130 75
English Pages 288 [299] Year 2024
“Evaluating Research in Academic Journals: A Practical Guide to Realistic Evaluation is a textbook that must be adopted for anyone teaching a course in Research Methods. From the first day of class students can begin gathering the knowledge, jargon-free, to precisely and properly evaluate peer-reviewed academic research articles. The material presented is unique in its coverage not only of quantitative-based research, but also that involving qualitative, mixed methods, as well as action-oriented research. A student- and instructor-friendly read that packs a powerful punch of information.” – Penny M. Geyer, John Jay College, the City University of New York (CUNY), USA “The latest edition of Evaluating Research in Academic Journals continues to improve on a classic ‘how to’ research text, and one that is written in an incredibly accessible manner. This book will be helpful to students, of all levels, backgrounds, and disciplines, who are evaluating existing scientific research for a dissertation or thesis project, a class paper, or even just to become better informed about the research on a given topic. Along with the other updates in the latest edition, the addition of a new chapter devoted entirely to evaluating qualitative research makes this text even more relevant and important to students and budding scientific researchers. At a minimum, this book should be in the personal academic library of every graduate student in a research-oriented field, bearing dozens of dog-eared pages, a fraying coffee-stained cover, and multiple creases along the spine from the many times it has been read and reread during the graduate school years (and beyond).” – Kelly M. Socia, University of Massachusetts Lowell, USA “This book is an incredibly useful complement to traditional research methods texts, which typically focus on idealized and abstracted procedures. Here, students are presented with the realities of the conduct of research and are given tools for assessing empirical articles from stem to stern. Checklists, examples, and exercises provide structured opportunities for students to apply their newfound skills.” – Sonja Siennick, Florida State University, USA “Evaluating Research in Academic Journals: A Practical Guide to Realistic Evaluation is itself an example of the hallmarks of the best academic writing – it’s written and organized in a way that distills the complexities of academic research into a format that students will find digestible, intellectually engaging, and actionable. The text is written for students in academic programs, but it would also be helpful for authors as a final checklist before submitting their own work for peer-review. Tcherni-Buzzeo and Pyrczak have expertly presented an intimidating topic for students in a way that students will find edifying, and dare I say even enjoyable! The text opens with guidelines for reviewing research and a reflective self-assessment. Each chapter guides students on a specific part of empirical research as presented in academic journals (e.g., beginning with evaluating titles, abstracts, and finishing with evaluating discussions) and includes built-in exercises with author commentary, guided class assignments, and real examples from the literature. This flow of the text will help make students better consumers of academic research and will set a solid foundation for those who wish to produce their own scholarly research in the future.” – Michael Jenkins, University of Scranton, USA
“This is a wonderful book that aims to help students become intelligent consumers of research in the social and behavioral sciences by reading and evaluating it for themselves. It provides students with clear and practical advice about how to assess each aspect of an article, from beginning to end, with relevant examples and useful exercises. Best of all, the book encourages students not to expect perfection from the research they read; instead, it teaches them to accept the inherent limitations of the research process without thinking that all studies are equally flawed or undervaluing its cumulative contribution to knowledge.” – Walter Forrest, School of Law, University of Limerick, Ireland
Evaluating Research in Academic Journals
Evaluating Research in Academic Journals is a guide for students learning how to evaluate reports of empirical research published in academic journals. It breaks down the process of evaluating a journal article into easy-to-understand steps and emphasizes the practical aspects of evaluating research. The book describes the nuances that may make an article publishable, even when it has serious methodological flaws. Students learn when and why certain types of flaws may be tolerated, and why evaluation should not be performed mechanically. Each chapter is organized around evaluation questions, and the book includes numerous examples from journals in the social and behavioral sciences to illustrate the application of evaluation questions and provide actual instances of strong and weak features of published reports. Common-sense models for evaluation, combined with a lack of jargon make it possible for students to start evaluating research articles in the first week of class, making this the ideal textbook for instructors and students across a range of disciplines. New to this edition: ■ A new chapter on Types of Research ■ Coverage of the new realities of online survey methods and research using big data ■ A new appendix on Emerging Issues in Survey Research ■ More emphasis and information on qualitative, case studies, and action research ■ Expanded discussion of research ethics, including additional research-ethics-oriented guidelines, and new appendices devoted to noteworthy cases of research ethics breaches. The accompanying Instructor and Student Resources provide free digital materials designed to test student knowledge and save time when preparing lessons, including over 150 multiplechoice questions, articles, videos, and weblinks for students to test their knowledge of the material and further their understanding of concepts; and downloadable lecture slides and test banks for instructors. Maria Tcherni-Buzzeo is Professor and Director of the Ph.D. Program in Criminal Justice at the University of New Haven, USA. She received her Ph.D. in Criminal Justice from the University at Albany (SUNY), and her research has been published in the Journal of Quantitative Criminology, Justice Quarterly, Aggressive Behavior, Journal of Developmental and LifeCourse Criminology, and other academic outlets.
Evaluating Research in Academic Journals A Practical Guide to Realistic Evaluation Eighth Edition
Maria Tcherni-Buzzeo and Fred Pyrczak
Designed cover image: Philip Thurston / Getty images Figures produced with DALL-E® Eighth edition published 2024 by Routledge 605 Third Avenue, New York, NY 10158 and by Routledge 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN Routledge is an imprint of the Taylor & Francis Group, an informa business © 2024 Maria Tcherni-Buzzeo and Fred Pyrczak The right of Maria Tcherni-Buzzeo and Fred Pyrczak to be identified as authors of this work has been asserted in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. The purchase of this copyright material confers the right on the purchasing institution to photocopy or download pages which bear a copyright line at the bottom of the page. No other parts of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. First edition published by Pyrczak Publishing 1999 Seventh edition published by Routledge 2019 ISBN: 978-1-032-42408-8 (hbk) ISBN: 978-1-032-42409-5 (pbk) ISBN: 978-1-003-36266-1 (ebk) DOI: 10.4324/9781003362661 Typeset in Times New Roman and Trade Gothic by Apex CoVantage, LLC Access the Instructor and Student Resources: www.routledge.com/cw/tcherni-buzzeo
Contents
Introduction to the Eighth Edition
1. Types of Research
ix
1
2. Background for Evaluating Research
18
3. Evaluating Titles
35
4. Evaluating Abstracts
49
5. Evaluating Introductions and Literature Reviews
65
6. Evaluating Samples When Researchers Generalize
96
7. Understanding and Evaluating Context-Specific Research Samantha A. Tosto
119
8. Evaluating Measures
140
9. Evaluating Experimental Procedures
156
10. Evaluating Analysis and Results Sections: Quantitative Research
177
11. Evaluating Methods and Analysis Sections: Qualitative Research Stephanie Bonnes
188
12. Evaluating Analysis and Results Sections: Mixed Methods Research Anne Li Kringen
206
13. Evaluating Discussion Sections
221
vii
Contents
14. Evaluating Systematic Reviews and Meta-Analyses: Towards Evidence-Based Practice
231
15. Putting It All Together
250
Concluding Comment
256
Appendix A1: Appendix A2: Appendix Appendix Appendix Appendix
Index
viii
B: C: D: E:
Research Ethics: Two of the Most (In)famous Studies Research Ethics: An Egregious Case that Led to Children’s Deaths Program/Policy Evaluation Limitations of Significance Testing Emerging Issues in Survey Research Checklist of Evaluation Questions
257 262 265 268 273 276
286
Introduction to the Eighth Edition
When students in social and behavioral sciences take advanced courses in their major field of study, they are often required to read and evaluate original research reports published as articles in academic journals. This book is designed as a guide for students who are first learning how to engage in this process.
Major Assumptions First, it is assumed that the students using this book have limited knowledge of research methods, even though they may have taken a course in introductory research methods (or may be using this book while taking such a course). Because of this assumption, technical terms and jargon, such as true experiment, are defined when they are first used in this book. Second, it is assumed that students have only a limited grasp of elementary statistics. Thus, the chapter on evaluating statistical reporting in research reports is confined to criteria that such students can easily comprehend. Finally, and perhaps most importantly, it is assumed that students with limited backgrounds in research methods and statistics can produce adequate evaluations of research reports – evaluations that get to the heart of important issues and allow students to draw sound conclusions from published research.
This Book Is Not Written for … This book is not written for journal editors or members of their editorial review boards. Such professionals usually have had firsthand experience in conducting research and have taken advanced courses in research methods and statistics. Published evaluation criteria for use by these professionals are often terse, full of jargon, and composed of many elements that cannot be fully comprehended without advanced training and experience. This book is aimed at a completely different audience: students who are just beginning to learn how to evaluate original reports of research published in journals.
ix
Introduction to the Eighth Edition
Applying the Evaluation Questions in This Book Chapters 3 through 15 are organized around evaluation questions that may be answered with a simple “yes” or “no,” where a “yes” indicates that students judge a characteristic to be satisfactory. However, for evaluation questions that deal with complex issues, students may also want to rate each one using a scale from 1 to 5, where 5 is the highest rating. In addition, N/A (not applicable) may be used when students believe a characteristic does not apply, and I/I (insufficient information) may be used if the research report does not contain sufficient information for informed judgment.
Note from the Authors I took over the updating of this text for its previous, 7th edition, due to Fred Pyrczak’s untimely departure from this earth in 2014. His writing in this book is amazing: structured, clear, and concise. It is no surprise that the text has been highly regarded by multiple generations of students who have used it in their studies. In fact, many students in my Methods classes have commented on how much they like this text and how well-written and helpful it is. I have truly enjoyed updating the 7th and now 8th editions of this book for the new generation of students and have tried my best to retain all the strengths of Fred’s original writing. I am also grateful to my colleagues Samantha A. Tosto, Stephanie Bonnes, and Anne Li Kringen who are experts in their respective fields and contributed relevant chapters to the current edition. I have added a new Chapter 1 (Types of Research) to this edition, as well as new appendices on research ethics and emerging issues in survey research. The rest of the chapters and appendices have been updated throughout. I hope these will serve you well in your adventures of reading research articles! You may also be wondering whether the rise of artificial intelligence (AI) tools can replace your judgment when evaluating research. My answer is: probably not. Moreover, learning to understand and evaluate research is like building muscles – will AI build your intellectual muscle for you in the absence of hard work? ;) This book is devoted to my amazing family – they light up my world! Maria Tcherni-Buzzeo New Haven, 2023 My best wishes are with you as you master the art and science of evaluating research. With the aid of this book, you should find the process both undaunting and fascinating as you seek defensible conclusions regarding research on topics that interest you. Fred Pyrczak Los Angeles, 2014
x
CHAPTER 1
Types of Research
Before we dive into the nitty-gritty of research types and assess studies published in peerreviewed journals, it is important to look at the bigger picture and figure out the goals of science.
Question 1: Why do scientists conduct their research? What are they trying to achieve (besides satisfying their personal curiosity)? What are the benefits of research to humankind? What answer would you give before you read the answer below? Answer: Research helps us to understand how the world works and how to make it better. If you think of innovations and breakthroughs in various fields of science – from physics to medicine to public health to psychology – they point out something that was not clear before, discover the way that things work or may work around us or inside us, and then serve as a basis for subsequent solutions and products that improve our lives. An example of such a breakthrough in social and behavioral sciences is a relatively recent discovery that genetics plays a huge role in determining personality traits, intellectual abilities, and mental health conditions of children, with the remaining environmental influences mostly coming from sources other than parents or their child-rearing methods. The social scientist who contributed the most to this breakthrough and its popularization was Judith Rich Harris with her book The nurture assumption originally published in 1998.1 In this groundbreaking book, she synthesized and reinterpreted a vast amount of research from behavioral genetics, psychology, education, and several other disciplines. Even though the revelation about the limited role of parental influence beyond genetics may upset or even offend some people, it has been hugely liberating to many others. For example, before the advent of genetic research, parents of children with autism, especially mothers, were considered the culprits of their children’s condition due to the lack of warmth that the parents exhibited to their children. This now debunked theory is called the “refrigerator mother theory.”2 It 1 2
Harris, J. R. (2009). The nurture assumption: Why children turn out the way they do. Free Press. ISBN: 9781439135082. www.simonandschuster.com/books/The-Nurture-Assumption/Judith-Rich-Harris/9781439135082 See more information on it in the online resources for this chapter.
DOI: 10.4324/9781003362661-1
1
Types of Research
must have been a very painful experience for these parents who were blaming themselves when, in reality, they did nothing wrong to cause autism in their children. Thanks to science, their mental suffering has been alleviated. Moreover, knowing the true causes of adverse events allows us to focus our efforts on correct prevention mechanisms to improve the situation. In this example, it would have been futile and counterproductive to focus on improving parental warmth as a “cure” for their children’s autism.
To summarize, scientists conduct research to figure out how the world works; this type of research is called pure research or basic research. People can then apply the newly acquired knowledge of how things work to arrive at solutions for how to make the world a better place; applied research is focused on this goal.
Question 2: What are the examples of pure research? Are there different types of pure research? How do they help us better understand the world? Answer: Again, the main purpose of pure research is to arrive at a clearer and more accurate picture of reality. Before embarking on the noble task of improving the world, we need to better understand how it works. There are several types of research that accomplish this important goal. Exploratory studies take a closer look at specific phenomena to better understand people’s experiences. These are usually qualitative studies that use observation, interviews, and document analysis (ethnographic methods) to gain an in-depth understanding of participants’ feelings, beliefs, and perspectives (Chapter 11 of this book discusses the methodology of qualitative studies in more detail). For instance, Copes and colleagues (2019) used ethnographic research and visual imagery expressed in contextualized photographs to better understand and relay the experiences of methamphetamine drug users and to counteract stigma. Example 1.2.1 contains an excerpt in which the authors explain the goals of their study. 2
Types of Research
Example 1.2.1 – Copes et al. (2019) 3 THE GOALS OF AN EXPLORATORY STUDY: BETTER UNDERSTANDING
While there are certainly deleterious effects of meth use and the stereotypes often ring true, existing narratives and imagery fall short of describing the more complex, and contradictory, realities of people’s lives. Indeed, people are complex. Even those who use meth daily are more than the stigmatized monsters portrayed in media (Boeri, 2013; Marsh, Copes, & Linnemann, 2017). They can be caring mothers, dedicated friends, and sympathetic listeners. But these sides to them are easy to ignore or brush aside when confronted with images of them on their worst days (e.g., mugshots). While we do not wish to romanticize those who use meth, we do think it is important to see them as the complex people they are. With this in mind, we engaged in a photo-ethnography of people who use meth in rural Alabama with a larger goal of acting as a counter-narrative and counter-visual to these general perceptions. Our aim was to go beyond presenting mere visuals and instead explicate the meaning of these visuals to produce and present contextualized, representative images of people who use meth. Our aim for the project was to understand how people who use meth in rural Alabama make sense of their lives and navigate their drug use within the context of rural poverty. Descriptive studies do what is implied in their name: they describe a phenomenon using statistics such as averages, percentages, and rates. These are usually quantitative studies that help us better understand the big picture, in contrast to the up-close and personal look at things in qualitative studies. Chapter 10 of this book focuses on assessing quantitative studies. Some examples of descriptive quantitative studies are as follows: comparing male and female college students in the United States in terms of their physical activity, sleep quality, and mood disorders (Example 1.2.2), ■ comparing European countries in terms of educational losses of schoolchildren during the COVID-19 pandemic (Example 1.2.3), and ■ assessing how the prevalence of mental illness among incarcerated people in Australia changed over the last two decades (Example 1.2.4). ■
Example 1.2.2 – Glavin et al. (2022) 4 THE GOALS OF A DESCRIPTIVE STUDY: GENDER COMPARISONS
Insufficient sleep is a serious and growing public health problem among college students. Approximately 32% of those 18 to 24 years of age report habitually sleeping less than
3
4
Copes, H., Tchoula, W., & Ragland, J. (2019). Ethically representing drug use: Photographs and ethnographic research with people who use methamphetamine. Journal of Qualitative Criminal Justice and Criminology, 8(1), 21–36. https://doi.org/10.21428/88de04a1.2e48b8e5 Glavin, E. E., Matthew, J., & Spaeth, A. M. (2022). Gender differences in the relationship between exercise, sleep, and mood in young adults. Health Education & Behavior, 49(1), 128–140. https://doi.org/10.1177/ 1090198120986782 3
Types of Research
the recommended 7 to 9 hours per night (Liu et al., 2016), and college students rate sleep difficulties second only to stress as a factor that negatively affects academic performance (American College Health Association, 2019).… To date, very few studies have examined the relationships between Physical Activity Guideline adherence, exercise frequency, sleep, and mood in college students.… This study was designed to fill this gap and determine if there are gender differences in the relationships between exercise frequency and physical activity guideline adherence, and sleep and mood outcomes.
Example 1.2.3 – Blaskó et al. (2022) 5 THE GOALS OF A DESCRIPTIVE STUDY: CROSS-NATIONAL COMPARISONS
It is widely discussed that the pandemic has impacted educational inequalities across the world. However, in contrast to data on health or unemployment, data on education outcomes are not timely. Hence, we have extremely limited knowledge about pandemic-related learning losses at the national and cross-national levels. As it might take years to get suitable comparative data, this study uses the latest large-scale international achievement survey from before the pandemic, the Trends in International Mathematics and Science Study 2019, to answer two research questions. First, which European countries are most likely to have experienced higher learning loss among their children? Second, which European countries have most likely experienced the greatest increases in learning inequalities? Results based on 4th graders’ school achievements indicate that educational inequalities between and within countries are likely to have augmented substantially throughout Europe. Some European countries are probably already facing an education crisis. 5
4
Blaskó, Z., Costa, P. D., & Schnepf, S. V. (2022). Learning losses and educational inequalities in Europe: Mapping the potential consequences of the COVID-19 crisis. Journal of European Social Policy, 32(4), 361–375. https://doi.org/10.1177/09589287221091687
Types of Research
Example 1.2.4 – Browne et al. (2023) 6 THE GOALS OF A DESCRIPTIVE STUDY: ASSESSING PREVALENCE
Rates of mental illness in people imprisoned in Australia are higher than in the general population (Butler et al., 2006), a finding that is consistent with prison mental illness prevalence studies worldwide (Fazel and Seewald, 2012). … Despite consistent findings regarding elevated rates of mental illness in prisons, prevalence estimates vary widely between studies. … To date, no studies have directly examined the prevalence of mental illness in prison over time using data from repeated surveys. … The current study examines data from three large prison health surveys conducted in New South Wales (NSW) over a 15year period, to determine whether the rates of mental illness in custody are increasing, in line with increases in self-reported mental illness in the general population and the apparent trend in national prison surveys. Any increase in the prevalence of mental illness in prisons over time has important implications for planning and resourcing both local health and prison services given the poorer health and criminal justice outcomes of this group.
Most studies in social and behavioral sciences are descriptive. Not all of them use strictly quantitative methods; some of them use mixed methods research, described in detail in Chapter 12 of this textbook. Overall, exploratory and descriptive studies help generate hypotheses about why things happen in the way they do, that is, help formulate causal explanations. Such causal explanations, in their complete form, become a theory. Testing theories and explanations is the goal of explanatory studies.
6
Browne, C. C., Korobanova, D., Yee, N., Spencer, S. J., Ma, T., Butler, T., & Dean, K. (2023). The prevalence of self-reported mental illness among those imprisoned in New South Wales across three health surveys, from 2001 to 2015. Australian & New Zealand Journal of Psychiatry, 57(4), 550–561. https://doi. org/10.1177/00048674221104411 5
Types of Research
Explanatory studies are a type of pure research held with high esteem in science. These are the studies that test cause-and-effect relationships or explanations of how things work. A typical explanatory study tests a theory or hypothesis that X has an effect on Y. Examples of explanatory studies are as follows: in Example 1.2.5, a study tests whether, in the United States, drivers’ race (i.e., visible skin color) affects police officers’ decisions to stop the driver; in Example 1.2.6, a study tests the self-determination theory on public healthcare workers in Italy, determining whether their job satisfaction is affected by their sense of autonomy, competence, and relatedness.
Example 1.2.5 – Pierson et al. (2020) 7 THE GOALS OF AN EXPLANATORY STUDY: ASSESSING CAUSE-AND-EFFECT
[W]e assess potential bias in stop decisions by applying the ‘veil of darkness’ test developed by Grogger and Ridgeway [21]. This test is based on a simple observation: because the sun sets at different times throughout the year, one can examine the racial composition of stopped drivers as a function of sunlight while controlling for time of day.… If black drivers comprise a smaller share of stopped drivers when it is dark and accordingly difficult to determine a driver’s race, that suggests black drivers were stopped during daylight hours in part because of their race. In both state patrol and municipal police stops, we find that black drivers comprise a smaller proportion of drivers stopped after sunset, suggestive of discrimination in stop decisions.
7
6
Pierson, E., Simoiu, C., Overgoor, J., Corbett-Davies, S., Jenson, D., Shoemaker, A., ... & Goel, S. (2020). A large-scale analysis of racial disparities in police stops across the United States. Nature Human Behaviour, 4(7), 736–745. https://doi.org/10.1038/s41562-020-0858-1
Types of Research
Example 1.2.6 – Battaglio et al. (2022) 8 THE GOALS OF AN EXPLANATORY STUDY: TESTING A THEORY
This study puts self-determination theory to an empirical test through a series of discrete choice experiments across three samples of public healthcare workers, for a total of 4,743 subjects. The three replications provide convergent evidence in support of the hypotheses that autonomy, competence, and three types of relatedness – with supervisors, peers, and beneficiaries – simultaneously and independently increase employee satisfaction. Meaningful differences emerge in the relative importance of those five factors. In particular, the fulfilment of one’s need for competence turns out to have the greatest positive impact across experimental replications, whereas the need for autonomy consistently comes last. To summarize, these are some of the ways scientists conduct pure research to better understand how the world works: through exploratory, descriptive, and explanatory studies. To explain the patterns and regularities observed and described in exploratory and descriptive studies, scientists construct theories that are then tested in explanatory studies. Once we gain a better understanding of reality through pure research, this knowledge can be put to work in order to improve the world. This is where applied research becomes very helpful because it focuses on solving practical problems, improving decision making, and bridging the gap between theory and practice.
Question 3: What are the examples of applied research? How does it help to improve the world? Answer: To demonstrate how knowledge acquired through pure research can be applied to solving real-world problems, let us consider some of the insights gained from the examples above. In the explanatory study highlighted in Example 1.2.6, Battaglio and colleagues (2022) found that what affected employee job satisfaction the most was their sense of competence. This research finding can be translated into developing a program intended to improve employees’ satisfaction and retention by improving their competence: for example, offering regular trainings to help workers upskill or recognizing employees with special awards conferred on a regular basis in several important areas of achievement. If a program of this type were to be instituted, it would be important to assess its effectiveness in improving employee satisfaction. This is exactly what evaluation studies do, which constitutes the most common type of applied research.
8
Battaglio, R. P., Belle, N., & Cantarelli, P. (2022). Self-determination theory goes public: Experimental evidence on the causal relationship between psychological needs and job satisfaction. Public Management Review, 24(9), 1411–1428. https://doi.org/10.1080/14719037.2021.1900351
7
Types of Research
Evaluation studies focus, as their name suggests, on evaluating the effectiveness of a policy or program. They help us determine what works and what does not when we try to make the world a better place. For example, given the evidence of an increasing incidence of mental illness among inmates (from descriptive studies like the one by Browne and colleagues (2023) in Example 1.2.4), it would be important to evaluate whether providing mental health treatment to people during and after release from prison would reduce their chances of committing another crime. In fact, an evaluation study of this type was conducted in Australia by Thomas and her colleagues (2022) – see Example 1.3.1, and in Canada, by Palis and her colleagues (2022) – see Example 1.3.2.
Example 1.3.1 – Thomas et al. (2022) 9 THE GOALS OF APPLIED RESEARCH: EVALUATING THE EFFECTIVENESS OF A POLICY OR PROGRAM (AUSTRALIA)
People released from prison who experience mental health and substance use problems are at high risk of reincarceration. This study aimed to examine the association between contact with mental health and substance use treatment services, and reincarceration, among adults released from prison.
Example 1.3.2 – Palis et al. (2022) 10 THE GOALS OF APPLIED RESEARCH: EVALUATING THE EFFECTIVENESS OF A POLICY OR PROGRAM (CANADA)
Diagnosis of mental disorder is prevalent among people who have been incarcerated. [...] In Canada, nearly all incarcerated people (>95%) will eventually return to the community [7,8]. The days and weeks immediately following release represent a time of elevated risk for poor outcomes, such as overdose, recidivism, and death [9–11]. This cohort study investigates (1) the association of postrelease mental health services access (MHSA) with reincarceration risk and (2) the association of timeliness of MHSA with time to reincarceration. These findings can be used to identify current gaps in the provision of care, highlighting the proportion of people with mental disorder diagnoses for
9
10
8
Thomas, E. G., Spittal, M. J., Taxman, F. S., Puljević, C., Heffernan, E. B., & Kinner, S. A. (2022). Association between contact with mental health and substance use services and reincarceration after release from prison. PLoS ONE, 17(9), e0272870. https://doi.org/10.1371/journal.pone.0272870 Palis, H., Hu, K., Rioux, W., Korchinski, M., Young, P., Greiner, L., ... & Slaunwhite, A. (2022). Association of mental health services access and reincarceration among adults released from prison in British Columbia, Canada. JAMA Network Open, 5(12), e2247146–e2247146. https://doi.org/10.1001/ jamanetworkopen.2022.47146
Types of Research
whom mental health service needs remain unmet in the transition from corrections to community. Interestingly, the results of these two studies were different. The study in Australia found that the use of mental health services is not effective in preventing subsequent repeat incarceration while the study in Canada found that mental health services substantially reduce subsequent reincarceration among the mentally ill. In fact, such situations are common. Multiple evaluation studies of the same type of program or policy are often conducted – in different locations, at different times, and with different populations – and they may produce different findings. Meta-analysis is one of the best options for combining information from multiple studies and making sense of the resulting picture. Chapter 14 of this book discusses meta-analyses and systematic reviews and provides guidelines for evaluating these types of studies. Returning to the topic of single evaluation studies (not aggregated into a meta-analysis), they encompass a wide range. Some of them assess the impact of broad governmental policies – see the study of Shah, Britton, and Bogdanovica (2022) on assessing e-cigarette regulations/ policies and the associated smoking cessation among people in European Union countries, in Example 1.3.3. Other evaluation studies may focus on the impact of a specific intervention on campus – see the evaluation of an educational program for undergraduate students, intended to reduce their meat consumption, in Example 1.3.4.
Example 1.3.3 – Shah et al. (2022) 11 EVALUATION STUDY: ASSESSING POLICY EFFECTS (EU COUNTRIES)
E-cigarettes have been available on European markets since 2007, and in 2014 in the European Union (EU) around 48.5 million people had ever used an e-cigarette (Farsalinos et al., 2016). While the overall prevalence of daily use of e-cigarettes among Europeans remains low at between 1% and 2.9% (Kapan et al., 2020), prevalence varies considerably across the EU countries with ever use ranging from around 5% in Portugal and Italy to about 20% in Latvia (Eurobarometer, 2017).… The aim of this study was to develop and test an equivalent regulatory and policy control scale to quantify the implementation of e-cigarette policies and to investigate associations between e-cigarette scale scores and a range of variables including e-cigarette market size, prevalence of e-cigarette use and changes in the proportion of former smokers across the current 27 EU nations and the UK.… [Our results] suggest that countries with more regulation of e-cigarette sales,
11
Shah, A., Britton, J., & Bogdanovica, I. (2022). Developing a novel e-cigarette regulatory and policy control scale: Results from the European Union. Drugs: Education, Prevention and Policy, 29(6), 719–725. https:// doi.org/10.1080/09687637.2021.1959520
9
Types of Research
advertising and safety might be more successful in promoting quitting and reducing harm from tobacco smoking.
Example 1.3.4 – Jalil et al. (2022) 12 EVALUATION STUDY: ASSESSING PROGRAM EFFECTS (COLLEGE CAMPUS IN THE UNITED STATES)
Meat consumption is a major driver of climate change. Interventions that reduce meat consumption may improve public health and promote environmental sustainability. We conducted a randomized controlled trial to examine the effects of an awareness-raising intervention on meat consumption. We randomized undergraduate classes into treatment and control groups. Treatment groups received a 50-minute lecture on how food choices affect climate change, along with information about the health benefits of reduced meat consumption. Control classrooms received a lecture on a placebo topic. We analyzed 49,301 students’ meal purchases in the college dining halls before and after the intervention. We merged food purchase data with survey data to study heterogenous treatment effects and disentangle mechanisms. Participants in the treatment group reduced their purchases of meat and increased their purchases of plant-based alternatives after the intervention. [...] Our study provides evidence that an intervention based on informing consumers and encouraging voluntary shifts can effectively reduce the demand for meat. Our findings help to 12
10
Jalil, A. J., Tasoff, J., & Bustamante, A. V. (2020). Eating to save the planet: Evidence from a randomized controlled trial using individual-level food purchase data. Food Policy, 95, 101950. https://doi.org/10.1016/j. foodpol.2020.101950
Types of Research
inform the international food policy debate on how to counter rising global levels of meat consumption to achieve climate change goals.
You may have noticed that, in Example 1.3.4, the evaluation of an educational program on campus (the one that is meant to reduce students’ meat consumption) was conducted using a randomized controlled trial (RCT). This is a technical term for one of the most rigorous methods for testing the effectiveness of a policy or program – experimental research, when participants are randomly assigned to different groups (treatment and control), with the treatment group receiving the intervention of interest and the control group receiving an alternative program or “treatment as usual.” Chapter 9 of this book delves into the topic of experiments and points out the most important aspects to pay attention to when evaluating experimental research studies. Action research is an important type of applied research. Its main distinguishing feature is that it includes the collaboration of researchers with community partners, with an explicit goal of addressing real-world problems or issues and producing changes within a specific context. Community partners are practitioners and organizations that are best equipped to know the situation “on the ground,” advise in terms of issues and problems that need to be resolved, and identify potential obstacles standing in the way. These community partners, or stakeholders, participate in every stage of the research process: from design to analysis to implementation. This is why this type of research is also called participatory action research or community-based research. It is conducted in multiple fields: education, public health, social work, management, and others. More information about action research is provided in Chapter 7, but to get a sense of what it looks like, consider Example 1.3.5 about participatory action research to improve the eating habits of children in Nepal.
11
Types of Research
Example 1.3.5 – Upreti et al. (2023) 13 PARTICIPATORY ACTION RESEARCH: NUTRITION AND EDUCATION (NEPAL)
Nutrition education at school can contribute to developing healthy nutritional behaviours in schoolchildren. This paper critically reflects on how participatory action research (PAR) empowered university researchers and a school community to co-develop a school-based nutrition education programme (SBNEP) that promotes healthy nutritional behaviours in basic-level schoolchildren (Grades 1–8).… This study was conducted in a public school located in the Chitwan district of Nepal from June 2018 to August 2022. The study involved basic-level schoolteachers, fourth to eighth-grade students and their parents/guardians, school leaders, and the PAR committee members as the co-researchers. The study used in-depth interviews, focus groups, participant observation, informal talks, and bridgingthe-gap workshop methods. The interpretive phenomenological method was used to explain the meaning of the data. The findings of the study reflect that exploring the needs for good nutritional behaviours, prioritising them, and co-designing the SBNEP utilising the PAR methodology is a time-consuming project since it demands prolonged fieldwork, self-motivation, commitment, action with critical reflection (praxis), dialogic relation, and negotiation skills from both researchers and co-researchers. Another important category of applied research is translational research. Translational research is specifically focused on using fundamental scientific discoveries, sometimes from several areas of basic research, and putting them to use, or ‘translating’ them into policy and practice. The real-world implementation of research-based improvements is the goal of translational research. At the same time, this type of research is rarely published in peer-reviewed academic journals; it is much more likely to appear in professional magazines geared toward practitioners and policymakers (and thus, it is largely outside the scope of this book). Together, pure research and applied research comprise empirical studies. The word empirical means based on data or evidence collected through a systematic data collection method, not based on pure logic or conjecture. Thus, empirical research articles specifically include an original analysis of empirical data (data can be qualitative, quantitative, or mixed). However, empirical studies are not the only type of article published in peer-reviewed journals. See Figures 1.1 and 1.2 illustrating various types of research and their relationships.
Question 4: What types of articles published in academic journals are NOT empirical? How are they useful?
13
12
Upreti, Y. R., Devkota, B., Bastien, S., & Luitel, B. C. (2023). Developing a school-based nutrition education programme to transform the nutritional behaviours of basic-level schoolchildren: A case from participatory action research in Nepal. Educational Action Research, [online first]. https://doi.org/10.1080/09650792.2023 .2206580
Types of Research
Figure 1.1 Types of research, based on nature of the data.
Answer: In addition to empirical articles, journals publish the following types of articles: literature reviews (including systematic reviews) ■ theoretical analyses ■ opinions, commentaries, and critiques ■ book reviews ■
Although none of these article types is empirical, they still contribute to the accumulation, interpretation, and dissemination of knowledge and to moving science forward. To properly evaluate non-empirical articles such as theoretical analyses and other similar pieces, one needs to be knowledgeable in the relevant field of science, so the evaluation of these types of articles falls outside the scope of this book (since it is intended for students who are just getting more familiar with academic articles). Guidelines for evaluating literature reviews (both a standalone review and the one that constitutes the front part of an empirical article) are discussed in Chapter 5 of this book. Guidelines for evaluating systematic reviews synthesizing multiple empirical studies selected in a systematic way are discussed in Chapter 14.
Figure 1.2 Types of research overall, including non-empirical articles.
13
Types of Research
Question 5: What other types of research are important to know? Answer: You may have heard before of such terms as cross-sectional and longitudinal research. This distinction is based on whether the study was conducted at one point in time (crosssectional) or at multiple time points, with the same participants being studied annually or every few years, to determine how things change over time (longitudinal). Another important term used in the social sciences is survey research. Survey research can be used to identify public opinion on a variety of issues such as gun control, climate change, and healthcare. Policymakers can then use this information to make decisions regarding these issues. Many changes have occurred in survey research and the way surveys are administered and evaluated, as technology and people’s preferences for communication (in-person, calls, emails, texts, social media messages, etc.) are changing. Appendix D discusses emerging issues in survey research. Finally, it is important to mention the term field research, which refers to research conducted in a natural or real-world setting as opposed to a laboratory. Field research provides an important context and ability for social scientists to immerse themselves in an environment where things are happening naturally, as well as to test changes in the way they unfold in real life (see examples of natural experiments in Chapter 9). Example 1.5.1 is an excerpt from a study that uses both survey and field research to better understand the prevalence of domestic violence against women in Pakistan.
Example 1.5.1 – Andersson et al. (2009) 14 FIELD AND SURVEY RESEARCH: DOMESTIC VIOLENCE AGAINST WOMEN (PAKISTAN)
This article describes the first national survey of violence against women in Pakistan from 2001 to 2004 covering 23,430 women. The survey took account of methodological and ethical recommendations, ensuring privacy of interviews through one person interviewing the mother-in-law while another interviewed the eligible woman privately. The training
Figure 1.3 Types of research, based on timing.
14
14
Andersson, N., Cockcroft, A., Ansari, N., Omer, K., Chaudhry, U. U., Khan, A., & Pearson, L. (2009). Collecting reliable information about violence against women safely in household interviews: Experience from a large-scale national survey in South Asia. Violence Against Women, 15(4), 482–496. https://doi. org/10.1177/1077801208331063
Types of Research
Figure 1.4 Types of research, including overlapping.
module for interviewers focused on empathy with respondents, notably increasing disclosure rates. Only 3% of women declined to participate, and 1% were not permitted to participate. Among women who disclosed physical violence, only one third had previously told anyone. Surveys of violence against women in Pakistan not using methods to minimize underreporting could seriously underestimate prevalence. Example 1.5.1 also illustrates how some types of research can overlap: the study described is descriptive (pure) research conducted as field research while using survey research methods. If the study is conducted at one point in time, it is cross-sectional. If it is repeated with the same respondents several years later, it becomes longitudinal. See Figures 1.3 and 1.4 for illustration.
Question 6: How can I determine the type of research? Answer: Most of the time, carefully reading the title and abstract of a journal article helps you figure out the type of research used in the study. Examples 1.2.3, 1.2.6, 1.3.4, and 1.5.1 above use abstracts of the articles because these abstracts convey information about the study type and goals very well. Check the title of each article these examples are from (see links in the footnotes) and re-read each abstract to determine the type of research each study used. Chapter 3 of this book points out the key criteria to pay attention to when evaluating article titles, and Chapter 4 does the same for article abstracts. In subsequent chapters, the types of research discussed in this chapter will be mentioned as they are relevant for evaluating various aspects of research reports. Visit the Instructor & Student Resources website for multiple choice questions and additional resources: www.routledge.com/cw/tcherni-buzzeo
Chapter 1 Exercises Part A: Familiarity Directions: The main types of research discussed in this chapter are listed below. For each, indicate the extent to which you were already familiar with it before reading this 15
Types of Research
chapter. Use a scale ranging from 1 (not at all familiar) to 5 (very familiar). For the ones you are more familiar with, how would you explain what it means in your own words? Qualitative research Familiarity rating:
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Quantitative research Familiarity rating: Mixed methods research Familiarity rating: Pure research (or basic research) Familiarity rating: Applied research Familiarity rating: Exploratory research Familiarity rating: Descriptive research Familiarity rating: Explanatory research Familiarity rating: Evaluation research Familiarity rating: Experimental research Familiarity rating: Action research Familiarity rating: Translational research Familiarity rating: Cross-sectional research Familiarity rating: Longitudinal research Familiarity rating: Survey research Familiarity rating: 16
Types of Research
Field research Familiarity rating:
1
2
3
4
5
Part B: Application Directions: Read a recent empirical research article published in an academic, peerreviewed journal and try to determine which type of research the study represents. The article may be one that you select based on your interests or one assigned by your instructor. If you are using this book without any prior training in research methods, do the best you can. As you work through this book, your ability to identify the type of research will improve.
17
CHAPTER 2
Background for Evaluating Research
The vast majority of research reports are initially published in academic journals. In these reports, or empirical journal articles,1 researchers describe how they have identified a research problem, made relevant observations or measurements to gather data, and analyzed the data they collected. The articles usually conclude with a discussion of the results in view of the study limitations, as well as the implications of these results for policy or practice. This chapter provides an overview of some of the general characteristics of such research. Subsequent chapters present specific questions that should be applied when evaluating empirical research articles.
✓
Guideline 1: Narrow Focus: Researchers often examine narrowly defined problems
Comment: While researchers are usually interested in broad problem areas, they often examine only narrow aspects of the problems because of limited resources and the desire to keep the research manageable by limiting its focus. Furthermore, quantitative studies often examine problems in such a way that the results can be easily reduced to statistics, further limiting the breadth of their research. Even though qualitative researchers generally take a slightly broader view when defining a problem to be explored in research (they are not constrained by the need to reduce the results to numbers and statistics), they still maintain their research focus on an issue they are examining. Example 2.1.1 briefly describes a study on two correlates of prosocial behavior (i.e., helping behavior). To make the study of this issue manageable, the researchers have greatly limited its scope. Specifically, they examined only one very narrow type of prosocial behavior (making donations to homeless men who were begging in public). 1
As described in Chapter 1, empirical research articles are different from other types of articles published in peer-reviewed journals in that they specifically include an original analysis of empirical data (qualitative, quantitative, or mixed data).
18
DOI: 10.4324/9781003362661-2
Background for Evaluating Research
Example 2.1.1 A STUDY ON PROSOCIAL BEHAVIOR, NARROWLY DEFINED
In order to study the relationship between prosocial behavior and gender as well as age, researchers located five men who appeared to be homeless and were soliciting money on street corners using cardboard signs. Without approaching the men, the researchers observed them from a short distance for two hours each. For each pedestrian who walked within ten feet of the men, the researchers recorded whether the pedestrian made a donation. The researchers also recorded the gender and approximate age of each pedestrian. Because researchers often conduct their research on narrowly defined problems, an important task in the evaluation of research is to judge whether a researcher has defined the problem so narrowly that it fails to make an important contribution to the advancement of knowledge.
✓
Guideline 2: Artifcial Settings: Researchers often conduct studies in artificial settings
Comment: Laboratories on university campuses are often the research setting. To study the effects of alcohol consumption on driving behavior, a group of participants might be asked to drink carefully measured amounts of alcohol in a laboratory and then “drive” using virtual-reality simulators. Example 2.2.1 describes the preparation of cocktails in a study of this type.
19
Background for Evaluating Research
Example 2.2.1 – Barkley et al. (2006) 2 ALCOHOLIC BEVERAGES PREPARED FOR CONSUMPTION IN A LABORATORY SETTING
The preparation of the cocktail was done in a separate area out of view of the participant. All cocktails were a 16-oz mixture of orange juice, cranberry juice, and grapefruit juice (ratio 4:2:1, respectively). For the cocktails containing alcohol, we added 2 oz of 190-proof grain alcohol mixed thoroughly. For the placebo cocktail, we lightly sprayed the surface of the juice cocktail with alcohol using an atomizer placed slightly above the juice surface to impart an aroma of alcohol to the glass and beverage surface. This placebo cocktail was then immediately given to the participant to consume. This procedure results in the same alcohol aroma being imparted to the placebo cocktail as the alcohol cocktail … Such a study might have limited generalizability to drinking in out-of-laboratory settings, such as nightclubs, the home, picnics, and other places where people consuming alcohol may be drinking different amounts at different rates while eating (or not eating) various foods. Nevertheless, conducting such research in a lab allows researchers to simplify, isolate, and control variables such as the amount of alcohol consumed, the types of food being consumed, the type of distractions during the “car ride,” and so on. In short, researchers often opt against studying variables in complex, real-life settings for the more interpretable research results typically obtained in a laboratory.
2
20
Barkley, R. A., Murphy, K. R., O’Connell, T., Anderson, D., & Connor, D. F. (2006). Effects of two doses of alcohol on simulator driving performance in adults with attention-deficit/hyperactivity disorder. Neuropsychology, 20(1), 77–87. https://doi.org/10.1016/j.jsr.2005.01.001
Background for Evaluating Research
✓
Guideline 3: Being Brief: Many research reports lack information on matters that are potentially important for evaluating the quality of research
Comment: In most journals, research reports of more than 15 pages are rare. Journal space is limited by economics: journals have limited readership and thus limited paid circulation, and they seldom have advertisers. Even with electronic-only versions, it is important to curb the editorial/peer-review workload; thus, researchers are required to describe the study as concisely as possible.3 Given this situation, researchers must judiciously choose the details to include in the report. Sometimes, they may omit information that readers deem important. Omitted details can cause problems during research evaluation. For instance, it is common for researchers to describe in general terms the questionnaires and attitude scales they used, without reporting the exact wording of the questions.4 Yet, there is considerable research indicating that how items are worded can affect the results of a study. Another important source of information about a study is descriptive statistics for the main variables included in subsequent analyses. This information is often crucial in judging the sample, as well as the appropriateness of the analytical and statistical methods used in the study. The fact that full descriptive statistics are provided can also serve as an important proxy for the authors’ diligence, professionalism, and integrity. Chapter 10 provides more information on how to evaluate the most common statistical information presented in research articles. As you apply the evaluation criteria in the remaining chapters of this book to evaluate a research study, you may often find that there is “insufficient information to make a judgment” and thus put I/I (insufficient information) instead of grading the evaluation criterion on a scale from 1 (very unsatisfactory) to 5 (very satisfactory).
✓
Guideline 4: Journal Matters: As a rule, the quality of a research article is correlated with the quality of a journal where the article is published
Comment: It is no surprise that most authors want their research published in the best, highestranked journals. Thus, the top journals in each field of science receive the most article submissions and, as a result, can be very selective in choosing which research articles to publish (basically, the best ones). Those authors whose submission was rejected from the top journal usually move down the list and submit the article (or its revised version) to the next best journal. If rejected again, the article is then submitted to a second-tier journal, and so on. This typical 3 4
Also consider the fact that our culture is generally moving towards a more fast-paced, quick-read environment (140 characters, anyone?), which makes long(ish) pieces often untenable. This statement appears in each issue of The Gallup Poll Monthly: “In addition to sampling error, readers should bear in mind that question wording … can introduce additional systematic error or ‘bias’ into the results of opinion polls.” Accordingly, The Gallup Poll Monthly reports the exact wording of the questions it uses in its polls. Other researchers cannot always do this because the measures they use may be too long to include in a journal article or may be copyrighted by publishers prohibiting the release of the items to the public. 21
Background for Evaluating Research
process is another reason why journal ranking is usually a good proxy for the quality of articles published there. Top academic publishers (Elsevier, Routledge/Taylor & Francis, SAGE, Springer, Wiley, etc.) often stand behind the journals publishing on their platforms, so this is another way to discern publications of reputable quality.5 Generally, the journal impact factor is a metric that provides a good idea of journal quality. The impact factor for a journal is calculated based on how often the studies recently published in the journal are cited by other researchers. A quick Google search of journal rankings by discipline can provide an easy way to see how journals stack up against one another in your field of study.6 Thus, with hundreds of editors and contributors to academic journals, it is understandable that published empirical articles vary in quality, with some being very weak in terms of their research methodology.7 We will discuss the main sources of research weaknesses in the next guideline.
✓
Guideline 5: Research Quality Varies: Researchers use less-thanperfect methods of measurement, sampling, and analysis, so some studies are weaker than others
Comment: No study is perfect – they all take place in the real world. However, this does not mean that they are all similarly flawed. You will need to get some experience in reading research 5 6 7
22
See more information about predatory journals and publishers in the online resources for this chapter. The reader should also be very cautious of any journal that has no impact factor metric (unless the journal is associated with a reputable publisher and has been launched recently). Many journals are refereed, or peer reviewed. This means that the editor has experts who act as referees by evaluating each paper submitted for possible publication. These experts make their judgments without knowing the identity of the researcher who submitted the paper (that is why the process is also called “blind peer review”), and the editor uses their input in deciding which papers to publish as journal articles. The author then receives the editor’s decision, which includes anonymous peer reviews of the author’s manuscript.
Background for Evaluating Research
articles and evaluating their sampling, measures, and analyses, to become a better judge of research quality. The chapters of this book will guide you through each part of the research article to help you pinpoint some of the most important aspects to pay attention to. Several key aspects of research quality are summarized below.
5a.
Imperfect Measures
In research, measurement can take many forms – from online multiple choice achievement tests to essay examinations, from administering a paper-and-pencil attitude scale with choices from “strongly agree” to “strongly disagree” to conducting unstructured interviews to identify interviewees’ attitudes.8 Observation is a type of measurement that includes direct observation of individuals interacting in either their natural environments or laboratory settings. It is safe to assume that all observation and measurement methods are flawed to some extent. To understand why this is so, consider a professor/researcher interested in studying racial relations in society in general. Because of limited resources, the researcher decides to make direct observations of White and African American students interacting (and/or not interacting) in the college cafeteria. The observations will necessarily be limited to the types of behaviors typically exhibited in cafeteria settings, which is a weakness of the researcher’s method of observation. In addition, observations will be limited to certain overt behaviors because, for instance, it will be difficult for the researcher to hear most of what is being said without intruding on the privacy of students. On the other hand, suppose that another researcher decides to measure racial attitudes by having students respond anonymously to racial statements by circling “agree” or “disagree” for each one. This researcher has an entirely different set of weaknesses in the method of measurement. First is the matter of whether students will reveal their real attitudes on such a scale – even if the response is anonymous – because most college students are aware that negative racial attitudes are severely frowned upon in academic communities. Thus, some students might indicate what they believe to be socially desirable (i.e., socially or politically “correct”) rather than revealing their true attitudes. Moreover, people may often be unaware of their own implicit racial biases.9 In short, there is no perfect method for measuring complex variables. Instead of expecting perfection, a consumer of research should consider the following question: Is the method sufficiently valid and reliable to provide potentially useful information? Examples 2.5.1 and 2.5.2 show statements from research articles in which the researchers acknowledge limitations in their methods of measurement.
8 9
Researchers sometimes refer to measurement tools as instruments, especially in older research literature. For more information, check Project Implicit hosted by Harvard University and run by an international collaboration of researchers (see the link in the online resources for this chapter).
23
Background for Evaluating Research
Example 2.5.1 – Kor et al. (2012) 10 RESEARCHERS’ ACKNOWLEDGMENT OF A LIMITATION OF THEIR MEASURE
In addition, the assessment of marital religious discord was limited to one item. Future research should include a multiple-items scale of marital religious discord and additional types of measures, such as interviews or observational coding, as well as multiple informants.
Example 2.5.2 – Callahan et al. (2011) 11 RESEARCHERS’ ACKNOWLEDGMENT OF LIMITATIONS OF SELF-REPORTS
Despite these strengths, this study is not without limitations. First, the small sample size decreases the likelihood of finding statistically significant interaction effects. […] Fourth, neighborhood danger was measured from mothers’ self-reports of the events which had occurred in the neighborhood during the past year. Adding other family member reports of the dangerous events and official police reports would clearly strengthen our measure of neighborhood danger. Chapter 8 provides more information on the evaluation of observational methods and measures typically used in empirical studies. Generally, it is important to look for whether the researchers properly acknowledge some key limitations of their measurement strategies.
5b.
Imperfect Sampling
Comment: Arguably, the most common sampling flaw in research reported in academic journals is the use of convenience samples (i.e., samples that are readily accessible to researchers). Most researchers are professors, and professors often use samples of college students – obviously as a matter of convenience. Another common flaw is the reliance on voluntary responses to mailed or online surveys, which are often quite low. For online surveys, such as those on social media, it may be even more difficult to evaluate the response rate unless we know how many people see the survey solicitation ad. (Problems related to the use of online surveys are discussed in Chapter 6 and Appendix D.) Other samples are flawed because researchers cannot identify and locate all members of a population (e.g., injection drug users). Without being able to do this, it is impossible to draw a sample that a researcher can reasonably defend as representative of the population. In addition, researchers often have limited resources, which forces them to use small samples and which in turn might produce unreliable results. On the other hand, qualitative researchers emphasize 10 11
24
Kor, A., Mikulincer, M., & Pirutinsky, S. (2012). Family functioning among returnees to Orthodox Judaism in Israel. Journal of Family Psychology, 26(1), 149–158. https://doi.org/10.1037/a0025936 Callahan, K. L., Scaramella, L. V., Laird, R. D., & Sohr-Preston, S. L. (2011). Neighborhood disadvantage as a moderator of the association between harsh parenting and toddler-aged children’s internalizing and externalizing problems. Journal of Family Psychology, 25(1), 68–76. https://doi.org/10.1037/a0022448
Background for Evaluating Research
selecting a purposive sample – one that focuses on people with specific characteristics and is likely to yield useful information – rather than a representative sample (Chapter 11 dives into qualitative research in more detail). Chapter 7 specifically discusses the types of research where samples are NOT meant for generalization – such as case studies and action research. However, most studies in social sciences are meant for generalization, or extending the study results to a larger population or other settings. The sampling evaluation criteria for such studies are examined in Chapter 6. Researchers sometimes explicitly acknowledge the limitations of their samples. Examples 2.5.3 and 2.5.4 show portions of such statements from research articles.
Example 2.5.3 – Jiang et al. (2011) 12 RESEARCHERS’ ACKNOWLEDGMENT OF LIMITATION OF SAMPLING (CONVENIENCE SAMPLE)
The present study suffered from several limitations. First of all, the samples were confined to university undergraduate students and only Chinese and American students. For broader generalizations, further studies could recruit people of various ages and educational and occupational characteristics.
Example 2.5.4 – Melgaard et al. (2022) 13 RESEARCHER’S ACKNOWLEDGMENT OF LIMITATION OF SAMPLING (SMALL SIZE, LIMITED DIVERSITY)
Our study’s limitations are that of any other small-N sample study, i.e., by the fact that the participants are from a single institution and limited to the department of technology. Future work would include expanding this study to a larger audience and explore the possibility of digitally nudging [12; 53] the procrastinators to see if we could increase their online engagement and improve their learning experience through follow-ups.
5c.
Pitfalls Happen
Comment: You know how a small miscalculation early on can lead to an incorrect result when solving a math problem? This situation is very similar to that in research. A seemingly minor flaw, such as a poorly worded question on attitudes in a survey questionnaire, might lead to
12
13
Jiang, F., Yue, X. D., & Lu, S. (2011). Different attitudes toward humor between Chinese and American students: Evidence from the Implicit Association Test. Psychological Reports, 109(1), 99–107. https://doi. org/10.2466/09.17.21.PR0.109.4.99-107 Melgaard, J., Monir, R., Lasrado, L. A., & Fagerstrøm, A. (2022). Academic procrastination and online learning during the COVID-19 pandemic. Procedia Computer Science, 196, 117–124. https://doi.org/10.1016/j. procs.2021.11.080
25
Background for Evaluating Research
incorrect results. Similarly, a treatment that has been misapplied in an experiment may produce misleading conclusions regarding its effectiveness. Alternatively, a sample that only had volunteers eager to participate in a specific study can lead to skewed results. (This type of situation can lead to self-selection bias, which is discussed in more detail in Chapter 6.) For these reasons, empirical research articles should be detailed, so that consumers of research can have enough information to judge whether the research methods were flawed. Obviously, data-input errors and computational errors are also possible sources of errors in the results. Unfortunately, the process of data input and checking for mechanical errors in entering data is hardly ever mentioned in research reports published in academic journals. Alternative statistical methods are available for most problems, and different methods can yield different results. Finally, even a non-statistical analysis can be problematic. For instance, if two or more researchers review extensive transcripts of unstructured interviews, their interpretations of interviewees’ responses might differ. Discrepancies, such as these, suggest that the results may be flawed or at least subject to different interpretations. Chapter 10 provides evaluation criteria for quantitative Methods and Results sections of research reports, while Chapter 11 does the same for qualitative sections, and Chapter 12 – for mixed methods research.
5d.
Why Weaker Studies Matter
Undoubtedly, some weak articles simply slip past less-skilled editors. More likely, an editor may make a deliberate decision to publish a weak study report because the problem it explores is of current interest to the journal’s readers. This is especially true when there is a new topic of interest, such as a new educational reform, newly recognized disease, or a new government initiative. The editorial board of a journal might reasonably conclude that publishing studies on such new topics is important, even if the initial studies are weak. Sometimes, studies with very serious methodological problems are labeled as pilot studies, in either their titles or introductions to the articles. A pilot study is a preliminary study that allows a researcher to try out new methods and procedures for conducting research, often with small samples. Pilot studies may be refined in subsequent, more definitive, larger studies. The publication of pilot studies, despite their limited samples and other potential weaknesses, is justified on the basis that they may point other researchers in the direction of promising new leads and methods for further research. Chapter 7 provides more details on the pilot studies and discusses their limited generalizability.
5e.
Acknowledgment of Limitations
Comment: Many researchers briefly point out the most obvious flaws in their research. They typically do this in the last section of the article, the Discussion section. While they tend to mention only the most obvious problems, these acknowledgments can be a good starting point for evaluating the study. Chapter 13 provides additional details on evaluating the Discussion sections. Example 2.5.5 shows the researchers’ description of the limitations of their research on Mexican American men’s college persistence intentions.
26
Background for Evaluating Research
Example 2.5.5 – Ojeida et al. (2011) 14 RESEARCHERS’ DESCRIPTION OF THE LIMITATIONS OF THEIR RESEARCH
Despite the contributions of this study in expanding our understanding of Mexican American men’s college persistence intentions, there also are some clear limitations that should be noted. First, several factors limit our ability to generalize this study’s findings to other populations of Mexican American male undergraduates. The participants attended a Hispanic-serving 4-year university in a predominantly Mexican American midsize southern Texas town located near the U.S.-México border. While the majority of U.S. Latinos live in the Southwest region, Latinos are represented in communities across the U.S. (U.S. Census Bureau, 2008c). Additionally, the study’s generalizability is limited by the use of nonrandom sampling methods (e.g., selfselection bias) and its cross-sectional approach (Heppner, Wampold, & Kivlighan, 2007). Again, it is important to look for statements in which researchers honestly acknowledge the limitations of their study. It does not mitigate the resulting problems but helps to properly recognize some likely biases and problems with the generalizability and interpretation of the study results. If there is no mention of study limitations in the article, this does not mean that the study is perfect. Quite the opposite: it merits extra caution since the absence of statements on study limitations in a research article is a huge ‘red flag.’ Take the role of an investigator when reading a research article – pay attention to details. This leads us to the next guideline.
✓
Guideline 6: Read Carefully: Research reports often contain many details, which can be very important when evaluating a study
Comment: The old saying “The devil is in the details” applies here. Students who have relied exclusively on secondary sources for information about their major field of study may be surprised at the level of detail in many research reports (even brief ones!). Typically, the amount of detail is much greater than what is implied in sources such as textbooks and classroom lectures. Example 2.6.1 illustrates the level of detail that can be expected in many empirical research articles published in academic journals. It describes a part of an intervention (nutritional supplementation) intended to reduce aggression in children.
14
Ojeda, L., Navarro, R. L., & Morales, A. (2011). The role of la familia on Mexican American men’s college persistence intentions. Psychology of Men & Masculinity, 12(3), 216–229. https://doi.org/10.1037/a0020091
27
Background for Evaluating Research
Example 2.6.1 – Raine et al. (2016) 15 AN EXCERPT FROM AN ARTICLE ILLUSTRATING THE LEVEL OF DETAIL OFTEN INCLUDED IN RESEARCH REPORTS IN ACADEMIC JOURNALS
Nutritional supplementation. Nutritional supplementation consisted of omega-3 fatty acids, multivitamins, and calcium. The omega-3 supplement consisted of a daily 200 ml drink (SmartFish Recharge) containing 1,000 mg of omega-3 (300 mg of DHA, 200 mg of EPA, 400 mg of alpha-linolenic acid, and 100 mg of DPA) (see also Appendix S1, available online). This drink was chosen because: (a) it contains an appreciably higher dosage of omega-3 than standard capsules in a relatively small liquid quantity (60.6% of the size of a standard can of cola) suitable for child consumption and (b) the fruit-flavored drink may be better tolerated and result in higher compliance with children than standard capsules. The daily dose of multivitamins consisted of 12 vitamins and seven minerals administered as one chewable tablet (see Appendix S1) together with one fruit-flavored chewable tablet containing calcium (600 mg) and vitamin D (400 micrograms). […] Adherence to protocol. Adherence to the treatment regimen was assessed by assays of fasting serum omega-6 and omega-3 fats from venous blood drawn by a nurse at both baseline and 3 months (end of treatment). Samples were assayed blinded for treatment condition at the Section of Nutritional Neurosciences, National Institute of Alcohol Abuse and Alcoholism […].
15
28
Raine, A., Cheney, R. A., Ho, R., Portnoy, J., Liu, J., Soyfer, L., ... & Richmond, T. S. (2016). Nutritional supplementation to reduce child aggression: A randomized, stratified, single‐blind, factorial trial. Journal of Child Psychology and Psychiatry, 57(9), 1038–1046. https://doi.org/10.1111/jcpp.12565
Background for Evaluating Research
Note the level of detail, such as (a) the dosage and specifications of vitamins and minerals, including the taste of chewable tablets; (b) the explanations for the taste selection and amount of drink that needed to be consumed by children; and (c) the procedures for assessing whether the nutritional supplements were actually consumed (“Adherence to protocol”). Such details are useful for helping consumers of research understand exactly the nature of the intervention examined in the study. Knowing what was done to the participants and how it was done makes it possible to render informed evaluations of the study. Having detailed descriptions is also helpful for other researchers who might want to replicate the study in order to confirm its findings.
✓
Guideline 7: Defnitions Matter: Many research articles provide precise definitions of key terms to help guide the measurement of the associated concepts
Comment: Often, students complain that research articles are dry and boring and “Why do they include all those definitions anyway?” To the credit of the researchers writing these articles, they include definitions to help rather than annoy the reader. Consider some complex concepts that need to be measured in a typical study. For example, researchers are interested in the prevalence of domestic violence (DV). What is domestic violence? Do we only consider physical acts as domestic violence or psychological and verbal abuse as well? What about financial abuse? What about threats? These questions can be answered using a careful and precisely worded definition of domestic violence. This can also help the reader figure out what the researchers may be missing if they use police reports rather than a survey of self-reported victimization. Example 2.7.1 illustrates some of the issues:
Example 2.7.1 – Morgan & Jasinski (2017) 16 AN EXCERPT FROM AN ARTICLE ILLUSTRATING HOW DOMESTIC VIOLENCE DEFINITION IS RELATED TO ITS MEASUREMENT
By using different definitions and ways of operationalizing DV, other forms of family violence may be omitted from the analysis. Pinchevsky and Wright (2012) note that researchers should expand their definitions of abuse in future research to be broader and more inclusive of different types of abuse. The current research uses a broader definition of DV by examining all domestic offenses that were reported in Chicago and each of the counties in Illinois and aims to capture a more accurate representation of the different forms of DV. Thus, precise definitions of key terms help guide the most appropriate strategy to measure these terms and help translate the concept into a variable. More information about conceptual and operational definitions of key terms in research is provided in Chapter 5, devoted to evaluating literature reviews.
16
Morgan, R. E., & Jasinski, J. L. (2017). Tracking violence: Using structural-level characteristics in the analysis of domestic violence in Chicago and the state of Illinois. Crime & Delinquency, 63(4), 391–411. https:// doi.org/10.1177/0011128715625082 29
Background for Evaluating Research
✓
Guideline 8: Theory Rules: Other things being equal, research related to theories is more important than non-theoretical research
Comment: A given theory helps explain the interrelationships among a number of variables and often has implications for understanding human behavior in a variety of settings.17 Theories provide major causal explanations to help us “see the forest for the trees,” to make sense of how the world around us works. Why do people commit crimes? What causes autism? How do children learn a language? Why are people reluctant to consider evidence that contradicts their worldview? Why are lower-class voters less likely to participate in elections? These and many other questions can be best answered with a logical big-picture explanation, or theory. Studies that test theories are explanatory (see Chapter 1 for more details). Those that produce results consistent with a theory lend support to that theory. Those with inconsistent results argue against this theory. Importantly, no one study ever provides proof (which is the subject of the next guideline). After a number of explanatory studies on the same topic have been conducted, their results provide accumulated evidence that argues for or against the theory or can assist in modifying the theory. Researchers often explicitly discuss theories relevant to their research, as illustrated in Example 2.8.1.
Example 2.8.1 – Leimberg & Lehmann (2022) 18 RESEARCHERS’ DISCUSSION OF A THEORY RELATED TO THEIR RESEARCH
In Gottfredson and Hirschi’s (1990) A General Theory of Crime, low self-control was proposed to be a trait established during childhood, and they defined it as the degree to which a person is “vulnerable to the temptations of the moment” (p. 87). According to Gottfredson and Hirschi, individuals who are low in self-control are more inclined to commit criminal acts, are adversely influenced in their performance in school, work, health, finances, and their personal lives, and are more likely to engage in drug use (Grasmick et al., 1993; Moffitt et al., 2011). […] The influence of impulsivity on hard drug use is of central importance to the focus of the current study. A key facet of low self-control, impulsivity influences the way an individual rationalizes his or her behaviors. This characteristic produces a vulnerability that diminishes the ability for individuals to consider the consequences of their actions and a more instantaneous reaction to engage in, and become more susceptible to, crime and drug use (Baron, 2010; Hay & Meldrum, 2015). The role of theoretical considerations in evaluating research is discussed in greater detail in Chapter 5.
17 18
30
Notice that the word theory has a similar meaning when used in everyday language: for example, “I have a theory on why their relationship did not work out.” Leimberg, A., & Lehmann, P. S. (2022). Unstructured socializing with peers, low self-control, and substance use. International Journal of Offender Therapy and Comparative Criminology, 66(1), 3–27. https://doi. org/10.1177/0306624X20967939
Background for Evaluating Research
✓
Guideline 9: Replication Matters: No individual study provides “proof”
Comment: Conducting research is fraught with pitfalls; any one study may have very misleading results, and all studies can be presumed to be flawed to some extent. However, this does not mean that all research is bad – some studies are much better than others. Therefore, individual empirical research articles should be carefully evaluated to identify those that are most likely to provide sound results. Additionally, it is important to consider the entire body of research on a given problem. If different researchers using different research methods, with different strengths and weaknesses, reach similar conclusions, consumers of research may say that they have considerable confidence in the conclusions of the body of research. Replication is the process of conducting repeated studies on the same topic using different methods or target populations. It is one of the most important ways in science to check whether the findings of previous studies hold true or are a result of random chance. To the extent that the body of research on a topic yields mixed results, consumers of research should lower their degree of confidence. For instance, if studies with a more scientifically rigorous methodology point in one direction, while weaker ones point in a different direction, consumers of research might say that they have some confidence in the conclusion suggested by the stronger studies, but that the evidence is not yet conclusive. To help consumers of research pull together and assess information from multiple studies on the same topic, researchers conduct and publish systematic reviews and meta-analyses. Reading this type of study is the best way to gather information on the evidence base about a specific issue. Chapter 14 discusses systematic reviews and meta-analyses in more detail and provides evaluation criteria to assess their quality.
✓
Guideline 10: Acquire Expertise: To become an expert on a topic, one must become an expert at evaluating original reports of research
Comment: An expert is someone who knows not only the broad generalizations about a topic but also the nuances of the research that underlie them. In other words, they know the particular strengths and weaknesses of the major studies used to arrive at the generalizations. Put another way, an expert on a topic knows the quality of the evidence regarding that topic and draws generalizations from the research literature based on that knowledge.
31
Background for Evaluating Research
Visit the Instructor & Student Resources website for multiple choice questions and additional resources: www.routledge.com/cw/tcherni-buzzeo
Chapter 2 Exercises Part A: Familiarity Directions: The ten guidelines discussed in this chapter are repeated below. For each one (plus sub-parts of Guideline 5), indicate the extent to which you were already familiar with it before reading this chapter. Use a scale from 1 (not at all familiar) to 5 (very familiar). Guideline 1.
Narrow Focus: Researchers often examine narrowly defined problems Familiarity rating:
Guideline 2.
4
5
1
2
3
4
5
1
2
3
4
5
Journal Matters: As a rule, the quality of a research article is correlated with the quality of a journal where the article is published Familiarity rating:
Guideline 5.
3
Being Brief: Many research reports lack information on matters that are potentially important for evaluating the quality of research Familiarity rating:
Guideline 4.
2
Need for Control: Researchers often conduct studies in artificial settings Familiarity rating:
Guideline 3.
1
1
2
3
4
5
Research Quality Varies: Researchers use less-than-perfect methods of measurement, sampling, and analysis, so some studies are weaker than other ones Familiarity rating:
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
2
3
4
5
3
4
5
5a. Imperfect measures Familiarity rating:
5b. Imperfect sampling Familiarity rating:
5c. Pitfalls happen Familiarity rating:
5d. Why weaker studies matter Familiarity rating:
1
5e. Acknowledgment of limitations Familiarity rating:
32
1
2
Background for Evaluating Research
Guideline 6.
Read Carefully: Research reports often contain many details – these can be very important when evaluating a study Familiarity rating:
Guideline 7.
4
5
1
2
3
4
5
1
2
3
4
5
Replication Matters: No individual study provides “proof” Familiarity rating:
Guideline 10.
3
Theory Rules: Other things being equal, research related to theories is more important than non-theoretical research Familiarity rating:
Guideline 9.
2
Definitions Matter: Many research articles provide precise definitions of key terms to help guide the measurement of the associated concepts Familiarity rating:
Guideline 8.
1
1
2
3
4
5
Acquire Expertise: To become an expert on a topic, one must become an expert at evaluating original reports of research Familiarity rating:
1
2
3
4
5
Part B: Application Directions: Read an empirical research article published in an academic, peer-reviewed journal, and respond to the following questions. The article may be one that you select or one that is assigned by your instructor. If you are using this book without any prior training in research methods, do the best you can in answering the questions at this point. As you work through this book, your evaluations will become increasingly sophisticated. 1 How narrowly is the research problem defined? In your opinion, is it too narrow? Is it too broad? Explain. 2 Was the research setting artificial (e.g., a laboratory setting)? If yes, do you think that the gain in the control of extraneous variables offsets the potential loss of information that would be obtained in a study in a more realistic setting? Explain. 3 Does the article lack information on matters that are potentially important for evaluating it? 4 Can you assess the quality of the journal the article is published in? Can you find information online about the journal’s ranking or impact factor? 5 Are there any obvious flaws or weaknesses in the researcher’s methods of measurement or observation? Explain. (Note: This aspect of research is usually described under the subheading Method or Measures.)
33
Background for Evaluating Research
6 Are there any obvious sampling flaws? Explain. (Note: This aspect of research is usually described under the subheading Method or Participants.) 7 Overall, does the study seem to have some obvious flaws or weaknesses? If yes, briefly describe its weaknesses and speculate on why it was published despite them. 8 Do the researchers include a discussion of the limitations of their study? 9 Were the descriptions of procedures and methods sufficiently detailed? Were any important details missing? Explain. 10 Are definitions of the key terms provided? Is the measurement strategy for the associated variables aligned with the provided definitions? Explain. 11 Does the researcher describe relevant theory? 12 Was the analysis statistical or non-statistical? Was the description of the results easy to understand? Explain. 13 Do the researchers imply that their research proves something? Do you believe that it proves something? Explain. 14 Do you think that as a result of reading this chapter and evaluating the research article you are gaining some expertise in evaluating research reports? Explain.
34
CHAPTER 3
Evaluating Titles
Title is typically the first thing that consumers of research see when they look for journal articles on a specific topic. You can do a preliminary evaluation of article titles using the criteria in this chapter without reading the full articles. Later on, the article title can be evaluated more comprehensively after you have read the article, to ensure that it accurately reflects the contents of the study. Apply the questions that follow while evaluating titles. The questions are stated as “yes– no” questions, where a “yes” indicates that you judge the characteristic to be satisfactory. You may also want to rate each characteristic using a scale from 1 to 5, where 5 is the highest rating. N/A (not applicable) and I/I (insufficient information to make a judgment) may also be used when necessary.
___ Question 1: Specifc: Is the title sufficiently specific? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: On any major topic in the social and behavioral sciences, there are likely to be many hundreds of research articles published in academic journals. In order to help potential readers locate those that are most relevant to their needs, researchers should use titles that are sufficiently specific so that each article can be differentiated from the other research articles on the same topic. Consider the topic of depression, which has been extensively investigated. The title in Example 3.1.1 is insufficiently specific. Contrast it with the titles in Example 3.1.2, each of which contains information that differentiates it from the others.
Example 3.1.1 A TITLE THAT IS INSUFFICIENTLY SPECIFIC
‒ An Investigation of Adolescent Depression and Its Implications DOI: 10.4324/9781003362661-3
35
Titles
Example 3.1.2 THREE TITLES THAT ARE MORE SPECIFIC THAN THE ONE IN EXAMPLE 3.1.1
‒ Gender Differences in the Expression of Depression by Adolescent Children of Alcoholics ‒ The Impact of Social Support on the Severity of Postpartum Depression Among Adolescent Mothers ‒ The Effectiveness of Cognitive Therapy in the Treatment of Adolescent Students with Severe Clinical Depression
___ Question 2: Concise: Is the title reasonably brief? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: While a title should be specific (see the previous evaluation question), it should be fairly concise. Titles of research articles in academic journals are typically 15 words or fewer. When a title contains more than 20 words, it is likely that the researcher is providing more information than is needed by consumers of research who want to locate articles.1
___ Question 3: No Simple Answer: Has the author avoided using a “yes–no” question as a title? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Because research rarely yields simple, definitive answers, it is seldom appropriate to use a title that poses a simple “yes–no” question. For instance, the title in Example 3.3.1 implies that there 1 36
The titles of theses and dissertations tend to be longer than those of journal articles.
Titles
is a simple answer to the question it poses. However, a study on this topic undoubtedly explores the extent to which men and women differ in their opinions on social justice issues – a much more interesting topic than the one suggested by the title. The Improved Version is cast as a statement and is more appropriate as the title of a research report for publication in an academic journal.
Example 3.3.1 A TITLE THAT INAPPROPRIATELY POSES A “YES–NO” QUESTION
‒ Do Men and Women Differ in Their Opinions on Social Justice Issues?
Improved Version of Example 3.3.1 A TITLE AS A STATEMENT
‒ Gender Differences in Opinions on Social Justice Issues
___ Question 4: No Jargon: Is the title free of jargon and acronyms that might be unknown to the audience for the research report? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Professionals in all fields use jargon and acronyms (i.e., shorthand for phrases, usually in capital letters) for efficient and accurate communication with their peers. However, their use in the titles of research reports is inappropriate, unless the researchers are writing exclusively for such peers. Consider Example 3.4.1. If ACOA (which stands for Adult Children of Alcoholics) is likely to be well known to all the readers of the journal in which this title appears, its use is probably appropriate. Otherwise, the acronym should be spelled out or its meaning paraphrased in the article title. As you can see, it can be difficult to make this judgment without being familiar with the journal and its audience.
Example 3.4.1 A TITLE WITH AN ACRONYM THAT IS NOT SPELLED OUT (MAY BE INAPPROPRIATE IF NOT WELL-KNOWN TO THE JOURNAL AUDIENCE)
‒ Job Satisfaction and Motivation to Succeed Among ACOA in Managerial Positions
___ Question 5: No Results: Has the author avoided describing results in the title? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I 37
Titles
Comment: It is usually inappropriate for a title to describe the results of a research project. Not because the authors want to provide a “cliff hanger,” but because research typically raises more questions than it answers. In addition, research results are often subject to more than one interpretation. Given that titles need to be concise, attempting to state the results in a title is likely to lead to oversimplification.
Consider the title in Example 3.5.1, which undoubtedly oversimplifies the results of the study. A meaningful accounting of the results should address issues such as the following: What type of social support (e.g., parental support, peer support, and so on) is effective? How strong does it need to be to lessen the depression? By how much is depression lessened by strong social support? Because it is almost always impossible to state results accurately and unambiguously in a short title, the results ordinarily should not be stated at all, as illustrated in the Improved Version of Example 3.5.1.
Example 3.5.1 A TITLE THAT INAPPROPRIATELY DESCRIBES RESULTS
‒ Strong Social Support Lessens Depression in Juvenile Delinquents
Improved Version of Example 3.5.1 A TITLE THAT APPROPRIATELY DOES NOT DESCRIBE RESULTS
‒ The Relationship Between Social Support and Depression in Young People in the Albanian Justice System
38
Titles
In addition to the better handling of study results in the title of the Improved Version of Example 3.5.1, it also specified the location of the study (Albania) and uses the more appropriate wording that is less stigmatizing than “juvenile delinquents.” Naming and definitions should be used carefully, in favor of preferring a more neutral and less stigmatizing language, which is intended to reduce the negative impact of labeling.
___ Question 6: Mentions Variables: Are the key variables mentioned in the title? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Variables are the characteristics that vary from one participant to another. In Example 3.6.1, the variables are (1) television viewing habits, (2) mathematics achievement, and (3) reading achievement. For instance, children vary (or differ) in their reading achievement, with some children achieving more than others. Likewise, they vary in terms of their mathematics achievement and television viewing habits.
Example 3.6.1 A TITLE THAT MENTIONS THREE VARIABLES
‒ The Relationship Between Young Children’s Television Viewing Habits and Their Achievement in Mathematics and Reading Note that “young children” is not a variable because the title clearly suggests that only young children were studied. In other words, being a young child did not vary in this study. Instead, it is a common trait of all the participants in the study, or a characteristic of the study sample (more details on sampling are provided in Chapters 6 and 7 of this book). When researchers examine many specific variables in a given study, they may refer to the types of variables in their titles rather than naming each one individually. For instance, suppose a researcher administered a standardized achievement test that measured spelling ability, reading comprehension, vocabulary knowledge, mathematical problem-solving skills, and so on. Naming all these variables would create a title that is too long. Instead, the researcher could refer to this collection of variables measured by the test as academic achievement, which is done in Example 3.6.2.
Example 3.6.2 A TITLE IN WHICH TYPES OF VARIABLES (ACHIEVEMENT VARIABLES) ARE IDENTIFIED WITHOUT BEING NAMED SPECIFICALLY
‒ The Relationship Between Parental Involvement in Schooling and Academic Achievement in the Middle Grades
39
Titles
___ Question 7: Mentions Participants: Does the title identify the types of individuals who participated in the study or the types of aggregate units in the sample? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: It is often desirable to include the names of populations in the title. From the title in Example 3.7.1, it is reasonable to infer that the population of interest consists of graduate students taking a statistics class. This would be of interest to those consumers of research who are searching through a list of hundreds of published articles on cooperative learning. For instance, knowing that the research report deals with this particular population might help consumers rule it out as an article of interest if they are trying to locate research on cooperative learning in elementary school mathematics.
Example 3.7.1 A TITLE IN WHICH THE TYPE OF PARTICIPANTS IS MENTIONED
‒ Effects of Cooperative Learning in a Graduate-Level Statistics Class Example 3.7.2 also names an important characteristic of the research participants – the fact that they are registered nurses employed by public hospitals.
Example 3.7.2 A TITLE IN WHICH THE TYPE OF PARTICIPANTS IS MENTIONED
‒ Administrative Management Styles and Job Satisfaction Among Registered Nurses Employed by Public Hospitals
40
Titles
Sometimes, instead of using individuals in a sample, studies use aggregate-level sampling units (such as schools, cities, states, or countries) and compare them to one another. For titles of such research reports, it is important to mention the type of units in the study sample as well. In Example 3.7.3, neighborhoods are such sampling units.
Example 3.7.3 A TITLE IN WHICH THE TYPE OF UNITS IN THE SAMPLE IS NOT ADEQUATELY MENTIONED
‒ Domestic Violence and Socioeconomic Status: Does the Type of Neighborhood Matter? Take a closer look at the title in Example 3.7.3: Does it provide sufficiently specific information about where the study was conducted? In fact, it is an inadequate title because it fails to mention the key characteristic of the neighborhoods in the study – that they are all located in the city of São Paulo, Brazil. Thus, a researcher looking, say, for studies conducted in South American countries may not even realize that this article should be examined. A more appropriate title for the study would be: “Domestic Violence and Socioeconomic Status in São Paolo, Brazil: Does the Type of Neighborhood Matter?” Often, researchers use a particular group of participants only because they are readily available, such as college students enrolled in an introductory psychology class who are required to participate in research projects. Researchers might use such individuals, even though they are conducting research applicable to all types of individuals. For instance, a researcher might conduct a study to test social relations theory that applies to all types of individuals (explanatory research). In such a case, the researcher might omit mentioning the types of individuals (e.g., college students) in the title because the research is not specifically directed at that population.
___ Question 8: Mentions Theory: If a study is strongly tied to a theory, is the name of the specific theory mentioned in the title? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Theories help advance science because they are propositions regarding relationships that have applications in many diverse, specific situations. For instance, a particular learning theory might have applications for teaching kindergarten children as well as for training astronauts. A useful theory leads to predictions about human behavior that can be tested through research. Many consumers of research are seeking information on specific theories, and their mention in titles helps these consumers to identify reports of relevant research. Thus, when research is closely tied to a theory, the theory should be mentioned in the article title. Example 3.8.1 shows two titles in which specific theories are mentioned.
41
Titles
Example 3.8.1 TWO TITLES THAT MENTION SPECIFIC THEORIES (DESIRABLE)
‒ Application of Terror Management Theory to Treatment of Rural Battered Women ‒ Achievement in Science-Oriented Charter Schools for Girls: A Critical Test of the Social Learning Theory Note that simply using the term theory in a title without mentioning the name of the specific theory is not useful to consumers of research. Example 3.8.2 has this undesirable characteristic.
Example 3.8.2 A TITLE THAT REFERS TO THEORY WITHOUT NAMING THE SPECIFIC THEORY (UNDESIRABLE)
‒ An Examination of Voting Patterns and Social Class in a Rural Southern Community: A Study Based on Theory As discussed in Chapter 2 (Guideline 8), studies that test theories (explanatory studies) are some of the most important in science since they allow us to see the big picture of how the world works and what causes lead to what effects. The next guideline examines why it is important for researchers to avoid overstating the extent to which their research helps establish causality.
___ Question 9: Caution with Causality: If the title implies causality, does the method of research justify it? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: As the famous saying goes, “correlation does not imply causation.” Example 3.9.1 implies that causal relationships (i.e., cause-and-effect relationships) have been examined because the title contains the word effects. This keyword is frequently used by researchers in their titles to indicate that they have explored causality in their studies.
Example 3.9.1 A TITLE IN WHICH CAUSALITY IS IMPLIED BY THE WORD EFFECTS
‒ The Effects of Computer-Assisted Instruction in Mathematics on Students’ Computational Skills
42
Titles
A common method for examining causal relationships is to conduct an experiment. An experiment is a study in which researchers give treatments to participants to determine whether the treatments cause changes in the outcomes.2 In a traditional experiment, different groups of participants are given different treatments (e.g., one group receives computer-assisted instruction, while a more traditional method is used to teach another group). The researcher then compares the outcomes obtained through the application of the various treatments.3 When such a study is conducted, the use of the word effects in the title is justified. Note that this evaluation question merely asks whether there is a basis for suggesting causality in the title. This question does not ask for an evaluation of the quality of the experiment or quasi-experiment (Chapter 9 provides more information about experimental research and evaluating experiments). The title in Example 3.9.2 also suggests that the researcher examined a causal relationship because of the inclusion of the word effects. Note that in this case, however, the researcher probably did not investigate the relationship using an experiment because it would be unethical to manipulate breakfast as an independent variable (i.e., researchers would not want to assign some students to receive breakfast while denying it to others for the purposes of an experiment).
Example 3.9.2 A TITLE IN WHICH CAUSALITY IS IMPLIED BY THE WORD “EFFECTS”
‒ The Effects of Breakfast on Student Achievement in the Primary Grades When it is not possible to conduct an experiment on a causal issue, researchers often conduct what are called ex post facto studies (also called causal-comparative or quasi-experimental studies). In these studies, researchers identify students who differ on some outcome (such as students who are high and low in achievement in the primary grades), but who are the same on demographics and other potentially influential variables (such as parents’ highest level of education, parental income, quality of the schools the children attend, and so on). Comparing the breakfasteating habits of the two groups (i.e., high- and low-achievement groups) might yield some useful information on whether eating breakfast affects4 students’ achievement because the two groups are similar on other variables that might account for differences in achievement (e.g., their parents’ level of education is similar). If a researcher has conducted such a study, the use of the word effects in the title is justified. 2 3
4
Notice that the word experiment is used in a similar way in everyday language: for example, “I don’t know if using local honey would actually relieve my allergy symptoms but I will try it as an experiment.” Experiments can also be conducted by treating a given person or group differently at different points in time. For instance, a researcher might praise a child for staying in his or her seat in the classroom on some days and not praise him or her on others and then compare the child’s seat-staying behavior under the two conditions. Note that in reference to an outcome caused by some treatment, the word is spelled effect (i.e., it is a noun). As a verb meaning “to influence”, the word is spelled affect.
43
Titles
Note that simply examining a relationship without controlling for potentially confounding variables does not justify a reference to causality in the title. For instance, if a researcher merely compared the achievement of children who regularly eat breakfast with those who do not, without controlling for other explanatory variables, a causal conclusion (and, hence, a title suggesting it) usually cannot be justified. Also note that synonyms for effect are influence and impact. They should usually be reserved for use in the titles of studies that are either experiments or quasi-experiments (such as ex post facto studies).
___ Question 10: Special Features: Are any highly unique or very important characteristics of the study referred to in the title or subtitle? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: There may be hundreds of studies on many topics in social and behavioral sciences. To help readers identify those with highly unusual or very important characteristics, references to these should be made in the title. For instance, in Example 3.10.1, the mention of a “nationally representative sample” may help distinguish that study from many others employing only local convenience samples. In Example 3.10.2, the fact that the study is based on a “natural experiment” is a noteworthy feature indicative of a rigorous test of causal effects (as mentioned above, Chapter 9 of this book discusses how experiments can help establish cause-and-effect relationships).
Example 3.10.1 A TITLE THAT POINTS OUT AN IMPORTANT STRENGTH IN SAMPLING
‒ The Relationship Between Teachers’ Job Satisfaction and Compensation in a Nationally Representative Sample 44
Titles
Example 3.10.2 A TITLE THAT POINTS OUT AN IMPORTANT STRENGTH IN METHOD
‒ The Impact of School Quality on Health and Income Outcomes in Adulthood: School Lottery as a Natural Experiment
___ Question 11: Titles with Subtitles: If there are a main title and a subtitle, do both provide important information about the research? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Failure on this evaluation question often results from an author’s use of a “clever” main title that is vague or catchy,5 followed by a subtitle that identifies the specific content of the research report. Example 3.11.1 illustrates this problem. In this example, the main title fails to impart specific information. In fact, it could be applied to thousands of studies in dozens of fields, as diverse as psychology and physics, in which researchers find that various combinations of variables (the parts) contribute to our understanding of a complex whole.
Example 3.11.1 A TWO-PART TITLE WITH A VAGUE MAIN TITLE (INAPPROPRIATE)
‒ The Whole Is Greater Than the Sum of Its Parts: The Relationship Between Playing with Pets and Longevity Among the Elderly In contrast to the previous example, Example 3.11.2 has a main title and a subtitle that both refer to specific variables examined in a research study. The first part names two major variables (“attachment” and “well-being”), while the second part names the two groups that were compared in terms of these variables.
Example 3.11.2 A TWO-PART TITLE IN WHICH BOTH PARTS PROVIDE IMPORTANT INFORMATION
‒ Attachment to Parents and Emotional Well-Being: A Comparison of African American and White Adolescents The title in Example 3.11.2 can also be rewritten as a single statement without a subtitle, as illustrated in Example 3.11.3. 5
For additional information about amusing or humorous titles in research literature, see the online resources for this chapter. 45
Titles
Example 3.11.3 A REWRITTEN VERSION OF EXAMPLE 3.11.2
‒ A Comparison of the Emotional Well-Being and Attachment to Parents in African American and White Adolescents
Examples 3.11.2 and 3.11.3 are equally good. The evaluation question being considered here is neutral on whether a title should be broken into a main title and subtitle. Rather, it suggests that, if broken into two parts, both parts should provide important information specific to the research being reported.
___ Question 12: Overall: Is the title effective and appropriate? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Rate this evaluation question after considering your answers to the earlier ones in this chapter, and taking into account any additional considerations and concerns you may have after reading the entire research article. Visit the Instructor & Student Resources website for multiple choice questions and additional resources: www.routledge.com/cw/tcherni-buzzeo
Chapter 3 Exercises Part A Directions: Evaluate each of the following titles to the extent that it is possible to do so without reading the complete research reports. The references for the titles are given 46
Titles
below. All are from journals that are widely available in large academic libraries. Ideally, you would want to read each article before using the evaluation criteria from this chapter to evaluate the article title. However, the next best thing is to read the article abstract to get a sense of what the study is about (more information about evaluating article abstracts is in Chapter 4 of this book). Keep in mind that there can be considerable subjectivity in determining whether a title is adequate. 1 Sugar and Spice and All Things Nice: The Role of Gender Stereotypes in Jurors’ Perceptions of Criminal Defendants6 2 The Living Experience of Feeling Peaceful7 3 Implicit Attitudes Toward Robots Predict Explicit Attitudes, Semantic Distance Between Robots and Humans, Anthropomorphism, and Prosocial Behavior: From Attitudes to Human–Robot Interaction8 4 On the Triple Exclusion of Older Adults During COVID-19: Technology, Digital Literacy and Social Isolation9 5 Liar, Liar, Pants on Fire! Social Desirability Bias in Software Piracy Research10 6 Beliefs in Conspiracy Theories Following Ostracism11 7 Omega-3 Supplements Reduce Self-Reported Physical Aggression in Healthy Adults12 8 How Changing Needs Change Technological Practices During a Crisis: An Explanation Using Practice Theory13
6
7 8
9
10 11 12
13
Strub, T., & McKimmie, B. M. (2016). Sugar and spice and all things nice: The role of gender stereotypes in jurors’ perceptions of criminal defendants. Psychiatry, Psychology and Law, 23, 487–498. https://doi.org/10. 1080/13218719.2015.1080151 Reding, N. (2022). The living experience of feeling peaceful. Nursing Science Quarterly, 35(4), 464–474. https://doi.org/10.1177/08943184221115133 Spatola, N., & Wudarczyk, O. A. (2021). Implicit attitudes towards robots predict explicit attitudes, semantic distance between robots and humans, anthropomorphism, and prosocial behavior: From attitudes to human–robot interaction. International Journal of Social Robotics, 13, 1149–1159. https://doi.org/10.1007/ s12369-020-00701-5 Zapletal, A., Wells, T., Russell, E., & Skinner, M. W. (2023). On the triple exclusion of older adults during COVID-19: Technology, digital literacy and social isolation. Social Sciences & Humanities Open, 8(1), 100511. https://doi.org/10.1016/j.ssaho.2023.100511 Gergely, M., & Rao, V. S. (2022). Liar, liar, pants on fire! Social desirability bias in software piracy research. Behaviour & Information Technology, 41(13), 2796–2818. https://doi.org/10.1080/0144929X.2021.1950834 Poon, K. T., Chen, Z., & Wong, W. Y. (2020). Beliefs in conspiracy theories following ostracism. Personality and Social Psychology Bulletin, 46(8), 1234–1246. https://doi.org/10.1177/0146167219898944 Bègue, L., Zaalberg, A., Shankland, R., Duke, A., Jacquet, J., Kaliman, P., … & Bushman, B. J. (2018). Omega-3 supplements reduce self-reported physical aggression in healthy adults. Psychiatry Research, 261, 307–311. https://doi.org/10.1016/j.psychres.2017.12.038 Schlosser, P. G., Chung, T. R., & Grover, V. (2023). How changing needs change technological practices during a crisis: An explanation using practice theory. Computers in Human Behavior, 107799. https://doi. org/10.1016/j.chb.2023.107799 47
Titles
9 When Single Parents Marry: Do Children Benefit Academically?14 10 Socioeconomic Disparities, Nighttime Bedroom Temperature, and Children’s Sleep15 11 Federal Statutes and Environmental Justice in the Navajo Nation: The Case of Fracking in the Greater Chaco Region16 12 Intergenerational Transmission of Dyslexia: How do Different Identification Methods of Parental Difficulties Influence the Conclusions Regarding Children’s Risk for Dyslexia?17 13 The Good, the Bad, and the Ugly? A Triarchic Perspective on Psychopathy at Work18 14 Outcome of the AVID College Preparatory Program on Adolescent Health: A Randomized Trial19
Part B Directions: Examine several academic journals that publish on topics of interest to you. Identify two empirical articles with titles you think are especially strong in terms of the evaluation questions presented in this chapter. Also, identify two titles that you believe have clear weaknesses. Bring the four titles to class for discussion.
14 15
16
17
18
19
48
Usevitch, M. T., & Dufur, M. J. (2021). When single parents marry: Do children benefit academically? Family Relations, 70(4), 1206–1221. https://doi.org/10.1111/fare.12535 Hinnant, B., Buckhalt, J. A., Brigham, E. F., Gillis, B. T., & El-Sheikh, M. (2023). Socioeconomic disparities, nighttime bedroom temperature, and children’s sleep. Journal of Applied Developmental Psychology, 86, 101530. https://doi.org/10.1016/j.appdev.2023.101530 Atencio, M., James-Tohe, H., Sage, S., Tsosie, D. J., Beasley, A., Grant, S., & Seamster, T. (2022). Federal statutes and environmental justice in the Navajo Nation: The case of fracking in the Greater Chaco region. American Journal of Public Health, 112(1), 116–123. https://doi.org/10.2105/AJPH.2021.306562 Khanolainen, D., Salminen, J., Eklund, K., Lerkkanen, M. K., & Torppa, M. (2022). Intergenerational transmission of dyslexia: How do different identification methods of parental difficulties influence the conclusions regarding children’s risk for dyslexia? Reading Research Quarterly, 58(2), 220–239. https://doi.org/10.1002/ rrq.482 Kranefeld, I., & Blickle, G. (2022). The good, the bad, and the ugly? A triarchic perspective on psychopathy at work. International Journal of Offender Therapy and Comparative Criminology, 66(15), 1498–1522. https://doi.org/10.1177/0306624X211022667 Dudovitz, R. N., Chung, P. J., Dosanjh, K. K., Phillips, M., Tucker, J. S., Pentz, M. A., … & Wong, M. D. (2023). Outcome of the AVID college preparatory program on adolescent health: A randomized trial. Pediatrics, 151(1), e2022057183. https://doi.org/10.1542/peds.2022-057183
CHAPTER 4
Evaluating Abstracts
The abstract is a summary of a research report that appears below its title. Like the title, it helps consumers of research to identify articles of interest. This function of abstracts is so important that online databases in the social and behavioral sciences provide the abstracts as well as the titles of the articles they include. Many journals have a policy regarding the maximum length of their abstracts. It is common to allow a maximum of 100–250 words.1 When evaluating abstracts, you will need to make subjective decisions about how much weight to give to the various elements included within them, given that their length is typically severely restricted. Make a preliminary evaluation of an abstract when you first encounter it. After reading the associated article, re-evaluate the abstract. The evaluation questions that follow are stated as “yes–no” questions, where a “yes” indicates that you judge the characteristic being considered as satisfactory. You may also want to rate each characteristic on a scale of 1 to 5, where 5 is the highest rating. N/A (not applicable) and I/I (insufficient information to make a judgment) can also be used when necessary.
___ Question 1: Research Purpose: Is the purpose of the study referred to or at least clearly implied? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Many authors begin their abstracts with a brief statement on the problem that needs to be addressed and the purpose of their research. Example 4.1.1 shows the first sentences of an abstract in which this was done.
1
The latest, 7th edition of Publication Manual of the American Psychological Association (APA) suggests that an abstract should be between 150 and 250 words. See online resources for this chapter for more info.
DOI: 10.4324/9781003362661-4
49
Abstracts
Example 4.1.1 – Luo et al. (2022) 2 FIRST SENTENCES OF AN ABSTRACT: SPECIFIC STATEMENT OF THE PROBLEM AND STUDY PURPOSE
School bullying, as a public health problem, has been linked to many emotional disorders. However, the overall status of school bullying among adolescent students in China is unknown. This nationwide study aimed to investigate school bullying in China and evaluate the relationships between school bullying and mental health status. Note that even though the word purpose is not used in Example 4.1.1, the purpose of the study is clearly stated: the study investigates the connections between school bullying and mental health in a particular population.
___ Question 2: Research Methods: Does the abstract mention highlights of the research methodology? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Given the shortness of an abstract, researchers can usually provide only limited information on their research methodology. However, even brief highlights can be helpful to consumers of research who are looking for research reports of interest. Consider Example 4.2.1, which is taken from an abstract. The fact that the researchers used a qualitative methodology employing interviews with small samples is an important methodological characteristic that might set this study apart from others on the same topic.
Example 4.2.1 – Verity et al. (2022) 3 AN EXCERPT FROM AN ABSTRACT: HIGHLIGHTS OF RESEARCH METHODOLOGY
There is limited qualitative research on the experience of loneliness in adolescence, meaning key facets of the loneliness experience that are important in adolescence may have been overlooked. The current study addresses that gap in the literature and explores how loneliness is experienced in the context of adolescence from the perspective of adolescents. About 67 online counseling conversations between Childline counselors and adolescents (ages 12–18 years; 70% females) who had contacted Childline to talk about loneliness were
2
3
50
Luo, X., Zheng, R., Xiao, P., Xie, X., Liu, Q., Zhu, K., … & Song, R. (2022). Relationship between school bullying and mental health status of adolescent students in China: A nationwide cross-sectional study. Asian Journal of Psychiatry, 70, 103043. https://doi.org/10.1016/j.ajp.2022.103043 Verity, L., Yang, K., Nowland, R., Shankar, A., Turnbull, M., & Qualter, P. (2022). Loneliness from the adolescent perspective: A qualitative analysis of conversations about loneliness between adolescents and Childline counselors. Journal of Adolescent Research, [online first]. https://doi.org/10.1177/07435584221111121
Abstracts
analyzed using Thematic Framework Analysis to establish commonalities and salient issues involved in adolescent experiences of loneliness.
Likewise, Example 4.2.2 provides important information about the research methodology (the facts that the study was cross-sectional, i.e., conducted at one point in time, that it was administered online, included a convenience sample of over 1,000, and the participants were nurses in Iran).
Example 4.2.2 – Kakemam et al. (2021) 4 AN EXCERPT FROM AN ABSTRACT: HIGHLIGHTS OF RESEARCH METHODOLOGY
We conducted a cross-sectional online study among 1,004 Iranian nurses through the convenience sampling technique.
___ Question 3: No Specs: Has the researcher omitted the titles of measures (except when these are the focus of the research)? Very unsatisfactory
4
1
2
3
4
5
Very satisfactory
or N/A
I/I
Kakemam, E., Chegini, Z., Rouhi, A., Ahmadi, F., & Majidi, S. (2021). Burnout and its relationship to self‐ reported quality of patient care and adverse events during COVID‐19: A cross‐sectional online survey among nurses. Journal of Nursing Management, 29(7), 1974–1982. https://doi.org/10.1111/jonm.13359
51
Abstracts
Comment: Including the full, formal titles of published measures, such as tests, questionnaires, and scales, in an abstract is usually inappropriate (see the exception below) because their names take up space that could be used to convey more important information. Note that consumers of research who are interested in the topic will be able to find the full names of the measures in the body of the article, where space is less limited than in an abstract. A comparison of Examples 4.3.1 and 4.3.2 shows how much space can be saved by omitting the names of the measures while conveying the same essential information.
Example 4.3.1 AN EXCERPT FROM AN ABSTRACT: NAMES THE TITLES OF MEASURES (INAPPROPRIATE DUE TO SPACE LIMITATIONS IN ABSTRACTS)
A sample of 483 college males completed the Attitudes Toward Alcohol Scale (Fourth Edition, Revised), the Alcohol Use Questionnaire, and the Manns–Herschfield Quantitative Inventory of Alcohol Dependence (Brief Form).
Example 4.3.2 AN IMPROVED VERSION OF EXAMPLE 4.3.1
A sample of 483 college males completed measures of their attitudes toward alcohol, their alcohol use, and their dependence on alcohol.
The exception: If the primary purpose of the research is to evaluate the reliability and validity of one or more specific measures, it is appropriate to name them in the abstract and title. 52
Abstracts
This will help readers locate research on the characteristics of specific measures. In Example 4.3.3, mentioning the name of a specific measure is appropriate because the purpose of the study is to determine a characteristic of the measure (its reliability).
Example 4.3.3 AN EXCERPT FROM AN ABSTRACT: PROVIDES THE TITLE OF A MEASURE (APPROPRIATE BECAUSE THE PURPOSE OF THE RESEARCH IS TO INVESTIGATE THE MEASURE)
Test-retest reliability of the Test of Variables of Attention (T.O.V.A.) was investigated in two studies using two different time intervals: 90 min and 1 week (7 days). To investigate the 90-min reliability, 31 school-age children (M = 10 years, SD = 2.66) were administered the T.O.V.A., then re-administered the test.
___ Question 4: Key Findings: Are the highlights of the results described? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Example 4.4.1 shows three sentences of an abstract where the highlights of the study results are described. Notice that the researchers make general statements about their results, such as “real-life social support was then associated with reduced depression,” without stating precisely by how much depression was lower. General statements of this type are acceptable, given the need for brevity in an abstract. In other words, it is acceptable to highlight the results in general terms.
Example 4.4.1 – Meshi & Ellithorpe (2021) 5 THREE SENTENCES OF AN ABSTRACT: HIGHLIGHTS OF STUDY RESULTS
Our analysis revealed that problematic social media use was significantly associated with decreased real-life social support and increased social support on social media. Importantly, real-life social support was then associated with reduced depression, anxiety, and social isolation, while social support on social media was not associated with these mental health measures. Our findings reveal the value of real-life social support when considering the relationship between problematic social media use and mental health.
5
Meshi, D., & Ellithorpe, M. E. (2021). Problematic social media use and social support received in real-life versus on social media: Associations with depression, anxiety and social isolation. Addictive Behaviors, 119, 106949. https://doi.org/10.1016/j.addbeh.2021.106949
53
Abstracts
Note that there is nothing inherently wrong with providing specific statistical results in an abstract if space permits and the statistics are understandable within the limited context of the abstract. Example 4.4.2 illustrates how this might be done.
Example 4.4.2 – Hou et al. (2020) 6 PART OF AN ABSTRACT: SPECIFIC RESULTS REPORTED AS HIGHLIGHTS
Of 3063 participants eligible for analysis, the total prevalence of depression and anxiety was 14.14 and 13.25%. Females were experiencing more severe stress and anxiety symptoms, while males showed better resilience to stress.
___ Question 5: Theory Piece: If the study is strongly tied to a theory, is the theory mentioned in the abstract? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: As indicated in the previous chapter, a theory that is central to a study might be mentioned in the title. If such a theory is not mentioned in the title, it should be mentioned in
6
54
Hou, F., Bi, F., Jiao, R., Luo, D., & Song, K. (2020). Gender differences of depression and anxiety among social media users during the COVID-19 outbreak in China: A cross-sectional study. BMC Public Health, 20, 1648. https://doi.org/10.1186/s12889-020-09738-7
Abstracts
the abstract, as illustrated in Example 4.5.1. It is also acceptable to mention it in both the title and abstract, as illustrated in Example 4.5.2. (Note that italics have been used in these examples for emphasis.)
Example 4.5.1 – Qi et al. (2011) 7 TITLE AND ABSTRACT: A SPECIFIC THEORY IS NAMED IN THE ABSTRACT BUT NOT IN THE TITLE (ACCEPTABLE TO DE-EMPHASIZE THEORY)
Title: Self-Efficacy Program to Prevent Osteoporosis Among Chinese Immigrants Objectives: The aim of this study was to evaluate the preliminary effectiveness of an educational intervention based on the self-efficacy theory aimed at increasing the knowledge of osteoporosis and adoption of preventive behaviors, including regular exercise and osteoporosis medication adherence, designed for Chinese immigrants, aged 45 years or above, living in the United States.
7
Qi, B.-B., Resnick, B., Smeltzer, S. C., & Bausell, B. (2011). Self-efficacy program to prevent osteoporosis among Chinese immigrants. Nursing Research, 60(6), 393–404. https://doi.org/10.1097/NNR.0b013e3182 337dc3
55
Abstracts
Example 4.5.2 – Cornacchione et al. (2016) 8 TITLE AND ABSTRACT: A SPECIFIC THEORY IS MENTIONED IN THE TITLE AND ABSTRACT (ACCEPTABLE TO EMPHASIZE THEORY)
Title: An Exploration of Female Offenders’ Memorable Messages from Probation and Parole Officers on the Self-Assessment of Behavior from a Control Theory Perspective Abstract (first half): Guided by control theory, this study examines memorable messages that women on probation and parole receive from their probation and parole agents. Women interviewed for the study were asked to report a memorable message they received from an agent, and to describe situations if/when the message came to mind in three contexts likely to emerge from a control theory perspective: when they did something of which they were proud, when they stopped themselves from doing something they would later regret, and when they did something of which they were not proud.
___ Question 6: No “Implications Are Discussed”: Has the researcher avoided making vague references to implications and future research directions? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Most researchers discuss the implications of their research and directions for future research near the end of their articles. The limited amount of space allotted to abstracts should not be used to make vague references to these matters. Example 4.6.1 is the closing sentence from an abstract. It contains vague references to implications and future research.
Example 4.6.1 LAST SENTENCE OF AN ABSTRACT: VAGUE REFERENCES TO IMPLICATIONS AND FUTURE RESEARCH (INAPPROPRIATE)
This article concludes with a discussion of both the implications of the results and directions for future research. The phrase in Example 4.6.1 could safely be omitted from the abstract without causing a loss of important information because most readers will correctly assume that most research reports discuss these elements. An alternative is to state something specific about these matters, as 8
56
Cornacchione, J., Smith, S. W., Morash, M., Bohmert, M. N., Cobbina, J. E., & Kashy, D. A. (2016). An exploration of female offenders’ memorable messages from probation and parole officers on the self-assessment of behavior from a control theory perspective. Journal of Applied Communication Research, 44(1), 60–77. https://doi.org/10.1080/00909882.2015.1116705
Abstracts
illustrated in Example 4.6.2. Note that in this example, the researcher does not describe the implications, but indicates that the implications will be of special interest to a particular group of professionals – school counselors. This will alert school counselors that this article (among hundreds of others on drug abuse) might be of special interest to them. If space does not permit such a long closing sentence in the abstract, it could be shortened to “Implications for school counselors are discussed.”
Example 4.6.2 IMPROVED VERSION OF EXAMPLE 4.6.1 (LAST SENTENCE OF AN ABSTRACT)
While these results have implications for all professionals working with drug-abusing adolescents, special attention is given to the implications for school counselors. In short, implications and future research need not necessarily be mentioned in the abstracts. However, if they are mentioned, something specific should be said about them.
___ Question 7: Perfect Abstract Trifecta: Does the abstract include the purpose/objectives, methods, and results of the study? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Some academic journals started requiring the partitioning of abstracts into Objective–Methods–Results, or even more commonly in recent years, into Purpose–Methods– Results–Conclusions, and other similar variations. This is a convenient way to ensure that key pieces of information are included in the abstract with an explicit subheading. Examples 4.7.1 and 4.7.2 provide an illustration of such partitioned abstracts (notice that both studies were published in the Journal of American College Health, with a 12-year gap).
Example 4.7.1 – Cremeens et al. (2011) 9 A TRI-PARTITIONED ABSTRACT: OBJECTIVE/METHODS/RESULTS
Objective: The purpose of this study was to examine challenges and recommendations (identified by college administrators) to enforcing alcohol policies implemented at colleges in the south eastern United States. Methods: Telephone interviews were conducted with 71 individuals at 21 institutions. Results: Common challenges included inconsistent
9
Cremeens, J. L., Usdan, S. L., Umstattd, M. R., Talbott, L. L., Turner, L., & Perko, M. (2011). Challenges and recommendations to enforcement of alcohol policies on college campuses: An administrator’s perspective. Journal of American College Health, 59(5), 427–430. https://doi.org/10.1080/07448481.2010.502201
57
Abstracts
enforcement, mixed messages received by students, and students’ attitudes toward alcohol use. The most common recommendations were ensuring a comprehensive approach, collaboration with members of the community, and enhanced alcohol education.
Example 4.7.2 – Johnson et al. (2023) 10 A PENTA-PARTITIONED ABSTRACT: OBJECTIVE/PARTICIPANTS/METHODS/ RESULTS/CONCLUSIONS
Exposure to potentially traumatic race-based experiences poses a risk factor for risky drinking among college students from historically marginalized racial/ethnic backgrounds. Objective: The current study examined the relationship between both the level (severity) and pattern of race-based traumatic stress (RBTS) reactions and risky drinking. Participants: The current study sample was made up of 62 male (23.5%) and 202 female (76.5%) Latino/a/x, Black, and Asian college students attending a minority-serving institution. Methods: Study participants were asked to participate in an anonymous online survey. Results: A criterion profile analysis revealed that higher scores on RBTS reactions overall, and elevated scores on RBTS – avoidance, low self-esteem, and anger, specifically, were indicative of more risky drinking. Conclusions: These findings highlight a distinct pattern of RBTS scores that may predict a vulnerability to risky drinking and underscore the importance of racial trauma healing in alcohol use prevention and intervention efforts.
10
58
Johnson, V. E., Chng, K., & Courtney, K. (2023). Racial trauma as a risk factor for risky alcohol use in diverse college students. Journal of American College Health, [online first]. https://doi.org/10.1080/07448481.2023. 2214247
Abstracts
However, even if a particular journal does not require the partitioning of abstracts, it is still a good rule of thumb to look for these key pieces of information when evaluating an abstract. Moreover, Objectives-Methods-Results are just as relevant for visual abstracts as they are for written ones. A recent trend among academic journals has been to add a graphical or video abstract and, in some cases, a plain language summary to the standard, traditional abstracts. The goal is to improve accessibility and appeal by providing additional useful formats for absorbing information about the study. Some emerging research comparing the effectiveness of different formats seems to show that video abstracts and plain language summaries are especially effective and may lead to higher visibility of the articles.11
___ Question 8: Overall: Is the abstract effective and appropriate? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Rate this evaluation question after considering your answers to the earlier ones in this chapter, while taking into account any additional considerations and concerns you may have. When answering this evaluation question, pay special attention to whether all three major elements described in the previous section (objectives, methods, and results) are included in the abstract. Visit the Instructor & Student Resources website for multiple choice questions and additional resources: www.routledge.com/cw/tcherni-buzzeo
Chapter 4 Exercises Part A Directions: Evaluate each of the following abstracts (to the extent that it is possible to do so without reading the associated articles) by answering Evaluation Question 8 (“Overall, is the abstract effective and appropriate?”) using a scale from 1 (very unsatisfactory) to 5 (very satisfactory). In the explanations for your ratings, refer to the other evaluation questions in this chapter. Point out both strengths and weaknesses, if any, of the abstracts.
11
See the online resources for this chapter for examples of visual abstracts and the relevant research on their effectiveness.
59
Abstracts
1
PIQUERO ET AL. (2022)12 Title: A COVID-19 Public Health Silver Lining? Reductions in Driving under the Influence Arrests and Crashes in Miami-Dade County Abstract: The health crisis that began in early 2020 has generated a large amount of interest in the effect of COVID-19 on public health. The majority of this work has centered around trying to better understand how the virus spreads, where it spreads, who is at risk and when, in order to provide evidence-based guidance to the public, and stop the pandemic. The Centers for Disease Control has continued to report the largely somber findings; however, there are silver linings. The temporary reduction in daily global CO2 emissions was one of these, but there are others. In this case study on Miami-Dade County, Fl, a regression discontinuity model is used to highlight reductions in both drunk driving crashes and driving under the influence arrests. While we observed immediate reductions in both crashes and arrests as a result of the March 2020 lockdown, more importantly over the duration of the year since the lockdown we observed a staggering reduction of over 800 fewer driving under the influence arrests and almost 150 alcohol-related motor-vehicle crashes.
2
JACKSON & VAUGHN (2017)13 Title: Sleep and Preteen Delinquency: Is the Association Robust to ADHD Symptomatology and ADHD Diagnosis? Abstract: Both qualitative and quantitative aspects of sleep have been linked to multiple dimensions of well-being. An emerging body of research has also revealed that poor sleep during adolescence can increase the likelihood of delinquent involvement. The contribution of early sleep difficulties to later delinquency, however, is often overlooked. Furthermore, the role that ADHD symptomatology and/or diagnosis might play in this association has not been adequately addressed, despite findings suggesting that both sleep disturbances and delinquent involvement are more common among children with ADHD symptomatology or an ADHD diagnosis. The current study examines the associations between sleep behaviors and preteen delinquency, and the extent to which ADHD symptomatology and/or diagnosis might inform these associations. Data from the Fragile Families and Child Wellbeing Study (FFCWS)
12
13
60
Piquero, A. R., Kurland, J., Piquero, N. L., & Talpins, S. K. (2022). A COVID-19 public health silver lining? Reductions in driving under the influence arrests and crashes in Miami-Dade County. Deviant Behavior, 43(10), 1285–1291. https://doi.org/10.1080/01639625.2021.1986683 Jackson, D. B., & Vaughn, M. G. (2017). Sleep and preteen delinquency: Is the association robust to ADHD symptomatology and ADHD diagnosis? Journal of Psychopathology and Behavioral Assessment, 39, 585– 595. https://doi.org/10.1007/s10862-017-9610-1
Abstracts
were employed to explore these associations and logistic regression techniques were utilized to analyze the data. The findings reveal that both sleep problems and sleep duration are associated with the odds of ADHD symptomatology, an ADHD diagnosis, and preteen delinquency. Even so, the results also suggest that persistent sleep problems are not significantly associated with the odds of preteen delinquency once ADHD symptomatology and diagnosis are taken into account. The influence of sleep duration on preteen delinquency, however, is robust to the association between ADHD measures and preteen delinquency. Poor sleep, therefore, appears to be an important modifiable risk factor for preteen delinquency. Even so, future investigations into the link between sleep and delinquency should account for developmental risks and/or disorders that commonly co-occur with sleep problems. 3
ROQUES ET AL. (2020)14 Title: Using a Mixed-Methods Approach to Analyze Traumatic Experiences and Factors of Vulnerability among Adolescent Victims of Bullying. Abstract: A number of studies have analyzed the bullying phenomenon among adolescent victims. Relatively few studies, however, have specifically addressed the associated post-traumatic stress disorder (PTSD). Our clinical practice and therapeutic encounters with adolescents reveal that the majority of bullied adolescents suffer from high levels of PTSD. The objective of this study is to further explore bullied adolescents’ traumatic experiences. In an attempt to analyze these experiences, this article presents a mixed-methods approach. Such an approach will allow to analyze the PTSD that results from bullying as well as subjects’ psychic and family-relevant vulnerabilities. First, bullying will be defined in the context of adolescence. Then the main studies on bullying will be presented. The objectives, tools and methods of analysis will be presented. The interviews will be analyzed according to the Interpretative Phenomenological Analysis (IPA) method. Projective tools, family drawings, Rorschach and Thematic Apperception Test (TAT), will be analyzed using a psychoanalytic interpretation method. Each qualitative tool will be used alongside a validated quantitative tool. The Clinical Administered PTSD Scale (CAPS-CA-5 questionnaire) and the interviews conducted will thus allow to analyze PTSD and traumatic experiences. The Family Assessment Device (FAD) and the family drawing test will enable to assess family functioning; lastly, the Symptom Check List (SCL-90) that will be used alongside Rorschach and TAT tests will allow to analyze individual psychological
14
Roques, M., Laimou, D., Camps, F. D., Mazoyer, A. V., & El Husseini, M. (2020). Using a mixed-methods approach to analyze traumatic experiences and factors of vulnerability among adolescent victims of bullying. Frontiers in Psychiatry, 10, 890. https://doi.org/10.3389/fpsyt.2019.00890
61
Abstracts
vulnerabilities. This approach will increase data validity. The originality of this research study is based on a mixed-methods approach, our methodology which is based on clinical psychology, and the choice of certain research tools which have received little attention to date. Ultimately, this study may help improve how bullying is identified and could contribute toward the reinforcement or revision of the criteria that characterize bullying. Lastly, it may help us explore various unexamined dimensions of bullying. A possible limitation is the complexity associated with such a protocol. 4
KWON ET AL. (2017)15 Title: The Multifaceted Nature of Poverty and Differential Trajectories of Health Among Children Abstract: The relationships between poverty and children’s health have been well documented, but the diverse and dynamic nature of poverty has not been thoroughly explored. Drawing on cumulative disadvantage and human capital theory, we examined to what extent the duration and depth of poverty, as well as the level of material hardship, affected changes in physical health among children over time. Data came from eight waves of the Korea Welfare Panel Study between 2006 and 2013. Using children who were under age 10 at baseline (N = 1657, Observations = 13,256), we conducted random coefficient regression in a multilevel growth curve framework to examine poverty group differences in intraindividual change in health status. Results showed that chronically poor children were most likely to have poor health. Children in households located far below the poverty line were most likely to be in poor health at baseline, while near-poor children’s health got significantly worse over time. Material hardship also had a significant impact on child health.
5
EREN & OWENS (2023)16 Title: Economic Booms and Recidivism Abstract: Objectives This paper examines the impact of local economic activity on criminal behavior. We build on existing research by relaxing the identification assumptions required for causal inference, and estimate the impact of local economic activity on recidivism.
15
16
62
Kwon, E., Kim, B., & Park, S. (2017). The multifaceted nature of poverty and differential trajectories of health among children. Journal of Children and Poverty, 23(2), 141–160. https://doi.org/10.1080/10796126. 2017.1300575 Eren, O., & Owens, E. (2023). Economic booms and recidivism. Journal of Quantitative Criminology, [online first]. https://doi.org/10.1007/s10940-023-09571-2
Abstracts
Methods We use the fracking boom as a source of credibly exogenous variation in the economic conditions into which incarcerated people are released. We replicate and extend existing instrumental variables analyses of fracking on how many released offenders return to state prison separately from aggregate crime and arrests. Results Our instrumental variables estimates imply that a ten thousand dollar increase in the value of per capita production is associated with a 2.8% reduction in the 1-year recidivism of ex-offenders at the county level. Improved labor market conditions, specifically an increase in wages for young adults, may explain a non-negligible fraction of the reduction in recidivism associated with economic booms. In contrast, we replicate existing work finding that fracking increased aggregate measures of crime and arrests. Conclusion Increased economic opportunity appears to have a different impact of overall crime than on recidivism. This suggests that the relationship between economic opportunity and offending may be conditioned by local social ties. Further research examining how social connections and labor markets affect individual criminal behavior is needed. 6
HENRY & SOLARI (2020) 17 Title: Targeting Oral Language and Listening Comprehension Development for Students with Autism Spectrum Disorder: A School-Based Pilot Study Abstract: This study investigates the effects of an integrated oral language and listening comprehension intervention for early elementary students with ASD. Students (n = 43) were randomly assigned to intervention or control comparison conditions, with intervention students receiving instruction in small groups of 3 or 4. Groups were led by special education classroom teachers 4 days per week across 20 weeks in the school year. Significant group differences were detected on measures of expressive vocabulary, narrative ability, and listening comprehension. This study provides preliminary evidence of the intervention’s feasibility and effectiveness for intervening in language and early reading skills for students with ASD.
17
Henry, A. R., & Solari, E. J. (2020). Targeting oral language and listening comprehension development for students with autism spectrum disorder: A school-based pilot study. Journal of Autism and Developmental Disorders, 50, 3763–3776. https://doi.org/10.1007/s10803-020-04434-2
63
Abstracts
Part B Directions: Examine several academic journals that publish on topics of interest to you. Identify two empirical articles with abstracts that you think are especially strong in terms of the evaluation questions presented in this chapter. Also, identify two abstracts that you believe have clear weaknesses. Bring the four abstracts to class for discussion.
64
CHAPTER 5
Evaluating Introductions and Literature Reviews
Research reports in academic journals usually begin with an introduction in which research literature is cited.1 An introduction with an integrated literature review has the following five purposes: ■ ■ ■ ■ ■
introduce the problem area, establish its importance, provide an overview of the relevant literature, show how the current study will advance knowledge in the area, and provide background for the researcher’s specific research questions, purposes, or hypotheses (which are usually stated in the last paragraph of the introduction or literature review).
This chapter presents evaluation questions to assess the front end of a research report consisting of an introduction and literature review. To cover both the most salient characteristics of an introduction/literature review and the details indicative of its quality, the evaluation questions in this chapter are organized into four major sections: 1. 2. 3. 4.
General Structure Language Matters Quality of Sources Quality of Analysis
Each of these four sections has several specific questions that you need to pay attention to when evaluating introductions and literature reviews.
1
General Structure
To ensure the overall effectiveness of the introduction/literature review, its organization or structure is extremely important. Creating an outline could be extremely important for those 1
In theses and dissertations, the first chapter usually is the introduction, with relatively few references to the literature. This is followed by a chapter that provides a comprehensive literature review.
DOI: 10.4324/9781003362661-5
65
Introductions and Literature Reviews
preparing to write their own literature reviews. The evaluation questions below would help you understand how literature reviews should be structured.
___ Question 1a: Intro Funnel: Does the researcher begin by quickly moving from a general topic to a more specific problem area?2 Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I2
Comment: Some researchers start their introductions with statements that are so broad that they fail to identify the specific area of investigation. As the beginning of an introduction to the study of factors associated with the decline in juvenile violence in the United States, Example 5.1.1 is deficient. Notice that it fails to identify the specific area (decline in violence among youth) to be explored in the research.
Example 5.1.1 BEGINNING OF AN INAPPROPRIATELY BROAD INTRODUCTION
There are many crimes being committed in the USA. They bring a lot of devastation to the families of victims and cause problems to the families of offenders as well. There have been multiple studies that looked into the reasons for why people commit crime, as well
2
66
Continuing with the same scheme as in the previous chapters, N/A stands for “Not applicable” and I/I stands for “Insufficient information to make a judgment.”
Introductions and Literature Reviews
as many studies that tried to determine why people desist, that is, stop committing crime. However, we still do not have a complete picture of the reasons for criminal behavior. Example 5.1.2 illustrates a more appropriate beginning for a research report on violence declines among youth.
Example 5.1.2 – Tcherni-Buzzeo (2023) 3 A SPECIFIC BEGINNING (COMPARE WITH Example 5.1.1)
The recent COVID-19 pandemic has been associated with an increase in violent crime, especially homicide and mass shootings, in the USA (Rosenfeld et al., 2021; Schildkraut & Turanovic, 2022). At the same time, evidence shows that these increases in violence largely did not involve the youth population (Mendel, 2022). In fact, violent crime among young people has been decreasing since the early 1990s, both based on official arrest data (Puzzanchera, 2019, 2020), National Crime Victimization Survey (Irwin et al., 2021; Kaylen et al., 2017), as well as Youth Risk Behavior Surveillance System (Perlus et al., 2014). In fact, during the 1990s, the most dramatic decrease in crime and delinquency happened among young people (Baumer et al., 2021; Cook & Laub, 2002; Mendel, 2022; Tcherni-Buzzeo, 2019). For example, the percentage of all arrests in the USA that involved youth under 18 more than halved— from 15% in 2000 to 7% in 2019 (Mendel, 2022), while arrests for homicide in this age group decreased by three quarters—from around 120 per 1 million in the early 1990s to under 30 per 1 million in the 2010s (Tcherni-Buzzeo, 2019, p. 314). Thus, the possible explanations about the “great American crime decline” must take into account this important fact…. Deciding whether a researcher has started the introduction by being reasonably specific often involves some subjectivity. As a general rule, the researcher should get to the point quickly, without using valuable journal space to outline a very broad problem area rather than the specific one(s) that they directly studied.
___ Question 1b: Problem Importance: Does the researcher establish the importance of the problem area? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Researchers select research problems that they believe are important, and they should specifically address this belief early in their introductions. Often, this is done by citing previously
3
Tcherni-Buzzeo, M. (2023). Increased prescribing of psychotropic drugs or school-based services for children with disabilities? Associations of these self-control-boosting strategies with juvenile violence at the state level. Journal of Developmental and Life-Course Criminology, [online first]. https://doi.org/10.1007/ s40865-023-00223-4
67
Introductions and Literature Reviews
published statistics that indicate how widespread a problem is, how many individuals are affected by it, and so on. Example 5.1.2 above shows how the first paragraph of a study can incorporate important statistical information. Example 5.1.3 below provides another illustration of the effective use of statistics in the introduction of a program evaluation study focused on reducing school bullying.
Example 5.1.3 – Hall & Chapman (2018) 4 FIRST PARAGRAPH OF AN INTRODUCTION THAT INCLUDES STATISTICS TO ESTABLISH THE IMPORTANCE OF A PROBLEM AREA
Bullying in schools is a pervasive and ongoing threat to the mental health and school success of students. A meta-analysis of 21 U.S. studies showed that on average 18% of youth were involved in bullying perpetration, 21% of youth were involved in bullying victimization, and 8% of youth were involved in both perpetration and victimization (Cook, Williams, Guerra, & Kim, 2010).5 In addition, the Youth Risk Behavior Survey, which started measuring bullying victimization in 2009, has shown that the prevalence rate has remained at 20% since that time (Centers for Disease Control and Prevention [CDC], 2016). Instead of providing statistics on the prevalence of problems, researchers sometimes use other strategies to convince readers of the importance of the research problems they have studied. One approach is to show that prominent individuals or influential authors have considered and addressed the issue being researched. Another approach is to show that a topic is of current interest because of the actions taken by governments (such as legislative actions), major corporations, and professional associations. Example 5.1.4 illustrates the latter technique, in which both legislative actions and public debate around them are cited.
Example 5.1.4 – Raus et al. (2021) 6 BEGINNING OF AN INTRODUCTION THAT USES A NONSTATISTICAL ARGUMENT TO ESTABLISH THE IMPORTANCE OF A PROBLEM
Euthanasia was decriminalized in Belgium in 2002, making it one of only a handful of countries where this practice is allowed under certain conditions (Law of 28 May 2002
4
5
6
68
Hall, W. J., & Chapman, M. V. (2018). Fidelity of implementation of a state antibullying policy with a focus on protected social classes. Journal of School Violence, 17(1), 58–73. https://doi.org/10.1080/15388220.201 6.1208571 Notice that in many of the articles published before 2020, references to studies with multiple authors are NOT listed in the (First Author et al., YEAR) format that became standard when the 7th edition of APA formatting guidelines came out at the end of 2019. Raus, K., Vanderhaegen, B., & Sterckx, S. (2021). Euthanasia in Belgium: Shortcomings of the law and its application and of the monitoring of practice. The Journal of Medicine and Philosophy: A Forum for Bioethics and Philosophy of Medicine, 46(1), 80–107. https://doi.org/10.1093/jmp/jhaa031
Introductions and Literature Reviews
on Euthanasia). Although the passing of the Euthanasia Law in 2002 was the result of significant parliamentary debate, it was in no way an end point; societal and political debate continues, for example, on whether or not to widen the scope of this law (Van den Broek and Eeckhaut, 2017). The interpretation and application of the Belgian Euthanasia Law are far from settled. Witness to this is the fact that in the 18-year period that has passed since the enactment of the Euthanasia Law, there have been many political attempts to amend it. Proposals, for example, have been submitted to legally oblige physicians who receive a euthanasia request but are unwilling to perform euthanasia themselves to refer the patient to another (more willing) physician. A number of legislative proposals have also been filed to allow euthanasia for patients with advanced dementia and for minors who are unable to consent (e.g., neonates). Finally, a researcher may attempt to establish the nature and importance of a problem by citing anecdotal evidence or personal experiences. While this is arguably the weakest way to establish the importance of a problem, a unique and interesting anecdote might convince readers that the problem is sufficiently important to investigate. A caveat: When you apply Evaluation Question 1b to the introduction of a research report, do not confuse the importance of the problem with your personal interest in it. It is possible to have little personal interest in a problem, yet still recognize that a researcher has established its importance. On the other hand, it is possible to have a strong personal interest in a problem, but judge that the researcher has failed to make a strong argument (or present convincing evidence) to establish its importance.
___ Question 1c: Theory Explained: Are any underlying theories adequately described? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: If a theory is named in the introduction to a research article, the theory should be adequately described. As a general rule, even a well-known theory should be described in at least one short paragraph (along with one or more references where additional information can be found). Lesser-known and new theories should be described in more detail. Example 5.1.5 briefly but clearly summarizes a key aspect of general strain theory, which underlies the author’s research.7
7
Notice that this is a very brief description of a theory in the introduction of a research article. Further in the article, discussion of the theory is expanded considerably.
69
Introductions and Literature Reviews
Example 5.1.5 – Paez (2018) 8 AN EXCERPT FROM THE INTRODUCTION TO A RESEARCH ARTICLE THAT DESCRIBES A THEORY UNDERLYING THE RESEARCH
This study applies general strain theory to contribute to literature that explores factors associated with engagement in cyberbullying. General strain theory posits that individuals develop negative emotions as a result of experiencing strain (e.g., anger and stress), and are susceptible to engaging in criminal or deviant behavior (Agnew, 1992). In contrast with other studies on cyberbullying, this study applies general strain theory to test the impact that individual and social factors of adolescents have on engagement in cyberbullying.
Note that much useful research is non-theoretical. In Chapter 1, we discuss exploratory, descriptive, explanatory, and evaluation studies. Of these four types, explanatory studies are most likely to involve a theory. The other three study types are mostly non-theoretical: in many exploratory and descriptive studies, the purpose is to collect and interpret data in order to understand the phenomenon better; in some descriptive and most evaluation studies, the purpose may be to make a practical decision. For instance, a researcher might poll parents to determine what percentage favors a proposed regulation that would require students to wear uniforms when attending school. Non-theoretical information on parents’ attitudes toward requiring uniforms might be an important consideration when a school board is making a decision on the issue. In another example of a descriptive study, without regard to theory, a researcher might collect data on the percentage of pregnant women attending a county medical clinic who use tobacco products during pregnancy. The resulting data will help decision-makers determine the prevalence of this problem within their clinic’s population.
8
70
Paez, G. R. (2018). Cyberbullying among adolescents: A general strain theory perspective. Journal of School Violence, 17(1), 74–85. https://doi.org/10.1080/15388220.2016.1220317
Introductions and Literature Reviews
In the case of a non-theoretical evaluation study, researchers may wonder whether boot camps reduce juvenile delinquency compared to a traditional community service approach. Thus, the researchers secured the judge’s agreement to randomly assign half of the youth adjudicated for minor offenses to boot camps and the other half to community service. Then, the researchers compared the rates of recidivism between the two groups of juveniles a year later. Evaluation research is covered in more detail in Appendix B: Program/Policy Evaluation. Evaluation studies are very important in assessing the effectiveness of various interventions and treatments but are unlikely to involve a theoretical basis. Chapter 14 provides more information about evidence-based programs and research aimed at creating such an evidence base. When applying Evaluation Question 1c (Theory Explained) to non-theoretical research, “not applicable” (N/A) is usually the fitting answer. A special note for evaluating qualitative research: qualitative researchers often explore problem areas without an initial reference to theories and hypotheses (this type of research is often exploratory). Sometimes, they develop new theories (models and other generalizations) as they collect and analyze data. Such theories, developed in qualitative research or by summarizing data/observations, are called grounded theories. The data often takes the form of transcripts from open-ended interviews, notes on direct observation, involvement in activities with participants, and so on. Thus, in a research article reporting on qualitative research, a theory might not be described until the Analysis section of the article (instead of the Introduction). When this is the case, apply Evaluation Question 1c to the point at which the theory is discussed. Chapter 7 provides examples of qualitative studies that are not intended for generalization. A more in-depth analysis of qualitative research overall is explained in Chapter 11.
___ Question 1d: Logical Flow: Does the literature review move from topic to topic instead of from citation to citation? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: A literature review that typically fails on this evaluation question is organized around citations rather than topics. For instance, a researcher might inappropriately first summarize Smith’s study, then Jones’s study, then Miller’s study, and so on. The result is a series of annotations that are merely strung together, which makes it look more like an annotated bibliography (or list of article abstracts) than a literature review. This fails to show the reader how various sources relate to each other, and what they mean as a whole. In contrast, a good literature review should be organized around topics and subtopics, with references cited as needed, often in groups of two or more citations per source. For instance, if four empirical studies support a certain point, the point should usually be stated with all four references cited together (as opposed to citing them in separate statements or paragraphs that summarize each of the four studies separately). In Example 5.1.6, there are separate citations for each point, including citations in groups of two, four, and five sources per point. 71
Introductions and Literature Reviews
Example 5.1.6 – Tcherni-Buzzeo (2023) 9 AN EXCERPT FROM A LITERATURE REVIEW WITH SOURCES CITED IN GROUPS
Since the crime declines happened during approximately the same time—starting in the early-to-mid 1990s—not only in the USA, but in other developed (and many developing) countries as well (Eisner et al., 2016; Tseloni et al., 2010), theories for the crime drop must be universal enough to go beyond localized explanations like policing strategies (Blumstein & Wallman, 2006) or abortion legalization (Donohue & Levitt, 2001). One such rather universal possibility is the increased prescribing of psychotropic medications to children and adolescents and the respective increased consumption of psychotropic drugs (Bouvy & Liem, 2012; Finkelhor & Johnson, 2017; Finkelhor & Jones, 2006; Marcotte & Markowitz, 2011; Pappadopulos et al., 2006). Importantly, the increased psychotropic medication prescribing applies to adults as well, even though the increase was much more drastic among juveniles (Olfson et al., 2002, 2006, 2012; Pappadopulos et al., 2006).
When a researcher discusses a particular source that is crucial to a point being made, that source should be discussed in more detail than the mere mention in Example 5.1.6. However, because research reports in academic journals are expected to be relatively brief, detailed discussions of individual sources should be presented sparingly and only for the most important and relevant literature. 9
72
Tcherni-Buzzeo, M. (2023). Increased prescribing of psychotropic drugs or school-based services for children with disabilities? Associations of these self-control-boosting strategies with juvenile violence at the state level. Journal of Developmental and Life-Course Criminology, [online first]. https://doi.org/10.1007/ s40865-023-00223-4
Introductions and Literature Reviews
___ Question 1e: Organized Subsections: Are very long introductions or literature reviews broken into subsections, each with its own subheading? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: When there are a number of issues to be covered in a long introduction or literature review, there may be several parts, each with its own subheading. The subheadings help to guide readers through long introductions, visually and conceptually breaking them down into more easily ‘digestible’ parts. For instance, Example 5.1.7 shows how subheadings can be used within the introduction to a study of the risk and protective factors for alcohol and marijuana use among urban and rural adolescents.
Example 5.1.7 – Clark et al. (2011) 10 FIVE SUBHEADINGS USED WITHIN AN INTRODUCTION – – – – –
Individual Factors Family Factors Peer Factors Community Factors Risk and Protective Factors among Urban and Rural Youths
In Example 5.1.8, a typical way to number subsections is demonstrated in a study of older adults’ exclusion experiences during the COVID-19 pandemic.
Example 5.1.8 – Zapletal et al. (2023)11 NUMBERED SUBHEADINGS USED WITHIN AN INTRODUCTION
1.
Introduction 1.1. Challenges and opportunities of digital technology use in older adults 1.2. Older adults, digital technology, and COVID-19 1.3. Digital technology and social isolation and exclusion
10
Clark, T. T., Nguyen, A. B., & Belgrave, F. Z. (2011). Risk and protective factors for alcohol and marijuana use among African American rural and urban adolescents. Journal of Child & Adolescent Substance Abuse, 20(3), 205–220. https://doi.org/10.1080/1067828X.2011.581898 Zapletal, A., Wells, T., Russell, E., & Skinner, M. W. (2023). On the triple exclusion of older adults during COVID-19: Technology, digital literacy and social isolation. Social Sciences & Humanities Open, 8(1), 100511. https://doi.org/10.1016/j.ssaho.2023.100511
11
73
Introductions and Literature Reviews
Important: When you are planning to write your own literature review, it is a good idea to start from an outline that looks very much like the list of subheadings in the examples above: it lists the subtopics you need to discuss.
2
Language Matters
From providing clear definitions to citing the sources of information (and knowing when they are needed and how many sources to cite per point) to using proper research language to describe the findings of previous research, this section will help you evaluate introductions and literature reviews as well as write your own.
___ Question 2a: Clear Defnitions: Has the researcher provided adequate conceptual definitions of key terms? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Often, researchers will pause at appropriate points in their introductions to offer formal conceptual definitions, such as the one shown in Example 5.2.1. Chapter 2 discusses the importance of definitions in research reports. A conceptual definition explains what the term means or includes, while an operational definition explains how the term is measured in the study.12 12
74
A conceptual definition identifies a term using only general concepts but with enough specificity that the term is not confused with other related terms or concepts. As such, it resembles a dictionary definition. In contrast, an operational definition describes the physical process used to create the corresponding variable. This type
Introductions and Literature Reviews
Note that it is acceptable for a researcher to cite a previously published definition, which is done in Example 5.2.1. In addition, note that the researchers mention both the conceptual definition (what burnout involves) and the operational definition (that it is measured by a specific inventory).
Example 5.2.1 – Dijxhoorn et al. (2021)13 A CONCEPTUAL DEFINITION PROVIDED IN AN ARTICLE’S INTRODUCTION
Burnout founds its origin in caring occupations which due to its aim to help people in need can be experienced as emotionally stressful [7].14 One of the early and commonly used definitions of burnout by Maslach et al. describes burnout as a psychological syndrome involving emotional exhaustion, depersonalization, and a diminished sense of personal accomplishment that occurred among professionals who work with other people in challenging situations [7]. These constructs are also represented in the widely used Maslach Burnout Inventory (MBI) [8]. At times, researchers may not provide formal conceptual definitions because the terms have widespread and commonly held definitions. For instance, in a research article on various methods of teaching handwriting, a researcher may not offer a formal definition of handwriting, which might be acceptable. In sum, this evaluation question should not be applied mechanically by examining whether there is a specific definition. The mere absence of one does not necessarily mean that a researcher has failed on this evaluation question, because a conceptual definition is not needed for some variables. When this is the case, you may give the article a rating of N/A (“not applicable”) for this evaluation question.
___ Question 2b: Sources Cited: Has the researcher cited sources for ‘factual’ statements? Very unsatisfactory
13
14
1
2
3
4
5
Very satisfactory
or N/A
I/I
of definition usually appears later in a research report, under the heading Measures (Chapter 8 discusses measures in more detail). Dijxhoorn, A. F. Q., Brom, L., van der Linden, Y. M., Leget, C., & Raijmakers, N. J. (2021). Healthcare professionals’ work-related stress in palliative care: A cross-sectional survey. Journal of Pain and Symptom Management, 62(3), e38–e45. https://doi.org/10.1016/j.jpainsymman.2021.04.004 Notice that the citation format in this example (and some subsequent ones) is called Vancouver style. It is different from the APA style of in-text citations where the author and year of each article are listed. In Vancouver style, references are numbered as they appear in the text, which is more typical of journals in biomedical, health and other sciences.
75
Introductions and Literature Reviews
Comment: Researchers should avoid making statements that sound like facts without referring to their sources. Example 5.2.2 is deficient in this respect. Compare it with its Improved Version, in which sources are cited for various assertions.
Example 5.2.2 UNREFERENCED FACTUAL CLAIMS (UNDESIRABLE)
Nursing is widely recognized as a high-stress occupation, which is highly demanding yet has limited resources to support nurses with their occupational stress. Providing palliative care to patients with fatal diseases is especially stressful, causing both emotional and professional challenges for healthcare professionals.
Improved Version of Example 5.2.2 – Dijxhoorn et al. (2021) 15 SOURCES CITED FOR FACTUAL CLAIMS (COMPARE WITH Example 5.2.2)
A recent systematic literature review on the prevalence of burnout among healthcare professionals providing specialist palliative care showed that 17% of these healthcare professionals suffered from a burnout [11]. In other words, almost one in five healthcare professionals providing specialist palliative care are at risk of dropping out as a result of work-related stress. At the same time, as a result of a decrease in the working-age population and a growing need for palliative care in coming years due to people getting older and having more co-morbidities, the workload for health-care professionals will likely further increase [12−14]. At the same time, not every factual statement should be provided with a reference. Some factual statements reflect common knowledge and thus do not need any references to a specific source of such knowledge. For example, an assertion like “violent crime has devastating consequences not only for the victims but also for the victims’ families” is fairly self-evident and reflects a common understanding about the direct and indirect effects of violent crime.
___ Question 2c: Number of Sources: Has the researcher avoided citing a large number of sources for a single point? Very unsatisfactory
15
76
1
2
3
4
5
Very satisfactory
or N/A
I/I
Dijxhoorn, A. F. Q., Brom, L., van der Linden, Y. M., Leget, C., & Raijmakers, N. J. (2021). Healthcare professionals’ work-related stress in palliative care: A cross-sectional survey. Journal of Pain and Symptom Management, 62(3), e38–e45. https://doi.org/10.1016/j.jpainsymman.2021.04.004
Introductions and Literature Reviews
Comment: As a rough rule, citing more than six sources for a single point is often inappropriate. When there are too many sources for a single point, three things can be done. First, the researcher can divide them into two or more subgroups. For instance, sources dealing with one population (such as children) might be cited in one group, whereas sources dealing with another population (such as adolescents) might be cited in another group. Second, the researcher can cite only the most salient (or methodologically strong) sources as examples of sources that support a point, as illustrated in Example 5.2.3. Notice that when the authors list explanations for the sources of racial disparities in income, they use e.g. (meaning “for example”) to cite the most prominent research studies for each explanation. The implication is that there are many more studies in each category besides those mentioned.
Example 5.2.3 – Chetty et al. (2020) 16 USING E.G., TO CITE SELECTED SOURCES (ITALICS USED FOR EMPHASIS)
The sources of these disparities [racial disparities in income] have been heavily studied and debated, with proposed explanations ranging from residential segregation (e.g., Wilson 1987; Massey and Denton 1993) and discrimination (e.g., Pager 2003; Eberhardt et al. 2004; Bertrand and Mullainathan 2004) to differences in family structure (e.g., McAdoo 2002; Autor et al. 2019). Third, to avoid citing a long string of references for a single point, researchers may refer the reader to the most recent comprehensive review of the relevant literature, as illustrated in Example 5.2.4.
Example 5.2.4 – Tcherni et al. (2016) 17 REFERRING TO A SINGLE COMPREHENSIVE RECENT SOURCE THAT SUMMARIZES OTHER RELEVANT RESEARCH SOURCES (ITALICS USED FOR EMPHASIS)
Thus, individual victimizations only represent the tip of the iceberg in terms of financial losses. Different methodologies of calculating losses and different definitions of online crime (identity theft, credit/debit card fraud, etc.) lead to different estimates of per person and overall losses. Moreover, surveys of individuals can bias estimates of losses upwards, if the percentage of population affected is small and may not be represented well, even in fairly large samples (see Florencio & Herley, 2013, for an excellent discussion of this issue).
16
17
Chetty, R., Hendren, N., Jones, M. R., & Porter, S. R. (2020). Race and economic opportunity in the United States: An intergenerational perspective. The Quarterly Journal of Economics, 135(2), 711–783. https://doi. org/10.1093/qje/qjz042 Tcherni, M., Davies, A., Lopes, G., & Lizotte, A. (2016). The dark figure of online property crime: Is cyberspace hiding a crime wave? Justice Quarterly, 33(5), 890–911. https://doi.org/10.1080/07418825.2014. 994658
77
Introductions and Literature Reviews
___ Question 2d: Objective Language (1): Has the researcher distinguished between opinions and research findings? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Researchers should use wording that helps readers understand whether the cited literature presents opinions or research results. To indicate that a citation is research-based, there are a variety of options, several of which are shown in Example 5.2.5.
Example 5.2.5 EXAMPLES OF KEY TERMS AND EXPRESSIONS INDICATING THAT INFORMATION IS RESEARCH-BASED – – – – – – – –
78
Recent data suggest that … In laboratory experiments … Recent test scores show … Group A has outperformed its counterparts on measures of … Research on XYZ has established … Data from surveys comparing … Doe (2017) has found that the rate of … These studies have greatly increased knowledge of …
Introductions and Literature Reviews – –
The mean scores for women exceed … The percentage of men who have performed …
Note that if a researcher cites a specific statistic from the literature (e.g., “A systematic review found that approximately 19.2% of postpartum women may have minor or major depression during the first three months after childbirth, and 7.1% experience major depression (Gavin et al. 2005)”),18 it is safe to assume that factual information is being cited. Sometimes, researchers cite the opinions of others. When they do this, they should word their statements in such a way that readers are made aware that opinions (and not data-based research findings) are cited. Example 5.2.6 shows some examples of keywords and phrases that researchers sometimes use to do this.
Example 5.2.6 EXAMPLES OF KEY TERMS AND EXPRESSIONS INDICATING THAT AN OPINION IS BEING CITED – – – – – – –
Jones (2016) has argued that … These kinds of assumptions were … Despite this speculation … These arguments predict … This logical suggestion … Smith has strongly advocated the use of … Based on the theory, Miller (2018) predicted that …
___ Question 2e: Objective Language (2): Has the researcher interpreted the research literature in light of the inherent limits of empirical research? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: As indicated in Chapter 2, empirical research has inherent limitations. As a result, no research study offers definitive proof. Instead, research results offer degrees of evidence that are sometimes extremely strong (such as the relationship between cigarette smoking and health), but much more often only modest or weak (such as the relationship between mental illness and crime).
18
As cited in Dagher, R. K., Pérez-Stable, E. J., & James, R. S. (2021). Socioeconomic and racial/ethnic disparities in postpartum consultation for mental health concerns among US mothers. Archives of Women’s Mental Health, 24, 781–791. https://doi.org/10.1007/s00737-021-01132-5
79
Introductions and Literature Reviews
Some words and phrases that researchers might use to indicate that the research results offer strong evidence are shown in Example 5.2.7.
Example 5.2.7 EXAMPLES OF TERMINOLOGY (IN ITALICS) THAT CAN BE USED TO INDICATE STRONG EVIDENCE – – –
Results of three recent studies strongly suggest that X and Y are … Most studies of X and Y clearly indicate the possibility that X and Y are … This type of evidence has led most researchers to conclude that X and Y…
The terms that researchers can use to indicate that the research results offer moderate to weak evidence are shown in Example 5.2.8.
Example 5.2.8 EXAMPLES OF TERMINOLOGY (IN ITALICS) THAT CAN BE USED TO INDICATE MODERATE TO WEAK EVIDENCE – – – – –
The results of a recent pilot study suggest that X and Y are … To date, there is only limited evidence that X and Y are … Although empirical evidence is inconclusive, X and Y seem to be … Recent research implies that X and Y may be … The relationship between X and Y has been examined, with results pointing toward…
It is not necessary for a researcher to indicate the degree of confidence that should be accorded to every finding discussed in a literature review. However, if a researcher merely states what the results indicate without qualifying terms, readers will assume that the research cited is reasonably strong.
___ Question 2f: No Quotes: Has the researcher avoided direct quotations from the literature? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Direct quotations should rarely (almost never) be used in literature reviews for two reasons. First, they often occupy more journal space, which is very limited, than a paraphrase would take. Second, they often interrupt the flow of the text because of the differences in the writing styles of the reviewer and the author of the literature.
80
Introductions and Literature Reviews
An occasional quotation may be used if it expresses an idea or concept that would lose its impact in a paraphrase. When something is written so perfectly or beautifully that it enhances the narrative of the article citing it, it is a good idea to include such a direct quote. Another reason is to relay something that is already stated impeccably and briefly or serves as a succinct definition. This is the case with the quotation shown in Example 5.2.9, which provides a summary of the self-control theory of criminal behavior.
Example 5.2.9 – Rocque et al. (2017) 19 A DIRECT QUOTATION IN A LITERATURE REVIEW (ACCEPTABLE IF DONE VERY SPARINGLY)
The theory was fully specified in Gottfredson and Hirschi’s 1990 book, A General Theory of Crime. The perspective starts with the idea that to understand why people engage in crime, one must fully understand what crime involves. By defining crimes as “acts of force or fraud taken in the pursuit of self-interest” (p. 15), they do not exclude similar behaviors not sanctioned by the state, thus avoiding problems with a scientific approach to studying a social construct such as crime. They then described the common characteristics of criminal acts, noting that they are generally simple, physical acts that do not require much in the way of advanced planning, and that, importantly, generally resulted in short-term or immediate gain at the expense of long-term costs. These elements of crime led to a theory
19
Rocque, M., Posick, C., & Piquero, A. R. (2017). Self-control and crime: Theory, research, and remaining puzzles. In K. D. Vohs & R. F. Baumeister (Eds.), Handbook of self-regulation: Research, theory, and applications (3rd ed., pp. 514–532). Guilford Publications. www.guilford.com/books/Handbook-of-Self-Regulation/ Vohs-Baumeister/9781462533824
81
Introductions and Literature Reviews
that views offenders as individuals who prefer to engage in physical as opposed to mental activities, who do not have high cognitive abilities, and who do not consider the long-term consequences of their behavior.
3
Quality of Sources
In Chapter 2, we discussed how academic journals of higher quality are more likely to publish articles of higher quality. Guideline 4 in that chapter presented information about the journal impact factor and reputable versus questionable academic publishers. Keep this in mind when assessing the sources cited in a literature review, or when deciding which sources to select for your own literature review.
___ Question 3a: Up-to-date Evidence: Is current research cited? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: The currency of the literature can be checked by noting whether the research published in recent years has been cited. Keep in mind, however, that relevance to the research topic is more important than recency. A 15-year-old study that is highly relevant and has a superior research methodology may deserve more attention than a less relevant, methodologically weaker one that has been published more recently. When this is the case, the researcher should explicitly state why an older research article is being discussed in more detail than the newer ones. Why is it important to cite the latest studies? First, more recent studies update and extend the ones published before them, so they provide the most up-to-date empirical evidence. Second, a more recent article would cite any important previous studies in its literature review section (or, as Sir Isaac Newton famously said: “If I have seen further, it is by standing on the shoulders of giants”). Note that a researcher may still want to cite older sources to establish the historical context of their study. In Example 5.3.1, researchers link a particular finding to Ferster and Skinner’s work published in 1957. Skinner is the best known of the early behavior analysts. References to more current literature follow.
Example 5.3.1 – Chen & Reed (2023) 20 REFERENCES SHOWING HISTORICAL LINKS AND RECENT SOURCES
Overall response rates were higher on the RR [random ratio] schedule than the RI [random interval] schedule, replicating findings from many studies with nonhuman (Ferster &
20
82
Chen, X., & Reed, P. (2023). The effect of brief mindfulness training on the micro-structure of human freeoperant responding: Mindfulness affects stimulus-driven responding. Journal of Behavior Therapy and Experimental Psychiatry, 79, 101821. https://doi.org/10.1016/j.jbtep.2022.101821
Introductions and Literature Reviews
Skinner, 1957; Peele et al., 1984), and human (Chen & Reed, 2020; Reed et al., 2015), participants.
___ Question 3b: Balanced Sources: Has the author cited any contradictory research findings? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: It is very rare that studies on a certain topic all arrive at similar results. More often, some studies show that, say, coffee is good for you and extends your years of life, while other studies find that it has no health benefits. In a new study of how coffee affects life and health, the authors should not review only the research literature that supports their case, while ignoring any studies that contradict (or do not support) their hypotheses and findings. An important goal of maintaining impartiality and objectivity in science requires that the authors cite both studies that support their views as well as those that produced opposite or inconclusive results. It is possible that such unfavorable results came from studies that are methodologically weaker than those with supportive findings – then, the researchers can discuss the limitations and draw comparisons. However, if the authors only cite those studies that are in line with their thinking, while omitting any mention of “inconvenient” contradictory findings, this is a problem and a serious flaw in the literature review. In Example 5.3.2, contradictory findings regarding the success of job training programs for former prisoners are cited and explained.
Example 5.3.2 – Cook et al. (2015) 21 CONTRADICTORY FINDINGS ARE INCLUDED IN THE LITERATURE REVIEW (BOTH SUPPORTIVE RESULTS AND UNFAVORABLE ONES)
The main findings were quite discouraging. SVORI [Serious and Violent Offender Reentry Initiative] provided modest enhancements in services to offenders before and after release, and appears to have had some effect on intermediate outcomes like self-reported employment, drug use, housing, and criminal involvement. However, there was no reduction in recidivism as measured by administrative data on arrest and conviction (Lattimore et al. 2010).… The most prominent experiment of the decade of the 1970s was the National Supported Work Demonstration program, which provided recently released prisoners and other high-risk groups with employment opportunities on an experimental basis. … Are-analysis by Christopher
21
Cook, P. J., Kang, S., Braga, A. A., Ludwig, J., & O’Brien, M. E. (2015). An experimental evaluation of a comprehensive employment-oriented prisoner re-entry program. Journal of Quantitative Criminology, 31(3), 355–382. https://doi.org/10.1007/s10940-014-9242-5
83
Introductions and Literature Reviews
Uggen (2000) which combined the ex-offenders with illicit-drug abusers and youthful dropouts found some reduction in arrests for older participants (over age 26), but not for the younger group. He has speculated that older offenders are more amenable to employment-oriented interventions (Uggen and Staff 2001), perhaps because they are more motivated. … In sum, the evidence on whether temporary programs that improve employment opportunities have any effect on recidivism is mixed. There have been both null findings and somewhat encouraging findings.
4
Quality of Analysis
When it comes to writing your own literature review, one of the hardest tasks is to know how to present previous research and its findings clearly and concisely, identifying not only what has been done but also what remains to be determined in future research. Analyzing how other researchers have accomplished this in their literature reviews and paying attention to the evaluation questions below can help with this task.
___ Question 4a: Critical Assessment: Does the literature review identify the strengths and weaknesses of prior studies? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: A researcher should consider the strengths and weaknesses of previously published studies. Articles based on a reasonably strong methodology may be cited without comments on their strengths. However, researchers are obligated to point out if the study design is rather weak. This might be done with comments such as “A small pilot study suggested …” or “Even though the authors were not able to test other likely alternative explanations of their results …” (see other phrasing suggestions in Examples 5.2.7 and 5.2.8 above). An instance of this is shown in Example 5.4.1, where the authors point out the weaknesses of a previous study: its basic-level statistical analyses and problems with establishing causality.
Example 5.4.1 – Dagher et al. (2021) 22 NEGATIVE CRITICISM IN A LITERATURE REVIEW
There is scarce research in the U.S. examining mental health care utilization among postpartum women and their mental health seeking behaviors. A survey of 574 women 22
84
Dagher, R. K., Pérez-Stable, E. J., & James, R. S. (2021). Socioeconomic and racial/ethnic disparities in postpartum consultation for mental health concerns among US mothers. Archives of Women’s Mental Health, 24, 781–791. https://doi.org/10.1007/s00737-021-01132-5
Introductions and Literature Reviews
who delivered at an Australian hospital found that at 4 months postpartum, depressed women had a higher likelihood of visiting a psychiatrist, social worker, postnatal depression group, pediatrician, or a general practitioner than non-depressed women (Webster et al. 2001). However, this study used bivariate analyses and did not control for potential confounders. In Example 5.4.2, note the instances of both positive critical appraisal (where the authors refer to “well-designed” studies) and negative criticism (where they discuss “weak studies”).
Example 5.4.2 – Abrams et al. (2018) 23 POSITIVE AND NEGATIVE CRITICISM IN A LITERATURE REVIEW
Do E-Cigarettes Help Smokers Quit or Do They Inhibit Cessation? The public health benefits of e-cigarettes are enhanced if they promote complete cessation of smoking. Four randomized controlled trials (RCTs) and well-designed observational studies show that e-cigarettes are effective in helping some adult smokers successfully quit smoking (4, 16, 18, 31, 39, 41, 72, 78, 91, 93, 114, 126, 144). Rates of cessation using e-cigarettes are similar to or higher than rates of cessation from previous clinical trials of NRT (103, 112, 126). Although some studies with loosely defined measures of use (e.g., ever use, not necessarily for cessation), inadequate or no appropriate comparison groups, or inability to rule out plausible confounders or selection bias have reported that e-cigarette use may be associated with no change or negative correlations with cessation (41, 126), those studies with more robust measures of how e-cigarettes were used (e.g., duration of use, type of device, use specifically for cessation) suggest that daily vaping can facilitate quit attempts and cessation (11, 15, 51, 75, 126). Weak observational studies that did not meet the minimum criteria for scientific rigor [see details in Villanti et al. (126)] were also excluded from two reviews (47, 78) that employed the Cochrane criteria for inclusion in systematic reviews and meta-analyses (50). One other meta-analysis did not employ Cochrane standards, included most of the weak studies (56), and reported a negative association among e-cigarette use and smoking cessation, concluding that e-cigarettes inhibit cessation. Sometimes, the authors are very subtle in the way they assess previous research, highlighting its strengths while still mentioning its weaknesses in a balanced way, as Example 5.4.3 shows.
23
Abrams, D. B., Glasser, A. M., Pearson, J. L., Villanti, A. C., Collins, L. K., & Niaura, R. S. (2018). Harm minimization and tobacco control: Reframing societal views of nicotine use to rapidly save lives. Annual Review of Public Health, 39, 193–213. https://doi.org/10.1146/annurev-publhealth-040617-013849
85
Introductions and Literature Reviews
Example 5.4.3 – Voigt et al. (2017) 24 BALANCED CRITICISM IN A LITERATURE REVIEW
Previous research on police–community interactions has relied on citizens’ recollection of past interactions [10] or researcher observation of officer behavior [17–20] to assess procedural fairness. Although these methods are invaluable, they offer an indirect view of officer behavior and are limited to a small number of interactions. Furthermore, the very presence of researchers may influence the police behavior those researchers seek to measure [21].
___ Question 4b: Clear Picture of Research: After reading the literature review, does a clear picture emerge of what the previous research has accomplished? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: A good literature review is supposed to educate the reader on the state of research about the issue the study sets out to investigate. The key findings and highlights from the literature should be clearly synthesized in the introduction and literature review. The following
24
86
Voigt, R., Camp, N. P., Prabhakaran, V., Hamilton, W. L., Hetey, R. C., Griffiths, C. M., … & Eberhardt, J. L. (2017). Language from police body camera footage shows racial disparities in officer respect. Proceedings of the National Academy of Sciences, 114(25), 6521–6526. https://doi.org/10.1073/pnas.1702413114
Introductions and Literature Reviews
questions are useful to ask after you have read the front-end portion (introduction and literature review) of an empirical article: Does it provide enough information on the state of research about the problem the study sets out to investigate? ■ Are the key findings and highlights from the literature clearly synthesized in the review? ■ Do you feel that you understand the state of the research related to the main research question or goal of the study (as stated or implied in the title of the article)? ■
If, after reading the literature review, you are still confused about what the previous studies have found about the narrow topic the study is focused on, give a low rating on this evaluation question.
___ Question 4c: Unanswered Questions: Has the researcher noted any gaps in the literature, any questions that still remain unresolved? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Gaps in the literature on a topic (areas not fully explored in previous studies or research questions without clear answers) are just as important to identify as the areas already well studied by researchers and their findings in those areas. The gaps in the research indicate what needs to be determined in subsequent research. In Example 5.4.4, the researchers pointed out such a gap.
Example 5.4.4 – Gattamorta et al. (2019) 25 AN EXCERPT POINTING OUT A GAP IN THE LITERATURE
Although existing literature has documented the increased frequency and greater impact of parental rejection among Hispanic sexual minorities (Alpaslan, Johnston, & Goliath, 2014; Ben-Ari, 1995; Borhek, 1988; Conley, 2011; Fields, 2001; Robinson, Walters, & Skeen, 1989; Saltzburg, 2004; Strommen, 1989), minimal research exists that examines the coming out experience from the perspective of Hispanic parents, as well as the impact a child’s coming out process has on Hispanic parents. This makes it difficult to determine whether Hispanic parents have similar experiences to those documented in previous research.
25
Gattamorta, K. A., Salerno, J., & Quidley-Rodriguez, N. (2019). Hispanic parental experiences of learning a child identifies as a sexual minority. Journal of GLBT Family Studies, 15(2), 151–164. https://doi.org/10.108 0/1550428X.2018.1518740
87
Introductions and Literature Reviews
Note that the presence of a gap in the literature can then be used to justify the current study when its purpose is to fill the gap. If the literature review does not explain clearly which questions remain unanswered and what still must be discovered in the research area of focus, give a low rating on this question.
___ Question 4d: The Stage is Set: Do the specific research questions or goals of the study flow logically from the overview of the research literature? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Typically, the specific research purposes, questions, or hypotheses on which a study is based are stated in the final paragraphs of the Introduction or Literature Review section.26 The material preceding them should set the stage and logically lead to them. For instance, if a researcher argues that the research methods used in previous studies are not well suited to answering certain research questions, it would not be surprising to learn that the author’s own research purpose is to re-examine the research questions using alternative methods. In Example 5.4.5, the authors refer to the studies that they have reviewed in their literature review. This sets the stage for their specific research questions, which are stated in the last sentence of the example.
Example 5.4.5 – Hellfeldt et al. (2018) 27 LAST PARAGRAPHS OF A LITERATURE REVIEW: BRIEF SUMMARY OF THE PREVIOUS RESEARCH REVIEWED LEADING TO A STATEMENT OF PURPOSE FOR THE CURRENT STUDY
These somewhat conflicting results [of studies reviewed above] point to a need of further research into how persistence of victimization and variation in experiences of bullying relate to different aspects of children’s lives.… The goal for this study is to examine patterns, including gender differences, of stability or persistence of bullying victimization, and how experiences of being bullied relate to children’s general well-being, including somatic and emotional symptomology.
26
27
88
Some researchers state their research purpose and research questions or hypotheses in general terms near the beginning of their introductions, and then restate them more specifically at the end of the introduction or literature review. Hellfeldt, K., Gill, P. E., & Johansson, B. (2018). Longitudinal analysis of links between bullying victimization and psychosomatic maladjustment in Swedish schoolchildren. Journal of School Violence, 17(1), 86–98. https://doi.org/10.1080/15388220.2016.1222498
Introductions and Literature Reviews
___ Question 5: Overall: Is the introduction/literature review effective and appropriate? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Rate this evaluation question after considering your answers to the earlier ones in this chapter and taking into account any additional considerations and concerns you may have. Be prepared to explain your overall evaluation. Visit the Instructor & Student Resources website for multiple choice questions and additional resources: www.routledge.com/cw/tcherni-buzzeo
Chapter 5 Exercises Part A Directions: Below, read excerpts from various sections of introductions and literature reviews of research articles. Answer the questions that follow each one. 1
Yasuhara et al. (2023) 28 – the first three paragraphs of the introduction/ literature review of a study of psychiatric disorders among people in professions requiring firearm carrying: There are a number of professions that not only come with high levels of occupational stress, but also high exposure to traumatic events and danger. This is especially the case for police who are routinely exposed to potentially traumatizing events as a part of their job (Cross & Ashley, 2004; Karafa & Koch, 2016; Kirschman et al., 2013; Violanti et al., 2006, 2017). Police officers, for example, have been found to have higher rates of posttraumatic stress disorder (PTSD), depression, and anxiety compared to the general population (Andrew et al., 2008; Lawson et al., 2012; Lee et al., 2016). One study showed that approximately 25% of police officers had suicidal ideation in their lifetime (Violanti et al., 2008), and another showed that twice as many officers died by suicide than of line-of-duty incidents in 2019 (Help, 2019). In general, public safety personnel are not only exposed to extreme stress and tragedy, but also the threat of violence (Krakauer et al., 2020). However, the bulk of research examining the impact of public safety professions on mental health has been dominated by policing studies. Other professions,
28
Yasuhara, K., Morreale, K., Talley, D., Cooper, D. T., Hoy-Watkins, M., & Coker, K. L. (2023). Psychiatric disorders among employment requiring firearms. Behavioral Sciences & the Law, 41(1), 19–29. https://doi. org/10.1002/bsl.2570
89
Introductions and Literature Reviews
including correctional officers, park rangers, private security, customs agents, and bailiffs, are jobs in which workers may need to carry firearms, due to experiencing a heightened risk of personal injury or encountering violent situations (Duhart, 2001; Krakauer et al., 2020; Warren, 2001). Research has yet to consider whether the nature of gun carrying has an impact on police officers or anyone else carrying a firearm as a part of their profession. Further, data collection of mental health symptoms and diagnoses among non-law enforcement gun carrying professionals is nonexistent. Therefore, little work has been done to examine the mental health of individuals who carry a gun as a part of their profession outside of law enforcement and military. To fill this gap, the current study uses national survey data to examine mental health diagnoses of individuals who have worked in a job where they were required to carry a firearm. a b c d e f 2
Do the authors begin by moving from a general topic to a more specific area? Explain. How well have the researchers established the importance of the problem area? Explain. Does the narrative move from topic to topic instead of from citation to citation? Explain. Have the researchers cited sources for factual statements? Explain. Is the number of sources per point adequate? Are the sources recent enough? Have the authors identified gaps in the research literature to set the stage for their own study? Explain.
Henson et al. (2015) 29 – the first paragraph of the introduction to a research article about the effectiveness of interventions to reduce alcohol drinking among college students: Estimates reveal that more than 500,000 student injuries, more than 600,000 assaults, more than 80,000 sexual assaults, and nearly 2,000 deaths occur annually because of college student drinking (Hingson, Zha, & Weitzman, 2009; Hingson & White, 2014). Generally, individual-level alcohol interventions targeted toward college students have been shown to be efficacious, but the effects of these interventions tend to be small and short-lived (see Carey, Scott-Sheldon, Carey, & DeMartini, 2007; Carey, Scott-Sheldon, Elliott, Garey, & Carey, 2012, for meta-analyses). These interventions range from face-to-face individual or group interventions to computer-delivered interventions. Thus, it is important
29
90
Henson, J. M., Pearson, M. R., & Carey, K. B. (2015). Defining and characterizing differences in college alcohol intervention efficacy: A growth mixture modeling application. Journal of Consulting and Clinical Psychology, 83(2), 370–381. https://doi.org/10.1037/a0038897
Introductions and Literature Reviews
to determine which students benefit from these interventions and which need alternative approaches to effectively reduce drinking consequences. a b c 3
Do the authors begin by moving from a general topic to a more specific area? Explain. How well have the researchers established the importance of the problem area? Explain. Does the narrative move from topic to topic instead of from citation to citation? Explain.
Hefner et al. (2019) 30 – the last two paragraphs of the introduction/literature review in a study of the use of cigarettes, e-cigarettes, and alcohol among college students and the associations of this use with psychiatric disorders: While mental illness is a known risk factor for combustible cigarette use (Lasser et al., 2000), and e-cigarette use is prevalent among adults with psychiatric and substance use disorders (Hefner et al., 2016; Hefner, Valentine, & Sofuoglu, 2017; Bianco, 2019), the relationship between e-cigarette use and mental illness within college students remains unclear. A recent study suggests a unique association between e-cigarette use and depressive symptoms among college students (Bandiera, Loukas, Li, Wilkinson, & Perry, 2017), while another longitudinal study found no relationship between e-cigarette use and depression or anxiety (Spindle et al., 2017). To the extent that mental health conditions may be associated with increased propensity to use e-cigarettes, which may contribute to nicotine dependence and/or combustible cigarette use in this population (Loukas et al., 2018; Spindle et al., 2017), improved clarity is needed. The present study examined e-cigarette, combustible cigarette, and alcohol use among college students, as well as motivations and perceptions associated with using e-cigarettes. Specifically, associations between alcohol consumption patterns and e-cigarette use were considered, comparing a) any alcohol use vs. no alcohol use, and b) binge drinking (5 or more drinks per episode) vs. moderate alcohol use. In addition, we examined perceptions and motivations for using ecigarettes by alcohol consumption patterns, as well as reported e- and combustible cigarette use during drinking episodes. Finally, to clarify inconsistent findings regarding the relationship between e-cigarettes and mental illness, we examined rates of e-cigarette use among college students with psychiatric and substance use disorders as compared to those without these conditions. a
30
Have the researchers cited sources for factual statements? Explain.
Hefner, K. R., Sollazzo, A., Mullaney, S., Coker, K. L., & Sofuoglu, M. (2019). E-cigarettes, alcohol use, and mental health: Use and perceptions of e-cigarettes among college students, by alcohol use and mental health status. Addictive Behaviors, 91, 12–20. https://doi.org/10.1016/j.addbeh.2018.10.040 91
Introductions and Literature Reviews
b c 4
Is the number of sources per point adequate? Are the sources recent enough? Have the authors identified gaps in the research literature to set the stage for their own study? Explain.
Fraser et al. (2022)31 – a paragraph from a literature review explaining the key terms in a study designed to improve critical reflection writing in teacher education: On critical reflection in teacher education Reflection is a complex process that involves interpreting events and experiences of the past in the context of the present, informing future action and behaviour (Blumer 1986). According to Brookfield (2017), however, reflection is not, by definition, critical. Critical reflection as a concept is located further along the continuum of reflection, with a greater focus on encouraging movement beyond recollections and observations, towards analysis, action and change. Described as a complex, rigorous, intellectual, and emotional enterprise (Rodgers 2002, 844), it provides space to explore ‘both personal and professional belief systems’ (Larrivee 2008, 343). While critical reflection’s value and role in teacher education has been well documented (e.g. Körkkö 2021; Alana and Augustin 2020; Liu 2015), less clear is how students can demonstrate their ability to critically reflect effectively on experiential learning in their writing. a
5
Are the conceptual definitions of key terms adequate? Explain.
MacCormack & Lindquist (2019)32 – two paragraphs from the part of a literature review explaining theoretical perspectives on the feeling of “hanger”: One common assumption, both in folk and experimental psychology, is that hunger impacts emotions, judgments, and behaviors because it impairs selfregulation. In this view, hunger releases the constraints that typically keep people from feeling unbridled emotions, making impulsive judgments, or aggressing against others (e.g., Bushman et al., 2014; DeWall, Deckman, Gailliot, & Bushman, 2011; DeWall, Pond, & Bushman, 2010). Until recently, much research on self-regulation was guided by the “regulation as muscle” analogy, which hypothesizes that self-control fails when biological resources such as glucose are depleted (Baumeister, 2003, 2014; Gailliot & Baumeister, 2007; Gailliot et al., 2007; Muraven & Baumeister, 2000; Vohs et al., 2014). This regulatory depletion hypothesis was first inspired by work demonstrating that mental effort can deplete blood glucose (Fairclough & Houston, 2004;
31
32
92
Fraser, M., Wotring, A., Green, C. A., & Eady, M. J. (2022). Designing a framework to improve critical reflection writing in teacher education using action research. Educational Action Research, [online first]. https:// doi.org/10.1080/09650792.2022.2038226 MacCormack, J. K. & Lindquist, K. A. (2019). Feeling hangry? When hunger is conceptualized as emotion. Emotion, 19(2), 301–319. https://doi.org/10.1037/emo0000422
Introductions and Literature Reviews
Hall & Brown, 1979). Thus, it is assumed that negative, high arousal emotions or outbursts of aggression when hungry occur because individuals cannot regulate their feelings without sufficient blood glucose (e.g., DeWall et al., 2011). However, the regulatory depletion hypothesis has been critiqued in recent years following failed replications and mixed findings (e.g., Carter, Kofler, Forster, & McCullough, 2015; Job, Walton, Bernecker, & Dweck, 2013; Kurzban, 2010; Miles et al., 2016; Vadillo, Gold, & Osman, 2016; see review in Inzlicht, Schmeichel, & Macrae, 2014). Moreover, the underlying biological premise may be unfounded, as it is unlikely that short-term shifts in cognitive exertion alter blood glucose levels substantially in the central nervous system (e.g., Coker & Kjaer, 2005; Peters et al., 2004). a b
Is the theory (hypothesis) adequately described? Explain. Which other evaluation questions from this chapter are relevant for assessing this excerpt from a literature review?
Part B Directions: Answer the following questions. 1
Consider Statements A and B (based on Besemer et al., 2017).33 They both contain the same citations. In your opinion, which statement is superior? Explain. Statement A Labeling theory suggests that people’s behavior is influenced by the label attached to them by society [1–4]. This label can be a critical factor to a more persistent criminal life course for individuals who might just be experimenting with delinquent activity. Previous studies have shown a considerable impact of convictions on subsequent criminal behavior [17–25]. Statement B Labeling theory suggests that people’s behavior is influenced by the label attached to them by society [1–4]. This label can be a critical factor to a more persistent criminal life course for individuals who might just be experimenting with delinquent activity. Previous studies have shown that policing interventions increase subsequent criminal behavior of youth, where the effects of labeling occur through arrests [17], incarceration [18–20], or any contact with the juvenile justice system [21–23]. Moreover, labeling effects have been shown to last well into adulthood [24, 25].
33
Besemer, S., Farrington, D. P., & Bijleveld, C. C. (2017). Labeling and intergenerational transmission of crime: The interaction between criminal justice intervention and a convicted parent. PLoS One, 12(3), e0172419. https://doi.org/10.1371/journal.pone.0172419
93
Introductions and Literature Reviews
2
Consider Statement C. This statement could have been used as an example for which evaluation question(s) in this chapter? Statement C – Novak & Fagan (2022)34 To date, limited research has examined the relationship between expulsion and offending, leaving a large component of the school-to-prison pipeline unexplored. Some investigations of the school-to-prison pipeline have used combined measures of school exclusion rather than disentangling the effects of suspension and expulsion (Arum & Beattie, 1999; Fabelo et al., 2011; Monahan et al., 2014). Further, of studies including discrete or combined measures of expulsion, none examine the association between expulsion and self-reported offending behavior.
3
Consider Statement D. This statement could have been used as an example for which evaluation question(s) in this chapter? Statement D – Gaucher & Veenhoven (2022)35 The quality of work life (QWL) has been defined in different ways and typically not very sharply. Carlson (1983) considers that QWL is a goal, an ongoing process for achieving that goal, and a concept of management. The goal of QWL is to create more involving, satisfying, and effective jobs and work environments for people at all levels of an organization. […] Job satisfaction would seem to be a more discrete concept, but definitions also differ in this case and lack clear boundaries. Hoppock (1935) introduced the concept of job satisfaction and defined it as a combination of psychological, physiological and environmental circumstances that cause a worker truthfully to say that he or she is satisfied with his or her job. Locke (1969) defines job satisfaction as the pleasurable emotional state resulting from the appraisal of one’s job as achieving or facilitating the achievement of one’s job values. Spector (1997) is the clearest in his definition: job satisfaction is ‘the degree to which people like their jobs’. For a review of the definitions of job satisfaction, see Aziri (2011).
4
Consider Statement E. This statement could have been used as an example for which evaluation question in this chapter? Statement E – Akutsu et al. (2007)36
34
35 36
94
Novak, A., & Fagan, A. (2022). Expanding research on the school-to-prison pipeline: Examining the relationships between suspension, expulsion, and recidivism among justice-involved youth. Crime & Delinquency, 68(1), 3–27. https://doi.org/10.1177/0011128721999334 Gaucher, R., & Veenhoven, R. (2022). What is in the name? Content analysis of questionnaires on perceived quality of one’s work life. Quality & Quantity, 56(3), 1045–1072. https://doi.org/10.1007/s11135-021-01165-z Akutsu, P. D., Castillo, E. D., & Snowden, L. R. (2007). Differential referral patterns to ethnic-specific and mainstream mental health programs for four Asian American groups. American Journal of Orthopsychiatry, 77(1), 95–103. https://doi.org/10.1037/0002-9432.77.1.95
Introductions and Literature Reviews
When speaking of “help-seeking” behaviors or patterns, Rogler and Cortes (1993) proposed that “from the beginning, psychosocial and cultural factors impinge upon the severity and type of mental health problems; these factors [thus] interactively shape the [help-seeking] pathways’ direction and duration” (p. 556).
Part C Directions: Read two empirical articles in academic journals on a topic of interest to you. Apply the evaluation questions in this chapter to their introductions and literature reviews and select the one to which you have given the highest ratings. Bring it to class for discussion. Be prepared to discuss its strengths and weaknesses.
95
CHAPTER 6
Evaluating Samples when Researchers Generalize
Immediately after the introduction and literature review, most researchers insert the main heading of Method or Data and Methods. In the Methods section, researchers almost always begin by describing the individuals they studied. This description is usually prefaced with one of the following subheadings: Data, or Sample, or Subjects, or Participants.1 A population is any group in which a researcher is ultimately interested. It might be large, such as all registered voters in Pennsylvania, or small, such as all members of a local teachers’ association. Researchers often study only samples (i.e., a subset of a population) for the sake of efficiency and then generalize their results to the population of interest. In other words, they infer that the data they collected by studying a sample are similar to the data they would have obtained by studying the entire population. Such generalizability only makes sense if the sample is representative of the population. In this chapter, we discuss some of the criteria that can help determine whether a study sample is representative, and thus, whether the study results can be generalized to a wider population. Because many researchers do not explicitly state whether they are attempting to generalize, consumers of research often need to make a judgment on this matter in order to decide whether to apply the evaluation questions in this chapter to the empirical research article being evaluated. To make this decision, consider the following questions: ■ ■
Does the researcher imply that the results apply to a larger population? Does the researcher discuss the implications of his or her research for a larger group of individuals than those directly studied?
If the answers are clearly “yes,” apply the evaluation questions in this chapter to the article being evaluated. Note that the evaluation of samples when researchers do not intend to generalize to 1
In older research literature, the term participants would indicate that the individuals being studied had consented to participate after being informed of the nature of the research project, its potential benefits, and its potential harm; while the use of the term subjects would be preferred when there was no consent – such as in animal studies.
96
DOI: 10.4324/9781003362661-6
Samples when Researchers Generalize
populations (a less likely scenario for social science research) is considered in the next chapter (Chapter 7). However, before we move on to explore connections between sampling and generalization, two categories of studies that are special in terms of sampling must be considered: total population studies and opt-in panels. The number of studies in both categories is growing rapidly owing to technological advances and the rise of big data, and it is important to know about these types of studies and their advantages.
Total Population Studies As their name suggests, total population studies (also called census- or population-based studies) collect information from every member of the population. Such studies can be conducted for a specific population of interest, as long as each member is included. For example, The Swedish Twin Registry by the Karolinska Institute in Stockholm, Sweden, has been collecting data on all twins born in the country since 1886 (data collection started in the 1950s). Registry data are used for multiple research projects in the biomedical and social sciences, such as the study in Example 6.0.1.
Example 6.0.1 TOTAL POPULATION STUDY BASED ON SWEDISH TWIN REGISTRY
Taylor et al. (2020)2 – There has been an increase in autism-spectrum disorders in recent years. It is possible that environmental factors played a role in this increase. To determine if the role of genetic and environmental factors in the etiology of ASD has changed over the past 20–25 years, Taylor and colleagues used twin pairs from the Swedish Twin Registry since comparing twins allows for using a precise calculation of genetic relatedness. Very little change over time has been found in the role of environmental factors in causing ASD. Genetics still plays an overwhelming role in this process. Most total population studies belong to one of the following two types: 1) Big data studies, which often cover the population of a whole country and use administrative data (e.g., data from de-identified IRS tax returns for the total population of the United States or register-based data for the total population of a Nordic country). 2) School, organization, or community surveys, where every member of the population of interest is included (as in Example 6.0.2, where every high school student in a specific city in China was included and assessed for myopia).
2
Taylor, M. J., Rosenqvist, M. A., Larsson, H., Gillberg, C., D’Onofrio, B. M., Lichtenstein, P., & Lundström, S. (2020). Etiology of autism spectrum disorders and autistic traits over time. JAMA Psychiatry, 77(9), 936– 943. https://doi.org/10.1001/jamapsychiatry.2020.0680
97
Samples when Researchers Generalize
Example 6.0.2 – Chen et al. (2018) 3 TOTAL POPULATION STUDY OF STUDENTS IN A SCHOOL DISTRICT (CHINA)
Background Myopia is the leading cause of preventable blindness in children and young adults. Multiple epidemiological studies have confirmed a high prevalence of myopia in Asian countries. However, fewer longitudinal studies have been performed to evaluate the secular changes in the prevalence of myopia, especially high myopia in China. In the present study, we investigated trends in the prevalence of myopia among high school students in Fenghua city, eastern China, from 2001 to 2015. Methods This was a population-based, retrospective study. Data were collected among 43,858 third-year high school students. […] Conclusions During the 15-year period, there was a remarkable increase in the prevalence of high and very high myopia among high school students, which might become a serious public health problem in China for the next few decades. The benefits of total population studies are clear: they correctly describe things happening within the population. At first glance, generalizations and matters of statistical significance do not seem to apply, since total population sampling already covers the population. However, in most total population studies, the goal is still to generalize the findings – to other countries, other time periods, or other school districts. In Chapter 1, we discussed how the goal of pure research is to better understand the causal mechanisms of what is happening. Note that this goal is often served well by total population studies, and their findings are applicable elsewhere. For example, the mechanisms generating autism-spectrum disorders examined in the Example 6.0.1 study are likely to generalize well beyond Sweden, and the findings of the study in Example 6.0.2, about the increased prevalence of eyesight problems in one school district in China, are likely to generalize to the rest of China and other Asian countries.
Opt-In Panels In this chapter, much attention is paid to the issues of low responses from potential study participants who were approached to participate. There has been a growing concern among researchers about decreasing response rates across the board and the resulting questions about how representative the resulting samples of participants are or how well they reflect the population of interest. Opt-in panels resolve this issue by gathering a large panel of participants who signed up to complete surveys online and get paid for their participation. The most well-known platforms for such opt-in panels are Amazon Mechanical Turk (MTurk), Lucid, and YouGov. For a fee, they allow researchers to collect data from a national, geographically diverse sample of participants, who can also be screened for specific 3
98
Chen, M., Wu, A., Zhang, L., Wang, W., Chen, X., Yu, X., & Wang, K. (2018). The increasing prevalence of myopia and high myopia among high school students in Fenghua city, eastern China: A 15-year populationbased survey. BMC Ophthalmology, 18, 1–10. https://doi.org/10.1186/s12886-018-0829-8
Samples when Researchers Generalize
demographic or geographic characteristics that are important for the researchers. Despite these clear advantages, however, the online format still leaves some populations of interest out or barely represented, limits the type of studies that can be conducted, and requires researchers to pay a non-trivial amount of money to use the opt-in panel. Appendix D discusses this type of survey research and its strengths and weaknesses in more detail.
1
GOLD STANDARD OF SAMPLING
___ Question 1a: Probability Sampling: Was random sampling used?4 Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I4
Comment: Using random, or probability, sampling (such as drawing names out of a hat) yields an unbiased sample (i.e., a sample that does not systematically favor any particular type of individual or group in the selection process).5 This type of sampling involves selecting from a sampling frame – a list that represents the target population. If a sample is unbiased and reasonably large, researchers are likely to make sound generalizations. (Sample size will be discussed later in this chapter.) The desirability of using random sampling procedures as the basis for generalization is so widely recognized among researchers that they are almost certain to mention the use of probability sampling if it was employed in the study. Examples 6.1.1 and 6.1.2 show two instances of how this has recently been expressed in published research.
Example 6.1.1 – Barnes et al. (2014) 6 RANDOM SAMPLING PROCEDURE (TO OBTAIN A NATIONALLY REPRESENTATIVE SAMPLE OF ADOLESCENTS IN THE UNITED STATES)
Data for this study came from the National Longitudinal Study of Adolescent Health (Add Health; Harris, 2009). The Add Health is a longitudinal and nationally representative sample of adolescents enrolled in grades 7 through 12 for the 1994–1995 academic year. The general focus of the Add Health study was to assess the health and development of American adolescents. In order to do so, a sample of high schools was first selected by employing stratified random sampling techniques. During this step, 132 schools were selected for
4 5 6
Continuing with the same scheme as in the previous chapters, N/A stands for “Not applicable” and I/I stands for “Insufficient information to make a judgment.” For a more modern version of this procedure, see the online resources for this chapter (a link to a random number generator). Barnes, J. C., Golden, K., Mancini, C., Boutwell, B. B., Beaver, K. M., & Diamond, B. (2014). Marriage and involvement in crime: A consideration of reciprocal effects in a nationally representative sample. Justice Quarterly, 31(2), 229–256. https://doi.org/10.1080/07418825.2011.641577 99
Samples when Researchers Generalize
participation and all students attending these schools were asked to complete a self-report questionnaire (N ~ 90,000). Beginning in April 1995 and continuing through December 1995, the Add Health research team collected more detailed information from a subsample of the students who completed the in-school surveys. Not all 90,000 students who completed in-school surveys also completed the follow-up interview (i.e. wave 1). Instead, students listed on each school’s roster provided a sample frame from which respondents were chosen. In all, wave 1 in-home interviews were conducted with 20,745 adolescents. Respondents ranged between 11 and 21 years of age at wave 1.
Example 6.1.2 – LaVan et al. (2017) 7 RANDOM SAMPLING PROCEDURE (TO OBTAIN A REPRESENTATIVE SAMPLE OF COURT CASES WHERE SCHIZOPHRENIA IS SUSPECTED OR CONFIRMED)
The litigated cases are a 10% random sample of 3543 cases litigated in all courts during the period 2010 to 2012 in which one of the keywords is “schizophrenia.” The cases were retrieved from the Lexis Nexis database of court cases at all court levels. Only cases in which the person with schizophrenia was a litigant were included. This reduced the total number of usable cases to 299.
___ Question 1b: Stratifcation: If random sampling was used, was it stratified? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Researchers use stratified random sampling by drawing individuals separately at random from different strata (i.e., subgroups) within a population. In Example 6.1.1 above, the sample of schools selected for the National Longitudinal Study of Adolescent Health (Add Health) was stratified by region of the country, urbanicity, and school size and type to ensure that schools from various parts of the country were represented, that rural, urban, and suburban schools were included in the sample, that small as well as large schools were represented, and so on. Stratifying will improve a sample only if the stratification variable (e.g., geography) is related to the variables of interest. For instance, if a researcher is planning to study how psychologists work with illicit substance abusers in New York State, stratifying by geography will improve the sample if the various areas of the state (for example, rural upstate New York areas versus areas in and around New York City) tend to have different types of drug problems, which may require different treatment modalities.
7
100
LaVan, M., LaVan, H., & Martin, W. M. M. (2017). Antecedents, behaviours, and court case characteristics and their effects on case outcomes in litigation for persons with schizophrenia. Psychiatry, Psychology and Law, 24(6), 866–887. https://doi.org/10.1080/13218719.2017.1316176
Samples when Researchers Generalize
Note that geography is often an excellent variable for stratification because people tend to cluster geographically on the basis of many variables that are important in the social and behavioral sciences. For instance, they often cluster according to race/ethnicity, income/personal wealth, language preference, religion, and so on. Thus, a geographically representative sample is also likely to be representative of these other variables. Other common stratification variables are gender, age, occupation, highest educational level attained, and political affiliation. In Example 6.1.3, geography was used as a stratification variable.
Example 6.1.3 STRATIFIED RANDOM SAMPLING
The data for our study came from a survey of 1,767 students in grades 7, 8, and 9 from 40 middle and high schools from randomly selected cities and towns in the state of Massachusetts. Four strata were used: (1) cities with a minimum population of 100,000, (2) cities/ towns with population sizes between 50,000 and 100,000, (3) cities/towns with population sizes between 15,000 and 50,000, and (4) towns with population sizes below 15,000.8
8
This example is loosely based on two articles: Ousey, G. C., & Wilcox, P. (2005). Subcultural values and violent delinquency: A multilevel analysis in middle schools. Youth Violence and Juvenile Justice, 3(1), 3–22. https://doi.org/10.1177/1541204004270942 and Hughes, L. A., Botchkovar, E. V., Antonaccio, O., & Timmer, A. (2022). Schools, subcultural values, and the risk of youth violence: The influence of the code of 101
Samples when Researchers Generalize
If random sampling without stratification is used (as in Example 6.1.2 in the previous section, where 10% of all relevant cases were randomly selected), the technique is referred to as simple random sampling. In contrast, if stratification is first used to form subgroups from which random samples are drawn, the technique is called stratified random sampling.
Important Consideration Despite the almost universal acceptance that an unbiased sample obtained through simple or stratified random sampling is highly desirable for making generalizations, the vast majority of research from which researchers want to make generalizations is based on studies in which non-random (biased) samples were used. There are three major reasons for this: a) Although a random selection of names might have been drawn, researchers often cannot convince all those selected to participate in the research project. This problem is addressed in the next three evaluation questions. b) Many researchers have limited resources with which to conduct research: limited time, money, and assistance. Often, they reach out to individuals who are readily accessible or convenient to use as participants. For instance, college professors conducting research often find that the most convenient samples consist of students enrolled in their classes, which are not even random samples of students on their campuses. This is called convenience sampling.9 c) For certain populations, it is difficult to identify all members or obtain their listing, or sampling frame. If a researcher cannot do this, he or she obviously cannot draw a random sample of the population. Examples of populations whose members are difficult to identify are the homeless in a large city, successful burglars (i.e., those who have never been caught), avid social media users, or gamblers among older adults. Because so many researchers study non-random samples, it is unrealistic to count failures on the first two evaluation questions in this chapter as fatal flaws in the research methodology. If journal editors routinely refused to publish research articles with this type of deficiency, there would be very little published research on many of the most important problems in social and behavioral sciences. Thus, when researchers use non-random samples when attempting to generalize, the additional evaluation questions in this chapter help distinguish between studies from which it is reasonable to make tentative, very cautious generalizations and those that are hopelessly flawed with respect to their sampling.
9
102
the street among students in three US cities. Journal of Youth and Adolescence, 51(2), 244–260. https://doi. org/10.1007/s10964-021-01521-0 Even though convenience sampling is an inherently biased method for drawing samples from which to generalize, it is important to consider whether the study participants are a reasonable target population for the research question posed by the researchers. This is discussed in more detail in Evaluation Question 3a further in this chapter.
Samples when Researchers Generalize
2
BIASING INFLUENCES
___ Question 2a: Response Rate: Have the researchers discussed the participation rate in a study where only some of the solicited participants decided to participate? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Before, we used to define response rates in terms of what is considered reasonably high. For instance, just a few years ago, a professional survey organization with trained personnel and substantial resources would have been concerned if it had a response rate of less than 80% when conducting a national survey. However, with the proliferation of cell phones and spam calling (plus spam call filters based on caller ID), as well as exponential increases in surveys of every kind imaginable over the last decade, response rates to even large national surveys (those using elaborate probability sampling techniques to represent the population of the country) have been falling rapidly. The most recently reported response rates for national surveys are in the 40–60% range.10 The situation becomes even murkier when electronic or online surveys are solicited through email, text messages, or advertisements on a website. The pace of technological advances is so high, and changes in the use of phones, email, apps, and specific social media platforms are so unpredictable that it is difficult to make any specific judgments or draw even tentative thresholds about “typical” or “reasonably high” response rates for online surveys. Research and guidance on this topic become outdated within a few months or years of publication because of the rapid pace of change. Appendix D describes emerging issues in survey research and changes in the methods and modes of data collection. Example 6.2.1 presents an excerpt from a recent study reporting the response rate to an online targeted survey of UK prison employees.
Example 6.2.1 – Moran & Turner (2022) 11 RESEARCHERS REPORTING THE ONLINE SURVEY RESPONSE RATE
The link to the online survey for current staff was disseminated by email to staff at six prisons within the public sector, under the terms of research access granted by the National Research Council for Her Majesty’s Prison and Probation Service. [...] For the former-staff
10
11
Krieger, N., LeBlanc, M., Waterman, P. D., Reisner, S. L., Testa, C., & Chen, J. T. (2023). Decreasing survey response rates in the time of COVID-19: Implications for analyses of population health and health inequities. American Journal of Public Health, 113(6), 667–670. https://doi.org/10.2105/AJPH.2023.307267 Moran, D., & Turner, J. (2022). Drill, discipline and decency? Exploring the significance of prior military experience for prison staff culture. Theoretical Criminology, 26(3), 396–415. https://doi.org/10.1177/ 13624806211031248
103
Samples when Researchers Generalize
survey, links to a hosting website were posted on social media using a dedicated Twitter account. [...] The current-staff survey ran for six months in 2019, and the former-staff survey for 12 months across 2019–2020. For the current-staff survey, N = 83. The six establishments together employ about 1700 eligible staff, and the response rate of 4.88%, although low, is in line with expectations for an untargeted (i.e. not personally addressed) online survey distributed by an employer on behalf of an external organization. For the former-staff survey, N = 145. Since the number of potential participants is unknown, no response rate can be calculated. When considering response rates to surveys (for example, 4,000 students, our team recruited a random sample of 4,000 degree-seeking students from the full population; at smaller institutions, all students were invited to participate. Sample files, containing information for recruitment and nonresponse analyses, were obtained from the Registrar at each site. 12
13
Winters, K. C., Toomey, T., Nelson, T. F., Erickson, D., Lenk, K., & Miazga, M. (2011). Screening for alcohol problems among 4-year colleges and universities. Journal of American College Health, 59(5), 350–357. https://doi.org/10.1080/07448481.2010.509380 Lipson, S. K., Zhou, S., Abelson, S., Heinze, J., Jirsa, M., Morigney, J., ... & Eisenberg, D. (2022). Trends in college student mental health and help-seeking by race/ethnicity: Findings from the national healthy minds study, 2013–2021. Journal of Affective Disorders, 306, 138–147. https://doi.org/10.1016/j.jad.2022.03.038
105
Samples when Researchers Generalize
Students had to be at least 18 years old to participate; there were no other exclusion criteria. Students were recruited via email. To incentivize participation, students were informed of their eligibility for one of several prizes totaling $2,000 annually. Incentives were not contingent on participation. Upon clicking a personalized link in the email, students were presented with an informed consent page and had to agree to the terms before entering the survey. Response rates were as follows: 16% in 2013, 23% in 2014–15, 27% in 2015–16, 23% in 2016–17, 23% in 2017–18, 16% in 2018–19, 16% in fall 2019, 13% in winter/spring 2020, 14% in fall 2020, and 15% in winter/spring 2021. To adjust for potential differences between responders and nonresponders, the study team constructed sample weights. […] Weights are larger for respondents with underrepresented characteristics, making estimates representative of the full population in terms of known characteristics. Note, however, that even though the incentive amount seems huge ($2,000), it did not lead to response rates higher than 30%, and most of the yearly response rates are between 13% and 16%. A possible explanation for this inconsistency is that $2,000 is the total yearly amount of all participation prizes across 375 campuses and hundreds of thousands of students, and it is not clear exactly what the monetary incentive is for each individual student to participate.
___ Question 2c: Addressing Biases: Is there reason to believe that the participants and non-participants are similar on relevant variables? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: In some instances, as in Example 6.2.3 above, researchers have information about those who do not participate, which allows for a comparison between non-participants and participants. In institutional settings such as schools, it is often possible to determine whether participants and non-participants differ in important respects. College students who answered the survey may differ from nonresponders (and thus from the entire student body) in terms of sociodemographic characteristics, such as gender, GPA, and age. If there are substantial differences, the results should be interpreted considering these differences (nonresponse bias). Even with the weights applied to ensure that the sample characteristics resemble those of the whole student body, the researcher should be cautious in generalizing the results to all students. This is exactly what Lipson and colleagues (2022)14 are concerned about when they say in the study abstract: “Response rates raise the potential of nonresponse bias. Sample weights adjust along known characteristics, but there may be differences on unobserved characteristics.” Nonresponse bias is a serious problem when political polls or surveys are conducted in countries with authoritarian regimes where people may fear punishment for giving a wrong 14
106
Lipson, S. K., Zhou, S., Abelson, S., Heinze, J., Jirsa, M., Morigney, J., ... & Eisenberg, D. (2022). Trends in college student mental health and help-seeking by race/ethnicity: Findings from the national healthy minds study, 2013–2021. Journal of Affective Disorders, 306, 138–147. https://doi.org/10.1016/j.jad.2022.03.038
Samples when Researchers Generalize
response (the one disapproved by the government). One of the solutions researchers use in such situations is comparing groups or tracking fluctuations over time (longitudinally) in people’s refusals to answer, which can be telling in itself. Besides nonresponse bias, another important one to consider is self-selection bias. A common method for researchers to recruit people into their samples is to ask for volunteers. However, do those who volunteer for study participation differ in some important ways from those who have never responded to the study recruitment ads? Could this selective volunteering (or self-selection) have affected the study results and conclusions? Research demonstrates that this is what may have happened in the famous Stanford Prison Experiment (SPE): its results would have likely been very different if a different way of recruiting participants had been used.15 The ad looking for SPE volunteers mentioned “a psychological study of prison life,” which might have attracted people with more psychopathic personalities than the general student population on campus. As a result, such volunteers might have been more prone to using emotionally abusive tactics in the “prison guard” role.16 Appendix A1 discusses Zimbardo’s SPE and Milgram’s experiments on obedience in more detail, from the point of view of modern research ethics and the value of research findings. Another possible bias to consider is attrition, or selective dropout of participants from the study, especially for longitudinal studies conducted over longer periods of time.17 Imagine: Of the 120 participants who signed up for a study and completed the first round of interviews, only 70 were left by the third round of interviews several months later. It is important to compare the characteristics of those who dropped out of the study with those who stayed. If the two groups differ on some important study variables or demographic characteristics, the possibility of bias should be discussed by the researchers. It is very likely that by the third wave, the remaining participants are not as representative of the larger population as the original 120 were; thus, the study results could be misleading or difficult to generalize.
3
BALANCING INFLUENCES
___ Question 3a: Targeted Sampling: If a sample is not random, was it at least drawn from the target group for the generalization? Very unsatisfactory 15
16 17
1
2
3
4
5
Very satisfactory
or N/A
I/I
Carnahan, T., & McFarland, S. (2007). Revisiting the Stanford Prison Experiment: Could participant selfselection have led to the cruelty? Personality and Social Psychology Bulletin, 33(5), 603–614. https://doi. org/10.1177/0146167206292689 For more information about the Stanford Prison Experiment and possible interpretations of its results, see the online resources for this chapter. Attrition is especially important to consider for studies that involve experiments. These issues are discussed in more detail in Chapter 9.
107
Samples when Researchers Generalize
Comment: There are many instances in the published literature in which a researcher studied one type of participant (e.g., college freshmen) and used the data to generalize to a different target group (e.g., young adults in general). When deciding which population the results can be generalized to, it is important to take into account the research question investigated in the study and the details of sample selection. For most research questions in descriptive studies seeking to estimate prevalence (e.g., assessing the prevalence of marijuana smoking in college students), the results can only be generalized to the actual population from which the sample was drawn. However, for studies assessing correlations, and especially for those testing causal explanations (explanatory studies), the results can often be generalized to a wider population. (Recall that these types of research were explained in Chapter 1.) Example 6.3.1 describes the purposive (non-probability) sample used in a study on how students’ self-control and smartphone use explain their academic performance. The researchers wanted to apply the results to college students only. Thus, the sample is adequate in terms of this evaluation question because the sample was drawn from the target group. Note that the researchers used both monetary and non-monetary incentives highly valuable to students – academic credits.
Example 6.3.1 – Troll et al. (2021)18 NON-RANDOM SAMPLE FROM THE TARGET GROUP (COLLEGE STUDENTS)
Study 1 The first correlational study sought to examine (1) the extent to which smartphone-use distracts university students from studying compared to other known sources of distraction and (2) whether students’ trait self-control is associated with the extent of distraction (via smartphones) while studying.
Methods Participants and procedure Participants were 454 students (74.9% woman; 0.7% diverse; Mage = 22.29, SD = 3.88) from Germany (N = 344), Switzerland (N = 81), and Austria (N = 29). They were recruited through an online research participation system at {Institution}, different online platforms (e.g., Facebook), and scientific networks. Participants received brief, general information about the study and could access the questionnaire via a URL link. Participants first completed all measures before providing demographic information. They had the chance to
18
108
Troll, E. S., Friese, M., & Loschelder, D. D. (2021). How students’ self-control and smartphone-use explain their academic performance. Computers in Human Behavior, 117, 106624. https://doi.org/10.1016/j. chb.2020.106624
Samples when Researchers Generalize
win one of two 25€ vouchers for an online store. Participating students from {Institution} additionally received course credit (0.25 hours).
___ Question 3b: Diverse Sample: If a sample is not random, was it drawn from diverse sources? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Did a researcher generalize to all college students after studying only students attending a small religious college in which 99% of the students had the same ethnic/racial background? Did a researcher generalize to men and women regarding the relationship between exercise and health after studying only men attending a cardiac unit exercise program? An answer of “yes” to these types of questions might lead to a low rating for this evaluation question. When a researcher wishes to generalize to a larger population in the absence of random sampling, consider whether the researcher sought participants from several sources, which increases the odds of sample representativeness. For instance, much educational research has been conducted in only one school. Using students from several schools within the district would increase the odds that the resulting sample would reflect the diversity of the district. In Example 6.3.2, the researchers used multiple methods to draw a sample for the study of parents of children with learning and attentional disabilities. This is vastly superior to using just one method for locating participants in a hard-to-reach population.
Example 6.3.2 – Park et al. (2020) 19 DIVERSE SOURCES FOR A SAMPLE (HELPS INCREASE REPRESENTATIVENESS)
Using data from different perspectives (triangulation) to strengthen the study’s validity (Patton 1999), we enrolled professional LAD [learning and attentional disability] experts from organizations and parents of children with LAD (Denzin 2006). […] We distributed study flyers online through local and national organizations, schools (public, private, special needs), support service centers, online advocacy groups, and Special Education Advisory Councils to recruit professionals. We also made calls to schools and organizations to describe our study and request that they post our flyers and refer members of their community to our study. Parents contacted the study coordinator to be screened and consented. Enrolled parents were asked to complete a sociodemographic survey prior to their phone-based interview, and they received $25 remuneration for their participation. 19
Park, E. R., Perez, G. K., Millstein, R. A., Luberto, C. M., Traeger, L., Proszynski, J., ... & Kuhlthau, K. A. (2020). A virtual resiliency intervention promoting resiliency for parents of children with learning and attentional disabilities: A randomized pilot trial. Maternal and Child Health Journal, 24, 39–53. https://doi. org/10.1007/s10995-019-02815-3
109
Samples when Researchers Generalize
___ Question 3c: Generalizability Limitations Discussed: If a sample is not random, does the researcher explicitly discuss this limitation and how it may have affected the generalizability of the study findings? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: While researchers may discuss the limitations of their methodology (including sampling) in any part of their reports, many explicitly discuss the limitations in the Discussion section at the end of their articles. Example 6.3.3 appeared near the end of the research report.
Example 6.3.3 – Craig et al. (2019) 20 STATEMENT OF A LIMITATION IN SAMPLING
The findings of the current study should be considered in light of its limitations. […] [Our] sample consisted of higher risk adjudicated delinquents from a single southeastern state in the United States, thus limiting its generalizability. Such acknowledgments of limitations do not remedy the flaws in the sampling procedure nor do they improve the researchers’ ability to generalize. However, they do perform two important functions: (a) they serve as warnings to naïve readers regarding the problem of generalizing, and (b) they reassure all readers that the researchers are aware of a serious flaw in their methodology.
4
CHARACTERISTICS OF THE SAMPLE
___ Question 4a: Sample Demographics: Has the author described relevant characteristics of the sample? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Researchers should describe the relevant background characteristics of the sample. For instance, when studying physicians’ attitudes toward assisted suicide, it would be relevant to know their religious affiliation. For studying consumers’ preferences, it would be helpful to know their economic status.
20
110
Craig, J. M., Intravia, J., Wolff, K. T., & Baglivio, M. T. (2019). What can help? Examining levels of substance (non) use as a protective factor in the effect of ACEs on crime. Youth Violence and Juvenile Justice, 17(1), 42–61. https://doi.org/10.1177/1541204017728998
Samples when Researchers Generalize
In addition to the participants’ characteristics that are directly relevant to the variables being studied, it is usually desirable to provide an overall demographic profile, including variables such as age, gender, race/ethnicity, and highest level of education. This is especially important when a non-random convenience sample is used because readers will want to visualize the particular participants who were part of such a sample. Example 6.4.1 is an excerpt from a study on how religious functioning is related to mental health outcomes in military veterans.
Example 6.4.1 – Boals & Lancaster (2018) 21 DESCRIPTION OF RELEVANT DEMOGRAPHICS
Military veterans (N = 90) completed an online survey for the current study. The sample was primarily male (80%) and Caucasian (79%). The mean age of the sample was 39.46 (SD = 15.10). Deployments were primarily related to Operation Iraqi Freedom/Operation Enduring Freedom (OIF/OEF) (n = 62), with other reported deployments to Vietnam (n = 12), the Balkan conflict (n = 4), and other conflicts (n = 3). Nine participants did not report the location of their deployments. The mean number of deployments was 1.47, and the mean time since last deployment was 13.10 years (SD = 13.56; Median = 8.00).
21
Boals, A., & Lancaster, S. (2018). Religious coping and mental health outcomes: The mediating roles of event centrality, negative affect, and social support for military veterans. Military Behavioral Health, 6(1), 22–29. https://doi.org/10.1080/21635781.2017.1333060
111
Samples when Researchers Generalize
When information on a large number of demographic characteristics has been collected, researchers often present them in statistical tables instead of in the narrative of the report.
___ Question 4b: Sample Size: Is the overall size of the sample adequate? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Students who are new to research methods are sometimes surprised to learn that there is often no simple answer to the question of how large a sample size should be. First, it depends, in part, on how much error a researcher is willing to tolerate. For public opinion polls, a stratified random sample of about 1,500 produces a margin of error of about 1–3%. A sample size of 400 produces a margin of error of about 4–6%.22 If a researcher is trying to predict the outcome of a close election, clearly a sample size of 400 would be inadequate.23 Responding to a public opinion poll usually takes little time and may be of interest to many participants, thus making it easier for the researchers to reach a large sample size. However, other types of studies may be of less interest to potential participants and/or may require extensive effort on the part of participants. Additionally, certain data collection methods (such as individual interviews) may require expenditure of considerable resources by researchers. Under such circumstances, it may be unrealistic to expect researchers to use larger samples. Thus, a consumer of research should ask whether the researchers used a reasonable number, given the particular circumstances of their study. Would it have been an unreasonable burden to use substantially more participants? Is the number of participants so low that there is little hope of making sound generalizations? Would it be reasonable to base an important decision on the results of the study given the number of participants used? Subjective answers to these types of questions will guide consumers of research on this evaluation question.24 It is important to keep in mind that a large sample size does not compensate for a sampling bias due to the failure to use random sampling. Using large numbers of unrepresentative participants does not get around the problem of their unrepresentativeness.
22 23
24
112
The exact size of the margin of error depends on whether the sample was stratified and on other sampling issues that are beyond the scope of this book. With a sample of only 400 individuals, there would need to be an 8–12% difference (twice the 4–6 point margin of error) between the two candidates for a reliable prediction to be made (i.e., a statistically significant prediction). There are statistical methods for estimating optimum sample sizes under various assumptions. While these methods are beyond the scope of this book, note that they do not take into account the practical matters raised here.
Samples when Researchers Generalize
___ Question 4c: Subgroup Size: Is the number of participants in each subgroup sufficiently large? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: When several groups of people are compared, for example, in the context of an experiment where one group has received “treatment” while the other is a comparison group, the rule of thumb is to have at least 30 participants per group (or subgroup) if the groups are fairly homogenous.25 A larger number of participants per group is needed if the groups are heterogeneous, that is, if there is a lot of variation among the participants in sociodemographic characteristics such as race, gender, age, income, or any other relevant variable. Consider the hypothetical information in Example 6.4.2, where the number of participants in each subgroup is indicated by n, and the mean (average) scores are indicated by m.
Example 6.4.2 A SAMPLE IN WHICH SOME SUBGROUPS ARE VERY SMALL
A random sample of 100 college freshmen was surveyed for their knowledge of alcoholism. The mean (m) scores on the knowledge scale of 1–25 were as follows: White (m = 18.5, n = 78), African American (m = 20.1, n = 11), Hispanic/Latino (m = 19.9, n = 9), and Chinese American (m = 17.9, n = 2). Thus, for each of the four ethnic/racial groups, there was a reasonably high average knowledge about alcoholism. Although the total number in the sample is 100 (a number that might be acceptable for some research purposes), the numbers of participants in the last three subgroups in Example 6.11.1 are so small that it would be highly inappropriate to generalize from them to their respective populations. The researcher should either obtain a larger number of participants in each subgroup or refrain from reporting the individual subgroups separately. Note that there is nothing wrong with indicating ethnic/racial backgrounds (such as the fact that there were two Chinese American participants) in describing the demographics of the sample. Instead, the problem is that the number of individuals in some of the subgroups used for comparison is too small to justify calculating a mean and making any valid comparisons or inferences about them. For instance, a mean of 17.9 for Chinese Americans is meaningless for the purpose of generalization because there are only two individuals in this subgroup. Here, at least 30 people per subgroup were required.
25
There is nothing magical about the number 30 – the reasons are purely statistical and have a lot to do with statistical significance testing (see more on this topic in Appendix C: Limitations of Significance Testing).
113
Samples when Researchers Generalize
5
ETHICAL CONSIDERATIONS
___ Question 5a: Research Ethics Review: Has the study been approved by a research ethics committee/ Institutional Review Board (IRB)? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: For any study that involves human subjects, even if indirectly, researchers planning the study must undergo a research ethics review process. In the United States, committees responsible for such ethics reviews are called Institutional Review Boards (IRBs). In Canada, similar agencies are known as Research Ethics Boards (REBs). In the United Kingdom and European countries, there is a system of Research Ethics Committees (RECs). Such an ethics committee checks whether the study meets the required ethical standards and does not present any undue danger of harm to the participants (usually, three types of harm are considered: physical, psychological, and legal harm). The study can only commence after approval from the relevant ethics committee. It is not necessary to mention the IRB’s or an analogous agency’s approval in the research report, but it is often a good idea to do so. Example 6.5.1 shows how such approval can be stated in an article (although a separate subheading is uncommon).
Example 6.5.1 – Geoffroy et al. (2018) 26 A BRIEF MENTION OF APPROVAL BY THE RELEVANT ETHICS REVIEW COMMITTEE, UNDER A SEPARATE SUBHEADING
Ethics Approval The Ethics Committee of the Institut de la statistique du Québec and the Research Ethics Board of the CHU Sainte-Justine Research Center approved each phase of the study, and informed consent was obtained. There may be times when a consumer of research judges that the study is so innocuous that an ethics review or informed consent (discussed next) is not needed. An example is an observational study in which individuals are observed in public places, such as a public park or shopping mall, while the observers are in plain view. Because public behavior is being observed by researchers in such instances, privacy would not normally be expected, and informed consent
26
114
Geoffroy, M. C., Boivin, M., Arseneault, L., Renaud, J., Perret, L. C., Turecki, G., … & Tremblay, R. E. (2018). Childhood trajectories of peer victimization and prediction of mental health outcomes in midadolescence: A longitudinal population-based study. Canadian Medical Association Journal, 190(2), E37–E43. https://doi.org/10.1503/cmaj.170219
Samples when Researchers Generalize
may not be required. However, approval from an ethics review committee is required even for benign studies. Appendix A2 describes a case of research fraud in a study, for which distorted data were deliberately selected (cherry-picked) by the lead researcher Andrew Wakefield, and the horrible consequences that followed – the rise of anti-vaccine movements worldwide and multiple deaths of children from vaccine-preventable diseases. Fortunately, such cases are rare, but this example shows how important the research ethics review is, beyond protecting study participants.
___ Question 5b: Informed Consent: Has informed consent been obtained from study participants? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: It is almost always highly desirable to obtain written informed consent from participants in a study. Participants should be informed of the nature of the study and, at least in general terms, the nature of their involvement. They should also be informed of their right to withdraw from the study at any time, without penalty. Typically, researchers report this matter only very briefly, as illustrated in Example 6.5.2, which presents a statement similar to many found in research reports in academic journals. It is unrealistic to expect much more detail than is shown here because, by convention, the discussion of this issue is typically brief.
Example 6.5.2 A BRIEF DESCRIPTION OF INFORMED CONSENT
Students from the departmental subject pool volunteered to participate in this study for course credit. Prior to participating in the study, students were given an informed consent form that had been approved by the university’s Institutional Review Board. The form described the experiment as “a study of social interactions between male and female students” and informed them that if they consented, they were free to withdraw from the study at any time without penalty.
___ Question 5c: Data and Privacy Protections: Have the researchers used safeguards to protect the participants’ data and privacy? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
115
Samples when Researchers Generalize
Comment: Even in secondary research (the research that uses existing or previously gathered data), protecting participants’ data and privacy is of paramount importance. In total population studies based on big data (discussed at the beginning of this chapter) or studies using data from social media, no contact with participants is involved, only the analyses of their data. However, the inadvertent release of data or intentional data breaches by hackers can cause enormous harm to those whose data are disclosed or compromised. Therefore, it is important that researchers ensure data protection. This is rarely mentioned in journal articles because this issue is understood implicitly (and was presumably examined by the relevant ethics committee before the start of the study). Example 6.5.3 shows a rare instance when the data protection procedures are mentioned explicitly (note that this study did not involve any direct contact with its subjects but did involve sensitive data).
Example 6.5.3 – Gay et al. (2015) 27 DESCRIPTION OF DATA PROTECTION THROUGH DE-IDENTIFICATION
The researcher applied for and received ethics approval from the Department of Community Health (DCH) Institutional Review Board (IRB). All data were kept confidential and presented in an anonymous format such that individual defendants were unidentifiable. This study was archival in nature and did not involve any direct contact with the study subjects. A list of competency assessments conducted through a state psychiatric facility in the southeastern United States was generated for the years 2010 to 2013, including both inpatient and outpatient evaluations.
___ Question 6: Overall: Is the sample appropriate for generalizing? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Rate this evaluation question after considering your answers to the earlier ones in this chapter and taking into account any additional considerations and concerns you may have. Be prepared to discuss your response to this evaluation question.
Concluding Comment Although the primary goal of much research in all sciences is to make sound generalizations from samples to populations, researchers in the social and behavioral sciences 27
116
Gay, J. G., Ragatz, L., & Vitacco, M. (2015). Mental health symptoms and their relationship to specific deficits in competency to proceed to trial evaluations. Psychiatry, Psychology and Law, 22(5), 780–791. https:// doi.org/10.1080/13218719.2015.1013009
Samples when Researchers Generalize
face special problems regarding access to and cooperation from samples of human participants. Unlike other published lists of criteria for evaluating samples, the criteria discussed in this chapter urge consumers of research to be pragmatic when making these evaluations. A researcher may exhibit some relatively serious flaws in sampling, yet a consumer may conclude that the researcher did a reasonable job under these circumstances. However, this does not preclude the need to be exceedingly cautious in making generalizations from studies with weak, non-representative samples. Confidence in certain generalizations based on weak samples can be increased, however, if various researchers with different patterns of weaknesses in their sampling methods arrive at similar conclusions when studying the same research problems (this important process, already mentioned in Chapter 2, is called replication). In the next chapter, the evaluation of samples when researchers do not attempt to generalize is considered. Visit the Instructor & Student Resources website for multiple choice questions and additional resources: www.routledge.com/cw/tcherni-buzzeo
Chapter 6 Exercises Part A Directions: Answer the following questions: 1
Suppose a researcher conducted a survey on a college campus by interviewing students that she or he approached while they were having dinner in the campus cafeteria one evening. In your opinion, is this a random sample of all students enrolled in the college? Could the method be improved? How?
2
Briefly explain why geography is often an excellent variable on which to stratify when sampling.
3
According to this chapter, the vast majority of research is based on biased samples. Cite one reason that is given in this chapter for this circumstance.
4
If multiple attempts have been made to contact potential participants, and yet the response rate is low, would you be willing to give the report a reasonably high rating for sampling? Explain.
5
Is it important to know whether participants and non-participants are similar on relevant variables? Explain.
6
Does the use of a large sample compensate for a bias in sampling? Explain.
117
Samples when Researchers Generalize
Part B Directions: Locate several research reports in academic journals in which the researchers are concerned with generalizing from a sample to a population, and apply the evaluation questions in this chapter. Select the one to which you gave the highest overall rating and bring it to class for discussion. Be prepared to discuss the strengths and weaknesses of the sampling method used.
118
CHAPTER 7
Understanding and Evaluating Context-Specifc Research Samantha A. Tosto
As indicated in previous chapters, researchers often study samples in order to make inferences about the larger population from which the sample was drawn. This process is known as generalizing. However, there are several instances in both qualitative and quantitative research in which the purpose is not to generalize but rather to gain further nuanced insight into one specific sample or case. Before evaluating sampling in this kind of research, it is important to briefly discuss what types of studies are not meant for generalization and what type of sampling is used in such studies.
Types of Research with Nongeneralizable Findings Nongeneralizable research is both common and important for the development of scientific knowledge, typically allowing for exploratory insight into areas of inquiry or programs/policies being developed. Some common forms of nongeneralizable research are highlighted below.
Ethnography Ethnography is a form of research focused on in-depth exploration of a specific community, culture, or group. The goal of this work is not to generalize to any community beyond that which is studied, but rather to gather rich, nuanced data about what group membership in this culture encompasses. Often, interviews and field observation are used in conjunction with ethnography so that researchers may fully immerse themselves in the lived experiences of this group. Ethnography is most commonly seen in literature exploring niche subcultures, small communities, and other groups that are not easily accessed by sampling the general population. Often, ethnographic research results are compiled into book format with smaller sections of the results being published in various journal articles. Researchers may also write about their own lived experience within a specific sociocultural context, a process known as autoethnography . Example 7.0.1 provides a summary of ethnographic research of court work.
DOI: 10.4324/9781003362661-7
119
Context-Specific Research
Example 7.0.1 EXAMINING INJUSTICE IN AMERICA’S COURTS
Gonzalez Van Cleve (2016)1 – In this book, Gonzalez Van Cleve conducts a thorough and in-depth analysis of the criminal courts of Cook County, Illinois. Over the course of several years, the author conducts over 1,000 hours of courtroom observations and over 100 interviews with lawyers, judges, and other courtroom actors to gain insight into how racial injustice is embedded within the formal practices and informal social institutions that make up the nation’s largest criminal court. In seeking such rich, nuanced information about the way Cook County Courts run, Gonzalez Van Cleve is able to gain access to a unique subculture of the American criminal legal system that is not generalizable but provides a level of insight not typically gained from other forms of research.
Phenomenological Research Phenomenology is a means of scientific inquiry focused on the lived human experience and the ways in which individuals consciously process a given phenomenon. At its core, phenomenology seeks to bring light to phenomena not typically focused on through the conscious thought and processes of humans who have experienced them directly. Through this interpretive exploration, a more comprehensive understanding of these phenomena is created; it is through the experience of, and reaction to, these phenomena that reality must be constructed. Researchers who conduct this type of work will often seek the perspectives of several individuals with the hopes of uncovering the universal essence or shared understanding of what it means to experience a given phenomenon. It is also central to this type of work, contrary to that of ethnography and other interpretative forms of research, that the researchers remove themselves from the data and have only first-person experiences included in the analysis. They attempt to suspend any preconceived ideas they have about the phenomenon in question in order to allow for a more concrete comprehension of how others process and navigate these experiences. Examples 7.0.2 and 7.0.3 describe phenomenological research.
Example 7.0.2 BEING A NURSE DURING THE COVID-19 PANDEMIC
White (2021)2 – Online interviews with nurse managers and assistant nurse managers combatting the coronavirus outbreak in the United States were conducted to better understand
1 2
120
Gonzalez Van Cleve, N. (2016). Crook County: Racism and injustice in America’s largest criminal court. Stanford University Press. www.sup.org/books/title/?id=23968 White, J. H. (2021). A phenomenological study of nurse managers’ and assistant nurse managers’ experiences during the COVID-19 pandemic in the United States. Journal of Nursing Management, 29(6), 1525–1534. https://doi.org/10.1111/jonm.13304
Context-Specific Research
the experiences of frontline leadership. As individuals who are both most at risk of infection and experienced the greatest workload shifts following the COVID-19 outbreak, nurses and nurse management provide a unique insight into the pandemic and how it has affected those most central to related healthcare changes. Nurse managers reported several shared experiences including being there for staff, leadership challenges, struggles and coping, and strengthening one’s role during the pandemic. The lived experiences of these managers provide insight into the ways in which healthcare staff were expected to both provide emotional support for others and cope with their own trauma during the pandemic.
Example 7.0.3 STUDENT EXPERIENCES OF CYBERBULLYING AND BULLYING
Chan et al. (2020)3 – In this phenomenological work, the authors conducted several focus groups with Malaysian youth and school counselors to explore their experiences with bullying and cyber bullying. In these focus groups, the authors were able to uncover what shared experiences existed among members and how they made sense of the interconnected nature of both in-person and online bullying. Participants shared their experiences with both being the victim of bullying and being bystanders, particularly in online spaces. Such evidence helps with the conceptualization and academic understanding of cyberbullying and the experiences of the students involved.
Action research Action research, also commonly referred to as action-oriented research or participatory-action research, has a unique focus on incorporating participants and key stakeholders as active participants in the research process. The goal is collaborative development and/or evaluation of policies, practices, and programs that both improve community outcomes and contribute to the scientific knowledge base of that field. In engaging these stakeholders in the research/learning process, the emphasis is on allowing communities to have a say in the decisions that affect them and potentially provide unique insight that only those most affected by the problem at hand might have. This form of research is particularly common in education, public health, and other fields that rely heavily on community engagement. For example, if a team of researchers was hired to help a school district implement a restorative conferencing program, they may incorporate key stakeholders such as the Board of Education, school administrators, faculty, staff, and even parents from the community. These stakeholders would be asked to provide feedback into what that would look like in their specific districts, what resources they feel they have and/or need, and if any unique challenges/barriers 3
Chan, N. N., Ahrumugam, P., Scheithauer, H., Shultze-Krumbholtz, A., & Ooi, P. B. (2020). A hermeneutic phenomenological study of students’ and school counsellors’ “lived experiences” of cyberbullying and bullying. Computers and Education, 146. https://doi.org/10.1016/j.compedu.2019.103755
121
Context-Specific Research
exist within their community that would be of important consideration during the research and implementation process. This provides insight into issues that the researchers, who do not work for school districts, may otherwise not be aware of. Examples 7.0.4 through 7.0.6 summarize empirical articles focused on action research and explain why generalizations from this type of research are limited.
Example 7.0.4 EMPOWERING VOICES – PARTICIPATORY ACTION RESEARCH TO ADDRESS MIDDLE SCHOOL DROPOUT
Phillips et al. (2010)4 – In this participatory-action research, the authors sought to incorporate youth into the process of evaluating a middle school program aimed at reducing drop-out rates and provide an accelerated opportunity for youth to catch up to an appropriate grade level for their age. Working with both teachers and students who are part of the program in this urban, working-class school, the researchers gained critical insight into some of the key areas of improvement in this program space and empowered marginalized students/faculty to have their voices heard during the evaluation process. In doing so, this research identified key gaps between practice and theory in educational programming and how teachers/students who are served by these programs experience unique challenges.
Example 7.0.5 ACTION RESEARCH TO ADDRESS HEALTH INEQUITY AND CLIMATE CHANGE
Friel et al. (2011)5 – This study reviews the work of the Global Research Network on Urban Health Equity to identify the connections between climate change, urbanization, and urban health inequities, particularly in at-risk developing nations. The authors identify several possible causal pathways in this relationship and attempt to create an action-oriented research agenda for focusing on and prioritizing the voices of those most impacted by these global changes. Given the many institutional and systemic structures that must be part of the solutions at local and nation-wide levels, the researchers suggest that more work be done to understand the unique challenges faced by urban communities most affected by climate change and the need for both quantitative and qualitative assessment of program/ policy implications in these areas.
4
5
122
Phillips, E. N., Berg, M. J., Rodriguez, C., & Morgan, D. (2010). A case study of participatory action research in a public New England middle school: Empowerment, constraints and challenges. American Journal of Community Psychology, 46(1–2), 179–194. https://doi.org/10.1007/s10464-010-9336-7 Friel, S., Hancock, T., Kjellstrom, T., McGranahan, G., Monge, P., & Roy J. (2011). Urban health inequities and the added pressure of climate change: An action-oriented research agenda. Journal of Urban Health: Bulletin of the New York Academy of Medicine, 88(5), 886–895. https://doi.org/10.1007/s11524-011-9607-0
Context-Specific Research
Example 7.0.6 UNDERSTANDING THE IMPLEMENTATION OF A SCHOOL-BASED HEALTH PROGRAM IN THE NETHERLANDS
Bartelink et al. (2018)6 – This protocol paper shows the process of a contextual action research approach, which seeks to understand the implementation process of a Dutch school-based health program into various lower-class community schools. Utilizing information gathered from school teachers and staff, health coordinators, and parents of enrolled children, the authors gain unique insight into how the skills from the program are being utilized, its long-term health effects, and context-specific barriers that might be present or affect implementation processes. This form of contextual action research is not generalizable because the goal is to find points of implementation where change or needed adaptations interact with context-specific factors, meaning that each context, or school, may have different findings or outcomes.
Case studies Case studies are utilized to obtain rich, nuanced details about a specific phenomenon or issue in the real-world context in which it is experienced. Case studies are utilized almost exclusively for exploratory or descriptive research, instances in which the goal is simply to learn more detail about how a given phenomenon or issue plays out in its natural context. Case studies often include very specific boundaries of what is defined as the phenomenon under question and who may be included as part of this research. In case study research, case(s) are often chosen not because of their representativeness but rather their uniqueness, with generalizability not being the goal. Case studies may include one singular case or person or multiple cases for comparison or exploration of the different ways these phenomena may be experienced. A singular case study may also be done on an entire entity such that a whole school, organization, event, or other multi-person agency is considered as one unit. For example, researchers may be interested in how racial/ethnic minority groups who are also members of the LGBTQIA+ community are exposed to or experience hate crimes or other biased behaviors. Given the niche nature of this group (being both a racial and sexual minority), research in this area would provide unique insight into how a member of this specific group experiences these phenomena and their effects but would not be generalizable to heterosexual racial/ethnic minorities or to White members of the LGBTQIA+ community and may not generalize well to LGBTQIA+ minorities in other contexts. Examples 7.0.7 through 7.0.9 describe several different types of case studies in published research articles.
6
Bartelink, N. H. M., van Assema, P., Jansen, M. W. J., Savelberg, H. H. C. M., Willeboordsem, M., & Kremers, S. P. (2018). The healthy primary school of the future: A contextual action-oriented research approach. International Journal of Environmental Research and Public Health, 15(10). https://doi.org/10.3390/ ijerph15102243
123
Context-Specific Research
Example 7.0.7 EXPLORING CRIMINOLOGICAL CONCEPTS THROUGH DOCUMENTARY
Redmon (2017)7 – In this article, the author explores the use of documentaries as a means for delving into criminological concepts and producing knowledge within the field. As part of this work, Redmon conduct a case study of the film, Girl Model, which focuses on the lived experiences of a Russian girl’s exploitation by the fashion/modeling industry. In utilizing sight, sound, and other film-making techniques, the directors provide real-world, experiential narratives that viewers can utilize to better understand the criminal underground of human trafficking. While not generalizable to all victims of exploitation in the industry, Girl Model’s, narrative explores the experience of the subject while also highlighting the larger use of documentary as part of criminological research methodology.
Example 7.0.8 A FORENSIC CASE STUDY OF ANTISOCIAL PERSONALITY DISORDER AND CRIMINALITY
DeLisi et al. (2021)8 – Case studies are frequently used in the field of psychology to highlight the experiences of people with specific mental disorders and the behaviors associated with them. In this study, the authors conduct a forensic case study of an individual with antisocial personality disorder (ASPD) and detail their lengthy criminal legal involvement. The comprehensive qualitative data provided by this single-person case study creates a rich description of how ASPD and associated behaviors may present and affect the lives of those afflicted. Such depth of understanding cannot be achieved with available quantitative information about diagnostic prevalence. This type of case study allows for the development and fine-tuning of psychological theory and our understanding of how such mental disorders may present in some of the most severe of cases.
Example 7.0.9 A CASE STUDY OF SCHOOL CLIMATE AND DISCIPLINARY EQUITY
Gullo (2018)9 – Often, research is needed to explore the context-specific needs of an organization or large entity, which is then considered as one “case” to be evaluated. In response to
7 8 9
124
Redmon, D. (2017). Documentary criminology: “Girl Model” as case study. Crime Media Culture, 13(3), 357–374. https://doi.org/10.1177/1741659016653994 DeLisi, M., Drury, A. J., & Elbert, M. J. (2021). Psychopathy and pathological violence in a criminal career: A forensic case report. Aggression and Violent Behavior, 60, 101521. https://doi.org/10.1016/j.avb.2020.101521 Gullo, G. L. (2018). Using data for school change: The discipline equity audit and school climate survey. Journal of Cases in Educational Leadership, 21(2), 28–51. https://doi.org/10.1177/1555458917728758
Context-Specific Research
community calls for change within a singular school district and an equity audit that found racial and ethnic disparities in disciplinary practices for students, Gullo (2018) conducted a district-wide school climate survey for all parents and students. In this survey, participants were asked to respond to questions about their perceived safety or child’s safety, awareness of school disciplinary policies, and perceptions of daily experiences. In collecting data from the entire district with the goal of better understanding this specific community’s unique needs, this study represents a case study of students’ and other impacted community members’ ability in the district to acknowledge and evaluate the efficacy of the district’s disciplinary practices.
Types of Research with Limited Generalizability Pilot studies Pilot studies are utilized primarily to determine the feasibility of a specific method or research design for studying specific problems. In this type of work, researchers will attempt to implement a small-scale version of the research they are interested in conducting. This is done to help researchers determine what in their research design should be adapted (what worked well and what did not) prior to conducting a subsequent full-scale study. This not only improves the validity of the full-scale research but provides unique insight into barriers or new research questions that otherwise may not have been considered. As such, these studies are included in the chapter as a type of research with limited generalizability – the goal of such work is to help plan or troubleshoot larger studies or interventions being planned but isn’t meant to generalize beyond the larger roll-out. Typically, pilot studies utilize a convenience sample to test the program or data collection instruments and receive feedback from the sample. This allows the researcher to also understand what type of full-scale sample is best suited for this work and how to gather such an appropriate sample. Several types of pilot studies exist and can be utilized depending on the research objectives, the needs of the researchers prior to full implementation, and the overall stage in the research process. Feasibility pilot studies evaluate the practicality of rolling out the study on a larger scale, including evaluating various aspects such as problems with participant recruitment, data collection procedures, research instruments, and potential challenges or limitations that may arise during the main study. Instrumentation pilot studies focus on testing the instruments or measures the researchers intend to use. Process pilot studies seek to finetune the implementation logistics of the project. Intervention development pilot studies are utilized primarily in the behavioral sciences and seek to evaluate a target intervention on a small sample. Data collection and analysis pilot studies both focus on the methodological concerns of the study and allow researchers the opportunity to ensure they have a solid plan for both data collection and analysis, respectively. Some authors may also refer to their work as a pilot study if they were unable to gather the target sample size and retrospectively determine that this study is a preliminary look into the given issue. While not the proper use of the term “pilot study,” this may help inform how the
125
Context-Specific Research
researcher conducts future work and gains access to a larger sample. Given the exploratory nature of pilot studies, the goal is to generalize the findings but NOT beyond the sample for which the researchers envision the full-scale roll-out of the intervention. Example 7.0.10 is a summary of a published pilot study with limited generalizability.
Example 7.0.10 DEVELOPING AN APP FOR PARENTS OF TEENS USING DRUGS
Becker et al. (2021)10 – A team of researchers seek to implement a phone-based application for parents of youth with a substance use disorder that will help them better engage the skills learned through traditional in-office continuing care. Piloting the app with a smaller, random sample of patients and their families helps the researchers explore the feasibility of implementing this app-based program and acceptability of parents (interest in and approval of the app). This will provide crucial information for improving the app and program for widespread roll-out. Given its exploratory nature, this study’s findings cannot be generalized to other programs or diagnoses beyond substance use disorder.
Sampling in Research Not Meant for Generalization In these forms of nongeneralizable research, researchers either try to study an entire population or program or utilize purposive sampling strategies. For purposive samples, which are most common in qualitative research, the sample size may be determined by the availability of participants who fit the sampling profile for the purposive sample. For example, if a researcher wanted to understand the college experiences of first-generation students, they would seek out only participants who fit that description (being a first-generation student) to ensure that the data they collect truly represents what they are trying the capture. Typically, these samples will be drawn until the researcher believes they have reached what is called saturation. Saturation is the point at which additional participants are adding little new information to the picture that is emerging from the data the researchers are collecting. In other words, saturation occurs when new participants are revealing the same types of information as those who have already participated, and new data does not reveal anything uniquely different from that which has already been shared. Purposive sampling may also be utilized when seeking to evaluate one specific case or when interested in engaging with entire specific communities/populations. This is especially true in institutional settings such as schools, where, for example, all the high school students in a school district (the population) might be tested or teachers may be asked to incorporate 10
126
Becker, S. J., Helseth, S. A., Janssen, T., Kelly, L. M., Escobar, K. I., Souza, T., Wright, T., & Spirito, A. (2021). Parent SMART (Substance Misuse in Adolescents in Residential Treatment): Pilot randomized trial of a technology-assisted parenting intervention. Journal of Substance Abuse Treatment, 127, 108457. https://doi.org/10.1016/j. jsat.2021.108457
Context-Specific Research
and provide feedback on new policies or practices. Since these samples are distinctly specific in their selection or inclusion criteria, such samples are not typically able to be generalized beyond the demographics covered. Asking all students in the school district to participate in a survey about their experiences utilizes data from an entire, specific population but may not be generalizable outside of that one district or to other generations of students. The current section describes common sampling procedures within each type of research not intended for generalization.
Sampling in ethnography Ethnography often involves both interviews with individuals and observation of the community, culture, or group of interest. Researchers often select a purposive sample of individuals from the group being studied with which to conduct interviews or gain insider access to more indepth understanding of the phenomena being observed. Beyond this small group of informants, the researcher will likely also observe the behaviors and public actions of the group at large, providing insight into how individuals interact and what the day-to-day operations of the community or culture are like. Through this observation of and participation within a large group and selective interviews with key individuals, ethnography allows for multiple samples to be incorporated into the data and analysis.
Sampling in phenomenological studies Phenomenological studies focus on in-depth understandings of how individuals experience a given phenomenon first-hand. As a result, these studies often involve small samples of people who have unique insight into these experiences and can articulate in great detail the conscious processes that occur as a result of these experiences. Researchers will often conduct in-depth interviews or focus groups with these samples in order to get the richest information. Such efforts help the researcher’s attempts to not make their own interpretations about what the phenomenon is like, but rather to get only first-hand accounts of persons most affected or central to the issue at hand.
Sampling in action research Action research (also called participatory-action research or action-oriented research) seeks to involve a purposive sample of persons most affected by the issue being researched. In doing so, this sample provides unique insight into the issue and highlights context-specific factors that may impact attempts to address it. In selecting the people most impacted by the problem to be part of the research process, action research allows for the exploration of the problem through the lens of people most often left out of such dialogues. These voices position the researchers as insiders to information they otherwise may not have explored. As a result, purposive sampling of key stakeholders is central to the success of action research.
127
Context-Specific Research
Sampling in case studies Case studies select samples or individuals to investigate based on the specific information being explored. In this type of research, as in many types of qualitative work, the goal is not to generalize but rather to explore the unique experiences and uncover rich descriptions of a specific phenomenon. The purposive sampling in this research has the goal of finding individual(s) who explicitly experience this phenomenon and can provide data that the general population could not. Case studies rarely, if ever, attempt to generalize beyond the small sample included in the research, even to others who may have relatively similar experiences. Purposive sampling is the primary sampling strategy of case studies because the use of probability sampling from the general population may not produce participants with the lived experiences the researchers are trying to explore. When utilizing whole entities as the “case,” participants must be central to the operations of the organization or entity and should be able to speak to the experiences of being part of this establishment.
Sampling in pilot studies Pilot studies often include subsamples of the target population for the full implementation of the program/policy/practice. These purposive samples seek to provide preliminary insight into the barriers, challenges, and successes associated with implementing the new program. Collecting a purposive sample that mimics the full population of interest is important to this process as the pilot study must closely mirror the true intended full-scale design. This allows for an accurate understanding of what will contribute to or hinder the success of the program. While this sample is meant to generalize to the larger population for which full implementation is intended, it is not meant to generalize beyond that to other individuals in contexts outside of the research loci.
Evaluating Research Design with Nongeneralizable Samples Given what is now known about research with nongeneralizable samples, it is important to evaluate such studies differently than those that have the goal of generalizability. Below are the factors one should consider when evaluating studies of this type.
___ Question 1: Generalizability Intent: Is the researcher seeking to generalize the findings?11 Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I11
Comment: If the goal of the study is generalizability, such efforts should be clearly outlined in the methods and results sections of the report. In addition, generalizability is best suited to a random sample, a sampling type not commonly found in the forms of research discussed in this 11 128
Continuing with the same grading scheme as in the previous chapters, N/A stands for “Not applicable” and I/I stands for “Insufficient information to make a judgement.”
Context-Specific Research
chapter (case studies, action research, etc.). While researchers rarely make broad claims about generalizability from a single study, they may encourage the replicability of the study design, claiming feasibility in other settings. Such efforts can be found in Example 7.1.1, which is an excerpt from Engel and colleagues (2022) who encourage replication of a successful study with other training programs for law enforcement.
Example 7.1.1 – Engel et al. (2022) 12 SPEAKING TO GENERALIZABILITY
While promising, this study does not speak to the validity or efficacy of any other deescalation training program. As noted previously, de-escalation trainings vary dramatically in their content, delivery method, and dosage. It is therefore unknown if these findings are generalizable to other trainings. This is a question that can only be answered through additional research. There is still much to learn about the effects of changes in police training generally and de-escalation training specifically before we may come to solid conclusions about the value of these practices.
___ Question 2: Sample/Population Description: Has the researcher described the sample/population in sufficient detail? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: As indicated in the previous chapters, researchers should describe relevant demographics (i.e., background characteristics) of the participants when conducting studies in which they are generalizing from a sample to a population. This is also true when researchers are not attempting to generalize or have limited generalizing potential as it will provide insight into what makes this sample or entire population unique or prime for studying. Sample or population demographics and other descriptive statistics are often presented in large tables within the research reports’ design or methods section and briefly summarized within the text, as an article excerpt in Example 7.2.1 demonstrates.
Example 7.2.1 – Becker et al. (2021a) 13 IDENTIFYING PARTICIPANT CHARACTERISTICS
Participant demographics are presented in Table 1. Fifty-eight of the 61 parents/legal guardians (95%) identified as biological parents, with the remainder identifying as other 12
13
Engel, R. S., Corsaro, N., Isaza, G. T., & McManus, H. D. (2022). Assessing the impact of de‐escalation training on police behavior: Reducing police use of force in the Louisville, KY Metro Police Department. Criminology & Public Policy, 21(2), 199–233. https://doi.org/10.1111/1745-9133.12574 Becker, S. J., Helseth, S. A., Janssen, T., Kelly, L. M., Escobar, K., & Spirito, A. (2021). Parent Smart: Effects of a technology-assisted intervention for parents of adolescents in residential substance use treatment 129
Context-Specific Research
blood relatives: the term “parents” is used hereafter for simplicity. Only one within site difference was found at the short-term facility: more adolescents who self-identified as multiracial were randomized to TAU than the Parent SMART condition. In addition, several between site differences were found among the adolescents. Relative to adolescents at the short-term facility, those at the long-term facility were significantly older, had more years of education, were more likely to identify as male, reported significantly more days of any substance use, and reported significantly more substance-related problems at baseline. No other between or within site differences were found on any of the parenting or adolescent variables, and none of the variables were systematically related to attrition.
___ Question 3: Sample Size/Saturation: Is the sample size adequate for the given study goals? In qualitative research, has the author claimed to have reached saturation in their findings? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Researchers must provide a rationale for their sample size. This may include explanation of the barriers that prevented a larger sample or the methodological reasons that their sample size was most appropriate. If the goal of the study is simply to explore or describe a phenomenon, such as in a case study, a single case may be adequate. If multiple cases are included, the goal may be to gather enough information to compare and contrast the various lived experiences within the sample. In pilot studies or action research, an adequate sample may include enough stakeholders or participants to fully understand how a program or intervention can best be implemented or adapted to meet the needs of the unique group. Qualitative researchers should provide a sufficient rationale for their sample size. Ideally, interviews or focus groups should be conducted until theoretical saturation is reached and the authors should reflect this in their writing. If data collection is stopped prior to saturation being reached, the authors should provide adequate detail as to why data collection ended at the final sample size. In Example 7.3.1, the authors clarify that sampling continued until said saturation was reached.
Example 7.3.1 – Kerman et al. (2020)14 VALIDATING SAMPLE SIZE
Data were collected in two phases. In the first phase, interviews were conducted using a semi-structured guide. Interviews began by exploring participants’ experiences using
14
130
on parental monitoring and communication. Evidence-Based Practice in Child and Adolescent Mental Health, 6(4), 459–472. https://doi.org/10.1080/23794925.2021.1961644 Kerman, N., Manoni-Millar, S., Cormier, L., Cahill, T., & Sylvestre, J. (2020). “It’s not just injecting drugs”: Supervised consumption sites and the social determinants of health. Drug and Alcohol Dependence, 213, 108078. https://doi.org/10.1016/j.drugalcdep.2020.108078
Context-Specific Research
[Supervised Consumption Sites (SCS)], including aspects that contributed to positive and negative experiences. The interview then transitioned to discussion of how SCSs affected SDOH-related outcomes. On average, the interviews lasted slightly less than 30 min. All interviews were audio-recorded and conducted over a two-week period in March 2019 by the lead author. Interviews were conducted until new data mostly replicated previously discussed perceptions and experiences, with emerging narratives being identified in multiple interviews.
___ Question 4: Ethnography Participant Selection Rationale: Does the sample interviewed or observed closely align with the community/culture/group of interest? Has insider access been obtained? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: The focus of ethnography is to better understand the lived experiences within a specific community or culture. As such, it is vital to the validity and reliability of the study that researchers gain access to this space and speak with individuals who are familiar with what it means to be a member of the group. Field observations should also be done with individuals who can display behaviors that are reflective of the core processes and shared meanings of the community or organization. Researchers should clarify how these individuals were accessed and what behaviors were observed that help provide unique insight. In her book Crook County, Gonzalez Van Cleve (2016) delves deep into the methodological choices that allow her such access to courtroom actors central to the outcomes of the Cook County court system – see the excerpt in Example 7.4.1.
Example 7.4.1 – Gonzalez Van Cleve (2017)15 OBSERVING AND LEARNING FROM INSIDERS
Over the course of 7 years (1997–2004), I completed three ethnographic visits of the field site, amounting to nine months of observations. In order to incorporate both participant and observer roles, I worked as a law clerk for the Cook County State’s Attorney’s Office (six months in 1997–1998), and for the Cook County Public Defender’s Office (three months in 2004). This allowed access to front-stage and backstage environments, including offices, courtrooms, lockups, and judge’s chambers.… In addition to ethnography, I conducted supplemental interviews in private settings, outside the purview and influence of other courtroom workgroup members. These interviews acted as a follow-up to more fully interrogate the rationale for courtroom practices.
15
Gonzalez Van Cleve, N. (2016). Crook County: Racism and injustice in America’s largest criminal court. Stanford, CA: Stanford University Press. www.sup.org/books/title/?id=23968 131
Context-Specific Research
___ Question 5: Phenomenology Participant Selection Rationale: Have participants been chosen explicitly for their first-hand experiences related to the phenomenon being evaluated? Has the author identified what makes these individuals most uniquely situated to provide information aligned with the research question? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Phenomenology is the study of a specific phenomenon and how people experience it by recording and compiling the first-hand experiences of people involved. The ways in which individuals cognitively process and navigate these phenomena is the crux of this type of research, meaning that all participants must be intimately familiar with these experiences and be able to articulate what this exposure means to them. Researchers must also make it clear how the individuals chosen meet these criteria and why such insider knowledge is important for advancing academic knowledge. Examples 7.5.1 and 7.5.2 show how the researchers highlight the importance of the chosen samples and the ways in which they can provide unique insight that could not be obtained without a purposive sample.
Example 7.5.1 – White (2021)16 UNIQUE EXPOSURE TO A GLOBAL PANDEMIC
To date, no qualitative phenomenological studies have reported the experiences of nurse managers’ roles during the pandemic. Furthermore, there are no known published research studies to date on the experiences of nurse managers during the pandemic in the United States.… Three hospitals in this one large health care system were chosen for recruiting because they had admitted the majority of the patients with COVID-19 during March 2020–September 2020. The recruited participants were considered a whole sample regardless of the facility employing them. Nurse managers were assigned to oversee a unit that cared for COVID-19 patients either an ICU unit or medical surgical unit that had been converted to a COVID-19 patient unit. They reported to a director of nursing. Assistant nurse managers were assigned to each shift and reported to the unit nurse manager.
16
132
White, J. H. (2021). A phenomenological study of nurse managers’ and assistant nurse managers’ experiences during the COVID-19 pandemic in the United States. Journal of Nursing Management, 29(6), 1525–1534. https://doi.org/10.1111/jonm.13304
Context-Specific Research
Example 7.5.2 – Chan et al. (2020)17 STUDENTS AS VICTIMS AND BYSTANDERS TO BULLYING
Studies were conducted on cyberbullying with implications for school counsellors such as more counsellor education programmes informed by research findings, and development of school policies and procedures that encourage and adopt a multi-disciplinary approach involving students, teachers, parents and school administrators (Bhat, 2008; Nordahl et al., 2013). To date, there appears to be no study of secondary school counsellors’ responses and perceptions of cyberbullying in schools. This qualitative study aims to discover the “lived experiences” of 70 students (ages 13–17 years from 6 national and 1 private schools) and 18 professionally trained school counsellors (ages 27–57 years) with cyberbullying in Malaysia. In investigating this phenomenon, the main question in this study is “What is ‘cyberbullying’ to students and school counsellors?”
___ Question 6: Participant Inclusion in Action Research: In action research, has the study included the appropriate participants relevant to the program or policy? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Action research incorporates key stakeholders or those most affected by a given program, practice, policy, or phenomenon as part of the research process. Inclusion of these individuals allows for unique insight into the issues experienced by these participants that might otherwise be missed by outside researchers. Given the importance of the insight gained from this process, the researchers should be sure to include all key stakeholders or participants that would be most affected by or have the greatest impact on the implementation of said program or policy and should include this rationale in their published work (as shown in Example 7.6.1 excerpt). In doing so, the researchers are more likely to collect comprehensive and exhaustive data on the potential barriers, unique challenges, and needs of the given implementation context.
Example 7.6.1 – Phillips et al. (2010)18 IDENTIFYING PARTICIPATORY-ACTION STAKEHOLDERS
Participatory action research affirms the rights of students and their ability to have a say in the decisions that shape their lives.… Youth PAR seeks to democratize knowledge by
17
18
Chan, N. N., Ahrumugam, P., Scheithauer, H., Shultze-Krumbholtz, A., & Ooi, P. B. (2020). A hermeneutic phenomenological study of students’ and school counsellors’ “lived experiences” of cyberbullying and bullying. Computers and Education, 146, 103755. https://doi.org/10.1016/j.compedu.2019.103755 Phillips, E. N., Berg, M. J., Rodriguez, C., & Morgan, D. (2010). A case study of participatory action research in a public New England middle school: Empowerment, constraints and challenges. American Journal of Community Psychology, 46(1–2), 179–194. https://doi.org/10.1007/s10464-010-9336-7 133
Context-Specific Research
putting the tools of inquiry into the hands of those affected by issues, thereby moving them from being the object of study to the central position of researcher. By privileging youth voices and experiences in the learning process, Youth PAR attempts to reduce, if not eliminate, the hegemony of powerful stakeholders, such as adults in the case of middleschool students, oppressive knowledge and structures.… Youth PAR with teacher action research or reflexive praxis provides the opportunity to positively engage students and correspondingly offers middle school teachers the opportunity to learn first hand about research methods, test their theories of effective educational strategies and engage in empowering practices.
___ Question 7: Case Study Selection Rationale: Does the sample inform the findings? Have the authors adequately described why they selected that case? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Case studies utilize a singular case, unit, or a very small sample to explore lived experiences and real-world contexts of a given phenomenon. As a result of the nuanced nature of this exploratory work, the author(s) should provide a rationale as to why this specific case or cases are worthy of specialized consideration and/or how the study provides information not already known in the literature. Case studies utilize very specific, carefully chosen samples and the author must include justification for this selection process in order for the reader to adequately understand the importance of this research. In Example 7.7.1, DeLisi and colleagues (2021) justify the unique value that their chosen case adds to forensic literature.
Example 7.7.1 – DeLisi et al. (2021)19 SELECTING THE RIGHT CASE
A biographical richness about the most pathological offenders and granular insights into the criminal career is missing. Here, we present a forensic case report of Mr. Z, an offender whose antisocial career and criminal justice system involvement spans the late 1940s to the present and whose behavioral odyssey dovetails with significant events in correctional history in the United States in the middle to late 20th century, and who was a multiple homicide offender during his decades of confinement.
19
134
DeLisi, M., Drury, A. J., & Elbert, M. J. (2021). Psychopathy and pathological violence in a criminal career: A forensic case report. Aggression and Violent Behavior, 60, 101521. https://doi.org/10.1016/j.avb.2020.101521
Context-Specific Research
___ Question 8: Pilot Study Sample Selection: For a pilot study, has the researcher used a sample with relevant demographics? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Pilot studies are essentially trial runs of a program or policy that is ready for widespread implementation or of a larger research effort. As a result, a pilot study should utilize a sample that is similar to that of the target population for the larger roll-out. Researchers should clarify how the demographics or characteristics of this sample make them well suited for the program or policy and how such findings will provide insight into the success of future implementation. As previously mentioned, the goal of pilot studies is to improve the success of subsequent implementation and sustainability but cannot typically be generalized beyond the target populations meant to be served by this specific program or policy. Example 7.8.1 highlights how researchers may indicate that they are utilizing samples meant to reflect the larger target of full implementation.
Example 7.8.1 – Becker et al. (2021b)20 TESTING A FUTURE RESEARCH DESIGN
To qualify for the study, parents had to meet the following inclusion criteria: (1) legal guardian of a 12–17-year-old admitted to residential treatment due to problems related to SU; (2) would remain the custodial guardian of the adolescent post-discharge; (3) fluent in English or Spanish; (4) willing and able to complete a baseline assessment prior to the adolescent’s discharge; and (5) had reliable access to a phone that could receive text messages and an internet-capable device to receive the TAI. Adolescents qualified if they had a parent who met the aforementioned criteria, and if they confirmed recent SU during the baseline assessment (i.e., alcohol or other drug use in the past 90 days).… The intervention was tested in both short-and long-term residential settings to inform the design of a future trial.
Ethical Considerations in Nongeneralizable Research There are several ethical considerations that must always be weighed whenever research is being conducted. First and foremost, all research protocols must be approved by the governing 20
Becker, S. J., Helseth, S. A., Janssen, T., Kelly, L. M., Escobar, K. I., Souza, T., Wright, T., & Spirito, A. (2021). Parent SMART (Substance Misuse in Adolescents in Residential Treatment): Pilot randomized trial of a technology-assisted parenting intervention. Journal of Substance Abuse Treatment, 127, 108457. https:// doi.org/10.1016/j.jsat.2021.108457
135
Context-Specific Research
Institutional Review Board or research ethics committee, made up of research scholars who confirm the design is ethically sound. Additionally, informed consent and the opportunity for participants to ask questions about their rights and protections must always take place. In addition, researchers must always prioritize the safety and well-being of participants, physically and emotionally, when dealing with sensitive topics and asking for private insight into the participant’s challenges. When evaluating research with unique samples, it is also of particular importance to ensure that safeguards have been developed to protect participants and maintain confidentiality or anonymity. Given the relatively small and specific nature of the purposive samples utilized in nongeneralizable research, participants are at increased risk of being identified. The researchers must always take great precautions to protect their identity by either utilizing completely anonymous data or keeping participants’ identities confidential. Such efforts may include providing pseudonyms for participants in any publicly available content, keeping their names separate from any data files, and always keeping information on a password-protected server or within a locked drawer. When determining if a study has taken these proper precautions, consider the following guidelines.
___ Question 9: Research Ethics Review: Has the study been approved by a research ethics committee/Institutional Review Board (IRB)? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: All research must be approved in advance by a research ethics committee or Institutional Review Board overseeing studies at the researcher’s institution. The purpose of the IRB is to review all research protocols and planned methodologies to ensure that participants will be protected to the greatest extent possible and that all practices of the researcher meet ethical standards of the field. Approval from the IRB should be obtained prior to any data collection and mentioned in the part of the article describing the study. Research articles will often address this briefly in the methods or procedure section, making clear that all protocols have been previously approved for implementation – see the relevant article excerpts in Examples 7.9.1 and 7.9.2.
Example 7.9.1 – Spence et al. (2023) 21 OBTAINING RESEARCH APPROVAL
All methodological concerns were subject to ethical approval, which was granted for the study by the university’s psychology research ethics committee.
21
136
Spence, R., Harrison, A., Bradbury, P., Bleakley, P., Martellozzo, E., & DeMarco, J. (2023). Content moderators’ strategies for coping with the stress of moderating content online. Journal of Online Trust and Safety, 1(5), 1–18. https://doi.org/10.54501/jots.v1i5.91
Context-Specific Research
Example 7.9.2 – DeLisi et al. (2021) 22 UTILIZING COURT DATA ETHICALLY
The Chief District Judge in this federal jurisdiction provided research approval for the study and Mr. Z voluntarily signed an informed consent form granting release of anonymized information.
___ Question 10: Informed Consent: Has informed consent been obtained from study participants? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Informed consent should be obtained from all participants in research. This process allows participants to understand the goals of the study, their role in the research and what will be asked of them, and their rights as a participant. Informed consent may be obtained verbally or in writing depending on the potential risks associated with participation. Often, it is not specifically mentioned in a research report, but it is implied that any study that received approval by a research ethics committee or IRB includes informed consent of participants. Example 7.10.1 highlights how informed consent might be addressed within a research article to make it clear that all participants understood their rights and engaged with the study voluntarily.
Example 7.10.1 – Scott et al. (2021) 23 ENSURING PARTICIPANT UNDERSTANDING
[The clinic] leaders and counselors needed to provide verbal informed consent to participate. Participants were assured of their rights to confidentiality and that their participation would not affect their employment as part of informed consent.
___ Question 11: Participant Identity Protection: Have precautions been taken to protect participants’ identity (especially in case studies or research with particularly small samples)? Very unsatisfactory
22 23
1
2
3
4
5
Very satisfactory
or N/A
I/I
DeLisi, M., Drury, A. J., & Elbert, M. J. (2021). Psychopathy and pathological violence in a criminal career: A forensic case report. Aggression and Violent Behavior, 60, 101521. https://doi.org/10.1016/j.avb.2020.101521 Scott, K., Murphy, C. M., Yap, K., Moul, S., Hurley, L., & Becker, S. J. (2021). Health professional stigma as a barrier to contingency management implementation in opioid treatment programs. Translational Issues in Psychological Science, 7(2), 166–176. https://doi.org/10.1037/tps0000245 137
Context-Specific Research
Comment: The anonymity or confidentiality of participants’ identities is vital to conducting ethical research but may be particularly difficult when utilizing small, unique samples from which identities may be deduced. The researcher should use pseudonyms for participants in any public documentation and if necessary, withhold information about the location or timing of the study if such details would lead to potential identification of subjects. Examples 7.11.1. and 7.11.2 highlight some of the many ways that participants’ identities may be protected and how such protections are explained within academic writing.
Example 7.11.1 – Tosto & Bonnes (2022) 24 USING PSEUDONYMS
To ensure participant anonymity, we refer to the lawyers by pseudonyms and have removed references to identifying information such as the name of military bases and location.
Example 7.11.2 – Spence et al. (2023) 25 ADDITIONAL PARTICIPANT PROTECTIONS
Participants were invited to take part in an online interview at a time of their choosing. They were given the choice to be interviewed with cameras on or off. Specific employment-related details such as the CM’s employer were not sought; however, clarification of the participant’s role was sought at the beginning of each interview.… The interviewers did not specifically ask about non-disclosure agreements but guaranteed anonymity and confidentiality by using participant codes and not storing any personal details. All data was secured on password-protected drives in line with Data Protection and UK GDPR regulations, and the interview recordings were deleted after transcription. Visit the Instructor & Student Resources website for multiple choice questions and additional resources: www.routledge.com/cw/tcherni-buzzeo
Chapter 7 Exercises Part A Directions: Answer the following questions. 1
In your own words, describe what makes studies with limited generalizability (i.e., pilot studies) different from other forms of nongeneralizable research discussed within this chapter.
24
Tosto, S. A., & Bonnes, S. (2022). “She clearly thought that something bad had happened to her”: How military lawyers construct narratives of victim legitimacy and perceived harm in sexual assault cases. Armed Forces and Society, [online first]. https://doi.org/10.1177/0095327X221108526 Spence, R., Harrison, A., Bradbury, P., Bleakley, P., Martellozzo, E., & DeMarco, J. (2023). Content moderators’ strategies for coping with the stress of moderating content online. Journal of Online Trust and Safety, 1(5), 1–18. https://doi.org/10.54501/jots.v1i5.91
25
138
Context-Specific Research
2 3 4
5
What is purposive sampling and how does it apply to studies of limited/no generalizability? Is purposive sampling more common in (a) qualitative or (b) quantitative research? Suppose you were conducting an ethnography of an indigenous community. How might you go about gaining access and/or who within the community would you most like to interview or speak with? Explain the value of nongeneralizable research to the field of social sciences. How do these studies still contribute to academic knowledge?
Part B Directions: Locate two research reports of interest to you in academic journals, in which the researchers are not directly concerned with generalizing from a sample to a population and apply the evaluation questions in this chapter. Select the one to which you gave the highest overall rating and bring it to class for discussion. Be prepared to discuss its strengths and weaknesses.
139
CHAPTER 8
Evaluating Measures
Immediately after describing the sample or population, researchers typically describe their measurement procedures. A measure is any tool or method used for measuring a trait or characteristic. The description of measures in research reports is usually identified with the subheading Measures.1 The term measures refers to the materials, scales, and tests that are used to make the observations or obtain the measurements. Participants (or Sample) and Measures are typical subheadings under the main heading Method in a research report. Often, researchers use published measures developed by other researchers. About equally as often, researchers use measures that they devise specifically for their particular research purposes. As a general rule, researchers should provide more information about such newly developed measures than on previously published ones, which have been described in detail in other publications. How well the researchers provided such details is addressed in the evaluation questions of section (1) Details of the Measures. In social sciences, researchers measure concepts that are not easy to capture, for example, “grit” or “delinquency.” Asking the right questions and seeking information from the right sources (not only from self-reports), as well as reducing subjectivity in measurement, are the focus of the evaluation questions in (2) Efforts to Reduce Biases. In many studies, researchers use existing data to answer new research questions (secondary data research). Researchers must use items gathered during the original data collection, often intended to measure different concepts. For example, the original survey included several questions asking respondents about their level of impulsivity. Another researcher wants to use people’s answers to these and several other questions to create a measure of low self-control. To better understand some of the validity issues involved in constructing measures and judging their quality, you can use the evaluation questions in (3) Reliability and Validity of the Measures. While you would need to take several sequential courses in measurement to become an expert, the evaluation questions discussed in this chapter will help you make preliminary evaluations of researchers’ measurement procedures. 1
140
As indicated in Chapter 2, observation is one of the ways of measurement. The term measures refers to the materials, scales, and tests that are used to make the observations or obtain the measurements. Participants (or Sample) and Measures are typical subheadings under the main heading Method in a research report. DOI: 10.4324/9781003362661-8
Measures
1
DETAILS OF THE MEASURES
___ Question 1a: Examples of Questions: Have the actual items and questions (or at least a sample of them) been provided?2 Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I2
Comment: Providing sample items and questions is highly desirable because they help operationalize what was measured. Note that researchers operationalize when they specify ways of measuring the concepts of interest. These sample items may refer to questions asked in a survey or interview, assignments in an achievement test, and so on. In Example 8.1.1, the researchers provide sample items for measuring family stress.
Example 8.1.1 – Voisin et al. (2020) 3 SAMPLE SURVEY ITEMS
Family stress. Family stress was assessed by summing the following three items: the number of adults in the household who have been incarcerated (e.g. spent time in jail or prison), who experienced mental health problems (e.g. depression, anxiety, schizophrenia, and posttraumatic stress disorder), and who use controlled substances (e.g. cocaine, marijuana, and alcohol). The response options for each of the items were based on a 4-point Likert-type scale (0 = “none,” 1 = “one,” 2 = “two,” 3 = “three,” and 4 = “more than 4”). A composite score was calculated, with higher scores indicating higher levels of family stress. Note that when the actual words used in the questions are provided, consumers of research can evaluate whether the wording is appropriate and unambiguous. Keep in mind, however, that many measures are copyrighted, and their copyright holders might insist on keeping the actual items secure from public exposure. Obviously, a researcher should not be faulted for failing to provide sample questions when this is the case.
___ Question 1b: Details Provided: Are any specialized response formats, settings, and/or restrictions described in detail? Very unsatisfactory 2 3
1
2
3
4
5
Very satisfactory
or N/A
I/I
Continuing with the same grading scheme as in the previous chapters, N/A stands for “Not applicable” and I/I stands for “Insufficient information to make a judgement.” Voisin, D. R., Kim, D. H., Bassett, S. M., & Marotta, P. L. (2020). Pathways linking family stress to youth delinquency and substance use: Exploring the mediating roles of self-effi cacy and future orientation. Journal of Health Psychology, 25 (2), 139–151. https://doi.org/10.1177/1359105318763992 141
Measures
Comment: It is desirable for researchers to indicate the response format (e.g., multiple-choice responses on a Likert scale from “Strongly Agree” to “Strongly Disagree”). Examples of settings that should be mentioned are the place where the measures were used (e.g., as in the participants’ homes), whether other individuals were present (such as whether parents were present while their children were interviewed), and whether a laptop was handed to the participants for the sensitive-topic portion of the interview. Examples of restrictions that should be mentioned are time limits and tools that participants are permitted (or not permitted) to use, such as not allowing the use of calculators during a mathematics test. Qualitative researchers should also provide details on these matters. This is illustrated in Example 8.1.2, in which the location and length of the interviews are noted, examples of questions are provided, and the recording and transcription process mentioned, for a study of homelessness and emergency department use.
Example 8.1.2 – McCallum et al. (2020) 4 DESCRIPTION OF DATA COLLECTION IN A QUALITATIVE STUDY
Interviews occurred in private office space at either a local mental health facility or a downtown university. These locations were chosen due to the participants’familiarity with them. ... Interviews lasted 60 to 90 minutes each and included prompts and questions such as the following: “What is it like when you visit an ED [emergency department]?” “Tell me about a memorable time. What happened?” “Why were you there?” or “Tell me about a positive/ negative experience in the ED.” The interviews were audio recorded and professionally transcribed for content, and pseudonyms were used to protect confidentiality. Notation was also added to the transcripts to denote pauses, overlapping speech, and changes in speed, volume pitch, and emphasis. You will find more details on evaluating various aspects of qualitative research in Chapter 11.
___ Question 1c: Additional Sources: For published measures, have sources been cited where additional information can be obtained? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Researchers should provide references to sources of additional information on the published measures used in their research.
4
142
McCallum, R., Medved, M. I., Hiebert-Murphy, D., Distasio, J., Sareen, J., & Chateau, D. (2020). Fixed nodes of transience: Narratives of homelessness and emergency department use. Qualitative Health Research, 30(8), 1183–1195. https://doi.org/10.1177/1049732319862532
Measures
Some measures have been published or previously reproduced in full in journal articles. Such studies typically describe the development and statistical properties of these measures. Other measures are published by commercial publishers as separate publications (e.g., test booklets) that usually have accompanying manuals describing technical information on the measures. In Example 8.1.3, the researchers briefly describe the nature of one of the measures they used, following it with a statement that the validity and reliability of the measure have been established and providing references to the relevant publications (shown in italics).
Example 8.1.3 – LaBrie et al. (2011) 5 BRIEF DESCRIPTION OF A MEASURE IN WHICH A REFERENCE FOR MORE INFORMATION ON RELIABILITY AND VALIDITY IS PROVIDED (ITALICS ADDED FOR EMPHASIS)
Motivations for drinking alcohol were assessed using the 20-item Drinking Motives Questionnaire (DMQ-R; Cooper, 1994), encompassing the 4 subscales of Coping (a =.87), Conformity (a =.79), Enhancement (a =.92), and Social Motives (a =.94). The DMQ-R has proven to be the most rigorously tested and validated measurement of drinking motives (Maclean & Lecci, 2000; Stewart, Loughlin, & Rhyno, 2001). Respondents were prompted with, “Thinking of the time you drank in the past 30 days, how often would you say that you drank for the following reasons?” Participants rated each reason (e.g., “because it makes social gatherings more fun” and “to fit in”) on a 1 (almost never/never) to 5 (almost always/always). If a study did not include previously published measures, the most fitting answer to this evaluation question would be N/A (not applicable).
2
EFFORTS TO REDUCE BIASES
___ Question 2a: Triangulation: When appropriate, were multiple methods or sources used to collect data/information on the key variables? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: As indicated in Chapter 2, it is safe to assume that all measurement methods (e.g., testing, interviewing, and making observations) have flaws. Thus, the results of a study can be more definitive if more than one method for collecting data or more than one source of data is used for the key variables.
5
LaBrie, J. W., Kenney, S. R., Migliuri, S., & Lac, A. (2011). Sexual experience and risky alcohol consumption among incoming first-year college females. Journal of Child & Adolescent Substance Abuse, 20(1), 15–33. https://doi.org/10.1080/1067828X.2011.534344
143
Measures
In quantitative research, researchers emphasize the development of objective measures that meet statistical standards for reliability6 and validity,7 which are discussed later in this chapter. When researchers use highly developed measures, they often do not believe that it is important to use multiple measures. For instance, they might use a well-established reading comprehension test that has been extensively validated. Quantitative researchers are unlikely to supplement such highly developed measures with other measures, such as teachers’ ratings of students’ reading comprehension. Other quantitative researchers may use multiple sources of data on the same variable (for example, violence rates) to compensate for the weaknesses of each data source. Example 8.2.1 provides an illustration of such a study that uses triangulation, or multiple measures of the same variable using different sources.
Example 8.2.1 – Steffensmeier et al. (2023) 8 TRIANGULATION: USING MEASURES FROM MULTIPLE SOURCES
Recent media and academic reports project rising levels of girls’violence and a narrowing gender gap. In response, the authors investigate 21st century trends in girls’ violence as reported across multiple official and unofficial longitudinal sources: Uniform Crime Reports (UCR) arrest and juvenile court referral statistics; National Crime Victimization Survey (NCVS) victimization data; and three sources of self-reported violent offending—Monitoring the Future, Youth Risk Behavior Surveillance System, and National Survey on Drug Use and Health.
6
7
8
144
Reliability of a measure refers to how well its results are reproduced in repeated measurements, or how consistent the results are when they are measured the same way (and the characteristic being measured has not changed). For example, if we administer the Stanford–Binet IQ test again a week later, will its results be the same if there has been no change in intellectual abilities of the children (and no training has been administered in between the two measurements)? If the answer is yes, the test is reliable. Validity refers to whether the instrument measures what it is designed to measure. For example, if the Stanford–Binet IQ test is designed to measure innate intelligence while it actually measures a combination of innate intelligence and the quality of education received by the child, the test is not a valid measure of innate intelligence, even if the test is a reliable measure. Steffensmeier, D., Schwartz, J., Slepicka, J., & Zhong, H. (2023). Twenty-first century trends in girls’ violence and the gender gap: Triangulated findings from official and unofficial longitudinal sources. Journal of Interpersonal Violence, [online first]. https://doi.org/10.1177/08862605231169733
Measures
In qualitative studies, researchers are also likely to use triangulation of data sources, or multiple measures of a single phenomenon, for several reasons. First, qualitative researchers strive to conduct research that is intensive and yields highly detailed results (often in the form of themes supported by verbal descriptions – as opposed to numbers). The use of multiple measures helps qualitative researchers probe more intensively from different points of view. In addition, qualitative researchers tend to view their research as exploratory.9 When conducting exploratory research, it is difficult to know which type of measure for a particular variable is likely to be most useful; thus, it would make sense to use several ways of measuring or observing the same phenomenon if possible. Finally, qualitative researchers see the use of multiple measures as a way to check the validity of their results. In other words, if different measures of the same phenomenon yield highly consistent results, we have increased the confidence in such results. It is not realistic to expect researchers to use multiple measures for all key variables. The measurement of some variables is so straightforward that it would be a poor use of a researcher’s time to measure them in several ways. For instance, when assessing the age of students participating in a study, most of the time it is sufficient to ask them to indicate it. If this variable is more important (for example, to ensure that nobody under the age of 18 is included), the researcher may use information about the students’ birth dates collected from the Registrar’s Office of the university. But in either case, it is unnecessary to use several sources of data on the participants’ age (unless the study specifically focuses on a research question such as: Which personality characteristics are associated with lying about one’s age?). Moreover, it is not a common practice for researchers to use triangulation or even be able to use multiple sources of measurement. For most studies, the likely rating for this question is N/A (not applicable).
___ Question 2b: Accurately Measuring Sensitive Matters: When delving into sensitive matters, is there reason to believe that accurate data were obtained? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Some issues are sensitive because they deal with illegal matters such as illicit substance use and gang violence. Other topics are sensitive because of societal taboos, such as those regarding certain forms of sexual behavior. Still others may be sensitive because of idiosyncratic personal views on privacy. For example, sexual orientation and income are sensitive issues for many individuals. Participants often decline to answer these questions or may not answer honestly. Thus, participants’ self-reports may sometimes lack validity, with answers distorted by social desirability10 or other biases. The authors in Example 8.2.2 discuss one possible approach for accurately measuring sensitive topics in self-reports: the use of ‘forgiving language.’ 9 We have discussed the types of pure research (descriptive, exploratory, and explanatory) in Chapter 1. 10 Social desirability refers to the tendency of some respondents to provide answers that are considered socially desirable, i.e., make the respondent look good. 145
Measures
Example 8.2.2 – Charles & Dattalo (2018) 11 MEASURING SENSITIVE TOPICS IN SELF-REPORTS: USING FORGIVING LANGUAGE
Noting that the use of self-report by providers is inherently a problematic approach because of the high probability of measurement error due to social desirability bias, the language of the developed surveys’ items were purposefully varied.… For example, one item reads “I frequently refer to clients by diagnoses they have, not their name.” Alternatively, other items used forgiving language, an approach described above, thereby encouraging more truthful responses (Groves et al., 2009; Sudman & Bradburn, 1982), and avoiding defensiveness. For example: “In the past, I have occasionally made reference to a client using a diagnostic label they have, instead of their name.” Another common technique for encouraging honest answers to sensitive questions is to collect the responses anonymously. This approach may work well in online surveys and paper-andpencil surveys completed in large groups, but it is usually impossible to provide anonymity for some types of data collection, such as in-person or phone interviews, or direct physical observation. Anonymity is also not an option for longitudinal studies, where researchers must connect each person’s responses from one year to the next (or among several waves of data collection). The most a researcher might be able to do is ensure confidentiality. Such assurance is likely to work best if the participants already know and trust the interviewer (such as a school counselor), or if the researcher has spent enough time with the participants to develop rapport and trust. The latter is more likely to occur in qualitative research than in quantitative research because qualitative researchers often spend substantial amounts of time interacting with their participants in an effort to bond with them. Another technique of increasing the likelihood of honest answers about sensitive matters in a questionnaire, for example, when measuring the involvement in illegal activities like shoplifting or drug use, is to ask questions about innocuous activities first and then proceed to questions about sensitive topics. This is called a foot-in-the-door technique in marketing and advertising. Also, it helps to ask: How often do you think your peers do “so and so” before asking the respondents themselves whether they do “so and so.”12
11
12
146
Charles, J. L. K., & Dattalo, P. V. (2018). Minimizing social desirability bias in measuring sensitive topics: The use of forgiving language in item development. Journal of Social Service Research, 44(4), 587–599. https://doi.org/10.1080/01488376.2018.1479335 In fact, research shows that when a person is asked about the illegal activities of his or her peers (especially the type of activities about which direct knowledge is limited), the respondents often project their own behavior in assigning it to their peers. For more, see Haynie, D. L., & Osgood, D. W. (2005). Reconsidering peers and delinquency: How do peers matter? Social Forces, 84(2), 1109–1130. https://doi.org/10.1353/ sof.2006.0018
Measures
___ Question 2c: Reducing Undue Infuences: Have steps been taken to prevent the measures from influencing any overt behaviors that were observed? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: If participants know that they are being directly observed, they may temporarily change their behavior.13 Clearly, this is likely to happen in the study of highly sensitive behaviors, but it can also affect data collection on other matters. For instance, some students may show their best behavior if they come to class to find a newly installed video camera scanning the classroom (to gather research data). Other students may show off by acting up in the presence of a camera. One possible solution would be to make surreptitious observations, such as with a hidden video camera or one-way mirror. However, in most circumstances, these techniques pose serious ethical and legal problems. Another, more realistic solution is to make observational procedures a routine part of the research setting. For instance, if it is routine for a classroom to be visited frequently by outsiders (e.g., parents, school staff, and university observers), the presence of a researcher may be unlikely to affect student behavior.
___ Question 2d: Interrater Reliability: If the collection and coding of observations involves subjectivity, is there evidence of interrater (or inter-observer) reliability? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Suppose a researcher observes groups of adolescent females interacting in various public settings, such as shopping malls, in order to collect data on aggressive behavior. Identifying certain aggressive behaviors may require considerable subjectivity. If an adolescent puffs out her chest, is this a threatening behavior or merely a manifestation of a big sigh of relief? Is a scowl a sign of aggression or merely an expression of unhappiness? Answering these questions sometimes requires considerable subjectivity. An important technique for addressing this issue is to have two or more independent observers make observations of the same participants simultaneously (or watch the recordings and rate them on a scale). If the rate of agreement on the identification and classification of the behavior is reasonably high (say, 80% or more), a consumer of research will be assured that the resulting data are not idiosyncratic to one particular observer and their powers of observation and possible biases. 13
This is referred to as the Hawthorne effect, though it has been recently debunked in the reanalysis of the original study about it. Modern efforts to replicate the effect have been inconclusive.
147
Measures
The rate of agreement is referred to as the interrater reliability (intercoder or interobserver reliability). When the observations are reduced to scores for each participant, the scores based on two independent raters’ observations can be expressed as an inter-rater reliability coefficient or intraclass correlation coefficient (ICC). These coefficients can range from 0.00 to 1.00, with values of about 0.70 or higher indicating adequate inter-observer reliability.14 In Example 8.2.3, the researchers reported the rates of agreement between the coders with ICCs of .85, .95, and .98). Note that to achieve such high rates of agreement, the researchers first trained the coders extensively.
Example 8.2.3 – Eubanks et al. (2019) 15 DISCUSSION OF CODER TRAINING AND INTERRATER RELIABILITY
All patients in this sample provided informed consent to participate in the study and receive 30 weekly sessions of CBT. Patients paid a small fee for each therapy session on an income- based sliding scale in order to approximate a naturalistic treatment setting. All therapy sessions were videotaped. ... Each session was coded by a pair of coders, drawn from a pool of six graduate students in clinical psychology who were trained in the use of the measure. Coders rated sessions independently, and their scores were averaged. All coders received at least 20 hours of training with the first author of the coding manual, and also engaged in practice coding of therapy sessions not included in this study. Coders were blind to study hypotheses and termination status (drop or completer). ... 3RS [Rupture Resolution Rating System] Interrater reliability. Interrater reliability on the 3RS was generally high: for the overall frequency of withdrawal markers reliability was ICC (1, 2) = .85; for confrontation markers ICC (1, 2) = .98; for resolution strategies ICC (1, 2) = .95.
3
RELIABILITY AND VALIDITY OF THE MEASURES
___ Question 3a: Internal Consistency: If a measure is designed to measure a single unitary trait, does it have adequate internal consistency? Very unsatisfactory 14
15
148
1
2
3
4
5
Very satisfactory
or N/A
I/I
Mathematically, these coefficients are the same as correlation coefficients, which are covered in all standard introductory statistics courses. Correlation coefficients can range from –1.00 to 1.00, with a value of 0.00 indicating no relationship. In practice, however, negatives are not found in reliability studies. Values near 1.00 indicate a high rate of agreement. Eubanks, C. F., Lubitz, J., Muran, J. C., & Safran, J. D. (2019). Rupture resolution rating system (3RS): Development and validation. Psychotherapy Research, 29(3), 306–319. https://doi.org/10.1080/10503307.2018.1552034
Measures
Comment: A test of computational skills in mathematics at the primary grade level measures a relatively homogeneous trait. However, a mathematics battery that measures verbal problemsolving and mathematical reasoning, in addition to computational skills, measures a more heterogeneous trait. Likewise, a self-report questionnaire on depression measures a much more homogeneous trait than a questionnaire on overall mental health. For measures designed to measure homogeneous traits, it is important to ask whether they are internally consistent (i.e., to what extent do the items or questions within the measure yield results that are consistent with each other?). Although it is beyond the scope of this book to explain how and why it works, a statistic known as Cronbach’s alpha (whose symbol is α) provides a statistical measure of internal consistency.16 As a special type of correlation coefficient, it ranges from 0.00 to 1.00, with values of approximately 0.70 or above indicating acceptable internal consistency and values above 0.90, indicating excellent consistency. Values below 0.70 suggest that more than one trait is being measured by the measure, which is undesirable when a researcher wants to measure only one homogeneous trait. In Example 8.3.1, the value of Cronbach’s alpha is above the cutoff point of 0.70.
Example 8.3.1 – Zimmerman et al. (2015) 17 STATEMENT REGARDING INTERNAL CONSISTENCY USING CRONBACH’S ALPHA
We employed the widely used Grasmick et al. (1993) scale to measure self-control attitudinally. Respondents answered 24 questions addressing the six characteristics of self-control (i.e. impulsive, risk seeking, physical, present oriented, self-centered, and simple minded). Response categories were adjusted so that higher values represent higher levels of selfcontrol. The items were averaged and then standardized. Consistent with the behavioral measure of self-control, sample respondents reported a slightly higher than average level of attitudinal self-control (3.3 on the unstandardized scale ranging from 1.3 to 4.6). The scale exhibits good internal reliability (α =.82). Internal consistency (also called internal reliability) is usually regarded as an issue only when a measure is designed to measure a single homogeneous trait and yields numerical scores (as opposed to qualitative measures used to identify patterns described in words). If a measure does not meet these two criteria, “not applicable” is an appropriate answer to this evaluation question.
16 17
Split-half reliability also measures internal consistency, but Cronbach’s alpha is widely considered a superior measure. Hence, split-half reliability is seldom reported. Zimmerman, G. M., Botchkovar, E. V., Antonaccio, O., & Hughes, L. A. (2015). Low self-control in “bad” neighborhoods: Assessing the role of context on the relationship between self-control and crime. Justice Quarterly, 32(1), 56–84. https://doi.org/10.1080/07418825.2012.737472
149
Measures
___ Question 3b: Temporal Stability: For stable traits, is there evidence of temporal stability? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Suppose a researcher wants to measure aptitude (i.e., potential) for learning algebra. Such an aptitude is widely regarded as being stable. In other words, it is unlikely to fluctuate much from one week to another. Hence, a test of such an aptitude should yield results that are stable across at least short periods of time. For instance, if a student’s score on such a test administered this week indicates that he or she has very little aptitude for learning algebra, the test should yield a similar assessment if administered to the same student next week. Likewise, in the area of personality measurement, most measures should yield results that have temporal stability (i.e., are stable over time). For instance, a researcher would expect that a student who scored very high on a measure of self-control one week would also score high the following week because self-control is unlikely to fluctuate much over short periods of time. The most straightforward approach to assessing temporal stability (or stability of the measurements over time) is to administer a measure to a group of participants twice at different points in time, typically within a couple of weeks between administrations. The two sets of scores can be correlated, and if a correlation coefficient (whose symbol is r) of about 0.70 or more (on a scale from 0.00 to 1.00) is obtained, there is evidence of temporal stability. This type of reliability is commonly known as test–retest reliability. It is usually examined only for tests or scales that yield scores (as opposed to open-ended interviews that yield meanings and ideas derived from responses). In Example 8.3.2, researchers describe how they established the test–retest reliability of a measure. Note that they report values above the suggested cutoff point of 0.70 for middle-aged adults and the less optimal range of r values for older adults. The authors also use the symbol r when discussing their results.
Example 8.3.2 – Iwasa & Yoshida (2018) 18 STATEMENT REGARDING TEMPORAL STABILITY (TEST-RETEST RELIABILITY) ESTABLISHED BY THE RESEARCHERS
To conduct another survey for test–retest reliability purposes, the company again emailed those who participated in survey 1 with an invitation to and link for the web survey two weeks after the Survey 1 (Survey 2). All told, 794 participants responded to the second round of the survey (re-response proportion: 90.0%).… 18
150
Iwasa, H., & Yoshida, Y. (2018). Psychometric evaluation of the Japanese Version of Ten Item Personality Inventory (TIPI-J) among middle-aged and elderly adults: Concurrent validity, internal consistency and testretest reliability. Cogent Psychology, 5(1), 1426256. https://doi.org/10.1080/23311908.2018.1426256
Measures
The correlation coefficients between TIPI-J [Ten-Item Personality Inventory, Japanese version] scores at the two time points were 0.74–0.84 (middle-aged individuals) and 0.67– 0.79 (older individuals).… These results are consistent with previous studies: Oshio et al. (2012) reported test–retest reliability of the TIPI-J among undergraduates as ranging from r = 0.64 (Conscientiousness) to r = 0.86 (Extraversion), and Gosling et al. (2003) reported values ranging from 0.62 to 0.77. As a whole, these findings indicate the almost acceptable reliability of the TIPI-J.
___ Question 3c: Content Validity: When appropriate, is there evidence of content validity? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: An important issue in the evaluation of achievement tests is the extent to which the contents of the tests are suitable in light of the research purpose. For instance, if a researcher has used an achievement test to study the extent to which second-graders in a school district have achieved the expected skills at this grade level, a consumer of the research will want to know whether the contents of the test are aligned with (or match) the contents of the secondgrade curriculum. While content validity is most closely associated with the measurement of achievement, it is also sometimes used as a construct for evaluating other types of measures. Content validity means that the measure actually measures what it is supposed to measure. For instance, in Example 8.3.3, the researchers had the contents of the measure of depression evaluated by experts.
Example 8.3.3 – Li et al. (2011) 19 A MEASURE SUBJECTED TO CONTENT VALIDATION BY EXPERTS
To test content validity, the C-PDSS [Chinese Version of the Postpartum Depression Screening Scale] was submitted to a panel consisting of six experts from different fields, including a psychology professor, a clinician from a psychiatric clinic, a senior nurse in psychiatric and mental health nursing, a university professor in obstetric nursing, and two obstetricians from two regional public hospitals. The rating of each item was based on two criteria: (a) the applicability of the content (applicability of expression and content to the local culture and the research object) and (b) the clarity of phrasing.
19
Li, L., Liu, F., Zhang, H., Wang, L., & Chen, X. (2011). Chinese version of the Postpartum Depression Screening Scale: Translation and validation. Nursing Research, 60(4), 231–239. https://doi.org/10.1097/ NNR.0b013e3182227a72
151
Measures
___ Question 3d: Empirical Validity: When appropriate, is there evidence of empirical validity? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Empirical validity refers to the validity established by collecting data using a measure in order to determine the extent to which the data make sense. Contrast it with face validity, which is a subjective assessment of whether the measure seems like it measures what it is supposed to measure, based on one’s understanding of the underlying concept and logic (or common sense). For instance, a depression scale might have face validity if a consumer of research decides that it seems to measure depression well. To be empirically validated, the depression scale needs to be administered to an institutionalized, clinically depressed group of adult patients as well as to a random sample of adult patients visiting family physicians for annual checkups. A researcher would expect the scores of the two groups to differ substantially in the predicted direction (i.e., the institutionalized sample should have higher depression scores). If the scores did not differ significantly between the two groups, the scale would have low empirical validity (and thus, questionable value). Empirical validity comes in many forms, but its full exploration is beyond the scope of this study. Several key terms that suggest that empirical validity has been explored are predictive validity, concurrent validity, criterion-related validity, convergent validity, discriminant validity, construct validity, and factor analysis. When researchers describe empirical validity, they usually do so in a way that is fairly comprehensible to individuals with limited training in tests and measurements. In Example 8.3.4, researchers briefly describe the empirical validity of the measure used in their research.
Example 8.3.4 – Diotaiuti et al. (2021) 20 STATEMENT REGARDING EMPIRICAL VALIDITY OF A MEASURE
This first Italian validation study of the Temporal Focus Scale (TFS) has shown a reliable measurement to assess the tendency of individuals to characteristically think about different periods of their lives.… Convergent validity assessment confirmed predictive indications with variables such as life satisfaction, optimistic/pessimistic orientation, perceived general self-efficacy, self-regulatory modes, anxiety, depression.
20
152
Diotaiuti, P., Valente, G., & Mancone, S. (2021). Validation study of the Italian version of Temporal Focus Scale: Psychometric properties and convergent validity. BMC Psychology, 9, 19. https://doi.org/10.1186/ s40359-020-00510-5
Measures
Often, the information on validity is exceptionally brief. Note that it is traditional for researchers to address empirical validity only for measures that yield scores, as opposed to qualitative measures such as semi-structured, open-ended interviews.
___ Question 4: Limitations Discussed: Do the researchers discuss the obvious limitations of their measures? Very 1 unsatisfactory
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: By discussing the limitations of their measures, researchers help consumers of research understand the extent to which the data that shaped the results can be trusted. In Evaluation Question 2b above, we discussed how the limitations of using self-reports may affect research results. In Example 8.4.1 that follows, researchers discuss other possible limitations.
Example 8.4.1 – Williams & Chapman (2011) 21 STATEMENT ACKNOWLEDGING A WEAKNESS IN MEASURES
With regard to measurement, it should be noted that the history of victimization measure was limited by a one-year historical time frame. This time frame might have excluded 21
Williams, K. A., & Chapman, M. V. (2011). Comparing health and mental health needs, service use, and barriers to services among sexual minority youths and their peers. Health & Social Work, 36(3), 197–206. https:// doi.org/10.1093/hsw/36.3.197
153
Measures
youths who were still experiencing the traumatic effects of victimizing events that occurred over a year before their completion of the survey. The victimization measure was also limited in that it did not include a measure of sexual victimization for male youths. To answer this evaluation question, it is important to look for the researchers’ own discussion of the limitations of their measures (usually in the Discussion section of the article).
___ Question 5: Overall: Are the measures adequate? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: The amount of information about the measures used in research that is reported in academic journals is often quite limited. The provision of references to obtain additional information helps overcome this problem. If a researcher provides too little information for a consumer of research to make an informed judgment about the measures used in the study and/or does not provide references where additional information can be obtained, the consumer should give it a low rating on this evaluation question or respond that there is insufficient information (I/I). Even if enough information or additional references about the measures are provided, rate this evaluation question, taking into account your answers to the previous questions in this chapter, as well as any additional considerations you may have about the measures. Visit the Instructor & Student Resources website for multiple choice questions and additional resources: www.routledge.com/cw/tcherni-buzzeo
Chapter 8 Exercises Part A Directions: Answer the following questions. 1
2
3 4
Name two or three issues that some participants might regard as sensitive and, hence, are difficult to measure. Answer this question with examples that are not mentioned in this chapter. (See the discussion of Evaluation Question 2b.) Have you ever changed your behavior because you knew (or thought) you were being observed? If yes, briefly describe how or why you were being observed and what behavior(s) you changed. (See Evaluation Question 2c and online resources for this chapter.) According to this chapter, what is a reasonably high rate of agreement when two or more independent observers classify behavior (i.e., of interrater reliability)? For which of the following would it be more important to consider internal consistency using Cronbach’s alpha? Explain your answer. A B
154
For a single test of mathematics ability for first graders that yields a single score. For a single test of reading and mathematics abilities for first graders that yields a single score.
Measures
5 6
Suppose a researcher obtained a test–retest reliability coefficient of 0.86. According to this chapter, does this indicate adequate temporal stability? Explain. Which type of validity is mentioned in this chapter as being an especially important issue in the evaluation of achievement tests?
Part B Directions: Locate two research reports of interest to you in academic journals. Evaluate the descriptions of the measures in light of the evaluation questions in this chapter, taking into account any other considerations and concerns you may have. Select the one to which you gave the highest overall rating, and bring it to class for discussion. Be prepared to discuss both its strengths and weaknesses.
155
CHAPTER 9
Evaluating Experimental Procedures
An experiment is a study in which treatments are given in order to determine their effects. For instance, one group of students might be trained to use conflict-resolution techniques (the experimental group), while the control group was not given any training (often called treatment as usual). Then, the students in both groups were observed on the playground to determine whether the experimental group used more conflict-resolution techniques than the control group. The treatments (i.e., training versus no training) constitute what are known as the independent variables.1 The resulting behavior on the playground constitutes the dependent variable.2 Any study in which even a single treatment was given to only a single participant was an experiment as long as the purpose of the study was to determine the effects of the treatment on another variable (some sort of outcome). A study that does not meet this minimal condition is not an experiment. Thus, for instance, a survey in which questions are asked but no treatment is given is not an experiment and should not be referred to as such. The following evaluation questions covered the basic guidelines for evaluating experiments (as opposed to quasi-experiments which we will discuss later in the chapter).
1 COMPOSITION OF THE EXPERIMENTAL AND CONTROL GROUPS This is the first and most important set of considerations for any experiment: How similar are the groups? One certain way to ensure group similarity is random assignment to the groups.
___ Question 1a: Random Assignment: If two or more groups were compared, were the participants assigned at random to the groups? 3
Very unsatisfactory 1 2 3
156
1
2
3
4
5
Very satisfactory
or N/A
I/I3
These are sometimes called the stimuli or input variables. In economics, these variables are called exogenous. These are sometimes called the output (or outcome) or response variable. In economics, these variables are called endogenous. Continuing with the same grading scheme as in the previous chapters, N/A stands for “Not applicable” and I/I stands for “Insufficient information to make a judgment.” DOI: 10.4324/9781003362661-9
Experimental Procedures
Comment: Assigning participants at random to groups guarantees that there is no bias in the assignment, so the groups are similar on average4 or “reasonably well balanced on all baseline variables, whether measured, unmeasured, or unknown” (Andrade, 2022).5 For instance, random assignment to two groups in the experiment on conflict-resolution training (mentioned above) would provide assurance that there is no bias, such as systematically assigning less aggressive children to the experimental group. Random assignment is a key feature of a true experiment, also known as a randomized controlled trial (RCT). Note that it is not safe to assume the assignment was random unless a researcher explicitly states that it was.6 Example 9.1.1 illustrates how this can be stated in reports of three different experiments.
Example 9.1.1 THREE EXPERIMENTS WITH RANDOM ASSIGNMENT
Experiment 1 (based on Avery et al., 20227): 508 students at the University of Oxford were randomly assigned to two conditions of a three-week experiment: (1) wearing a Fitbit wrist device to track their sleep and physical activity and getting paid for meeting their bedtime and sleep duration targets that they set for themselves (experimental group) or (2) wearing a Fitbit wrist device to track their sleep and physical activity with no additional interventions (control group). Experiment 2 (roughly based on Rogers et al., 20198): 118 children 14–24 months of age with autism spectrum disorders were randomly assigned to two groups: three months of weekly parent coaching sessions in intensive play routines (experimental group) and providing reading and video material to parents with no coaching sessions involved (control group). Experiment 3 (based on Atkin-Plunk, 20239): 175 people returning from incarceration in Palm Beach County, FL, were assigned to either a transitional employment program
4
5 6 7 8
9
Similar on average means that there is no direct correspondence of each participant from Group A to another similar participant from Group B. But for any important characteristics that can affect outcomes, the average of Group A on this characteristic is similar to the average of Group B. Andrade, C. (2022). Intent-to-treat (ITT) vs completer or per-protocol analysis in randomized controlled trials. Indian Journal of Psychological Medicine, 44(4), 416–418. https://doi.org/10.1177/02537176221101996 Since true experiments (the ones with random assignment) are the strongest research design to establish a cause-and-effect relationship, researchers would never fail to mention this crucial feature of their study. Avery, M., Giuntella, O., & Jiao, P. (2022). Why don’t we sleep enough? A field experiment among college students. The Review of Economics and Statistics. https://doi.org/10.1162/rest_a_01242 Rogers, S. J., Estes, A., Lord, C., Munson, J., Rocha, M., Winter, J., ... & Talbott, M. (2019). A multisite randomized controlled two-phase trial of the Early Start Denver Model compared to treatment as usual. Journal of the American Academy of Child & Adolescent Psychiatry, 58(9), 853–865. https://doi.org/10.1016/j.jaac.2019.01.004 Atkin-Plunk, C. A. (2023). Examining the effects of a transitional employment program for formerly incarcerated people on employment and recidivism: A randomized controlled trial during COVID-19. Journal of Experimental Criminology, [online first]. https://doi.org/10.1007/s11292-023-09578-6 157
Experimental Procedures
(TEP) helping with job allocation and training (experimental group, N = 90) or a waitlist for TEP (control group, N = 85). Note that assigning individuals to treatments at random is vastly superior to assigning previously existing groups to treatments at random. For instance, in educational research, it is not uncommon to assign one class to an experimental treatment and have another class serve as the control group. As students are not ordinarily randomly assigned to classes, there may be systematic differences between the students in the two classes. For instance, one class may have more highly motivated students or more parental involvement. Thus, a consumer of research should not answer “yes” to this evaluation question unless individuals (or a large number of aggregate units) were assigned at random. What are the aggregate units? This term may refer to police beats, neighborhoods, or schools. Some (rare) studies in which a large number of such aggregate units are randomly assigned to treatment and control conditions can be found in criminal justice research. Example 9.1.2 describes a random assignment of so-called crime “hot spots” (areas with a high frequency of violent crime) to different police intervention strategies in a field experiment (the one taking place in the natural environment rather than in a lab).
Example 9.1.2 – Taylor et al. (2011)10 A TRUE EXPERIMENT: A LARGE NUMBER OF AGGREGATE UNITS IS RANDOMLY ASSIGNED TO THE EXPERIMENTAL AND CONTROL GROUPS
Jacksonville is the largest city in Florida.… Like many large cities, Jacksonville has a violent crime problem. The number of violent crimes in Jacksonville has gone up from 2003 to 2008.… For this project, JSO [Jacksonville Sheriff’s Office] experimented with a more geographically focused approach to violence reduction that involved concentrating patrol and problem-solving efforts on well-defined “micro” hot spots of violence. [W]e took 83 violent hot spots and randomly assigned them to one of three conditions: 40 control hot spots, 21 saturation/directed patrol hot spots (we use this hybrid term to capture the fact that officers were directed to specific hot spots and that their extended presence at these small locations, which typically lasted for several hours at a time, amounted to a saturation of the areas), or 22 problem-oriented policing (POP) hot spots. Each of these three conditions was maintained for a 90-day period.
10
158
Taylor, B., Koper, C. S., & Woods, D. J. (2011). A randomized controlled trial of different policing strategies at hot spots of violent crime. Journal of Experimental Criminology, 7(2), 149–181. https://doi.org/10.1007/ s11292-022-09541-x
Experimental Procedures
Again, if the answer to this evaluation question is “yes,” the experiment being evaluated is a true experiment. Note that this term does not imply that the experiment is perfect in every respect. Instead, it indicates only that participants were assigned randomly to comparison groups to make the groups approximately similar. There are other important features to consider, including the size of the groups, which will be discussed next. (Also, the term true distinguishes this design from quasi-experiments that we will discuss later in the chapter.)
___ Question 1b: Random Sampling or Random Assignment? Has the researcher distinguished between random selection and random assignment? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: The desirability of using random selection to obtain samples from which researchers can confidently generalize to larger populations is discussed in Chapter 6. Such selection is highly desirable in most studies, whether they are experiments or not. Random assignment, on the other hand, refers to the process of assigning participants to the treatment and control conditions. Note that in any given experiment, selection may or may not be random (most often, convenience samples of volunteers are used). Similarly, group assignment may or may not be random.
159
Experimental Procedures
Figure 9.1 An ideal combination of random selection (sampling) and random assignment (experiment)
Figure 9.1 illustrates an ideal situation, where first, there is random selection from a population of interest to obtain a sample. This is followed by random assignment to treatment conditions. When discussing the generalizability of the results of an experiment, the researcher should consider the sampling used. In other words, a properly selected sample (ideally, one selected at random) allows for more confidence in generalizing the results to a population.11 However, when discussing the comparability of the two groups, researchers should consider the type of assignment used. Random assignment of participants to the groups increases researchers’ confidence that the two groups were initially equal, permitting a valid comparison of the outcomes of the treatment and control conditions and increasing the internal validity of the experiment, or ability to isolate the causal relationship (discussed in more detail later in the chapter).
2
SIMILARITY OF TREATMENT AND CONTROL CONDITIONS FROM THE PARTICIPANTS’ PERSPECTIVE
The questions and guidelines below about equalizing the conditions of treatment among the groups apply both to true experiments (with random assignment to groups) and quasiexperiments (without random assignment to groups). Putting yourself into the shoes of a participant in the experiment you are evaluating would help you answer the evaluation questions below. 11
160
Recall the discussion about the Stanford Prison Experiment in Chapter 6 – it could be that its results are not generalizable due to the way the sample was selected (asking for volunteers for a “psychological study of prison life”), even though random assignment to “guards” and “prisoners” was used in the experiment. See more details about this in Appendix A1 online.
Experimental Procedures
___ Question 2a: Similarity of Conditions: Except for differences in the treatments, were all other conditions the same in the experimental and control groups? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: The results of an experiment can be influenced by many variables other than the independent variable (treatment) and composition of the groups. For instance, if the experimental and control groups are treated at different times of the day or in different rooms in a building (where one room is noisy and the other is not), these factors might influence the outcome of an experiment. Researchers refer to such variables as confounding variables12 because they confound the interpretation of the results. One especially striking illustration of such confounding comes from experiments testing the effects of surgery for a specific health condition. Example 9.2.1 below shows an experiment answering the following question: Is surgery the best treatment for osteoarthritis of the knee? If the patients are randomly assigned to either undergo surgery or complete a round of physical therapy, the results would be confounded by the patients’ knowledge of which treatment they have received. Thus, it would be difficult to say whether it was the surgery or the knowledge that one had the surgery that made them feel better. To remove this confounding variable, the researchers went to an extreme extent in equalizing the experimental and control group conditions: patients were randomly assigned to either a real or placebo surgery (sometimes also called sham surgeries).
Example 9.2.1 A TRUE EXPERIMENT USING PLACEBO SURGERY IN A DOUBLE-BLIND PROCEDURE
Moseley et al. (2002)13—Many surgeries are conducted without clear evidence as to whether the results of the procedure lead to a better quality of life for the patients compared to a non-surgical alternative. The team of researchers led by Dr. Moseley conducted a true
12
13
In quasi-experimental designs, it is even harder to rule out confounders than in true experiments. For example, consider a study where a group of adults who experienced abuse or neglect as children has been matched ex post facto (after the fact) with a control group of adults of the same age, gender, race, and socioeconomic status who grew up in the same neighborhoods as the child maltreatment survivors. Then the researchers compare the two groups in terms of such outcomes as engaging in violence as adults. Let’s say the study has found that the control group of adults has far fewer arrests for violence than maltreatment survivors. How can we be sure that this difference in outcomes is a result of child maltreatment experiences? It is very likely that other important variables confound the intergenerational transmission of violence, for example, low socioeconomic status or genetics. Moseley, J. B., O’Malley, K., Petersen, N. J., Menke, T. J., Brody, B. A., Kuykendall, D. H., … & Wray, N. P. (2002). A controlled trial of arthroscopic surgery for osteoarthritis of the knee. New England Journal of Medicine, 347(2), 81–88. https://doi.org/10.1056/NEJMoa013259
161
Experimental Procedures
experiment to determine whether the effects of arthroscopy of the knee for patients with osteoarthritis lower the experiences of pain in the patients and lead to their better ability and walk and climb stairs. 180 patients were randomly assigned to a real surgery or a placebo surgery (when the knee incision was made under anesthesia but no further surgical procedures were conducted). Neither the patients themselves nor the nurses assessing the results of the surgery (over a period of two years after the procedure) knew which patient was assigned to which group. In addition to a questionnaire based on patients’ subjective reports of knee pain, the nurses used objective measures of patients’ walking speed and ability to climb stairs. There were no differences between the real surgery group and placebo surgery group in any of the outcome measures.
Thus, in the ground-breaking study by Moseley and his colleagues (2002) described in Example 9.2.1 above, each participating patient did not know whether they had undergone real or simulated surgery. Admittedly, this was a much more involved experiment than randomizing patients into taking a drug or a placebo pill. However, it dramatically reduced an important confounding difference between the groups by essentially equalizing the subjective experiences of participants in the experimental and control groups.
___ Question 2b: Demand Characteristics: When appropriate, have the researchers considered the possibility of biasing participants toward what is expected of them? Very unsatisfactory 162
1
2
3
4
5
Very satisfactory
or N/A
I/I
Experimental Procedures
Comment: If participants have no knowledge of whether they are in the experimental or control group (as in the sham surgery experiments described above), such experiments are called blind or blinded (double-blind if the evaluators of the outcomes also do not know the participants’ group assignments). However, it is not always possible to conduct blind experiments. If participants know (or suspect) the purpose of the experiment, this knowledge may influence their responses. For instance, in a study on the effects of a film showing the negative consequences of drinking alcohol, the experimental-group participants might report more negative attitudes toward alcohol only because they suspect this is what the researcher wants. Participants try to give the researchers what they think the researchers expect. This is known as a demand characteristic. It is named this way because the phenomenon operates as though researchers are ‘subtly demanding’ a certain outcome. Certain types of measures (more subjective ones) are more prone to the effects of demand characteristics than others. We discuss the issue of objectivity in measuring outcomes in Evaluation Question 4b.
3
SUFFICIENT NUMBERS OF PARTICIPANTS
The questions and guidelines below about the number of participants in an experiment apply both to true experiments (with random assignment to groups) and quasi-experiments (without random assignment to groups).
___ Question 3a: Large Group Size: If two or more groups were compared, were there enough participants (or aggregate units) per group? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Remember, in Chapter 6, a common rule of thumb was mentioned for studies that involve comparisons among subgroups: each group or subgroup should have at least 30 individuals or aggregate units (if the groups are fairly homogenous, or have similar participants). The same rule applies to the groups in an experiment. Consider Example 9.3.1, which illustrates this rule.
Example 9.3.1 PLANNING AN EXPERIMENT TO TEST INTERVENTIONS FOR DEPRESSION
The researchers aimed to test two different interventions for patients experiencing depressive symptoms at a subclinical level (not reaching the diagnosis of clinical depression): a new antidepressant drug (Experimental Group 1), ■ exercise therapy (Experimental Group 2), and ■ waitlisted for one of the interventions (Control Group). ■
163
Experimental Procedures
Ideally, patients should be randomly assigned to one of three groups. The researchers wanted to see which group experienced a reduced incidence of depression following the treatments, and whether the difference in outcomes between the experimental and control groups was statistically significant. To achieve this, each group ideally needs to have 30 or more patients.14 If we apply this rule of the minimum group size to aggregate units in the field experiment described in Example 9.1.2 earlier in the chapter, we need 30+ hot spots for each of the three conditions (saturation/direct patrol, problem-oriented policing, and control group). Unfortunately, the experiment had 21, 22, and 40 hot spots, correspondingly, so it would not get the highest rating on this evaluation question, but it comes fairly close to the required 30+ units per group. Obviously, we have to be realistic when assessing randomized field experiments: it is much harder to get 30+ aggregate units per group than 30+ individuals per group.
___ Question 3b: Attrition: Has the researcher considered participant dropout? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: The phenomenon of individuals dropping out of a study is referred to as attrition (sometimes called experimental mortality). In Chapter 6, we have already mentioned this regarding longitudinal studies. Attrition can be particularly problematic in experiments. The longer and more complex the experimental treatment, the more likely it is that some participants would drop out. This can affect the generalizability of the results because they would apply only to the types of individuals who continued their participation. For researchers conducting experiments, differential attrition can be an important source of confounding (also referred to as attrition bias). Differential attrition refers to the possibility that those who drop out of an experimental condition are systematically different from those who drop out of a control condition. For instance, in an experiment on a weight-loss program, those in the experimental group who were discouraged by failing to lose weight may have dropped out. Thus, those who remained in the experimental condition were more successful in losing weight, leading to an overestimation of the beneficial effects of the weight-loss program. Researchers usually cannot physically prevent attrition (according to modern research ethics standards, participants should be free to withdraw from the study, and it should be clearly communicated to them in the informed consent form). However, researchers can compare the characteristics of those who dropped out with those who remained in the study in an effort to determine whether these two groups are similar in important ways. Example 9.3.2 shows a portion of a statement dealing with this matter.
14
164
This would give the researchers a good chance of detecting a moderate effect size with a power (the probability of detecting a statistically significant effect, given that the effect exists) of 0.80 and an alpha level (the probability of making a Type I error) of 0.05.
Experimental Procedures
Example 9.3.2 – Moore et al. (2006)15 DESCRIPTION OF AN ATTRITION ANALYSIS
The participant attrition rate in this study raised the concern that the participants successfully completing the procedure were different from those who did not in some important way that would render the results less generalizable. Thus, an attrition analysis was undertaken to determine which, if any, of a variety of participant variables could account for participant attrition. Participant variables analyzed included ages of the participants and the parents, birth order and weight, socioeconomic status, duration of prenatal health care, prenatal risk factor exposure, hours spent weekly in day care, parental ratings of quality of infant’s previous night’s sleep, and time elapsed since last feeding and diaper change. This analysis revealed two effects: On average, participants who completed the procedure had been fed more recently than those who did not complete the procedure …, and those who completed the procedure were slightly younger (153.5 days) than those who did not (156 days). An alternative approach to dealing with differential attrition, which is becoming a modern standard for analyzing the outcomes of experiments, is an intent-to-treat (ITT) analysis when treatment dropouts are included in the calculations along with the participants who have completed the treatment. This approach is very conservative because it is less likely to find statistically significant treatment effects (since the dropouts are less likely to exhibit positive treatment outcomes). However, if a treatment is found to have a statistically significant impact using ITT analysis, we can be much more confident in the actual effectiveness of the treatment.
4
HOW OUTCOMES OF THE EXPERIMENT WERE EVALUATED
Again, the next set of questions applies to both true and quasi-experiments.
___ Question 4a: Blind Evaluation of Outcomes: Were the effects of treatment evaluated by individuals who were not aware of the group assignment status? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: If a researcher is also the individual administering the treatment (or one of the treatment providers), it is very important that the outcomes are assessed by someone else – importantly, by a person who is not aware of the treatment group assignment. Even if the effects of
15
Moore, D. S., & Cocas, L. A. (2006). Perception precedes computation: Can familiarity preferences explain apparent calculation by human babies? Developmental Psychology, 42(4), 666–678. https://doi. org/10.1037/0012-1649.42.4.666
165
Experimental Procedures
the treatment or intervention are evaluated using fairly objective procedures and tests, the assessor’s knowledge of the group assignment status can inadvertently impact the assessments and bias the results. It is considered the gold standard of experimentation to use what is called a double-blind procedure: (1) the participants are not aware of whether they are in a treatment or a control group, and (2) the individuals assessing the outcomes are not aware of the participants’ group assignment. For example, in the placebo surgery experiment described earlier, neither the nurses assessing the outcomes of surgery nor the patients themselves were aware of the type of surgery they had undergone, as illustrated in Example 9.4.1.
Example 9.4.1 – Moseley et al. (2002) 16 ASSESSMENT OF THE RESULTS OF EXPERIMENT USING DOUBLE-BLIND PROCEDURES
Study personnel who were unaware of the treatment group assignments performed all postoperative outcome assessments; the operating surgeon did not participate in any way. Data on end points were collected 2 weeks, 6 weeks, 3 months, 6 months, 12 months, 18 months, and 24 months after the procedure. To assess whether patients remained unaware of their treatment group assignment, they were asked at each follow-up visit to guess which procedure they had undergone. Patients in the placebo group were no more likely than patients in the other two groups to guess that they had undergone a placebo procedure.
___ Question 4b: Objective Measures of Outcomes: Were the effects of treatment evaluated using objective measures not amenable to the perceptual effects? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: When assessing changes in patients’ knee function after the placebo surgery experiment, the nurses used both subjective measures of pain (something like: “On a scale of 1 to 10 …”) and objective measures like the number of seconds it takes a patient to climb a flight of stairs. Which outcome measure do you think provides an unbiased picture of reality? As you can imagine, subjective measures, such as self-reports, are especially sensitive to biases such as social desirability and demand characteristics (discussed in Evaluation Question 2b). When interpreting the results obtained from self-reports, researchers should consider whether biases are likely to be in play. One way to overcome this difficulty is to supplement self-report measures with other measures if available, such as reports by friends or significant others.
16
166
Moseley, J. B., O’Malley, K., Petersen, N. J., Menke, T. J., Brody, B. A., Kuykendall, D. H., … & Wray, N. P. (2002). A controlled trial of arthroscopic surgery for osteoarthritis of the knee. New England Journal of Medicine, 347(2), 81–88. https://doi.org/10.1056/NEJMoa013259
Experimental Procedures
It is important for consumers of research to consider how sensitive the outcome measures used in a study may be to subjective biases. For example, an achievement test is a more objective measure than self-report. Likewise, many physical or biological measures are less sensitive to subjective influences. In an experiment testing behavioral interventions for reducing cocaine use, for instance, a participant will not be able to alter the results of their blood tests for the presence of cocaine.
5 CONSIDERATIONS FOR QUASI-EXPERIMENTS When researchers use pre-existing, non-randomly formed groups rather than assigning participants to the groups at random, this type of experiment is called a quasi-experiment. Quasiexperiments also allow researchers to take advantage of naturally occurring changes in policies or laws to compare the resulting changes in the outcome variables. For example, after a state-wide outdoor smoking ban, did the heart attack rate in the state decrease? If the experiment you are evaluating involves random assignment to groups, you can give N/A (not applicable) to the next three evaluation questions.
___ Question 5a: Comparison Group Similarity: If two or more comparison groups were not formed at random, is there evidence that they were initially equal in important ways? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Suppose a researcher wants to study the impact of a new third-grade reading program being used with all third graders in a school (the experimental group). For a comparison group, the researcher will have to use third graders in another school. The use of two intact, already formed groups is known as a non-equivalent group design (NEGD) quasi-experiment.17 Because students are not randomly assigned to schools, this experiment will get low scores on Evaluation Question 1a. However, if the researcher selects a comparison school in which students have standardized reading test scores similar to those in the experimental school and are similar in other important respects, such as parents’ socioeconomic status, the experiment may yield useful experimental evidence. However, note that the similarity between groups is not as satisfactory as assigning participants to groups at random. For instance, the children in the two schools may differ in some important respects that the researcher has overlooked or about which the researcher has no information. Perhaps, the children’s teachers in the experimental school were more experienced.
17
There are other types of quasi-experiments, besides the non-equivalent group design (NEGD). Some of the most popular among them are ex post facto designs, before-and-after natural experiments, interrupted timeseries designs, regression discontinuity design (RDD), and a recently popular statistical approach of propensity score matching.
167
Experimental Procedures
Their experience in teaching, rather than in the new reading program, might be the cause of the differences in reading achievement between the two groups. When using two intact groups (such as classrooms), it is important to give both a pretest and a posttest to measure the dependent variable before and after the treatment. For instance, to evaluate the reading program, a researcher should give a pretest in reading in order to establish the baseline reading scores and to check whether the two intact groups are initially similar on the dependent variable. Of course, if the two groups are highly dissimilar, the results of the experiment will be difficult to interpret.18 Note that some pre-existing groups could have been formed at random. For example, if court cases are assigned to different judges at random, then the groups of cases ruled on by each judge can be expected to be approximately equal on average. That is, even if there is a lot of variation among such cases, each judge is supposed to get a group with a similar range of variations (if there is a sufficiently large number of cases in each group). Researchers could then wait a few years and compare the groups to see whether offenders are more likely to commit new crimes when their cases are decided by more punitive judges or by more lenient ones.19 Thus, although it was not the researchers who formed the groups using random assignment, this example represents a true experiment.
___ Question 5b: Alternating Treatments: If only a single group is used, have the treatments been alternated? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Not all quasi-experiments involve comparison of two or more groups. Consider, for instance, a teacher who wants to try using increased praise for appropriate behaviors in the classroom to see whether it reduces behaviors such as inappropriate out-of-seat behavior (IOSB). To conduct an experiment on this, the teacher could count the number of IOSBs for a week or two before administering the increased praise. This would yield what is called the baseline data. Suppose the teacher then introduces the extra praise and finds a decrease in the IOSBs. This might suggest that the extra praise caused the improvement.20 However, such a conclusion would be highly tenuous because children’s environments are constantly changing in many ways, and other environmental influences (such as the school principal scolding the students on
18
19
20 168
If the groups are initially dissimilar, a researcher should consider locating another group that is more similar to serve as the control. If this is not possible, a statistical technique known as analysis of covariance can be used to adjust the post-test scores in light of the initial differences in pretest scores. Such a statistical adjustment can be risky if the assumptions underlying the test have been violated, a topic beyond the scope of this book. In fact, the study that inspired this example has found that there is no statistically significant difference among the groups, even though there is a tendency of offenders to recidivate more if their cases happen to be assigned to more punitive judges: Green, D. P., & Winik, D. (2010). Using random judge assignments to estimate the effects of incarceration and probation on recidivism among drug offenders. Criminology, 48(2), 357–387. https://doi.org/10.1111/j.1745-9125.2010.00189.x If the teacher stopped the experiment at that point, it would represent a basic before-and-after quasi-experiment.
Experimental Procedures
the playground without the teacher’s knowledge) might be the real cause of the change. A more definitive test would be for the teacher to reverse the treatment and go back to giving less praise, then revert to the higher-praise condition again. If the data form an expected pattern, the teacher would have reasonable evidence that increased praise reduces IOSB.21 Notice that in the example being considered, the single group serves as the comparison group during the baseline, serves as the experimental group when the extra praise is initially given, then serves as the control group again when the condition is reversed, and finally serves as the experimental group again when the extra praise is reintroduced. Such a design has this strength: the same children with the same backgrounds are both the experimental and comparison groups. (In a two-group experiment, the children in one group may be different from the children in the other group in some important way that affects the outcome of the experiment.) The major drawback of a single-group design is that the same children are being exposed to multiple treatments, which may lead to unnatural reactions. How does a child feel when some weeks he or she receives extra praise for appropriate behaviors but other weeks does not? Such reactions might confound the results of the experiment. Researchers refer to this problem as multiple-treatment interference. If two pre-existing classes were available for the type of experiment being considered, the teacher could use what is called a multiple baseline design, in which the initial extra-praise condition is started on a different week for each group. If the pattern of decreased IOSB under the extra-praise condition holds up across both groups, the causal conclusion would be even stronger than when only one group was used.
___ Question 5c: History Effect: For before-and-after quasiexperiments, were other relevant events considered? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Suppose researchers evaluated the effects of a program intended to increase student enrollments in a Russian Studies minor. The recruitment program took place as planned, from January to February 2022, and the results were assessed in March. When the researchers compared the enrollment of students into Russian Studies to the previous year’s enrollment numbers, the drop was staggering. Can this be attributed to the recruitment program’s negative effect or is it more likely that the war that Russia started against Ukraine in February 2022 was responsible for the change in enrollments? Most natural experiments are before-and-after experiments when data are collected on the dependent variable before and after an intervention that happens naturally (such as a policy change). For example, when gun control legislation is passed, it can serve as a natural intervention point for comparing firearm homicide rates before and after the law went into effect. The hardest part is to rule out the history effect when other possible influences are happening at the same time as the intervention of interest and possibly affecting the outcomes. For
21
This type of experimentation is often referred to as single-subject research. 169
Experimental Procedures
instance, at the same time as the gun control law went into effect, the gang turf war broke out in the same big city.
6
TREATMENT DETAILS
As with many other types of research, “the devil is in the details” when it comes to how the treatments were administered and monitored. The following five evaluation questions apply to true experiments and to those quasi-experiments where a manipulation of the independent variable (or treatment) was involved.
___ Question 6a: Detailed Description: Are the treatments described in sufficient detail? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Because the purpose of an experiment is to estimate the effects of treatment on dependent variables, researchers should provide detailed descriptions of the treatment administered. If the treatments are complex, such as two types of therapy in clinical psychology, researchers should provide references to additional publications where detailed accounts can be found. The same rule applies to treatments used in previous studies, and references to previous research should be provided. At the same time, it is important to describe the key components of the interventions so that readers can understand the gist. In Example 9.6.1, the researchers begin by providing references for the program used and then describing it in some detail. Only a portion of their detailed description of the treatment is shown in the example.
Example 9.6.1 – Olweus et al. (2019) 22 EXCERPT SHOWING REFERENCES FOR MORE INFORMATION ON EXPERIMENTAL TREATMENT FOLLOWED BY A DETAILED DESCRIPTION (PARTIAL DESCRIPTION SHOWN HERE)
The Olweus Bullying Prevention Program (OBPP, Olweus 1991, 1993, Olweus and Limber 2010a, b), the oldest and most researched school-based bullying prevention program (National Academies 2016), was first developed and evaluated by Dan Olweus in Norway in the mid-1980s. Initially designed for students in elementary, middle, and junior high school grades, the goals of the OBPP are to reduce bullying among children and youth, prevent new bullying problems, and more generally, achieve better relations among peers at school (Olweus 1993; Olweus and Limber 2010b). To achieve these goals, school personnel focus on restructuring the school environment to reduce opportunities and rewards for bullying and on building a sense of community. The OBPP is built upon four key principles … 22
170
Olweus, D., Limber, S. P., & Breivik, K. (2019). Addressing specific forms of bullying: A large-scale evaluation of the Olweus bullying prevention program. International Journal of Bullying Prevention, 1, 70–84. https://doi.org/10.1007/s42380-019-00009-7
Experimental Procedures
___ Question 6b: Quality Assurance: If the treatments were administered by individuals other than the researcher, were those individuals properly trained and monitored? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Researchers often rely on other individuals such as graduate assistants, teachers, and psychologists to administer the treatments they use in experiments. When this is the case, it is desirable for the researcher to ensure proper training. Otherwise, it is possible that the treatments were modified in some unknown way. Example 9.6.2 shows a statement regarding the supervision and training of therapists who administered the treatments in an experiment. Note that such statements are typically brief.
Example 9.6.2 – Butler et al. (2011) 23 EXCERPT ON TRAINING THOSE WHO ADMINISTERED THE TREATMENTS
For this study, the MST team comprised three therapists and a supervisor. Staff changed minimally during the trial and a total of five therapists delivered all the interventions. The therapists held master’s-level qualifications in counseling psychology or social work, and had a minimum of 2 years’ experience working with families. They received MST training as part of the study. Even if those who administered the treatments were trained, they normally should be monitored. This is particularly true for long and complex treatment cycles. For instance, if psychologists try out new techniques with clients over a period of several months, they should be monitored by spot-checking their efforts to determine whether they are applying the techniques they learned in their training. This can be achieved by directly observing them or questioning them.
___ Question 6c: Person Effect: If each treatment group had a different person administering the treatment, did the researcher try to eliminate the person effect? Very unsatisfactory
23
1
2
3
4
5
Very satisfactory
or N/A
I/I
Butler, S., Baruch, G., Hickey, N., & Fonagy, P. (2011). A randomized controlled trial of multisystemic therapy and a statutory therapeutic intervention for young offenders. Journal of the American Academy of Child & Adolescent Psychiatry, 50(12), 1220–1235. https://www.sciencedirect.com/science/article/abs/pii/ S089085671100880X
171
Experimental Procedures
Comment: Suppose that the purpose of an experiment is to compare the effectiveness of three methods for teaching decoding skills in first-grade reading instruction. If a different teacher uses each method, differences in the teachers (such as their ability to build rapport with students, level of enthusiasm, and ability to build effective relationships with parents) may cause observed differences in achievement. That is, the teachers’ personal characteristics, rather than their respective teaching methods, may have impacted the outcome. One solution to this problem is to have each of the three methods used by a large number of teachers, with the teachers assigned at random to the methods. If such a large-scale study is not possible, another solution is to have each teacher use all three methods. In other words, Teacher A could use Method X, Method Y, and Method Z at different points in time with different children. The other two teachers would do likewise. When the results are averaged, the personal effect of each teacher will have contributed to the average scores earned under each of the three methods. If this issue is not applicable to the experiment you are evaluating, give it “N/A” on this evaluation question.
___ Question 6d: Treatment Compliance: If treatments were selfadministered, did the researcher check treatment compliance? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Some treatments are self-administered, out of the researcher’s view. For instance, an experimental group might be given a new antidepressant drug for self-administration over a period of months. The researcher could check treatment compliance by asking participants how faithful they are being in taking the drug. More elaborate checks would include instructing participants to keep a diary of their drug-taking schedule or even conducting tests that detect the presence of the drug.
___ Question 6e: Cross-Group Contamination: Could diffusion of treatment or information exchange occur between the experimental and control group participants? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: The problem of possible treatment contamination is relevant when people in the experimental and control groups have a chance to interact: for example, if students from the same class were randomly assigned to different educational interventions happening three times a week for one hour each. Almost inevitably, the students would talk to each other about what they did during the intervention periods. Such diffusion of treatment may make it less likely to find significant differences between the experimental and control group outcomes.
172
Experimental Procedures
Such cross-group contamination may be remedied when whole classes, rather than individuals, are assigned to different treatments (cluster-randomized approach), but there are some serious trade-offs. Researchers are unlikely to have access to 60+ classrooms compared to 60+ students, and if only two pre-existing classrooms of 30 students each are used as the experimental and comparison groups, a true experiment becomes a quasi-experiment, with inevitable losses in terms of establishing the treatment effect.
___ Question 7: Natural Settings: Is the experiment conducted in a natural for-participants environment? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Sometimes researchers conduct experiments in artificial settings. When they do this, they limit the external validity (or generalizability) of the study. What is found in the artificial environment of an experiment may not be found in more natural settings (i.e., the finding may not be valid outside the laboratory where the study took place). Therefore, experiments conducted in laboratory settings are likely to have poor external validity. Notice the unnatural aspects of Example 9.7.1 below. First, the amount and type of alcoholic beverages were assigned (rather than being selected by the participants as they would be in a natural setting). Second, the female was an accomplice of the experimenters (not someone the males were actually dating). Third, the setting was a laboratory, where the males would be likely to suspect that their behavior was being monitored in some way. While the researchers have achieved a high degree of physical control over the experimental setting, they have sacrificed external validity in the process.
173
Experimental Procedures
Example 9.7.1 EXPERIMENT WITH POOR EXTERNAL VALIDITY
A research team was interested in the effects of alcohol consumption on aggressiveness in males when dating. In the experiment, some of the males were given moderate amounts of beer to consume, while the controls were given nonalcoholic beer. Then, all males were observed interacting with a female cohort of the experimenters. The interactions took place in a laboratory on a college campus, and observations were made through a one-way mirror. At the same time, experiments conducted in the field, such as the experiment on different policing strategies described in Example 9.1.2 earlier in this chapter, present the opposite problem: it is often impossible for researchers to control all the important aspects in a field experiment. Experiments in natural settings often present internal validity problems. The internal validity of an experiment refers to whether the experiment can properly test a cause-and-effect relationship to rule out confounding variables (or alternative explanations for the results).
___ Question 8: Research Ethics: Has the researcher used ethical and politically acceptable treatments? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: This evaluation question is important for a wide range of research designs. For instance, has the researcher used treatments to promote classroom discipline that are acceptable to parents, teachers, and the community? It is important to remember that research ethics standards evolve with changes in cultural values and sensitivities, so some treatments that were acceptable 20 years ago may become unacceptable. At the same time, it is important to remember that if the proposed treatments are unethical, they are usually ruled out at the research ethics committee or Institutional Review Board (IRB) review stage, before the experiment even takes place, so this guideline might be more relevant when evaluating older studies or studies that were not subjected to review by an IRB or ethics committee approval. Appendix A1 discusses two of the most infamous studies – Zimbardo’s Stanford Prison Experiment and Milgram’s Experiments on Obedience – that would not pass the IRB review nowadays.
___ Question 9: Overall: Was the experiment properly conducted? Very unsatisfactory
174
1
2
3
4
5
Very satisfactory
or N/A
I/I
Experimental Procedures
Comment: Rate the overall quality of the experimental procedures on the basis of the answers to the evaluation questions in this chapter and take into account any other concerns you may have. Visit the Instructor & Student Resources website for multiple choice questions and additional resources: www.routledge.com/cw/tcherni-buzzeo
Chapter 9 Exercises Part A Directions: Answer the following questions. 1 In an experiment, a treatment constitutes what is known as A an independent variable. B a dependent variable. 2 Which of the following is described in this chapter as being vastly superior to the other? A Assigning a small number of previously existing groups to treatments at random. B Assigning individuals to treatments at random. 3 Suppose a psychology professor conducted an experiment in which one of her sections of Introduction to Social Psychology was assigned to be the experimental group and the other section served as the control group during a given semester. The experimental group used computer-assisted instruction while the control group received instruction via a traditional lecture/discussion method. Although both groups are taking a course in social psychology during the same semester, the two groups might be initially different in other ways. Speculate on what some of the differences might be. (See Evaluation Question 5a.) 4 In this chapter, what is described as a strength of an experimental design in which one group serves as both the treatment group and its own control group? What is the weakness of this experimental design? 5 Very briefly describe how the person effect might confound an experiment. 6 What is the difference between a blind and a double-blind experiment? 7 What is the name of the phenomenon in which participants may be influenced by knowledge of the purpose of an experiment? 8 What are the main advantages and drawbacks of natural experiments? What about lab experiments? 9 Briefly explain how random selection differs from random assignment. 10 Is it possible to have nonrandom selection yet still have random assignment in an experiment? Explain.
175
Experimental Procedures
Part B Directions: Locate empirical articles on two experiments on topics of interest to you. Evaluate them in light of the evaluation questions in this chapter, taking into account any other considerations and concerns you may have. Select the one to which you gave the highest overall rating, and bring it to class for discussion. Be prepared to discuss its strengths and weaknesses.
176
CHAPTER 10
Evaluating Analysis and Results Sections: Quantitative Research
This chapter discusses the evaluation of the Analysis and Results sections in quantitative research reports. These almost always contain statistics that summarize the data collected, such as means, medians, and standard deviations. These statistics are known as descriptive statistics. The Results sections of quantitative articles also usually contain inferential statistics (such as various regression analyses), which help make inferences from the sample that was actually studied to the population from which the sample was drawn. It is assumed that the reader has a basic knowledge of elementary statistical methods. Note that the evaluation of Methods and Analysis sections of qualitative research reports is covered in the next chapter. The guidelines for evaluating the Analysis and Results sections of mixed-methods research are explained in Chapter 12.
____ Question 1: % and n: When percentages are reported, are the underlying numbers of cases also reported?1 Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I1
Comment: Percentages are widely reported in empirical articles published in academic journals. When reporting percentages, it is important that researchers report the underlying number of cases for each percentage. Otherwise, the results might be misleading. Consider Example 10.1.1, which contains only percentages. The percentage decrease in this example seems dramatic. However, when the underlying numbers of cases (whose symbols are n or N) are shown, as in Example 10.1.2, it becomes clear that the percentage represents only a very small decrease in absolute terms (i.e., a decrease from four to two students).
1
Continuing with the same grading scheme as in the previous chapters, N/A stands for “Not applicable” and I/I stands for “Insufficient information to make a judgement.”
DOI: 10.4324/9781003362661-10
177
Analysis and Results: Quantitative
Example 10.1.1 PERCENTAGE REPORTED WITHOUT UNDERLYING NUMBER OF CASES (POTENTIALLY MISLEADING)
Since the end of the Cold War, interest in Russian language studies has decreased dramatically. For instance, at Zaneville Language Institute, the number of students majoring in Russian has decreased by 50% from a decade earlier.
Example 10.1.2 PERCENTAGE REPORTED WITH UNDERLYING NUMBER OF CASES (NOT MISLEADING)
Since the end of the Cold War, interest in Russian language studies has decreased dramatically. For instance, at Zaneville Language Institute, the number of students majoring in Russian has decreased by 50% from a decade earlier (n = 4 in 2002, n = 2 in 2012). On the other hand, it is also important to remember that percentages or rates (for example, the number of cases per 1,000 or per 100,000) are essential when comparing units of different sizes (such as cities, countries, schools, etc.). For example, when comparing the number of violent crimes in New York City (NYC) versus Chicago in 2022, we can look at the numbers: approximately 45,500 in NYC and about 17,600 in Chicago. This comparison seems to imply that NYC is much more violent than Chicago. However, we should take the population difference into account: in 2022, NYC had about 8.5 million people while Chicago had about 2.7 million people. Thus, by calculating violent crime rates (number of cases divided by population and multiplied by 100,000), we see that NYC, with its violent crime rate of about 535 per 100,000, is actually a safer place than Chicago, with a violent crime rate of 652.
178
Analysis and Results: Quantitative
____ Question 2: Means are Meaningful: Are averages other than means (like a median or mode) reported for skewed distributions? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: The mean, which is the most commonly reported average, should be used when the distribution is not highly skewed (when the distribution is approximately symmetrical). A skewed distribution is one in which there are extreme scores on one side of the distribution (such as very high scores without a similar share of very low scores to counterbalance them). Example 10.2.1 shows a skewed distribution. It is skewed because there is a very high score of 310, which is not balanced out by a very low score at the lower end of the distribution of scores. This is known as a distribution that is skewed to the right or is positively skewed. The mean, which is supposed to represent the central tendency of the entire group of scores, has been pulled up by a single very high score, resulting in a mean of 82.45, which is higher than all of the scores except the highest score (310).
Example 10.2.1 A SKEWED DISTRIBUTION (SKEWED TO THE RIGHT) AND A MISLEADING MEAN
Scores: 55, 55, 56, 57, 58, 60, 61, 63, 66, 66, 310 Mean = 82.45, standard deviation = 75.57 The raw scores for which a mean was calculated are seldom included in research reports, which makes it impossible to inspect for skewness. However, some simple computations using only the mean and standard deviation (which are usually reported) can reveal whether the mean was misapplied to a distribution that is highly skewed to the right. These are the calculations: 1
Round the mean and standard deviation to whole numbers (to simplify the computations). Thus, the rounded mean is 82 and the rounded standard deviation is 76 for Example 10.2.1. 2 Multiply the standard deviation by two (i.e., 76 × 2 = 152). 3a SUBTRACT the result of Step 2 from the mean (i.e., 82 − 152 = –70). 3b ADD the result of Step 2 to the mean (i.e., 82 + 152 = 234). Steps 3a and 3b show the lower and upper bounds of an approximately normal distribution that would be fittingly described by the mean. If the result of Step 3a is lower than the lowest possible score, which is usually zero, then the distribution is highly skewed to the right.2 2
In a normal, symmetrical distribution, there are 3 standard deviation units on each side of the mean. Thus, there should be 3 standard deviation units on both sides of the mean in a distribution that is not skewed. In this example, there are not even 2 standard deviation units to the left of the mean (because the standard deviation was multiplied by 2). Even without understanding this theory, a consumer of research can still apply the
179
Analysis and Results: Quantitative
(In this example, –70 is much lower than zero.) This indicates that the mean was applied to a skewed distribution, resulting in a reported average that is misleadingly high.3 If the result of Step 3b is higher than the highest score, the distribution is highly skewed to the left or is negatively skewed. In such a case, the mean would be a misleadingly low value for an average. This type of inappropriate selection of an average is rather common, perhaps because researchers often compute the mean and standard deviation as a matter of standard practice. A more appropriate measure of central tendency for skewed distributions would be a median (the midpoint of the distribution if the raw scores are listed from lowest to highest) or a mode (the most common raw score in the distribution).4 If you detect that a mean has been computed for a highly skewed distribution by performing the set of calculations described above, there is little that you can do to correct it short of contacting the researcher to request the raw scores. If alternative measures of central tendency (the median or mode) are not provided for skewed distributions in the research report, the mean should be interpreted with great caution, and the article should be given a low rating on this evaluation question.
3
4
180
simple steps described here to identify skewed distributions. Note that there are precise statistical methods for detecting a skew. However, for their use to be possible, the original scores would be needed, and those are almost never available to consumers of research. This procedure will not detect all highly skewed distributions. If the result of Step 3a is lower than the lowest score obtained by any participant, the distribution is also skewed. However, researchers seldom report the lowest score obtained by participants. A mode is also the only measure of central tendency that can be used for describing non-numerical data but it is much more common to present the distribution of non-numerical data as percentages (for example, “65% of the sample was White, 23% African American, 4% Asian, and 8% were other or mixed race”).
Analysis and Results: Quantitative
____ Question 3: Descriptives: Have the researchers presented descriptive statistics before presenting the results of inferential tests? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Descriptive statistics include frequencies, percentages, averages (usually the mean or median), correlation coefficients (usually Pearson’s r), and measures of variability (usually standard deviation or interquartile range). Descriptive statistics are called this way because they describe the sample. Inferential statistics, such as tests of differences between the means or other tests of statistical significance, confidence intervals, regression analyses, and so on, allow researchers to infer, or generalize from the sample statistics to the population. In technical terms, inferential statistics determine the probability that any differences among descriptive statistics are due to chance (random sampling error). Obviously, it makes no sense to discuss the results of a test performed on descriptive statistics unless those descriptives have first been presented. Failure on this evaluation question is rather rare and represents a serious flaw in research reports.
____ Question 4: Signifcance: If any differences are statistically significant but substantively small, have the researchers noted that they are small? Very unsatisfactory
1
2
3
4
5
Very satisfactory
or N/A
I/I
Comment: Statistically significant differences are sometimes very small, especially when researchers are using large samples. (See Appendix C for an explanation of this point.) When this is the case, it is a good idea for a researcher to point this out. Obviously, a small but statistically significant difference should be interpreted differently from a large and statistically significant difference. Example 10.4.1 illustrates how a significant but substantively small difference might be pointed out.
Example 10.4.1 DESCRIPTION OF A SMALL BUT STATISTICALLY SIGNIFICANT DIFFERENCE
Although the difference between the means of the experimental group (M = 24.55) and control group (M = 23.65) was statistically significant (t = 2.075, p