175 39 17MB
English Pages 388 [389] Year 2023
“Harlow breaks down concepts in simple terms and draws insightful comparisons across different statistical techniques. The text is an excellent resource for all who conduct statistical analyses, from undergraduates to professionals, and everywhere in between.” A. Nayena Blankson, Full Professor of Psychology, Spelman College, USA “Once again, Harlow writes with authority and great clarity. There’s discussion of estimation, replication, and reproducibility, and all through there’s R code and guidance. This shrewdly revised new edition is a wonderfully future-oriented guide to the multivariate world.” Geoff Cumming, Professor Emeritus, La Trobe University, Melbourne, Australia “The Essence of Multivariate Thinking provides a gentle introduction to an expansive toolkit of methods with a through line focused on aptly applying them to research questions. Dr. Harlow seamlessly ties together several modeling frameworks (e.g., multiple regression, MANOVA, discriminant function analysis, logistic regression, SEM, and latent growth modeling) by emphasizing commonalities in their underlying assumptions, statistical tests, and effect size interpretations. All this was done with a clear and accessible writing style accompanied by a unifying data example using R, SAS, and SPSS.” Jolynn Pek, Associate Professor of Quantitative Psychology, Ohio State University, USA “In this third edition of her landmark text on multivariate methods, readers are furnished with ways to think about a variety of multivariate techniques from start to finish: pre liminary considerations, testing assumptions, conducting analyses, and writing up and interpreting the results. They will learn how the various multivariate methods are related, and when each should be used. Readers can pattern their own analyses after those used as examples in the book. Her example analyses are easy to follow and emulate. Every chapter ends by summarizing a topic around a set of core themes to help researchers understand and select appropriate multivariate methods. Professor Harlow is known for her didactic style and ability to communicate complex topics in clear, approachable language. In this third edition she adds new material on multi-sample SEM and latent growth curve models. The provision of R code for all analyses is a welcome addition. Readers will emerge knowing about a variety of multivariate methods—how they are related and distinct, when to use them, how to conduct them, and how to communicate the results to readers. Harlow has a knack for explaining complicated methods in clear, approachable language. In this revised third edition of her landmark text, she provides readers with all the tools they need to become expert users and consumers of multivariate techniques. Kristopher J. Preacher, Lois Autrey Betts Chair in Education & Human Development, Vanderbilt University, USA “Lisa Harlow’s third edition of The Essence of Multivariate Thinking is the perfect text book for introductory graduate statistics or a multivariate statistics course. Each chapter provides readers with the foundational knowledge and relevant software code to start implementing the analyses in their own work right away. Harlow’s accessible chapters strike the right balance of unique information and related themes.” Alyssa Counsell, Assistant Professor in Quantitative Psychology, Toronto Metropolitan University, Ontario, Canada
The Essence of Multivariate Thinking
Focusing on the underlying themes that run through most multivariate methods, in this fully updated 3rd edition of The Essence of Multivariate Thinking Dr. Harlow shares the simila rities and differences among multiple multivariate methods to help ease the understanding of the basic concepts. The book continues to highlight the main themes that run through just about every quantitative method, describing the statistical features in clear language. Analyzed examples are presented in 12 of the 15 chapters, showing when and how to use relevant multivariate methods, and how to interpret the findings both from an overarching macro- and more specific micro-level approach that includes focus on statistical tests, effect sizes, and confidence intervals. This revised 3rd edi tion offers thoroughly revised and updated chapters to bring them in line with current information in the field, the addition of R code for all examples, continued SAS and SPSS code for seven chapters, two new chapters on structural equation modeling (SEM) on multiple sample analysis (MSA) and latent growth modeling (LGM), and applications with a large longitudinal dataset in the examples of all methods chapters. Of interest to those seeking clarity on multivariate methods often covered in a statistics course for first-year graduate students or advanced undergraduates, this book will be key reading and provide greater conceptual understanding and clear input on how to apply basic and SEM mul tivariate statistics taught in psychology, education, human development, business, nursing, and other social and life sciences. Lisa L. Harlow is a professor emerita of psychology at the University of Rhode Island, USA
Multivariate Applications Series Sponsored by the Society of Multivariate Experimental Psychology, the goal of this series is to apply statistical methods to significant social or behavioral issues, in such a way so as to be accessible to a nontechnical-oriented readership (e.g., non-methodological researchers, tea chers, students, government personnel, practitioners, and other professionals). Applications from a variety of disciplines such as psychology, public health, sociology, education, and business are welcome. Books can be single- or multiple-authored or edited volumes that (1) demonstrate the application of a variety of multivariate methods to a single, major area of research; (2) describe a multivariate procedure or framework that could be applied to a number of research areas; or (3) present a variety of perspectives on a topic of interest to applied multivariate researchers. Anyone wishing to submit a book proposal should send the following: (1) author/title; (2) timeline including completion date; (3) brief overview of the book’s focus, including table of contents and, ideally, a sample chapter (or chapters); (4) a brief description of competing publications; and (5) targeted audiences. For more information, please contact the series editor, Lisa Harlow, at Department of Psychology, University of Rhode Island, 10 Chafee Road, Suite 8, Kingston, RI 02881–0808; phone (401) 874–4242; fax (401) 874–5562; or e-mail [email protected]. Longitudinal Data Analysis: A Practical Guide for Researchers in Aging, Health, and Social Sciences, co-edited by Jason T. Newsom, Richard N. Jones, and Scott M. Hofer (2011) Structural Equation Modeling with MPlus: Basic Concepts, Applications, and Pro gramming written by Barbara M. Byrne (2012) Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis written by Geoff Cumming (2012) Frontiers of Test Validity Theory: Measurement, Causation and Meaning written by Keith A. Markus and Denny Borsboom (2013) The Essence of Multivariate Thinking: Basic Themes and Methods, Second Edition written by Lisa L. Harlow (2014) Longitudinal Analysis: Modeling Within-Person Fluctuation and Change written by Lesa Hoffman (2015) Handbook of Item Response Theory Modeling: Applications to Typical Performance Assessment co-edited by Steven P. Reise & Dennis Revicki (2015) Longitudinal Structural Equation Modeling: A Comprehensive Introduction written by Jason T. Newsom (2015) Higher-order Growth Curves and Mixture Modeling with Mplus: A Practical Guide by Kandauda A. S. Wickrama, Tae Kyoung Lee, Catherine Walker O’Neal, & Frederick O. Lorenz What If There Were No Significance Tests?: Classic Edition by Lisa L. Harlow, Stanley A. Mulaik, James H. Steiger Introduction to Item Response Theory Models and Applications by James E. Carlson Higher-order Growth Curves and Mixture Modeling with Mplus: A Practical Guide 2nd Edition by Kandauda A. S. Wickrama, Tae Kyoung Lee, Catherine Walker O’Neal, & Frederick O. Lorenz The Essence of Multivariate Thinking: Basic Themes and Methods, Third Edition written by Lisa L. Harlow
The Essence of Multivariate Thinking Basic Themes and Methods Third edition
Lisa L. Harlow
Designed cover image: Instants/E+ via Getty Images Third edition published 2023 by Routledge 605 Third Avenue, New York, NY 10158 and by Routledge 4 Park Square, Milton Park, Abingdon, Oxon OX14 4RN Routledge is an imprint of the Taylor & Francis Group, an informa business First edition published by Phychology Press 2005 © 2023 Lisa L. Harlow The right of Lisa L. Harlow to be identified as authors of this work has been asserted in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data A catalog record for this title has been requested ISBN: 978-0-367-21970-3 (hbk) ISBN: 978-0-367-21972-7 (pbk) ISBN: 978-0-429-26910-3 (ebk) DOI: 10.4324/9780429269103 Typeset in Times New Roman by Taylor & Francis Books
In Memory of Jacob Cohen and Barbara M. Byrne
Contents
List of figures List of tables About the Author Acknowledgements Preface to the Third Edition
xi
xiii
xvii
xix
xxi
PART I
Overview 1 Introduction and Multivariate Themes 2 Background Considerations
1
3
16
PART II
Intermediate Multivariate Methods with One Continuous Outcome
39
3 Multiple Regression
41
4 Analysis of Covariance
67
PART III
Multivariate Group Methods with Categorical Variable(s) 5 Multivariate Analysis of Variance
89
91
6 Discriminant Function Analysis
115
7 Logistic Regression
133
PART IV
Multivariate Dimensional Methods with Continuous Variables 8 Principal Components and Factor Analysis
151
153
PART V
Structural Equation Modeling
181
9 Structural Equation Modeling
183
x
Contents
10 Path Analysis
202
11 Confirmatory Factor Analysis
229
12 Latent Variable Modeling
253
13 Multiple Sample Analysis
280
14 Latent Growth Modeling
302
PART VI
Summary
323
15 Integration of Multivariate Methods
325
Appendix A Appendix B Index
336
343
358
Figures
3.1
Multiple Regression Model with Five Predictors (at baseline) of a DV,
Diet Behavior (at 12 months) 3.2 Scatterplots of Three Diet Temptations Parcels with DV, Diet Behavior,
N = 2,654 3.3 Q-Q Plot of Standardized Residuals and t Quantiles for the MR Model 3.4 Histogram with Normal Density Curve for the Residuals from the MR
Model 4.1 ANCOVA with IV = Group, Covariate = Diet Behavior at baseline, and
DV = Diet Behavior at 12 Months 4.2 Scatterplots of Diet Behavior at Baseline (Covariate) and 12 months (DV) 4.3 Q-Q Plot of Standardized Residuals and t Quantiles for the ANCOVA
Model 4.4 Histogram with Normal Density Curve for the Residuals from the
ANCOVA Model 4.5 Boxplots of Diet Behavior at 12 months for ANCOVA by Group 5.1 Scatterplots and Histograms for the Four MANOVA Dependent
Variables, N = 2710 5.2a–d Boxplots of Four Diet Behavior Dependent Variables by Group 5.3 Micro-Level Results: Cohen’s d and Means for Four DVs, by Group IV,
N = 2,710 6.1 Plots of the Group Centroids of the Discriminant (aka Canonical) Scores
for the Control (1) and Treatment (2) Groups, and Structure Coefficients
(i.e., Loadings) for the Four Diet Behavior Variables 6.2 Diagram of Four Diet Behavior Predictors and Treatment Group DV for
Follow-up DFA with Discriminant Loadings and Effect Size 7.1 LR with Treatment Group DV and Odds Ratios and CIs Provided for
Four Diet Behavior Predictors 8.1 Depiction of Two Dimensions of Diet Behavior and Diet Temptations,
each with Several Relevant (bold-lined) and Several Non-essential
(dashed-lined) Loadings 8.2a Scatterplots and Histograms for Four Diet Behavior Variables 8.2b Scatterplots and Histograms for Three Diet Temptations Variables 8.3a Parallel Analysis Scree Plot for Four Diet Behavior and Three Diet
Temptations Item Parcels from Principal Components Analysis 8.3b Parallel Analysis Scree Plot for Four Diet Behavior and Three Diet
Temptations Item Parcels from Factor Analysis 10.1 Depiction of Three Path Analysis Model Versions 10.2 Scatterplots for Two Numeric Variables in the Path Analysis Models
50
56
57
57
75
81
84
84
85
108
111
112
128
130
148
154
168
168
170
172
207
217
xii 11.1 11.2 12.1 12.2 13.1 14.1 14.2 14.3 15.1
List of figures Diagram of CFA Diet Behavior and Diet Temptations Factors Scatterplots among the Seven Variables in the CFA Example Depiction of Three Latent Variable Model Versions Scatterplots among the Seven Numeric Variables in the LVM Example Multiple Sample Analysis CFA Diagram Latent Growth Model Path Diagram for a Basic (Unconditional) Model
with Three Occasions Latent Growth Model Path Diagram for a Conditional Model with Three
Occasions and One Exogenous Predictor Parallel Process Latent Growth Model Path Diagram for a Bivariate
Model with Three Occasions Visual Summary of the Multivariate Methods
237
240
263
266
285
304
312
312
334
Tables
1.1 1.2 2.1 2.2 2.3a 2.3b 2.3c 2.4a 2.4b 3.1a 3.1b 3.1c 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 4.1a 4.1b 4.1c 4.2 4.3 4.4 4.5 4.6 4.7 4.8
Summary of the Definition, Benefits, Drawbacks, and Context for
Multivariate Methods Summary of Multivariate Themes Summary of Background Themes to Consider for Multivariate Methods Six Questions to Ask for Each Multivariate Method R Syntax for Test–Retest Reliability and Coefficient Omega SAS Syntax for Test–Retest Reliability and Coefficient Alpha SPSS Syntax for Test–Retest Reliability and Coefficient Alpha Diet Behaviors Test–Retest Reliability Coefficients, N = 5,942 Diet Temptations Test–Retest Reliability Coefficients, N = 2,331 R SYNTAX for MR SAS SYNTAX for MR SPSS SYNTAX for MR Descriptive Statistics for the DV, Diet Behavior, and Five IVs, N = 2,654 Correlation Coefficients among DV, Diet Behavior, and Five IVs, N =
2,654 Summary of Standard MR Output, DV = Diet Behavior (12 mos.), N =
2,654 Summary of Macro-Level Fit for Five Hierarchical MR Models, N =
2,654 Assess if Hierarchical Models Add Significant Predictor(s), N = 2,654 Micro-Level Summary for Final Hierarchical Model 5, DV = Diet
Behavior (12 mos.), N = 2,654 Stepwise MR Macro-Level Selection Summary, DV = Diet Behavior (12
mos.), N = 2,654 Micro-Level Summary for Final Stepwise Model, DV = Diet Behavior
(12 mos.), N = 2,654 Main Themes Applied to Multiple Regression R Syntax for Descriptive Statistics and ANCOVA SAS Syntax for Descriptive Statistics for ANCOVA SPSS Syntax for Descriptive Statistics and ANCOVA Pearson Correlation Coefficients ANCOVA Example Group Frequencies ANCOVA Example Descriptive Statistics, by Group Testing for Homogeneity of Regressions ANOVA Macro-Level Results ANCOVA Macro- and Micro-Level Results Main Themes Applied to ANCOVA
7
13
30
31
33
33
34
34
35
52
54
55
56
56
58
60
60
61
62
62
64
77
79
79
80
80
81
82
82
82
86
xiv 5.1a 5.1b 5.1c 5.2 5.3a 5.3b 5.4 5.5 5.6 5.7 6.1a 6.1b 6.1c 6.2 6.3 6.4 6.5 7.1a 7.1b 7.1c 7.2 7.3a 7.3b 7.4a 7.4b 7.4c 7.4d 7.5 7.6 7.7 8.1 8.2a 8.2b 8.2c 8.3 8.4
8.5 8.6
List of tables R Syntax for MANOVA SAS Syntax for MANOVA SPSS Syntax for MANOVA Descriptive Statistics on the Four DVs, by Group, N = 2,710 Correlations among the Five Variables for MANOVA, N = 2,710 Test–Retest Reliability for the Four DVs from 00 to 12 Months, N =
2,671 MANOVA Tests for Homogeneity of Variance Assumption MANOVA Macro-Level Tests of the Set of Four DVs, by Group ANOVA Follow-up Tests for Each of the Four DVs, by Group Main Themes Applied to MANOVA R Syntax for DFA SAS Syntax for DFA SPSS Syntax for DFA Macro-Level Results for the Follow-up DFA Micro-Level Discriminant Weights for the Follow-up DFA Classification Results for the Follow-up DFA Main Themes Applied to DFA R Syntax for Logistic Regression SAS Syntax for LR SPSS Syntax for LR Descriptive Statistics for Five LR Variables, N = 2,710 Correlations among the Five Variables in LR, N = 2,710 Test–Retest Reliability for the Four Predictors from 00 to 12 months, N =
2,671 Macro-Level Log Likelihood Statistics for LR Model Fit, N = 2,710 R Pseudo-R2 Effect Size Indices for LR Model (M) SAS Pseudo-R2 Effect Size Indices for LR Model (M) SPSS Pseudo-R2 Effect Size Indices for LR Model (M) Micro-Level LR Results for Four Covariates with Treatment Group, N =
2,710 Classification Results for LR with Four Covariates and Treatment Group,
N = 2,710 Main Themes Applied to LR Abbreviations for Four Diet Behavior and Three Diet Temptations Item
Parcels, with Total Coefficient Omega Internal Consistency R Syntax for PCA and FA SAS Syntax for PCA and FA SPSS Syntax for PCA and FA Descriptive Statistics on Four Diet Behavior and Three Diet Temptations
Item Parcels, N = 3,656 Correlation Matrix for Four Diet Behavior and Three Diet Temptations
Item Parcels, with Number of Possible Dimensions and Eigenvalues, N =
3,656 Macro-level Fit Assessment for Two Random Samples and Combined
Sample Varimax- and Oblimin-Rotated PCA Pattern Matrices with
Communalities for Four Diet Behavior and Three Diet Temptations Item
Parcels on Combined Sample, N = 3,656
102
104
104
106
107
107
108
109
110
112
124
125
126
127
128
129
131
141
142
142
144
144
144
145
145
145
145
146
147
148
161
162
165
165
167
169
171
173
List of tables 8.7a
8.7b
8.8 9.1 9.2 9.3 9.4a 9.4b 9.4c 9.5 10.1 10.2 10.3 10.4 10.5 10.6a 10.6b 10.7 10.8 10.9 10.10 11.1 11.2 11.3 11.4 11.5 11.6a 11.6b 11.6c 11.7 11.8 11.9 11.10 12.1 12.2 12.3 12.4
Varimax Rotated ULS FA Pattern Matrix with Bootstrapped Confidence Intervals and Communalities for Four Diet Behavior and Three Diet Temptations Item Parcels on Combined Sample, N = 3,656 Oblimin Rotated ULS FA Pattern Matrix with Bootstrapped Confidence
Intervals and Communalities for Four Diet Behavior and Three Diet Temptations Item Parcels on Combined Sample, N = 3,656 Main Themes Applied to PCA and FA Summary of Symbols, Matrices, and R Notation for PA Summary of Symbols, Matrices and R Notation for CFA Summary of Symbols, Matrices and R Notation for LVM Sample Outline of a Generic R File for PA Sample Outline of a Generic R File for CFA Sample Outline of a Generic R File for LVM Main Themes Applied to SEM Summary of Symbols for Path Analysis Test–Retest Reliability Correlations and Confidence Intervals R Script and Descriptive Statistics for the Four PA Measures R Syntax for Path Analysis Models in First Random Sample Macro-Level Fit for Three PA Models in First Random Sample, Na =
1,151 Standardized Residuals from Full PA Model in 1st Random Sample, Na =
1,151 Standardized Residuals from Full PA Model in the Second Random
Sample, Na = 1,152 Parameter Estimates for Full PA Model in 1st Random Subsample, Na =
1,151 Macro-Level Fit for Three PA Models in Second Random Subsample, Nb
= 1,152 Parameter Estimates for Full PA Model in 2nd Random Subsample, Nb =
1,152 Main Themes Applied to Path Analysis Summary of Symbols for Confirmatory Factor Analysis Coefficient Omegas for Parcels for CFA Constructs R Script and Descriptive Statistics for seven CFA Measures Correlations among the Seven Variables to Assess Possible Collinearity R Syntax for CFA Models Macro-Level Fit for Three CFA Models, Sample a, N = 1,902 Macro-Level Fit for Three CFA Models, Sample b, N = 1,903 Macro-Level Fit for Correlated CFA Model, Combined Sample, N =
3,805 Standardized Residuals from the Correlated CFA Model, N = 3,805 Factor Structure with Loadings and Confidence Intervals for the
Correlated CFA Model on the Combined Sample, N = 3,805 Modification Indices of 7.0 or More for CFA Correlated Model, N =
3,805 Main Themes Applied to CFA Summary of Symbols for LVMs Coefficient Omegas for Parcels for LVM Latent Constructs R Syntax and Descriptive Statistics for the Nine LVM Measures Correlations among the Seven Numeric Variables to Assess Collinearity
xv
174
175
176
194
195
195
196
197
197
198
206
214
215
218
220
221
222
222
223
224
225
232
239
239
241
243
244
245
245
245
246
247
249
257
264
265
267
xvi
List of tables
12.5a 12.5b 12.6a 12.6b 12.6c 12.7 12.8 12.9 12.10 12.11 12.12 13.1 13.2 13.3 13.4 13.5 13.6 13.7 13.8 14.1 14.2
14.3a 14.3b 14.4 14.5 14.6 15.1 15.2 15.3 15.4 A.1 B.1 B.2
R Syntax for Latent Variable Models, Na = 1,151 (Nb = 1,152) R Syntax for LVMs with Indirect and Total Effects, Combined N = 2,303 Macro-Level Fit for Three LVM Versions, Na = 1,151 Macro-Level Fit for Three LVM Versions, Nb = 1,152 Macro-Level Fit for Full LVM, Combined Sample, N = 2,303 Standardized Residuals from the Full LVM, Combined N = 2,303 Factor Loadings for the Full LVM, Combined N = 2,303 Regression Effects for the Full LVM, Combined N = 2,303 (Co)Variances for the Full LVM on Combined N = 2,303 Modification Indices of 7.0+ for the Full LVM, Combined N = 2,303 Main Themes Applied to LVM Greek Symbols for Main Parameters in MSA with R Input Reliability Coefficients for Measures Used in the MSA Example R syntax for MSA Univariate Statistics for Measured Variables in MSA Nested Model Comparison Macro-Level Fit for Three MSA Models, n = 4,274 Women and n =
1,796 Men Micro-Level Fit for the Metric Invariance MSA Model Main Themes Applied to MSA Greek Symbols for Main Parameters in LGM with R Input Descriptive Statistics (Mean [M], Standard Deviation [SD]) of the
Temptations for, and Stage of Change from, a High-Fat Diet Scale across
Intervention Groups R Syntax for Preliminary Analyses before LGM R lavaan Syntax for LGM Analyses Macro-Level Fit for Three LGMs Unstandardized Micro-Level Results for LGMs Main Themes Applied to LGM Models and Central Themes Applied to Multivariate Methods Initial Considerations for Multivariate Methods Background Themes Applied to Multivariate Methods Interpretation Themes Applied to Multivariate Methods Codebook for the Data Summary of Matrix Concepts Summary of Matrix Calculations
268
269
270
270
271
271
272
273
274
274
276
284
291
294
295
296
297
298
299
306
313
314
315
316
317
319
327
328
330
332
336
356
356
About the Author
Lisa L. Harlow (1985 PhD, UCLA) is an emerita professor at URI emphasizing quantita tive methods (e.g., multivariate statistics, structural equation modeling) with a focus on increasing interest, performance, and diversity in quantitative science. She published 100+ articles, seven books and received $8,600,000+ in funding, largely on multivariate methods, applications, health, and advancing underrepresented groups in quantitative science. She is editor of the Multivariate Applications Book Series, former editor of Psychological Meth ods, and past associate editor of Structural Equation Modeling. She is also a past-president of the American Psychological Association (APA) Division 5 (Quantitative), and the Society of Multivariate Experimental Psychology. Honors include two Scholarly Excel lence Awards; Fellow status in APS and 5 APA Divisions (1, 2, 5, 38, 52); the Jacob Cohen Distinguished Contributions to Teaching and Mentoring award; a Distinguished Fellowship at the Institute for Advanced Study, Melbourne, Australia; and a Fulbright Scholar Award at York University, Toronto, Canada.
About the Chapter Co-Authors Zachary J. Kunicki (2017 PhD, University of Rhode Island: URI) (Chapter 13) is an assistant professor at Brown University, also serving there as the assistant director of the Quantitative Science Program in the Department of Psychiatry and Human Behavior. He has published over 40 articles, with a research focus on applying psychometric and long itudinal latent variable modeling to cognitive aging and factors that alter the trajectory of aging such as dementia and delirium. He is also a much-enjoyed instructor of numerous courses (e.g., Quantitative Methods, Psychological Research, Introduction to R). Honors include multiple departmental excellence awards during graduate school, and serving as a Faculty Scholar on the Stakeholder Engagement Team for the NIA IMPACT Collaboratory. In addition to his doctorate, he also earned three Masters degrees: a 2013 MA in Psychol ogy from Southern Connecticut State University, a 2018 MS in Statistics from URI, and a 2019 MPH (Public Health) from Brown University. Leslie Ann Daline Brick (2015 PhD, University of Rhode Island: URI) (Chapter 14) is an assistant professor at Brown University, where she is also the Quantitative Sciences Associate Director. She has over 70 peer-reviewed publications, with research focusing on how underlying genetic risk contributes to substance use behavior and stress-sensitive sequelae (PTSD, depression), and how genetic risk is associated with the social environ mental and trauma-exposure. She is also interested in quantitative psychology and applying latent variable and structural equation modeling in psychology, and has taught several courses (e.g., Quantitative Methods, Conducting Research at the Forefront of Science, Introduction to R). Numerous honors include: URI Graduate Student Excellence in
xviii
About the Author
Behavioral Science Award, Peter Merenda Prize in Statistics and Research Methodology, Graduate Fellowship, and Neuroscience Post-baccalaureate Certificate. She has also colla borated on more than 25 grants related to Alzheimer’s Disease, mental health, substance use, stress, adolescent risks, and longitudinal assessments, among other areas.
Acknowledgements
Thanks are extended to several agencies that provided grants to support the data collection and relevant research for the current book, including grants CA50087, CA27821, and CA71356 from the National Cancer Institute (NCI), and grant AG024490 from the National Institute on Aging (NIA) at the National Institutes of Health (NIH). The book was also supported by grant G20RR030883 from NIH. I am very grateful for the R training I received from the Advance Clinical and Trans lational Research Virtual Learning R Module Series at Brown University, which was funded by an NIH grant, U54GM115677. The R series carried me through six months of home-confinement in 2020 when I was called back to the States from a sabbatical in Singapore, due to the COVID pandemic. I cannot thank the R biostatistics instructor, Anarina Murillo, enough for all of her wisdom and expertise, and for allowing me to be part of an already over-enrolled course. Without that R Series, I cannot imagine how the current third edition could have been completed. A shout-out also goes to several from the University of Rhode Island (URI), who helped get me started on my ongoing journey into R. Harrison Dekker, the director of the URI Library AI Lab, taught a number of R workshops and was incredibly patient with my many questions. In graduate quantitative courses that I’ve taught, the students and I have been merrily mesmerized by the seeming perils and wonders of R by several remarkable teaching assistants, including Angela Astorini, (Joshua) Ray Tanzer, Wenqiu Cao, and Travis Dean. I also want to thank Andrea Paiva, a URI fountain of statistical and research acumen, who was instrumental in helping to organize the data used in this book, and Mike Cheung who graciously welcomed me to Singapore in 2020. All of these talented individuals are surely going places and are cheerfully and skillfully taking others along with them in their own quantitative quests. Thanks are offered to faculty and staff at the University of Rhode Island who generously offered resources and support. I am particularly grateful for James O. Prochaska, the Director of the Cancer Prevention Research Center, whose research and data made possible the multivariate examples used in this book. The data were culled from three longitudinal studies on reducing risks and improving health, with the multivariate chapter examples focusing on analyses about increasing healthy diet behaviors to avoid or lower high-fat foods. I also want to recognize a long-term URI mentor, Dr. Peter Merenda, who died in 2019 after 97 full years of sharing his enthusiasm and knowledge of statistical methods. High regards to all the multivariate application book series advisory board members: Leona Aiken, Daniel Bauer, Jeremy Biesanz, (the late) Barbara Byrne, Gwyneth Boodoo, Katherine Masyn, Scott Maxwell, Dan McNeish, and Stephen West; as well as our Taylor & Francis/Routledge Editors, Adam Woods, Danielle Dyal, and Lucy McClune, and all the contributors and readers of our book series. Much appreciation to the Society of Multivariate Experimental Psychology (SMEP) that keeps quant nerds happy and hopping in a warm and engaging forum. In particular, I cherish
xx
Acknowledgements
the memories of Jacob Cohen, Barbara Byrne, and Wayne Velicer, long-term and wise former SMEP members who are no longer with us, but whose influence, inspiration and friendship will never be forgotten. Finally, my utter gratitude goes to my treasured husband, Gary, our daughter, Rebecca, and her husband, Jeremy, who are each a constant source of support and encouragement to me. They are the center of my world and have my unending love and admiration. I count my blessings every day that I have the privilege and joy of sharing my life with them; my appreciation for them is beyond words.
Preface to the Third Edition
A hearty welcome to my readers who are willing to venture into the wonderful world of multivariate thinking. The book continues to highlight the main themes that run through just about every quantitative method, describing the statistical features in clear language. Ana lyzed examples are presented in 12 of the 15 chapters, showing when and how to use relevant multivariate methods, and how to interpret the findings both from an overarching macro- and more specific micro-level approach. My wish is that every reader ends up feel ing more confident and empowered in understanding others’ research, and in applying these methods in one’s own work, thereby enriching and inspiring others in an ever-widening multivariate connection.
New to This Edition This third edition of The Essence of Multivariate Thinking extends earlier work (Harlow, 2005, 2014) in the following ways. 1
2
3
The biggest change was that R code was added for all the examples used in the book. R is an open-source software that is readily available worldwide and is now being extensively used in statistical computing. Adding R analyses for the examples in the third edition will extend the reach of the book, beyond the use of only SPSS or SAS, which are also still used in seven of the current chapters. Information is provided on how to set up the R code for all of the multivariate methods discussed in the book, and also describes how to interpret the output from R. A structural equation modeling chapter on multiple sample analysis (MSA) was added, which is co-authored by Zachary Kunicki. This new chapter presents MSA that involves analyzing a structural equations model across two or more samples. The focus is on determining whether parameter estimates are invariant across different samples, and if so, there is some degree of generalizability for the model with respect to the nature of the samples. The application example takes the reader through the con ceptualization of an MSA, with a description of relevant data, R input code, and how to interpret the main findings. As in other chapters, readers will leave with enough input to understand the methodology and be able to evaluate others’ use as well as conduct their own MSA. A structural equation modeling chapter on latent growth modeling (LGM) was also added, which is co-authored by Leslie Brick. This new chapter presents the highlights of LGM that involves analyzing the intercept and slope of a construct across three or more time points. The focus is on determining whether the means (intercepts) and the slopes change over time, and whether there is a different rate of change depending on what the initial mean is for a construct. LGM allows for an assessment of intra- and
xxii
4
5
6
7
Preface to the Third Edition inter-individual information across time, providing greater insight into the process of change. The LGM chapter describes the basic methodology, computer code, and inter pretation of the results when conducting LGM analyses using the R-lavaan software. Readers will leave knowing how to understand and apply LGM to their own work after reading the chapter. The third edition uses a different dataset than in previous editions in the examples for all method chapters. The first edition analyzed two-wave data from over 500 women, exploring psycho-existential functioning, condom use and HIV risk; the second edition analyzed cross-sectional data from 265 university faculty to understand several aspects of work environment and satisfaction. For this third edition, data were drawn from a three-wave secondary-data sample of 8,784 participants, pooled from three randomized longitudinal studies, which focused on a baseline risk for one or more high-risk beha viors (i.e., high-fat diet, smoking, and sun exposure: see Linnan et al., 2002; Prochaska et al., 2004; Prochaska et al., 2005). As not all variables were included in each of the three studies, sample sizes varied for the book examples, depending on the variables used, ranging from over 2,000 to over 6,000. Analyses largely assess a randomly assigned diet treatment group, stage of readiness for a healthy diet, and measures of diet behavior and diet temptations for a high-fat diet, defined as having at least 30% fat. These data provided an opportunity for showcasing: ANCOVA, MANOVA, dis criminant function analysis, and logistic regression with a randomized categorical grouping variable; multiple regression, path analysis and latent variable modeling across longitudinal cross-sections of three time points; principal components, factor analysis, and confirmatory factor analysis validated in two random subsets of the data; MSA across gender binary groups in the new Chapter 13, and a three-wave LGM analysis in the new Chapter 14. The topic of multilevel modeling was not covered in this edition, although it was fea tured in the second edition. The data used in this third edition does not readily lend itself to this topic, which is covered very well elsewhere (e.g., Hox, Moerbeek, & van de Schoot, 2018; Snijders, & Bosker, 2012). All chapters were updated with current information in the field. The second edition added more information on effect sizes, confidence intervals and statistical assumptions than in the first edition. The third edition continues to improve the presentation of relevant statistical inference procedures (e.g., effect sizes, confidence intervals, fit indices, graphs, and statistical tests of assumptions). Recent material on open science, replication, and reproducibility was also discussed. Further, the final, integrating Chapter 15 was expanded to include the new material from the two added structural equation modeling chapters (i.e., MSA and LGM). Previous readers have commented on how helpful this final chapter has been in pulling together the main highlights of all of the methods. The final chapter summary tables were expanded to incorporate the new methods, as well as updated information. Readers can check out the website that includes information on the R, SAS, and SPSS code and output for the multivariate chapter examples (https://sites.google.com/site/m ultivariatethirdedition).
The book should be useful in illuminating basic and structural equation modeling multi variate methods often covered in a statistics course for first-year graduate students or advanced undergraduates. Awareness of common themes helps bring greater understanding of the basic concepts essential to multivariate thinking. To keep a conceptual focus, for mulas are still kept at a minimum such that the book does not require knowledge of advanced mathematical methods beyond basic algebra and finite mathematics. Finally, the
Preface to the Third Edition
xxiii
pervasive use of R will be instructive to those wishing to learn how to apply this universally available, although sometimes seemingly daunting, computer programming method. If you are like me, you might initially find R challenging, although quickly compelling and almost downright addictive! References Harlow, L.L. (2005). The essence of multivariate thinking: Basic themes and methods. Erlbaum. Harlow, L.L. (2014). The essence of multivariate thinking: Basic themes and methods (2nd ed.). Routledge. Hox, J., Moerbeek, M., & van de Schoot, R., (2018). Multilevel analysis: Techniques and applications (3rd ed.). Routledge. Linnan, L. A., Emmons, K. M., Klar, N., Fava, J. L., Laforge, R. G., & Abrams, D. B. (2002). Challenges to improving the impact of worksite cancer prevention programs: Comparing reach, enrollment, and attrition using active versus passive recruitment strategies. Annals of Behavioral Medicine, 24(2), 157–166. https://doi.org/10.1207/S15324796ABM2402_13. Prochaska J. O., Velicer, W. F., Rossi, J. S., Redding, C. A., Greene, G. W., Rossi, S. R., Sun, X., Fava, J. L., Laforge, R., & Plummer, R. A. (2004). Multiple risk expert systems interventions: Impact of simultaneous stage-matched expert system interventions for smoking, high-fat diet, and sun exposure in a population of parents. Health Psychology, 23(5), 503–516. https://doi.org/10. 1037/0278-6133.23.5.503. Prochaska, J. O., Velicer, W. F., Redding, C., Rossi, J. S., Goldstein, M., DePue, J., Greene, G. W., Rossi, S. R., Sun, X., Fava, J. L., Laforge, R., Rakowski, W., & Plummer, B. A. (2005). Stagebased expert systems to guide a population of primary care patients to quit smoking, eat healthier, prevent skin cancer, and receive regular mammograms. Preventive Medicine, 41(2), 406–416. https:// doi.org/10.1016/j.ypmed.2004.09.050. Snijders, T. A. B, & Bosker, R. J. (2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling (2nd ed.). Sage.
Part I
Overview
1
Introduction and Multivariate Themes
1.1 What is Multivariate Thinking? Amidst ongoing questioning and ever-widening perspectives across disciplines and peoples, whether in the realm of science or the broader sphere of life, we are trying to understand the underlying truth in the seeming quagmire of observable reality. Kaku (2009), a theoretical physicist, argues that there may be a set of simple rules in nature “but the applications of those rules may be inexhaustible. Our goal is to find the rules” (p. 302). How can we uncover these essential truths? Hayes (2011) writes that by breaking down stimuli into small segments and noticing points of contrast and similarity the most salient aspects are revealed. He summarizes this process by stating that “the aim is to explore the kinds of patterns that appear frequently in our environment, in the hope of identifying and under standing some characteristic themes or features” (p. 422). In a similar vein, Hoffmann (2011) suggests that scientific curiosity is fed by having a great deal of background knowledge about a phenomenon, and then noticing anomalies, developing intuitions, and finding connections. Huff (2011) also agrees, speaking of how engaging curiosity and overarching synthesis lead to scientific discovery. I would like to argue that the search for simplicity, truth, and latent order could be made much more attainable when approached with a mindset of multivariate thinking. This would involve noticing similarities, differ ences, and common themes that involve examining data from overarching macro-level, as well as specific micro-level perspectives. Multivariate thinking is defined as a body of thought processes to illuminate inter-relat edness between and within sets of variables. The essence of multivariate thinking as por trayed in this book strives to reveal the inherent structure and uncover the meaning within these sets of variables through application and interpretation of various multivariate statis tical methods with real-world data. The multivariate methods we examine are a set of tools for analyzing multiple variables in an integrated and powerful way. The methods allow the examination of richer and more realistic designs than can be assessed with traditional univariate methods that only analyze one outcome variable and usually just one or two independent variables (IVs). Compared to univariate methods, multivariate methods allow us to analyze a complex array of variables, providing greater assurance that we can come to some synthesizing conclusions with less error and more validity than if we were to analyze variables in isolation. Multivariate knowledge offers greater flexibility and options for analyses that extend and enrich other statistical methods of which we have some familiarity. Ultimately, a study of multivariate thinking and methods encourages coherence and integration in research that can motivate policy and practice. For those interested in other approaches to multivariate methods, check out Tabachnick and Fidell (2019) and Pituch and Stevens (2016) that are both excellent. DOI: 10.4324/9780429269103-2
4
Overview
Having a preliminary understanding of what is meant by multivariate thinking, it is useful to itemize several benefits and drawbacks to studying multivariate methods.
1.2 Benefits Several benefits can be derived from understanding and using multivariate methods. a
b
c
First, our thinking is stretched to embrace a larger context in which we can envision more complex and realistic theories and models than could be rendered with univariate methods. Knowledge of multivariate methods provides a structure and order with which to approach research, demystifying the aura of secrecy and laying bare the essence as most phenomena of interest to researchers are elaborate, involving several possible variables and patterns of relationship. We gain insight into methods that were previously perceived as abstract and incomprehensible by increasing our understanding of multivariate statistical terminology. The knowledge builds on itself, providing increased understanding of statistical methodology. Thus, multivariate thinking offers intellectual exercise that expands our sense of knowing and discourages isolated, narrow perspectives. It helps sort out the seeming mystery in a research area, providing a large set of real-world approaches for analysis to explain variability in a non-constant world. In this regard, check out Rodgers (2010) who does a beautiful job explaining the wonders of statistical modeling to gain an overarching understanding in our research. Second, a thorough grounding in multivariate thinking helps us in understanding others’ research, giving us a richer understanding when reading the literature. By studying basic features and applications of these statistical tools, we become better consumers of research, achieving greater comprehension of particular findings and their implications. Several students have reported that whereas they had previously just scanned the abstracts and possibly the introduction and discussion sections of research articles, studying multivariate methods gave them the intellectual curiosity and the know-how to venture into the methods and results sections. Reading the statistical portion of research articles provides greater enjoyment when reading the literature and opens up a whole world replete with multiple approaches that can be applied to a research area. After continued exposure and experience with the many ways to apply multivariate methods, we can begin to develop a more realistic and critical view of others’ research and gain more clarity on the merits of a body of research. Even if we never choose to conduct our own analyses, knowledge of multivariate methods opens our eyes to a wider body of research than would be possible with only univariate methods of study. Third, multivariate thinking helps expand our capabilities by informing application in our own research. We are encouraged to consider multiple methods for our research, and the methods needed to perform research are more fully understood. An under standing of multivariate methods increases our ability to evaluate complex, real-world phenomena and encourages ideas on how to apply rigorous methods to our own research. Widening our lens to see more and own more information regarding research, we are encouraged to think in terms that lead to asking deeper, clearer, and richer questions. With this broadened perspective, we are able to see the connection between theory and statistical methods and to potentially inform theory development. Empiri cally, a background in multivariate methods allows us to crystallize theory into testable hypotheses and to provide empirical support for our observations. Thus, it can increase the credibility of our research and help us add to existing literature by informing an
Introduction and Multivariate Themes
d
e
f
5
area with our unique input. We are also offered greater responsibility and challenged to contribute to research and scholarly discourse in general, not exclusively in our own area of interest. Fourth, multivariate thinking enables researchers to examine large sets of variables in encompassing and integrated analysis, thereby controlling for overall error rate and also taking correlations among variables into account. This is preferred to conducting a large number of univariate analyses that would increase the probability of making an incorrect decision while falsely assuming that each analysis is orthogonal. More variables can also be analyzed within a single multivariate test, thereby reducing the risk of Type I errors (rejecting the null hypothesis too easily), which can be thought of as liberal, assertive, and exploratory (Mulaik, Raju, & Harshman, 1997). We can also reduce Type II errors (retaining the null hypothesis too easily), also described as conservative, cautious, and confirmatory (Abelson, 1995). Analyzing more variables in a single analysis also minimizes the amount of unexplained or random error while also maximizing the amount of explained systematic variance. This provides a much more realistic and rigorous framework for analyzing our data than with univariate methods. Fifth, multivariate thinking reveals several assessment indices to see if the overall or macro-analysis, as well as specific part or micro-analysis, are behaving as expected. These overall and specific aspects encompass both omnibus (e.g., F-test) and specific (e.g., Tukey) tests of significance, along with associated effect sizes (e.g., R-squared, and Cohen’s d). Acknowledging the need for more balanced thought about significance testing (e.g., Alger, 2022; Amrhein, Greenland, & McShane, 2019; Anderson, 2020; Benjamin et al., 2018; Giofrè, Boedker, Cumming, Rivella, & Tressoldi,, 2022; Hard wicke et al., 2022; Kepes, Keener, McDaniel, & Hartman,, 2022; Kline, 2013; Lakens, 2022; Schwab et al., 2022), I agree with recommendations for fuller discussion of findings that includes supplemental information such as effect sizes and confidence intervals (e.g., Calin-Jageman, 2022; Cumming, 2012; Grissom & Kim, 2012; Kelley & Preacher, 2012). Later in this chapter we’ll discuss the topic of macro- and microassessment in greater detail to help interpret findings from multivariate analyses. Finally, multivariate participation in the research process encourages more positive attitudes towards statistics in general. Active involvement increases our confidence in critiquing others’ research and more enthusiasm for applying methods to our own research. Greater feeling of empowerment occurs with less anticipatory anxiety when approaching statistics and research. We may well find ourselves asking more complex research questions with greater assurance, thereby increasing our own understanding. All of this should help us to feel more comfortable articulating multiple ideas in an intelligent manner and engage less in doubting our own capabilities with statistics and research. This is consequential because the available large quantity of multivariate information could intimidate many that would rather not delve into them without coaxing. However, my experience has been that more exposure to the capabilities and applications of multivariate methods empowers us to pursue greater understanding and hopefully provide greater contributions to the body of scientific knowledge.
1.3 Drawbacks Due to the size and complexity of most multivariate designs, several drawbacks may be evident. I present three drawbacks that could emerge when thinking about multivariate methods, and end with two additional drawbacks that are more tongue-in-cheek perceptions that could result:
6
Overview
a
First, statistical assumptions (e.g., normality, linearity, and homoscedasticity) common to the general linear model must be met for most multivariate methods, and less is known about the robustness of these to violations, compared to univariate methods. More is said about assumptions in the section on Inferential Statistics in Chapter 2. Second, many more participants are usually needed to adequately test a multivariate design, compared to smaller univariate studies. One guideline suggests having at least five to 10 participants per variable or per parameter, though as many as 20 to 50 partici pants per variable or parameter may be necessary when assumptions are not met (e.g., Tabachnick & Fidell, 2019). Others (e.g., Comrey & Lee, 1992) recommend having a sample size of 200–500, with smaller sample sizes allowed when there are large effect sizes (e.g., Green, 1991; Guadagnoli & Velicer, 1988). Third, interpretation of results from a multivariate analysis may be difficult due to having several layers to examine. With multivariate methods, we can often examine:
b
c
i The overall significance to assess the probability that results were due to chance; ii The main independent variables that are contributing to the analysis; iii The nature of the dependent variable(s) showing significance; and iv The specific pattern of the relationship between relevant independent and depen dent variables. d
e
Fourth, some researchers speculate that multivariate methods are too complex to take the time to learn; an inaccurate perception because the basic themes are clear and reoccurring, as we will shortly see. Fifth, after immersing ourselves in multivariate thinking, it could become increasingly difficult to justify constructing or analyzing a narrow and unrealistic research study. We might even find ourselves thinking from a much wider and global perspective.
1.4 Context for Multivariate Thinking The main focus of learning and education is knowledge consumption and development in which we are taught about the order that others have uncovered and learn methods to seek our own vision of order. During our early years, we are largely consumers of others’ knowledge, learning from experts about what is important and how it can be understood. As we develop in our education, we move more into knowledge development and generation, which is explored and fine-tuned through the practice of scientific research. The learning curve for research can be very slow, though both interest and expertise increase with exposure and involvement. After a certain point, one that widely varies depending on individual interests and instruction the entire process of research clicks and becomes unbe lievably compelling. We become hooked, getting a natural high from the process of discovery, creation, and verification of scientific knowledge. I personally believe that all of us are latent scientists of sorts, if only at an informal level. We each go about making hypotheses about everyday events and situations, based on more or less formal theories. We then collect evidence for or against these hypotheses and make conclusions and future predictions based on our find ings. When this process is formalized and validated in well-supported environments, the oppor tunity for a major contribution by a well-informed individual is made much more likely. Further, this is accompanied by a deeply felt sense of satisfaction and reward. That has certainly been my experience. Both knowledge consuming and generating endeavors, particularly in the social and behavioral sciences, are greatly enhanced by the study of multivariate thinking. One of our
Introduction and Multivariate Themes
7
Table 1.1 Summary of the Definition, Benefits, Drawbacks, and Context for Multivariate Methods 1. Definition 2. Benefits
Set of tools for identifying relationships among multiple variables
a b c d e f
Stretch thinking to embrace a larger context Help in understanding others’ research Expand capabilities with our own research Examine large sets of variables in a single analysis Provide several macro- and micro-assessment indices Engender more positive attitudes towards statistics in general
3. Drawbacks
a b c d e
Less is known about robustness of multivariate assumptions Larger sample sizes are needed Results are sometimes more complex to interpret Methods may be challenging to learn Broader focus requires more expansive thinking
4. Context
a b
Knowledge consumption of others’ research Knowledge generation from one’s own research
roles as multivariate social-behavioral scientists is to attempt to synthesize and integrate our understanding and knowledge in an area. Piecemeal strands of information are useful only to the extent that they eventually get combined to allow a larger, more interwoven fabric of comprehension to emerge. For example, isolated symptoms are of little value in helping an ailing patient unless a physician can integrate them into a well-reasoned diagnosis. Multi variate thinking helps us in this venture and allows us to clearly specify our understanding of a behavioral process or social phenomenon. Table 1.1 presents a summary of the definition, benefits, drawbacks, and context for multivariate methods. Multivariate Themes Next, we gain more specificity by taking note of various themes that run through all of multivariate thinking. Quantitative methods have long been heralded for their ability to synthesize the basic meaning in a body of knowledge. Aristotle emphasized meaning through the notion of “definition” as the set of necessary and sufficient properties that allowed an unfolding of understanding about concrete or abstract phenomena; Plato thought of essence or meaning as the basic form (see Lakoff & Núñez, 2000). Providing insight into central meaning is at the heart of most mathematics, which uses axioms and categorical forms to define the nature of specific mathematical systems. In this chapter, I focus on the delineation of themes that reoccur within statistics, parti cularly with multivariate methods, in the hope of making conscious and apprehensible the core tenets, if not axioms, of multivariate thinking.
1.5 Central Themes As with basic descriptive and inferential statistics, multivariate methods help us understand and quantify how variables (co-)vary. Multivariate methods provide a set of tools to analyze how scores from several variables covary, whether through group differences, correlations, or underlying dimensions in order to explain systematic variance over and above random error variance. Thus, we are trying to explain or make sense of the variance in a set of variables with as little random error variance as possible. Multivariate methods intimately
8
Overview
involve the concepts of variance, covariance, and ratios of these (co-) variances. We will also examine the theme of creating linear combinations of the variables since this is central to most multivariate methods, as well as latent variables used in some methods. Variance Variance is the average of the squared difference between a set of scores and their mean. Variance is what we usually want to analyze with any statistic. When a variable has a large variance, sample scores tend to be very different, having a wide range. It’s useful to try to predict how scores vary, to find other variables that help explain the variation. Statistical methods help identify systematic, explained variance, acknowledging that there will most likely be a portion of unknowable and random or error variance. The goal of most statistics is to try to explain how scores vary so that we can predict or understand them better. Var iance is an important theme particularly in multivariate thinking, and can be analyzed in several ways, as we shall see later in this chapter. Covariance Covariance is the average of the product of the differences between one variable and its mean and a second variable and its mean. Covariance or its standardized form, correlation, depicts the existence of a linear relationship between two or more variables. When variables rise and fall together (e.g., study time and Grade Point Average), they positively covary or co-relate. If scores vary in opposite directions (e.g., greater practice is associated with a lower golf score), negative covariance occurs. The theme of covariation is fundamental to multivariate methods since we are interested in whether a set of variables tend to co-occur together, indicating a strong relationship. Multivariate methods most often assess covariance by assessing the relationship among variables while also taking into account the covariation among other variables included in the analysis. Thus, multivariate methods allow a more informed test of the relationships among variables than can be analyzed with univariate methods that expect separate or orthogonal relationships with other variables. Ratio of (Co-) Variances Many methods examine a ratio of how much (co-) variance there is between variables or groups, relative to how much variance there is within variables or within groups. When the between information is large relative to the within information, we usually conclude that the results are significantly different from those that could be found based on chance. The reason for this is that when there are greater differences across domains than there are within domains, whether from different variables or different groups, there is some indica tion of systematic shared or associated variance that is not just attributable to random error. It’s useful to see how correlation and ANOVA, two central, univariate statistical methods, embody a ratio of variances. We can then extend this thinking to multivariate methods. Correlation shows a ratio of covariance between variables over variance within variables. When the covariance between variables is almost as large as the variance within either variable, this indicates a stronger relationship between variables. Thus, a large correlation indicates that much of the variance within each variable is shared or covaries between the variables. With group-difference statistics (e.g., ANOVA), we often form an F-ratio of how much the group means vary relative to how much variance there is within each group. When the means are much more different between groups (i.e., large variance between groups) than
Introduction and Multivariate Themes
9
the scores are within each group (i.e., smaller variance within groups), we have evidence of a relationship between the grouping (e.g., categorical, independent) and outcome (e.g., continuous, dependent) variables. Here, a large F-ratio indicates significant group-differ ence variance. These ratios, whether correlational or ANOVA-based, are also found in multivariate methods. In fact, just about every statistical significance test is based on some kind of ratio of variances or covariances. Knowing this fact and understanding the nature of the ratio for each analysis helps us make more sense out of our statistical results, whether from uni variate or multivariate methods. Linear Combinations A basic theme throughout most multivariate methods is that of finding the relationship between two or more sets of variables. This is usually accomplished by forming linear combinations of the variables in each set that are additive composites that maximize the amount of variance drawn from the variables. A simple example of a linear combination is the course grade received in many classes. The grade, let’s call it Y, is formed from the weighted combination of various scores. Thus, a course grading scheme of Y = 0.25 (Homework) + 0.25 (Midterm Exam) + 0.30 (Final Exam) + 0.20 (Project) would reveal a linear combination showing the weights attached to the four course requirements. These linear combination scores are then analyzed, summarizing the many variables in a simple, concise form. With multivariate methods, we are often trying to assess the rela tionship between variables, which is often the shared variance between linear combinations of variables. Several multivariate methods analyze different kinds of linear combinations. Components. A component is a linear combination of variables that maximizes the var iance extracted from the original set of variables. The use of components or linear combi nations synthesizes information by redistributing most of the variance from a larger set of variables, usually into a smaller set of summary scores (e.g., discriminant scores in order to examine the relationships among categorical grouping variable(s) and continuous, measured variables). Factors. We have just seen how linear combinations can be thought of as dimensions that seem to summarize the essence of a set of variables. If we are conducting a factor analysis, whether exploratory or confirmatory, we refer to these dimensions as factors. Factors differ from linear combinations in that the former are latent dimensions that have separated common, shared variance among the variables from any unique or measurement error variance within the variables. Thus, a factor is sometimes believed to represent the underlying true dimension in a set of variables, after removing the portion of variance in the variables that is not common to the others (i.e., the unique or error portion). Constructs. The core of what researchers want to measure is represented by constructs. In its simplest form, a construct can be represented by a single item, although it is probably more fully and reliably measured as a latent variable that has several items or composite scores. See Chapters 9 to 14. Summary of Central Themes In discussing central themes, we saw the pivotal role of variances, covariances, and ratios of these, particularly in multivariate statistics. Ultimately, we want to explain how variables vary and covary, and we often do so by examining a ratio of variances or covariances. The ratio informs us of the proportion of explained variance that is often used as an indication of effect size. We also considered the concept of a linear combination that incorporates
10
Overview
much of the variance from several variables. Several multivariate methods use linear com binations to summarize information in sets of variables. Depending on the method, linear combinations are referred to with different terms (e.g., components, factors, and dis criminant functions). Regardless of the label, multivariate linear combinations synthesize information from a larger set of variables to make analyses more manageable or compre hensible. Latent variables also have several items or composite scores. Now we turn to ways to evaluate and interpret results from multivariate methods, using both a macro- and micro-assessment focus.
1.6 Interpretation Themes When interpreting results to assess whether an analysis is successful, we should evaluate from several perspectives. Most statistical procedures, whether with univariate or multi variate methods, allow a macro-assessment of how well the overall analysis explains the variance among pertinent variables. It is also important to focus on a micro-assessment of the specific aspects of the multivariate results. In keeping with concern about the exclusive use of significance testing (e.g., Abelson, 1997; APA, 2020; Cumming, 2012; Harlow, Mulaik, & Steiger, 2016; Kline, 2013), I advocate that each result be evaluated with a significance test, effect size, and confidence interval whenever possible. That is, it is helpful to know whether a finding is significantly different from chance, the magnitude of the significant effect, and how certain we are of our effect. Macro-Assessment The first way to evaluate an analysis is at a global or macro-level that usually involves a significance test and most likely some synthesis of the variance in a multivariate dataset. A macro-summary usually depicts whether there is significant covariation or mean differences within data, relative to how much variation there is among scores within specific groups or variables. Macro-Level Significance Test. A significance test, along with an accompanying prob ability or p-value is usually the first step of macro-assessment in a multivariate design (APA, 2020). Significance tests tell us whether our empirical results are likely to be due to random chance or not. It is useful to be able to rule out, with some degree of certainly, an accidental or anomalous finding. Of course, we always risk making an error no matter what our decision. When we accept our finding too easily, we could be guilty of a Type I error, saying that our research had veracity when in fact it was a random finding. When we are too cautious about accepting our findings, we may be committing a Type II error, saying that we have no significant findings when in fact we do. Significance tests help us to make probabilistic decisions about our results within an acceptable margin of error, usually set at 1% to 5%. We would like to say that we have more reason to believe that our results are true, than that they are not true. Significance tests give us some assurance in this regard and are essential when we have imperfect knowledge or a lack of certainty about an area. We can help rule out false starts and begin to accrue a growing knowledge base with these tests (Mulaik, Raju & Harshman, 1997). Most univariate and multivariate significance tests involve a ratio of (co-) variances. For example, group-difference methods tend to use an F-test, which is a ratio of the variance between means over the variance within scores, to assess whether observed differences are significantly different from what we would expect based on chance alone. Correlational methods can also make use of an F-test to assess whether the covariance among variables is
Introduction and Multivariate Themes
11
large relative to the variance within variables. When the value of an F-test is significantly large (as determined by consulting appropriate statistical tables), we can conclude that there is sufficient evidence of relationships occurring at the macro-level. This would suggest that there is a goodness of approximation between the model and the data (McDonald, 1997). When this occurs, it is helpful to quantify the extent of the relationship, usually with effect sizes. Macro-Level Effect Size. Effect sizes (ESs) provide an indication of the magnitude of our findings at an overall level. They are a useful supplement to the results of a significance test (APA, 2020). Quite often, a multivariate ES takes the form of a proportion of shared variance between the independent and dependent variables, particularly for multivariate analyses. Guidelines for multivariate shared variance are 0.02, 0.13, and 0.26 for small, medium, and large ESs, respectively (Cohen, 1992). Although much more can be said about effect sizes (see Cohen, 1988), we will focus largely on those that involve a measure of shared variance. Shared Variance. Shared variance is a common theme throughout most statistical methods and often can form the basis of an ES. We are always trying to understand how to explain the extent by which scores vary or covary. Most always this involves two sets of variables such that the focus is on how much variance is shared between the two sets (e.g., a set of IVs and a set of DVs, or a set of components–factors and a set of measured vari ables). With multivariate methods, one of the main ways of summarizing the essence of shared variance is with squared multiple correlations. Indices of shared variance or squared multiple correlations can inform us of the strength of relationship or effect size (e.g., Cohen, 1988). Squared multiple correlation,R2, indicates the amount of shared variance between the variables. It is useful by providing a single number that conveys how much the scores from a set of variables (co-)vary in the same way (i.e., rise and fall together), relative to how much the scores within each variable differ among themselves. A large R2 value (e.g., 0.26 or greater) indicates that the participants’ responses on a multivariate set of variables tend to behave similarly, such that a common or shared phenomenon may be occurring among the variables. Many statistical methods make use of the concept of R2 or shared variance. Pearson’s correlation coefficient, r, is an index of the strength of relationship between two variables. Squaring this correlation yields R2, sometimes referred to as a coefficient of determination that indicates how much overlapping variance is shared between two variables. In Appendix B we’ll see how shared variance across linear combinations can be summarized with func tions of eigenvalues, traces or determinants, each of which is described in the discussion of matrix notation and calculations. Residual or Error Variance. Leftover variance is another possible consideration when interpreting central themes. Many statistical procedures benefit from assessing the amount of residual or error variance in an analysis. In prediction methods, we often want to examine prediction error variance (i.e., 1 minus R2), which is how much variation in the outcome vari able was not explained by the predictors. In other multivariate methods we can get an indica tion of the residuals by subtracting eta-squared (i.e., η2: ratio of between variance over total variance) from one, also known as Wilks’ (1932) lambda. Confidence Intervals. When presenting effect sizes, it is important to recognize the degree of uncertainty, or conversely, the degree of precision for what is actually an estimate of some population value. Confidence intervals (CIs) provide a useful way to express the possible range of values that could occur if an estimate of an effect were obtained in many samples of the same size. A CI is usually formed by multiplying a standard error times a critical value (e.g., t, z, etc.) and then both subtracting and adding this product to an effect
12
Overview
size. The result is a set of lower and upper limits that provides a reasonable guess as to the range of values within which the effect size is expected to occur most of the time. If alpha is set to 0.05, then we can create a 95% CI, within which we can expect that 95% of the time we would find the population estimate within this range. Varying CIs can be formed, for example a 99% CI would provide a wider range with less precision but more certainty; whereas at the other end a 90% CI would provide a narrower range with more precision but less certainty. Conventional computer programs do not always provide readily available CIs for macrolevel ESs such as a shared variance term, but see Soper (2006–2022) for a handy online tool for calculating a CI for R-squared values. Micro-Assessment After finding significant results and a meaningful ES at a macro-level, it is then useful to examine results at a more specific, micro-level. Micro-assessment involves examining spe cific facets of an analysis (e.g., means, weights) to determine specifically what is con tributing to the overall relationship. In micro-assessment we ask whether there are specific coefficients or values that can shed light on which aspects of the system are working and which are not. Means. With group-difference methods, micro-assessment entails an examination of the differences between of means. We can do this by simply presenting a descriptive summary of the means and standard deviations for the variables, across groups and possibly graphing them to allow visual examination of any trends. Other methods can be used to assess mean differences. Standardized Effect Size for Means. Cohen’s d is a useful micro-level ES, formed from the ratio of the difference between means over the (pooled) standard deviation. Cohen (1988) suggests that a small ES indicates a mean difference of almost a quarter (i.e., 0.20) of a standard deviation. A medium ES is a mean difference of a half (i.e., 0.05) of a stan dard deviation. A large ES suggests almost a whole (i.e., 0.80) standard deviation difference between means. Bonferroni Comparisons. Simply conducting a series of t-tests between pairs of groups allows us to evaluate the significance of group means. A Bonferroni approach establishes a set alpha level (e.g., p = 0.05) and then distributes the available alpha level over the number of paired comparisons. Thus, if there were four comparisons and a desired alpha level of 0.05, each comparison of means could be equally evaluated with a p-value of 0.0125. Of course, the p-value can be unevenly split (e.g., 0.02 for one comparison and 0.01 for three comparisons between means). Planned Comparisons. We could also see which pairs of means were statistically dif ferent, using a multiple comparison method such as Tukey’s (1953) Honestly Significant Difference (HSD) approach that builds in control of Type I error. Ideally, these micro-level tests should directly follow from our hypotheses, rather than simply testing for any possible difference. Fisher’s Protected Tests. The practice of conducting an overall significance test and then following up with individual t-tests for pairs of means is referred to as a Fisher protected t-test (e. g., Carmer & Swanson, 1973). Whereas Bonferroni and planned comparison approaches are conservative and may not provide enough power to detect a meaningful effect, Fisher’s pro tected approach is often preferred due to its ability to control experiment-wise Type I error and still find individual differences between pairs of means (Cohen, Cohen, West, & Aiken, 2003). Weights. With correlational or dimensional methods, we often want to examine weights that indicate how much of a specific variable is contributing to an analysis. In Multiple
Introduction and Multivariate Themes
13
Regression, we examine least squares regression weights that tell us how much a predictor variable covaries with an outcome variable after taking into account the relationships with other predictor variables. In an unstandardized metric, it represents the change in an out come variable that can be expected when a predictor variable is changed by one unit. In a standardized metric, the regression weight, often referred to as a beta weight, gives the number of standard deviation units that the outcome will change when the value for a predictor increases by one standard deviation, after controlling for the other predictor variables in the equation. Thus, the weight provides an indication of the unique importance of a predictor with an outcome variable. In other multivariate methods the unstandardized weights are actually eigenvector weights (see Appendix B on matrix notation) used in forming linear combinations of the variables. These are often standardized to provide an interpretation that is similar to the standardized weight in MR. In FA, weights are also examined. These can indicate the amount of relationship (i.e., a factor loading) between a variable and an underlying dimension. In each of these instances, the weight informs us of how much a specific variable relates to some aspect of the analysis. A loading weight or correlation can also be used as a micro-level ES. Guide lines for interpreting small, medium, and large micro-level correlational ESs are 0.10, 0.30, and 0.50, respectively (Cohen, 1992). Summary of Interpretation Themes For most multivariate methods, there are two themes that help us with interpreting our results. The first of these, macro-assessment, focuses on whether the overall analysis is significant, on the magnitude of the effect size, and the width of a confidence interval. In addition, we often want to examine residuals to assess the unexplained or error variance. The magnitude and nature of error variance informs us as to where and how much the multivariate method is not explaining the data. Second, micro-assessment focuses on the specific aspects of an analysis that are important in explaining the overall relationship. These micro-aspects can be means, particularly with group-difference methods, or weights, particularly with correlational or prediction methods. Summary of Multivariate Themes In the realm of methodology, the multivariate themes covered to this point (Table 1.2) permeate and come together under the effective tools of multivariate statistics to inform us of the fundamental nature of what is being studied and how it can be explained. These themes will be featured and help you make sense of each of the methods discussed in this book. I hope to convince you that multivariate statistics is not a set of isolated, complex, and abstract tools, but rather a set of connected, understandable, and illuminating proce dures that lay bare the structure of and allow meaning into the essential nuggets of truth in the data. Stick around and see what you think. Table 1.2 Summary of Multivariate Themes 1. Central Themes (All multivariate methods focus on these central themes) 2. Interpretation Themes (All multivariate methods summarize at big picture and specific levels)
Variance Covariance Ratios of variance (and covariance) Linear combinations Macro-assessment (e.g., significance test and effect size with confidence interval) Micro-assessment (e.g., examining means or weights)
14
Overview
References Abelson, R. P. (1995). Statistics as principled argument. Erlbaum. Abelson, R. P. (1997). The surprising longevity of flogged horses: Why there is a case for the sig nificance test. Psychological Science, 8, 12–15. Alger, B. E. (2022). Neuroscience needs to Test both statistical and scientific hypotheses. Journal of Neuroscience, 42(45) 8432–8438. https://doi.org/10.1523/JNEUROSCI.1134-22.2022. American Psychological Association (APA). (2020). Publication Manual of the American Psycholo gical Association (7th ed.). Washington, DC: Author. Amrhein, V., Greenland, S., & McShane, B. (2019). Scientists rise up against statistical significance. Nature, 567(7748), 305–307. doi:10.1038/d41586-019-00857-9. Anderson, S. F. (2020). Misinterpreting p: The discrepancy between p values and the probability the null hypothesis is true, the influence of multiple testing, and implications for the replication crisis. Psychological Methods, 25(5), 596–609. https://doi.org/10.1037/met0000248. Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. -J., Berk, R., Bollen, K. A., Brembs, B., Brown, L., Camerer, C., … Johnson, V. E. (2018). Redefine statistical sig nificance. Nature Human Behavior, 2, 6–10. https://doi.org/10.1038/s41562-017-0189. Calin-Jageman, R. J. (2022). Better inference in neuroscience: Test less, estimate more. Journal of Neuroscience, 9 November, 42 (45), 8427–8431. https://doi.org/10.1523/JNEUROSCI.1133-22. 2022. Carmer, S. G., & Swanson, M. R. (1973). An evaluation of ten pairwise multiple comparison procedures by Monte Carlo methods. Journal of the American Statistical Association, 68, 66–74. Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Academic Press. Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159. Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for behavioral sciences (3rd ed.). Erlbaum. Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis (2nd ed.). Erlbaum. Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta analysis. Routledge. Giofrè, D., Boedker, I., Cumming, G., Rivella, C., & Tressoldi, P. (2022). The influence of journal submission guidelines on authors’ reporting of statistics and use of open research practices: Five years later. Behavior Research Methods. https://doi.org/10.3758/s13428-022-01993-3. Green, S, B, (1991). How many subjects does it take to do a regression analysis? Multivariate Beha vioral Research, 26, 449–510. Grissom, R. J., & Kim, J. J. (2012). Effect sizes for research: Univariate and multivariate applications (2nd ed.). Routledge. Guadagnoli, E., & Velicer, W. F. (1988). Relation of sample size to the stability of component pat terns. Psychological Bulletin, 10, 265‑275. Hardwicke, T. E., Salholz-Hillel, M., Malicˇ ki, M., Szűcs, D., Bendixen, T., & Ioannidis, J. P. A. (2022). Statistical guidance to authors at top-ranked journals across scientific disciplines. The American Statistician, posted online Nov. 8, 2022. doi:10.1080/00031305.2022.2143897. Harlow, L. L., Mulaik S. A., & Steiger, J. H. (2016). What if there were no significance tests (Classic ed.)? Erlbaum. Hayes, B. (2011). Making sense of the world. American Scientist, 99, 420–422. Hoffmann, R. (2011). That’s interesting. American Scientist, 99, 374–377. Huff, T. E. (2011). Intellectual curiosity and the scientific revolution: A global perspective. Cambridge University Press. Kaku, M. (2009). Physics of the impossible. Anchor Books. Kelley, K., & Preacher, K. J. (2012). On effect sizes. Psychological Methods, 17, 137–152. Kepes, S., Keener, S. K., McDaniel, M. A., Hartman, N. S. (2022). Questionable research practices among researchers in the most research-productive management programs. Journal of Organiza tional Behavior. Published March 8, 2022. https://doi.org/10.1002/job.2623. Kline, R. B. (2013). Beyond significance testing: Reforming data analysis methods in behavioral research (2nd ed.). American Psychological Association.
Introduction and Multivariate Themes
15
Lakens, D. (2022). (Guest Post): Averting journal editors from making fools of themselves. Error Statistics Philosophy. Posted January 5, 2022. https://errorstatistics.com/2022/01/05/lakens-guest-p ost-averting-journal-editors-from-making-fools-of-themselves/. Lakoff, G., & Núñez, R. E. (2000). Where mathematics comes from: How the embodied mind brings mathematics into being. Basic Books, A Member of the Perseus Books Groups. McDonald, R. P. (1997). Goodness of approximation in the linear model. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp. 199–219). Erlbaum. Mulaik, S. A., Raju, N. S., & Harshman, R. A. (1997). A time and place for significance testing. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp. 65–115). Erlbaum. Pituch, K. A., & Stevens, J. P. (2016). Applied multivariate statistics for the social sciences: Analyses with SAS and IBM’s SPSS (6th ed.). Routledge. Rodgers, J. L. (2010). The epistemology of mathematical and statistical modeling: A quiet methodo logical revolution. American Psychologist, 65, 1–12. Schwab, S., Janiaud P., Dayan M., Amrhein V., Panczak R., Palagi P. M., et al. (2022). Ten simple rules for good research practice. PLoS Computational Biology 18(6): e1010139. https://doi.org/10. 1371/journal.pcbi.1010139. Soper, D. (2006–2022). R-square confidence interval calculator. Retrieved Oct. 29, 2022 from http:// www.danielsoper.com/statcalc3/calc.aspx?id=28. Tabachnick, B. G., & Fidell, L. S. (2019). Using multivariate statistics (7th ed.). Pearson. Tukey, J. W. (1953). The problem of multiple comparisons. Unpublished manuscript, Princeton Uni versity (mimeo). Wilks, S. S. (1932). Certain generalizations in the analysis of variance. Biometrika, 24, 471–494.
2
Background Considerations
2.1 Preliminary Considerations before Multivariate Analyses Before beginning the invigorating job of conducting multivariate analyses, it is important to consider some initial issues. Stay tuned to read more about these considerations that will go a long way in preparing for any statistical analysis. Open Science The topic of open science and transparency in research is of central importance and is receiving extensive discussion in the literature. For example, Nosek and his colleagues (e.g., Morling & Calin-Jageman, 2020; Nosek & Errington, 2020; Nosek & Olson, 2020; Open Science Colla boration, 2015) have developed a number of principles, guidelines, and services to encourage open science methods that are reproducible and integrative (see https://www.cos.io/). Among their suggestions, researchers are encouraged to pre-register their studies, and to openly share their procedures, materials, and data. Aczel, Szaszi, Sarafoglou, Kekecs1, and Kucharský (2020) offer a checklist to help researchers in achieving transparency in their research. Simonsohn and colleagues (e.g., Simonsohn, Nelson, & Simmons, 2014; Simonsohn, Simmons, & Nelson, 2015) discuss ways to identify and avoid p-hacking (i.e., conducting analyses until you find significant p-values) (also see Friese & Frankenbach, 2020). Kerr (1998) introduced the concept of HARKing (i.e., hypothesizing after results are known), arguing that this practice was largely responsible for the inability to replicate findings, although Rubin (2022) purports that there is not enough evidence to support the reported detrimental effects of HARKing with respect to repli cation. Schiavone and Vazire (2022) and Grahe (2021) offer suggestions and summarize a number of these issues such as the importance of transparency, preregistering studies, replicating findings, and avoiding questionable research practices such as p-hacking and HARKing. Bishop (2019) also has an excellent article on avoiding four major factors that hinder reproducing reli able results, including p-hacking, HARKing, publication bias, and conducting a study with low statistical power (e.g., by using small samples or research that involves very small effects that are difficult to discern). Further, Evans (2022) calls for more normative transparency in research, consistent with what Hagger (2022) refers to as an open science mindset. Bosma and Granger (2022) agree and provide guidelines on practicing open science, arguing that this will lead to more ethically responsible research. Theory Before embarking on a research study, it is essential to inquire about meta-frameworks that can provide a structure with which to conduct our research. Are there multiple divergent perspectives to consider? Are any of them more central or salient than the others? Which DOI: 10.4324/9780429269103-3
Background Considerations
17
offer a more encompassing way to view an area of study, while also providing a basis for strong investigations? Meehl (1997) talks of the need to draw on theory that makes risky predictions that are capable of being highly refuted. Strong theories are much preferred to weak ones that make vague and vacuous propositions. Others agree with Meehl’s emphasis on theory. Wilson (1998) speaks of theory in reverent words, stating that “Nothing in sci ence – nothing in life, for that matter – makes sense without theory. It is our nature to put all knowledge into context in order to tell a story, and to re-create the world by this means” (p. 56). Theory provides a coherent theme to help us find meaning and purpose in our research. Wheatley (1994) speaks of the power and coherence of theory in terms of pro viding an overall meaning and focus in our research. She writes “As long as we keep purpose in focus … we are able to wander through the realms of chaos … and emerge with a discernible pattern or shape …” (p. 136). Abelson (1995) discusses theory as being able to cull together a wide range of findings into “coherent bundles of results” (p. 14). Thus, a thorough understanding of the theories that are germane to our research will provide purpose and direction in our quest to perceive the pattern of meaning that is pre sent in a set of variables that explains a phenomenon. This level of theoretical understanding makes it more likely that meaningful hypotheses can be posited that are grounded in a coherent structure and framework. Hypotheses Upon pondering a number of theories of a specific phenomenon, several hypotheses or predictions will undoubtedly emerge. In our everyday life we all formulate predictions and hypotheses, however informal. This can be as mundane as a prediction about what will happen during our day, or about how the weather will unfold. In scientific research, we strive to formalize our hypotheses so that they directly follow from well-thought-out theory. The more specific and precise we make our hypotheses, the more likelihood there is of either refuting them or finding useful evidence to corroborate them (Meehl, 1997). Wilson (1998) makes this clear by stating that theoretical tests of hypotheses “are constructed spe cifically to be blown apart if proved wrong, and if so destined, the sooner the better” (p. 57). Multivariate statistics allows us to formulate multiple hypotheses that can be tested in conjunction. Thus, we should try to formulate several, pivotal hypotheses or research questions that allow for rigorous tests of our theories, allowing us to hone and fine-tune our theories or banish them as useless (Wilson, 1998). The testing of these hypotheses is the work of empirical research. Empirical Studies Having searched out pertinent theories that lead to strong predictions, it is important to investigate what other researchers have found in our research area. Are there multiple empirical studies that have previously touched on aspects of these theories and predictions? Are there multiple contributions that could be made with new research that would add to the empirical base in this area? Borenstein, Hedges, Higgins, and Rothstein (2021) empha size the need to accrue results from multiple studies and assess them within a meta-analysis framework. This allows the regularities and consistent ideas to emerge as a larger truth than could be found from single studies. Fletcher (2022) and Howard and Maxwell (2023) reinforce the value of meta-analysis in meeting the goals for replication. Abelson (1995) describes this process as the development of “the lore” whereby “… well-articulated research … is likely to be absorbed and repeated by other investigators” as a collective understanding of a phenomenon (pp. 105–106). No matter what empirical area of interest, a
18
Overview
thorough search of previous research on a topic should illuminate the core constructs that could be viewed as pure or meta-versions of our specific variables of interest. After taking into account meaningful theories, hypotheses, and empirical studies we are ready to con sider how to measure the major constructs we plan to include in our research. Measurement When conducting empirical research, it is useful to ask about the nature of measurement for constructs of interest (Bandalos, 2018; Markus & Borsboom, 2013). Are there several pivotal constructs to be delineated and measured? Are there multiple ways to measure each of these constructs? Are there multiple, different items or variables for each of these mea sures? Classical Test Theory (e.g., Lord & Novick, 1968) and Item Response Theory (e.g., Embretson & Reise, 2000; McDonald, 2000) emphasize the importance of modeling the nature of an individual’s response to a measure and the properties of the measures. Relia bility Theory (e.g., Anastasi & Urbina, 1997; Lord & Novick, 1968; McDonald, 1999) emphasizes the need to have multiple items for each scale or subscale we wish to measure. Similarly, statistical analysts conducting principal components or factor analyses emphasize the need for a minimum of three or four variables to anchor each underlying dimension or construct (e.g., Gorsuch, 2015; Velicer & Jackson, 1990). The more variables we use, the more likelihood there is that we are tapping the true dimension of interest. In everyday terms, this is comparable to realizing that we cannot expect someone else to know who we are if we only use one or even two terms to describe ourselves. Certainly, students would agree that if a teacher were to ask just a single exam question to tap all of their knowledge in a topic area, this would hardly begin to do the trick. Multivariate thinking aids us in this regard, by not only encouraging but also requiring multiple variables to be examined in conjunction. This makes it much more likely that we will come to a deeper understanding of the phenomenon under study. Having identified several pertinent variables, it is also important to consider whether there are multiple time points across which a set of variables can be analyzed. Multiple Time Points Does a phenomenon change over time? Does a certain period of time need to pass before a pattern emerges or takes form? These questions are often important when examining change or stability over time (e.g., Hoffman, 2015). Assessing samples at multiple time points aids us in discerning which variables are most likely the causal agents and which are more the receptive outcomes. If the magnitude of a relationship is always stronger when one variable precedes another in time, there is some evidence that the proceeding (independent) variable (IV) may be impacting the other more dependent outcome. Having contemplated the possibility of multiple time points, it is important to consider how to build in multiple controls. Multiple Controls Perhaps the most veritable way to assure causal inferences is to implement controls within a research design (e.g., Pearl, Glymour, & Jewell, 2016). The three most salient controls involve a test of clear association between variables, evidence of temporal ordering of the variables, and the ability to rule out potential confounds or extraneous variables (e.g., Bul lock, Harlow, & Mulaik, 1994). This can be most elegantly achieved with an experimental design that:
Background Considerations 1 2 3
19
examines the association between carefully selected reliable variables, manipulates the IV such that one or more groups receive a treatment, whereas at least one group does not, and randomly selects a sufficient number of participants from a relevant population and randomly assigns them to either the treatment or control group.
With this kind of design, there is a greater likelihood that non-spurious relationships will emerge in which the IV can clearly be identified as the causal factor, with potential con founding variables safely ruled out with the random selection and assignment (Fisher, 1925). Despite the virtues of an experimental design in ensuring control over one’s research, it is often difficult to enact such a design. Variables, particularly those used in social sciences, cannot always be easily manipulated. For example, I would loathe to experimentally manipulate the amount of substance abuse that is needed to bring about a sense of mean inglessness in life. These kinds of variables are examined more ethically in a quasi-experimental design that tries to systematically rule out relevant confounds (e.g., Shadish, Cook, & Campbell, 2002). These types of designs could include background variables (e.g., income, education, age at first substance abuse, history of substance abuse, history of meaninglessness), or covariates (e.g., network of substance users in one’s environment, stressful life events) that could be statistically controlled while examining the relationship perceived between IVs and dependent variables (DVs). Needless to say, it is very difficult to ensure that adequate controls are in place without an experimental design, though the realities of real-world research make it necessary to consider alternative designs. In addition to multiple controls, it is useful to consider collecting data from multiple samples. Multiple Samples Are there several pertinent populations or samples from which data could be gathered to empirically study the main constructs and hypotheses? Samples are a subset of entities (e.g., persons) from which we obtain data to statistically analyze. Ideally, samples are randomly drawn from a relevant population, though much of research is conducted with convenience samples such as classrooms of students. Another type of sampling is called “purposive” that refers to forming a sample that is purposely heterogeneous or typical of the kind of population from which generalization is possible (Shadish, Cook, & Campbell, 2002). When samples are not drawn at random or purposively, it is difficult to generalize past the sample to a larger population (e.g., Shadish, 1995). Still, results from a non-random sample can offer descriptive or pre liminary information that can be followed up in other research. Procedures such as propensity score analysis (Rosenbaum, 2002) can be used to identify covariates that can address selection bias in a non-random sample, thus allowing the possibility of generalizing to a larger population. The importance of identifying relevant and meaningful samples is pivotal to all research. In multivariate research, samples are usually larger than when fewer variables are examined. Whether analyzing univariate or multivariate data from a relevant sample, it is preferable to verify whether your findings are consistent. Fisher (1935) highlighted the need for replicating findings in independent samples. Current thinking is re-affirming this emphasis to reiterate the importance of demonstrating that findings should be replicated and crossvalidated (e.g., Anderson & Kelley, 2022; Anderson & Maxwell, 2016; Derksen & Morawski, 2022; Grahek, Schaller, & Tackett, 2021; Klein et al., 2018; Pashler & Wagenmakers, 2012; Wilson, Harris, & Wixted, 2020). Statistical procedures have been developed in several areas of statistics that incorporate findings from multiple samples. For example, Jöreskog (1971) and Sörbom (1974)
20
Overview
developed multiple sample procedures for assessing whether a hypothesized mathematical model holds equally well in more than one sample. These multiple sample procedures allow for tests of increasing rigor of replication or equality across the samples, starting with a test of equal pattern of relationships among hypothesized constructs, up through equality of sets of parameters (e.g., factor loadings, regressions, and means) among constructs (see Chapter 13). If a hypothesized model can be shown to hold equally well across multiple samples, particularly when constraining the parameters to be the same, this provides a strong test of the generalizability of a model (Jöreskog, 1971; Molenaar, 2020). Even if most multivariate methods do not have specific procedures for cross-validating findings, efforts should be taken to ensure that results would generalize to multiple samples, thus allowing greater confidence in their applicability. Practical Implications Although research does not have to fill an immediately apparent practical need, it is helpful to consider what implications can be derived from a body of research. When multiple variables are examined, there is a greater likelihood that connections among them will manifest in ways that suggest practical applications. For example, research in health sci ences often investigates multiple plausible predictors of disease, or conversely well-being (Diener, 2009), which can be used in developing interventions to prevent illness and sustain positive health. Practical applications do not have to originate with initial research in an area. For example, John Nash researched mathematical group theory which only later was used to understand economics, bringing Nash a Nobel Prize (Nash, 2002). Lastly, it is important to consider a number of multivariate methods from which you could select for your specific research goals. Multiple Statistical Methods Are there several analyses needed to address the main questions? What kinds of analyses are needed? It is often important to examine our research using several multivariate meth ods (e.g., Tabachnick & Fidell, 2019). John Tukey (1977) championed the idea of liberally exploring our data to find what it could reveal to us. In this respect, it is not unlike an artist using several tools and utensils to work with a mound of clay until the underlying form and structure is made manifest. Throughout the book, examples are provided about how the themes pertain to various multivariate methods. Here, a brief overview of several kinds of multivariate methods is given. One set of methods focuses on group differences (e.g., Maxwell, Delaney, & Kelley, 2018; Tabachnick & Fidell, 2019). For group-difference methods, the main question is: Are there mean significant differences across groups, over and above what would occur by random chance, and how much of a relationship is there between the grouping and outcome variables? Group-difference methods help us discern whether the average score between each group is more different than the scores within each group. If the between-group dif ferences are greater than the random differences that are found among scores within each group, we have some evidence that the nature of the group is associated with the outcome scores. This is useful information, especially when resources are scarce or decisions need to be made that affect specific groups. For example, it is important to be able to differentiate groups that do and don’t need medical treatment, educational enrichment, or psychological interventions. Prediction methods allow us to predict an outcome on the basis of several predictor variables. The main question addressed with these methods is: How much of an outcome
Background Considerations
21
can we know given a set of predictor variables, and is the degree of relationship sig nificantly different from zero? Predictive methods allow us to assess possible relationships between variables, such that scores increase or decrease in predictable ways. If the pattern of increase and decrease between two sets of scores is almost as large as the average random differences among the scores for each variable, there is some evidence of an asso ciation between the pair of variables. Prediction methods are helpful in determining which set of variables is most closely linked to a specific outcome. For example, it would be useful to predict who will do well in recovering from a disease, or achieving success in an educational or business environment. Exploratory dimensional methods delineate the underlying dimensions in a large set of variables or individuals. Analyses can help delineate the main dimensions that underlie a large set of variables. For example, we could identify several dimensions to explain a large number of items measuring various facets of intelligence. Similar to the above, modern modeling methods (Harlow, 2010; Rodgers, 2010) can explore multiple relationships and paths, confirm underlying dimensions, estimate grouping effects, and conceptualize latent variables to explain complex relationships. Again, before addressing how the sets of themes relate specifically to a number of multi variate methods, it is helpful to notice several more background themes that are essential to an understanding of quantitative methods, whether multivariate or univariate. As delineated in the preceding sections, all considered investigation begins with well-reasoned theories, articulate hypotheses, careful empirical research, accurate measurement, representative samples, and relevant statistical methods. When time and multiple controls are possible, we may also be able to discern the causal nature of the relationships among variables. Devlin (1994) emphasizes that “… abstract patterns are the very essence of thought, of communication, of computation, of society, and of life itself” (p. 7). The field of mathe matics has long been concerned with noticing the fundamental patterns and connections in an area of study. In the more applied field of statistics, we can notice some basic themes that tend to permeate quantitative thinking. At least a preliminary understanding of them will help us later when delving into the complexities of specific multivariate methods. Data Data constitute the pieces of information (i.e., variables) on a phenomenon of interest. Data that can be assigned meaningful numerical values can be analyzed with a number of sta tistical methods. We usually assign a (numerical) score to each variable for each entity and store this in an “N by p” data matrix, where “N” stands for the number of participants or entities and “p” stands for the number of variables (predictors or outcomes). A data matrix is the starting point for statistical analysis. It is the large, numerical knowledge base from which we can combine, condense, and synthesize in order to derive meaningful and rele vant statistical nuggets that capture the essence of the original information. Obviously, a data matrix will tend to have more columns (of variables) and most likely more rows (of participants) with multivariate research than with univariate methods. To the extent that the data are collected from a large and representative random sample it can offer a strong foundation and workplace for subsequent analyses. At the end of this and later chapters, we’ll examine a data matrix drawn from a combined sample of 8,784 participants who responded to questions about diet behavior. The chapters will use different subsets of the data, depending on the variables selected for the examples that will be analyzed. Because not every measure was administered to every participant, the sample sizes (i.e., number rows in the data matrix) and the variables analyzed (i.e., number of columns in the data matrix) will vary for the examples.
22
Overview
Measurement Scales Variables can be measured on a continuous or a categorical scale. Variables measured on a continuous scale have numerical values that can be characterized by a smooth flow of arithmetically meaningful quantitative measurement, whereas categorical variables take on finite values that are discrete and more qualitatively meaningful. Age and height are examples of continuous variables that can take on many values that have quantitative meaning. In contrast, variables like gender and ethnicity have categorical distinctions that are not meaningfully aligned with any numerical values. It is also true that some variables can have measurement scales that have both numerical and categorical properties. Likert scale variables have several distinct categories that have at least ordinal, if not precisely quantitative, values. For example, variables that ask participants to rate a statement any where from “1 = Strongly Disagree” to “5 = Strongly Agree” are using an ordinal Likert scale of measurement. Continuous variables can be used as either predictors or as outcomes (e.g., in multiple regression). Categorical variables are often used to separate people into groups for analysis with group-difference methods. For example, we may assign participants to a treatment or a control group with the categorical variable of treatment (with scores of 1 = yes, or 0 = no), which we’ll see later in the examples presented at the ends of the chapters on ANCOVA, MANOVA, discriminant function analysis, and logistic regression. Due to the common use of Likert scales in social science research, the role of such ordinal variables can be desig nated as either categorical or quasi-continuous depending on whether the analysis calls for a grouping or a quantitative variable. As we will see later in this chapter, the choice of statistical analysis is often dependent, at least in part, on the measurement scales of the variables being studied. This is true for either multivariate or univariate methods. If variables are reliably and validly measured, whether categorical or continuous, then the results of analyses will be less biased and more trust worthy. We’ll also see that prior to conducting multivariate methods, we often begin by analyzing frequencies on categorical data and examining descriptive statistics (e.g., means, standard deviations, skewness, and kurtosis) on continuous data, as well as graphs to dis play and feature the highlights of the data (Friendly & Wainer, 2021). We will have a chance to do this in later chapters when providing the details from a fully worked through example for each method. Roles of Variables Variables can be independent (i.e., perceived precipitating cause), dependent (i.e., perceived outcome), or mediating (i.e., forming a sequential link between independent and dependent variables). In research, it is useful to consider the role that each variable plays in under standing phenomena. A variable that is considered a causal agent is sometimes labeled as independent or exogenous. It is not explained by a system of variables, but is believed to have an effect on other variables. Affected variables are often referred to as dependent or endogenous, implying they were directly impinged upon by other, more inceptive variables. Another kind of endogenous variable can be conceptualized as intermediate and thus intervenes between or changes the nature of the relationship between independent variables (IVs) and dependent variables (DVs). When a variable is conceived as a middle pathway between independent and dependent variables it is often labeled as an intervening or med iating variable (e.g., MacKinnon, 2008; Nguyen, Schmid, & Stuart, 2021). Jung (2021) provides an understandable description of the mediation process, along with several exam ples from the literature. Later, in Chapter 10 on path analysis, an example is analyzed where
Background Considerations
23
the IV, a randomized diet treatment (versus a control group), predicts a mediator (M) of stage of diet behavior change 12 months later, which in turn predicts a DV of healthy diet behaviors at 24 months. The pattern of relationships would reflect pure mediation if the IV significantly related to M, and M was significantly related to the DV, but there was not a sig nificant relationship between the IV and the DV after adding M. However, as we’ll see later in the book, whereas there were significant links with the mediator, there was also a small but significant relationship between the IV (i.e., treatment group) and the DV (i.e., healthy diet behaviors), revealing that the mediator was not exclusively linking the IV and the DV. Variables are referred to as moderator variables when they change the nature of the relationship between the independent and dependent variables (e.g., Baron & Kenny, 1986; MacKinnon, 2008). For example, teaching style may be a predictor of an outcome, school performance. If another variable is identified, such as gender, that when multiplied by teaching style changes the nature of the predictive relationship, then gender is seen as a moderator variable. Thus, a moderator is an interaction formed between a hypothesized IV and another exogenous variable believed to have some effect on a DV when taken in conjunction with the first IV. In this example, we might find that teaching style is significantly related to school performance, whereas gender may not be related by itself. However, if an interaction is formed by multiplying a gender binary score (e.g., 1 = man, 2 = woman) by the teaching style score (e.g., 1 = lecture, 2 = interactive), we may find that this interaction is positively related to school performance. This would imply that individuals who identify as a woman and who are taught with an interactive teaching style are more apt to have higher school performance than other students. This finding suggests that gender binary moderated the relationship between teaching style and school performance (though we could just as easily state that teaching style moderated the relationship between gender binary and school performance, highlighting the importance of theory in hypothesis generation and interpretation of findings). Moderating or mediating variables are also sometimes referred to as covariates. Covariates are variables that may correlate with a DV and are ideally not correlated with other IVs. Failing to consider covariates could hinder the interpretation of relationships between independent and dependent variables, especially with non-random samples. Covariates help to statistically iso late an effect, especially when random assignment and/or manipulation are not accomplished. When several well-selected covariates (i.e., confounds, extraneous variables) are included in a study, and the relationship between the IVs and DVs still holds after controlling for the effects of one or more covariates, there is greater assurance that we have isolated the effect. We should also realize that the designation of a variable as independent, dependent, mediating or moderating refers only to the role that a variable plays within a specific research design. Using the variable, self-efficacy, we could assign it the role of IV (e.g., predictor of some outcome), mediator (e.g., intervening between some IV and a DV), moderator (e.g., part of an interaction between another variable, say stage of behavior change, that is significantly related to a DV), or a DV (e.g., an outcome predicted by other IVs, mediators and/or moderators). Ultimately, con firmation of a designated role is drawn from several sources of evidence (e.g., experimental design, longitudinal research, replication). Still, it is important to clearly articulate the intended role of variables in any design, hopefully with support and justification from previous theory and empirical research. Finally, we should realize that statistical methods can analyze multiple vari ables at a time, with multivariate methods allowing larger and more complex patterns of variables than other procedures. Incomplete Information Inherent in all statistical methods is the idea of analyzing incomplete information, where only a portion of knowledge is available. For example, we analyze a subset of the data by
24
Overview
selecting a sample from the full population since this is all we have available. We examine only a subset of the potential causal agents or explaining variables since it is nearly impossible to conceive of all possible predictors. We collect data from only a few measures for each variable of interest since we don’t want to burden our participants. We describe the main themes in the data (e.g., factors, dimensions) and try to infer past our original sample and measures to a larger population and set of constructs. In each case, there is a need to infer a generalizable outcome from a subset to a larger universe in order to explain how scores vary and covary. Ultimately, we would like to be able to demonstrate that associa tions among variables can be systematically explained with as little error as possible. For example, a researcher might find that substance use scores vary depending on the level of distress and the past history of substance abuse. It may be that the higher the level of dis tress and the greater the past history of substance abuse, the more likely someone is to engage in greater substance use. It is most likely true that other variables are important in explaining an outcome. Even when conducting a large, multivariate study, it is important to recognize that we cannot possibly examine the full set of information, largely since it is not usually known or accessible. Instead, we try to assess whether the pattern of variation and covariation in the data appears to demonstrate enough evidence for statistically significant relationships, over and above what could be found from sheer random chance. Missing Data Another way that incomplete information plays out is when there are missing data (Enders, 2022; Garson, 2012; Little & Rubin, 2019). If data are missing completely at random (i.e., MCAR) there is no discernable pattern between an incidence of missing data and any other observable or unobservable variable. As this is not always easy to establish, it may be that data are more likely to be missing at random (i.e., MAR). This would imply that data that are missing are not due to the nature of the variables that have the missing data. Another category of missing data that would be less desirable for a researcher occurs when data are not missing at random (NMAR). Data that are NMAR suggest that participants neglected to respond to spe cific items due to the nature of the items, such that they were intentionally left blank. This could occur in, say, a health practices survey when respondents were hesitant to divulge the extent of their substance use or sexual behavior. There are several excellent sources that address the topic of missing data more thoroughly (e.g., Enders, 2022; Graham, 2009; Little & Rubin, 2019; Madley-Dowd, Hughes, Tilling, & Heron, 2019; McNeish, 2017). Descriptive Statistics Descriptive statistics provide a clear and concise view of data from a specific sample or population. This often involves summarizing the central nature of variables (e.g., average or mean score; mid-point or median score; and most frequently occurring or modal score), ideally from a representative sample. This can also comprise the spread or range of scores, as well as the average difference each score is from the mean (i.e., standard deviation). Descriptive statistics can also include measures of skewness and kurtosis to indicate how asymmetric or lopsided, and how peaked or heavy-tailed, respectively, is a distribution of scores. Thus, descriptive statistics summarize basic characteristics of a distribution such as central tendency, variability, skewness, and kurtosis. Descriptive statistics can be calculated for large multivariate studies that investigate the relationships among a number of variables, hopefully based on a well-selected and large sample. Descriptive statistics are presented for each example used in the chapters.
Background Considerations
25
Another form of descriptive statistics occurs when we synthesize information from multiple variables in a multivariate analysis using inferential statistics on a specific, non-random sample. For example, an instructor may want to describe the nature of class performance from a specific set of variables (e.g., quizzes, tests, projects, homework) and sample (e.g., one classroom). If she wanted to describe group differences between students from science versus non-science disciplines, she could conduct a multivariate analysis of variance with a categorical IV, college discipline (STEM: science, technology, engineering, mathematics vs. Non-STEM), and the several continuous outcomes she measured from students’ performance. Results would not necessarily be generalized beyond her immediate classroom, though they could provide a descriptive summary of the nature of performance between the STEM vs. Non-STEM groups in her class of students. Inferential Statistics Inferential statistics allow us to generalize beyond our sample data to a larger population. With most statistical methods, inferences are more apt to generalize past one’s specific data when statistical assumptions are met. Inferential statistics allow estimates of population characteristics from samples repre sentative of the population. Inferences are strengthened when potential extraneous variables are identified and taken into account. Likewise, if we can show that the data follow expected assumptions such that we can rule out random chance with our findings, then results are more conclusive. Multivariate research that shares these same features (i.e., representative samples, con trolling for confounds, and normally distributed data) can provide a basis for inferences beyond the immediate sample to a large, relevant population. In the next section, more is discussed about statistical assumptions, which often apply to many statistics, including most of the multivariate methods discussed in this book. Statistical Assumptions Statistical assumptions for many multivariate analyses include: independence, normality, linearity and homoscedasticity, each of which are described below. Independence in the data implies that scores from participants are separate from each other, and not dependent, as would be the case when sampling relatives or classmates and did not take into account that scores could be similar due to the shared experience. If scores are related, as would occur with repeated measures or clustered data, specific methods should be used that address the dependency. For example, a within-groups design takes into account the correlation between measures collected over time or from the same person on a similar characteristic. Hierarchical linear models, also called multilevel models, take into account data that are clustered in groups that tend to respond similarly. This could occur if sampling students from the same classroom, or from individuals from the same family, or even the same country in an international study. Data should be normally distributed with little skewness or kurtosis, which would also reveal a random pattern when forming a scatterplot of the residuals. Scatterplots are often available in computer programs for multiple regression, which actually can be used for other multivariate methods. See later discussion in this chapter, and Chapter 3 on multiple regression where there is more description on how to assess many of the assumptions with R, SAS, and SPSS. Nonnormal distributions with skewness values greater than about 1.0 are called “right skewed” or “positively skewed”, suggesting that most scores were relatively low with just a
26
Overview
few high values. In these cases, data distributions are lopsided with a few high scores trailing off in the positive or right side of the graph and most scores forming a high peak in the low end. Conversely, when skewness is less than about –1.0, data distributions are said to be “left skewed” or “negatively skewed,” with a few low scores trailing off to the left in the negative direction and most scores piled up in the high end. Students can probably see that when measuring a desirable outcome, such as performance, they would prefer to hear that test scores were negatively skewed so as to indicate that most people had high scores with only a few scoring poorly. On the other hand, when measuring an undesirable trait or occurrence, such as the number of speeding tickets received, it would be preferable to have a positively skewed distribution where most people had very few and only a small set had a large number of tickets. Nonnormal distributions with kurtosis would indicate that the data are piled up into a high peak with a good handful of points falling in either or both ends or tails of the dis tribution. Kurtosis values that are greater than about 1.0 or 2.0 indicate some degree of kurtosis that would be helpful to address. The relationships are homoscedastic, where the variance for one of the variables would be approximately the same if assessed at every level of a different variable. When homo scedasticity is present, the pattern of points that intersect the scores for two variables would look like a diagonal ellipse, revealing a relatively even distribution of scores across the two variables. Thus, scores do not overly cluster at one end, with very few scores at the other end. The reason for this assumption is that many statistics require that the variances are averaged or pooled across groups or variables. In order to have a reasonable estimate of the average variance, the values should be similar or homoscedastic. Consider the analysis of variance that uses the average within-groups variance in the denominator of the F-test. If the variances differ dramatically across groups such that one variance is more than 20 times another variance, this would indicate heteroscedasticity and would not meet assumptions. Another assumption often required is that there are linear relationships among variables, such that variables follow an additive pattern of the form: Y = BiXi + error. Moreover, the pattern of relationship between pairs of variables that are linear stays constant (i.e., just increasing, or just decreasing), without changing direction after certain points. For example, when correlating anxiety with performance, the pattern may look positively linearly related initially when increasing from small to moderate amounts of anxiety is accompanied by small to moderate increases in performance. However, after a certain point, anxiety tends to be negatively related to performance, thus revealing a non-linear pattern that would not be adequately assessed with traditional linear model methods. Assumptions can often be assessed by examining skewness and kurtosis values, bivariate plots of the variables, and correlations among the variables. Scatterplots are easily con structed in most computer programs and can help researchers to verify that points roughly follow a linear pattern (i.e., a fairly consistent slanted elliptical shape), instead of a curvi linear pattern that would indicate non-linearity that would violate assumptions. Scatterplots also provide some indication as to whether the data are homoscedastic, such that the var iance of scores on one variable is approximately the same at every level of another variable. For example, consider a possible plot of the relationship between drug use (on a 1–5 scale on the horizontal X-axis) and distress (on a similar 1–5 scale, plotted on the vertical Y-axis). Suppose that those with low drug use report low levels of distress, whereas those with high drug use have a whole range of scores on distress, from very little to very much. This would result in a plot with points in the scatterplot collecting fairly tightly around the lower left portion and points in the upper right that are very spread out and have the appearance of the top of a funnel. This would clearly reveal that there is heterogeneity of variance that would violate the assumption of homoscedasticity.
Background Considerations
27
Although it is not always done in published studies, testing for statistical assumptions is important and relatively doable with most computer routines. Analyses are conducted in the upcoming chapters on sample data to provide an examination of assumptions. Computer code for R, SPSS, and SAS are provided. Multicollinearity Before conducting multivariate methods, it is important to make sure that variables are not too highly overlapping, resulting in what is called multicollinearity. This would cause instability in statistical analyses such that it would be difficult to decide to which variable a weight should be attached. Generally, if variables are correlated greater than |0.90|, collinearity is most likely pre sent and decisions will have to be made as to whether to drop one of the variables or to combine them in a composite variable. In Chapter 3 on multiple regression, it will become more apparent how to assess possible collinearity by examining “variance inflation factor” (VIF) values, where: V IF ¼ 1=ð1 R2 Þ
ð2:1Þ
In Equation 2.1, the value in the denominator is the unexplained variance after predicting one variable from one or more other variables. If a variable is highly correlated with one or more variables in an analysis, it will be difficult to assign a unique coefficient to either variable leading to overly large variance and unstable estimates in procedures such as multiple regression (i.e., see Chapter 3). Myers (1990) suggests that VIF values greater than 10 would indicate collinearity, requiring a decision about whether to drop or combine the highly related, multicollinear variables. Using Equation 2.1, a VIF value of 10 would cor respond to an R2 value of 0.90: V IF ¼ 1=ð1 :90Þ ¼ 1=0:10 ¼ 10 This would mean that the multiple correlation between that variable and the others in a set is 0.95 (i.e., square root of 0.90). Thus, it might make sense to be concerned even sooner, say, when VIF values are larger than about 5, as this would correspond to a multiple R of about 0.90 (i.e., R2 of 0.81): V IF ¼ 1=ð1 0:81Þ ¼ 1=0:19 ¼ 5:26 VIF values are requested in Chapter 3 on multiple regression to assess for possible multi collinearity among a set of variables. Transforming the Data After assessing assumptions and possible collinearity, it is also useful to consider whether any variables need to be transformed to meet assumptions. If variables have skewness greater than an absolute value of about 1.0, or kurtosis greater than about 2.0, it may be helpful to consider taking a logarithmic or other transformation (e.g., square root) to reduce nonnormality. Although trans formed scores are not in the original metric and interpretation may be difficult, many scores used in the social sciences have arbitrary scales. For example, a variable such as substance abuse could be highly lopsided with a peak at the low end and a tail trailing off on the right side of a graph (i.e., positively or right
28
Overview
skewed). This pattern would indicate that most individuals reported low or no substance abuse, whereas a few individuals reported moderate to high substance use. In this case, it would be helpful to consider taking a natural log or square root of the scores in order to transform the variable into one that more closely followed assumptions (e.g., normality). For interested readers, several excellent discussions of transformations are offered else where (e.g., Cohen, Cohen, West, & Aiken, 2003; Tabachnick & Fidell, 2019). Thus, transformations, although somewhat controversial, may still be preferred to increase the power of analyses and decrease bias by meeting assumptions (e.g., Johnson & Wichern, 2002). Consider making transformations with your research, especially when analyzing data concerned with extreme variables such as drug use and sexual risk. If the data can be transformed to meet assumptions, then inferences made on these data may be more accurate. Reliability Reliability coefficients give an indication of how accurate and consistent the data are for a particular construct. There are several kinds of reliability coefficients, depending on the nature of the data. If data are collected across two time points, test–retest reliability, which is the correlation between scores at the two points, is an excellent indication of how stable the construct is. If data are collected using two different forms, parallel-forms reliability can be calculated, also with a correlation between scores assessed on two different forms for the same sample of individuals. Probably the most common form of reliability is internal con sistency as it only requires an assessment at a single time point on a single form, as long as there are multiple items. An example is shown shortly on calculating internal consistency and test–retest reliability for two multi-item constructs. Note that reliability coefficients should be calculated or obtained for the main constructs, before conducting the main analyses in any research study. Since many methods assume that variables are perfectly reliable, parameter estimates will be biased when using unreliable vari ables. To the extent that data are randomly selected and relevant, and that data meet assump tions and variables are reasonably reliable (e.g., reliability coefficients are greater than or equal to at least 0.70 or 0.80), there is greater basis for inferring past the initial sample to a larger population with some degree of confidence. It is not unusual for test–retest reliability coeffi cients to be somewhat lower, particularly when there is a long time-frame between the time points. We’ll see this later in the chapter when examining test-test reliability across 12-month and 24-month time periods. In the upcoming example, coefficient omega internal consistency coefficients (McDonald, 1999) are also calculated for two scales used later in the multivariate examples in this book (i.e., Diet Behavior and Diet Temptations). Coefficient omega is becoming more widely used, due to potential problems with the traditionally used coefficient alpha internal con sistency index (e.g., Flora, 2020; McNeish, 2018; Peters, 2014; Sijtsma, 2009). As common computer programs do not always provide calculations for coefficient omega, realize that there may be some degree of bias in these internal consistency reliability estimates. Roles of Variables and Choice of Methods Although many statistical methods can be subsumed under a single, generalized linear model (McCullagh & Nelder, 1989), it is often useful to characterize different statistical methods by the roles played by the variables in an analysis. Intermediate multivariate methods, such as multiple regression (MR) and analysis of covar iance (ANCOVA) are more sophisticated than univariate methods (e.g., correlation and analysis
Background Considerations
29
of variance: ANOVA). MR and ANCOVA (see Chapters 3 and 4, respectively) are similar in that they both feature a single, continuous outcome variable and two or more IVs, at least one or more of which are continuous. They differ in that MR variables (i.e., the multiple IVs and single out come) are often all continuous. In contrast, ANCOVA always has at least one categorical IV, much like ANOVA. ANCOVA differs from ANOVA in that ANCOVA has both continuous and categorical IVs, allowing the use of continuous covariates as well as grouping IV(s) to help explain the single outcome variable. Categorical or grouping multivariate methods such as multivariate analysis of variance (MANOVA), discriminant function analysis (DFA), and logistic regression (LR) allow an examination of links between a set of variables and a set of categories or groups. MANOVA (see Chapter 5) is an extension of ANOVA, allowing multiple, continuous DVs and one or more categorical IVs that each have two or more groups. Variables used in DFA (see Chapter 6) have the same characteristics as those with MANOVA except that the roles are reversed. In DFA, the set of continuous variables is viewed as predictor variables and the set of groups is seen as aspects of a categorical outcome variable. When there is a mixture of both categorical and continuous predictors and a categorical outcome, LR (see Chapter 7) is often used. DFA and LR are very similar in the nature and roles for the variables, with some differences. DFA requires multivariate assumptions (e.g., normality, linearity, and homoscedasticity), whereas LR does not. Further, DFA focuses on weights that can be interpreted as how much the categorical variable will change when the predictor variable changes by one point. In LR, weights are interpreted as the odds that an individual with the IV characteristic will end up in a specific outcome group. If there is a 50–50 chance of ending up in a specific group, the odds will be equal to 1.0. If there is less chance then the odds will be less than 1.0; conversely, if there is greater than a 50–50 chance of being in a specific group based on the IV characteristic, the odds will be greater than 1.0. Predictor variables that are meaningful in an analysis generally have an odds value greater than |1.0| associated with them. Multivariate structural methods, such as principal components analysis (PCA) and factor analysis (FA) incorporate a large set of continuous variables. In PCA or FA (see Chapter 8), there is one set of continuous variables with the goal of trying to identify a smaller set of latent dimensions that underlie the variables. Structural equation modeling (SEM) provides another set of structural methods that can be very useful in examining the pattern of relationships among multiple variables, whether latent or measured. Chapters 9 to 14 discuss an overview, path analysis, confirmatory factor analysis, latent variable modeling, multiple sample SEM analysis, and latent growth mod eling, respectively. Chapter 15 gives an integrated summary of all of the methods and their main features. Summary of Background Considerations Multivariate statistical methods build on these background themes allowing more realistic designs among multiple variables than methods that just analyze one (i.e., univariate) or two (i.e., bivariate) key variables. Multivariate methods can help us see relationships that might only occur in combination with a set of well-selected variables. Background themes involve practicing open science, having strong theory, consideration of the data used to test the hypotheses, the scale of measurement and nature of the variables, both descriptive and inferential statistics, assumptions, and how the scale of the variables often affects the choice of method (e.g., Intermediate, Categorical or Grouping Multivariate, or Structural Multivariate) for a particular study. A summary of background themes to consider for multivariate methods is presented in Table 2.1.
30
Overview
Table 2.1 Summary of Background Themes to Consider for Multivariate Methods Initial considerations informing multivariate themes at all levels of focus, with greater adoption generally leading to greater reliability, validity, and generalization:
a b c d e f g h i j
Open science Theory Hypotheses Empirical studies Measures Multiple time points Multiple controls Multiple samples Practical implications Multiple statistical methods
Background considerations before conducting multivariate analyses:
a b c d e f g h i
Data matrix (N rows of participants and p columns of variables)
Measurement scales (categorical or continuous)
Types of variables (independent, dependent, mediating, or moderating)
Incomplete information (analyze only a subset)
Missing data
Descriptive statistics (central tendency, variability, skewness, and kurtosis)
Inferential statistics (requiring assumptions to generalize beyond the sample)
Assumptions, considerations, and reliability
Types of variables and choice of methods (intermediate methods with 1 continuous DV; multi variate group-difference methods; and multivariate correlational methods)
2.2 Questions to Help Apply Themes to Multivariate Methods Having considered a number of themes that enhance multivariate thinking, it is instructive to briefly outline how these themes apply to several multivariate methods. To do this in the upcoming chapters, it is helpful to reflect on a set of questions to ask. These questions can help us focus on how the major themes apply to various statistical procedures. My hope is that by working through a set of relevant questions on several multivariate statistics, it will be much easier to extend this thinking to other methods not discussed in this volume. Thus, the main focus of this book is not to present complete detail on every multivariate method. Rather, I present a set of themes and questions to help enlarge our thinking to recognize the common facets of many statistical methods. In keeping with Devlin’s (1994) claim that “most mathematicians now agree [that] mathematics is the science of patterns” (p. 3), I argue that multivariate statistics embody a set of prevalent ideas or propositions. To the extent that these elemental ideas are apparent, it is much easier to engage in multivariate thinking. The first question to ask is: What is this method and how is it similar to and different from other methods? For each method, a brief description of the main features is presented. I then point to other methods that are similar to, and different from the particular method being discussed, thereby inviting an initial understanding of the connections among methods.
Background Considerations
31
A second question I ask is: When is a specific method used and what research questions can it address? In this section I present a few common uses of the specific multivariate method, often providing brief examples. Information presented in this section can help when trying to decide which method to use for a particular research hypothesis. The third question asks: What is the statistical model or main equations for a method? The topic of models can appear obscure and abstract. It is helpful to think of a model as a concise representation of a large system or phenomenon. Thus, a model is just an abbre viated description. To the extent that a model is quantified, it becomes more capable of being examined and tested in a rigorous, scientific arena. I like to think of a progression of models from a verbal or conceptual description, to symbolic and pictorial representation (as in a flow chart or path diagram), to one or more mathematical equations. Many statistical methods can be accommodated under what is known as the generalized linear model (e.g., McCullagh & Nelder, 1989), showing how a variable is an additive linear combination of other constants or variables. For example, in basic mathematics, a straight line is modeled as Y = A + BX, where Y is an outcome variable, A is a (constant) point at which a line intersects with the Y-axis, B is the (constant) slope of a line (i.e., rise over run), and X is a predictor or independent variable. This simple linear formulation emerges in each of the multivariate models presented in this volume. A fourth question is: What are the preliminary considerations and assumptions for a method? I work through much of this topic in the current chapter for a general discussion, and then again in the other chapters specific to each method. The main goal, here, is to explicate a number of considerations important to think about prior to conducting the main multivariate analyses. The fifth question asked is: How are results interpreted at macro- and micro-levels for each method? To aid with interpretation, it is useful initially to examine results at a global, overriding level, often with a test of significance and a measure of the magnitude of an effect, along with confidence intervals. Cumming (2012), Grissom and Kim (2012), and Kelley and Preacher (2012) provide excellent discussion on effect sizes, as well as con fidence intervals, which are useful to consider when interpreting research findings. Pek and Flora (2018) also offer guidance on effect sizes, particularly unstandardized effects, along with several examples. For each multivariate method presented here, a macro-level approach is described and then applied with real-world data. Subsequently, the focus is on the specific micro-level aspects of a multivariate analysis, often involving some description of means or weights that give some indication as to the importance of specific variables. Micro-level assessment is presented for each multivariate method in this book. The sixth question asks: What is an example of applying this method to a research question? For each multivariate method presented in this volume, we work through an example using a consistent data set.
Table 2.2 Six Questions to Ask for Each Multivariate Method 1 2 3 4 5 6
What is this method and how is it similar to and different from other methods? When is this method used and what research questions can it address? What is the statistical model or main equations for this method? What are the preliminary considerations and assumptions for this method? How are results for this method interpreted at macro- and micro-levels? What is an example of applying this method to a research question?
32
Overview
For this chapter, it is useful to provide an example of calculating test–retest reliability over 12- and 24-month periods for the composite scores of Diet Behavior and Diet Temp tations. Coefficient omega (McDonald, 1999) is also calculated for each of the two con structs, providing a less-biased estimate of internal consistency reliability than the more widely used Cronbach’s (1951) coefficient alpha (e.g., McNeish, 2018; Peters, 2014). The example is presented below, based on data that are used for many of the examples used in the book.
2.3 An Example of Conducting Reliability Analyses for Major Constructs The example draws on a secondary data sample of 6,620 participants pooled from an ori ginally combined sample of 8,784 participants from three randomized studies who showed a baseline risk for a high-fat diet (see Brick, Yang, Harlow, Redding, & Prochaska, 2019; Prochaska et al., 2004; Prochaska et al., 2005). The demographics of a reduced pooled sample (N = 6,620) included 63% female, 91% White, 2% Black, 2% Hispanic, and were aged from 18 to 76 (mean = 43.75, standard deviation = 1.77), with all participants at risk for high-fat diet (i.e., estimated >30% fat). For each of the three studies, participants were randomized to an intervention or assessment-only control group and a tailored expert system intervention was applied to improve diet in the intervention group. Intervention was given after baseline measurement, and at 6 and 12 months post baseline, with assessments conducted at baseline, 12 months, and 24 months. Over the three studies, there were varying rates of attrition, although Greene et al. (2013) found only small demographic and behavioral effects on retention, and no stage effect on retention and no site-specific interaction with study outcome. A codebook for these data is provided in Appendix A, providing more details on the actual items. The R open-source computer package is used to calculate test–retest reliability and coefficient omega, with the R code presented here, as with other methods and examples in this book. Note that the focus in this book is more on understanding a set of widely used multivariate methods, with the computer code secondarily provided to facilitate learning. For those who are interested in more details about these computer programs, check into books that are more specifically focused on multivariate methods with R (e.g., Field, Miles, & Field, 2012; Flora, 2018), SPSS (e.g., Field, 2018; Leech, Barrett, & Morgan, 2015), or SAS (e.g., Delwiche & Slaughter, 2019). Let’s take a look at Table 2.3a that shows the R code for calculating reliability coeffi cients with both test–retest correlations across 12- and 24-month periods, and coefficient omega, for the two main constructs recently discussed. The first step is to install and load needed packages and their corresponding libraries. For example, the “dplyr” allows you to select subsets of variables from the larger dataset before conducting specific analyses, and the “psych” package and library calculate coefficient omega (see the omega() function), among other routines. We also ask to drop incomplete cases that are not available (see the na. omit() function) before conducting analyses. Test–retest correlations are easily obtained (see the cor() function). SAS syntax for test–retest and internal consistency reliability coefficients is shown in Table 2.3b. The PROC CORR command is used to calculate test–retest correlations, and this same PROC CORR, followed by ALPHA, indicates a request for the procedure (i.e., proc) that calculates coefficient alpha based on the correlations among the relevant items. This procedure examines how internally consistent the items are for the two constructs (i.e., Diet Behavior and Diet Temptations). Unfortunately, coefficient omega is not readily calculated in SAS, nor in SPSS, so the code for coefficient alpha is presented instead.
Background Considerations
33
Table 2.3a R Syntax for Test–Retest Reliability and Coefficient Omega # install packages and load libraries install.packages(“psych”) # Coef. omega, etc. install.packages(“dplyr”) # Selecting data library(psych); library(dplyr) # Variables and correlations for Diet Behavior across 3 time points DB3tVars