228 68 7MB
English Pages [145] Year 2019
Exploratory Factor Analysis
Quantitative Applications in the Social Sciences A SAGE PUBLICATIONS SERIES 1. Analysis of Variance, 2nd Edition Iversen/ Norpoth 2. Operations Research Methods Nagel/Neef 3. Causal Modeling, 2nd Edition Asher 4. Tests of Significance Henkel 5. Cohort Analysis, 2nd Edition Glenn 6. Canonical Analysis and Factor Comparison Levine 7. Analysis of Nominal Data, 2nd Edition Reynolds 8. Analysis of Ordinal Data Hildebrand/Laing/ Rosenthal 9. Time Series Analysis, 2nd Edition Ostrom 10. Ecological Inference Langbein/Lichtman 11. Multidimensional Scaling Kruskal/Wish 12. Analysis of Covariance Wildt/Ahtola 13. Introduction to Factor Analysis Kim/Mueller 14. Factor Analysis Kim/Mueller 15. Multiple Indicators Sullivan/Feldman 16. Exploratory Data Analysis Hartwig/Dearing 17. Reliability and Validity Assessment Carmines/Zeller 18. Analyzing Panel Data Markus 19. Discriminant Analysis Klecka 20. Log-Linear Models Knoke/Burke 21. Interrupted Time Series Analysis McDowall/ McCleary/Meidinger/Hay 22. Applied Regression, 2nd Edition Lewis-Beck/ Lewis-Beck 23. Research Designs Spector 24. Unidimensional Scaling McIver/Carmines 25. Magnitude Scaling Lodge 26. Multiattribute Evaluation Edwards/Newman 27. Dynamic Modeling Huckfeldt/Kohfeld/Likens 28. Network Analysis Knoke/Kuklinski 29. Interpreting and Using Regression Achen 30. Test Item Bias Osterlind 31. Mobility Tables Hout 32. Measures of Association Liebetrau 33. Confirmatory Factor Analysis Long 34. Covariance Structure Models Long 35. Introduction to Survey Sampling Kalton 36. Achievement Testing Bejar 37. Nonrecursive Causal Models Berry 38. Matrix Algebra Namboodiri 39. Introduction to Applied Demography Rives/Serow 40. Microcomputer Methods for Social Scientists, 2nd Edition Schrodt 41. Game Theory Zagare 42. Using Published Data Jacob 43. Bayesian Statistical Inference Iversen 44. Cluster Analysis Aldenderfer/Blashfield 45. Linear Probability, Logit, and Probit Models Aldrich/Nelson 46. Event History and Survival Analysis, 2nd Edition Allison 47. Canonical Correlation Analysis Thompson 48. Models for Innovation Diffusion Mahajan/Peterson 49. Basic Content Analysis, 2nd Edition Weber 50. Multiple Regression in Practice Berry/Feldman 51. Stochastic Parameter Regression Models Newbold/Bos 52. Using Microcomputers in Research Madron/Tate/Brookshire
53. Secondary Analysis of Survey Data Kiecolt/ Nathan 54. Multivariate Analysis of Variance Bray/ Maxwell 55. The Logic of Causal Order Davis 56. Introduction to Linear Goal Programming Ignizio 57. Understanding Regression Analysis, 2nd Edition Schroeder/Sjoquist/Stephan 58. Randomized Response and Related Methods, 2nd Edition Fox/Tracy 59. Meta-Analysis Wolf 60. Linear Programming Feiring 61. Multiple Comparisons Klockars/Sax 62. Information Theory Krippendorff 63. Survey Questions Converse/Presser 64. Latent Class Analysis McCutcheon 65. Three-Way Scaling and Clustering Arabie/ Carroll/DeSarbo 66. Q Methodology, 2nd Edition McKeown/ Thomas 67. Analyzing Decision Making Louviere 68. Rasch Models for Measurement Andrich 69. Principal Components Analysis Dunteman 70. Pooled Time Series Analysis Sayrs 71. Analyzing Complex Survey Data, 2nd Edition Lee/Forthofer 72. Interaction Effects in Multiple Regression, 2nd Edition Jaccard/Turrisi 73. Understanding Significance Testing Mohr 74. Experimental Design and Analysis Brown/Melamed 75. Metric Scaling Weller/Romney 76. Longitudinal Research, 2nd Edition Menard 77. Expert Systems Benfer/Brent/Furbee 78. Data Theory and Dimensional Analysis Jacoby 79. Regression Diagnostics, 2nd Edition Fox 80. Computer-Assisted Interviewing Saris 81. Contextual Analysis Iversen 82. Summated Rating Scale Construction Spector 83. Central Tendency and Variability Weisberg 84. ANOVA: Repeated Measures Girden 85. Processing Data Bourque/Clark 86. Logit Modeling DeMaris 87. Analytic Mapping and Geographic Databases Garson/Biggs 88. Working With Archival Data Elder/Pavalko/Clipp 89. Multiple Comparison Procedures Toothaker 90. Nonparametric Statistics Gibbons 91. Nonparametric Measures of Association Gibbons 92. Understanding Regression Assumptions Berry 93. Regression With Dummy Variables Hardy 94. Loglinear Models With Latent Variables Hagenaars 95. Bootstrapping Mooney/Duval 96. Maximum Likelihood Estimation Eliason 97. Ordinal Log-Linear Models Ishii-Kuntz 98. Random Factors in ANOVA Jackson/Brashers 99. Univariate Tests for Time Series Models Cromwell/Labys/Terraza
Quantitative Applications in the Social Sciences A SAGE PUBLICATIONS SERIES 100. Multivariate Tests for Time Series Models Cromwell/Hannan/Labys/Terraza 101. Interpreting Probability Models: Logit, Probit, and Other Generalized Linear Models Liao 102. Typologies and Taxonomies Bailey 103. Data Analysis: An Introduction Lewis-Beck 104. Multiple Attribute Decision Making Yoon/ Hwang 105. Causal Analysis With Panel Data Finkel 106. Applied Logistic Regression Analysis, 2nd Edition Menard 107. Chaos and Catastrophe Theories Brown 108. Basic Math for Social Scientists: Concepts Hagle 109. Basic Math for Social Scientists: Problems and Solutions Hagle 110. Calculus Iversen 111. Regression Models: Censored, Sample Selected, or Truncated Data Breen 112. Tree Models of Similarity and Association Corter 113. Computational Modeling Taber/Timpone 114. LISREL Approaches to Interaction Effects in Multiple Regression Jaccard/Wan 115. Analyzing Repeated Surveys Firebaugh 116. Monte Carlo Simulation Mooney 117. Statistical Graphics for Univariate and Bivariate Data Jacoby 118. Interaction Effects in Factorial Analysis of Variance Jaccard 119. Odds Ratios in the Analysis of Contingency Tables Rudas 120. Statistical Graphics for Visualizing Multivariate Data Jacoby 121. Applied Correspondence Analysis Clausen 122. Game Theory Topics Fink/Gates/Humes 123. Social Choice: Theory and Research Johnson 124. Neural Networks Abdi/Valentin/Edelman 125. Relating Statistics and Experimental Design: An Introduction Levin 126. Latent Class Scaling Analysis Dayton 127. Sorting Data: Collection and Analysis Coxon 128. Analyzing Documentary Accounts Hodson 129. Effect Size for ANOVA Designs Cortina/Nouri 130. Nonparametric Simple Regression: Smoothing Scatterplots Fox 131. Multiple and Generalized Nonparametric Regression Fox 132. Logistic Regression: A Primer Pampel 133. Translating Questionnaires and Other Research Instruments: Problems and Solutions Behling/Law 134. Generalized Linear Models: A Unified Approach, 2nd Edition Gill/Torres 135. Interaction Effects in Logistic Regression Jaccard 136. Missing Data Allison 137. Spline Regression Models Marsh/Cormier 138. Logit and Probit: Ordered and Multinomial Models Borooah 139. Correlation: Parametric and Nonparametric Measures Chen/Popovich 140. Confidence Intervals Smithson 141. Internet Data Collection Best/Krueger 142. Probability Theory Rudas 143. Multilevel Modeling, 2nd Edition Luke
144. Polytomous Item Response Theory Models Ostini/Nering 145. An Introduction to Generalized Linear Models Dunteman/Ho 146. Logistic Regression Models for Ordinal Response Variables O’Connell 147. Fuzzy Set Theory: Applications in the Social Sciences Smithson/Verkuilen 148. Multiple Time Series Models Brandt/Williams 149. Quantile Regression Hao/Naiman 150. Differential Equations: A Modeling Approach Brown 151. Graph Algebra: Mathematical Modeling With a Systems Approach Brown 152. Modern Methods for Robust Regression Andersen 153. Agent-Based Models Gilbert 154. Social Network Analysis, 2nd Edition Knoke/Yang 155. Spatial Regression Models, 2nd Edition Ward/Gleditsch 156. Mediation Analysis Iacobucci 157. Latent Growth Curve Modeling Preacher/Wichman/MacCallum/Briggs 158. Introduction to the Comparative Method With Boolean Algebra Caramani 159. A Mathematical Primer for Social Statistics Fox 160. Fixed Effects Regression Models Allison 161. Differential Item Functioning, 2nd Edition Osterlind/Everson 162. Quantitative Narrative Analysis Franzosi 163. Multiple Correspondence Analysis LeRoux/ Rouanet 164. Association Models Wong 165. Fractal Analysis Brown/Liebovitch 166. Assessing Inequality Hao/Naiman 167. Graphical Models and the Multigraph Representation for Categorical Data Khamis 168. Nonrecursive Models Paxton/Hipp/ Marquart-Pyatt 169. Ordinal Item Response Theory Van Schuur 170. Multivariate General Linear Models Haase 171. Methods of Randomization in Experimental Design Alferes 172. Heteroskedasticity in Regression Kaufman 173. An Introduction to Exponential Random Graph Modeling Harris 174. Introduction to Time Series Analysis Pickup 175. Factorial Survey Experiments Auspurg/Hinz 176. Introduction to Power Analysis: Two-Group Studies Hedberg 177. Linear Regression: A Mathematical Introduction Gujarati 178. Propensity Score Methods and Applications Bai/Clark 179. Multilevel Structural Equation Modeling Silva/Bosancianu/Littvay 180. Gathering Social Network Data adams 181. Generalized Linear Models for Bounded and Limited Quantitative Variables, Smithson and Shou 182. Exploratory Factor Analysis, Finch
Sara Miller McCune founded SAGE Publishing in 1965 to support the dissemination of usable knowledge and educate a global community. SAGE publishes more than 1000 journals and over 800 new books each year, spanning a wide range of subject areas. Our growing selection of library products includes archives, data, case studies and video. SAGE remains majority owned by our founder and after her lifetime will become owned by a charitable trust that secures the company’s continued independence. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Exploratory Factor Analysis
W. Holmes Finch Ball State University
FOR INFORMATION:
Copyright © 2020 by SAGE Publications, Inc.
SAGE Publications, Inc.
All rights reserved. Except as permitted by U.S. copyright law, no part of this work may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without permission in writing from the publisher.
2455 Teller Road Thousand Oaks, California 91320 E-mail: [email protected] SAGE Publications Ltd. 1 Oliver’s Yard 55 City Road London EC1Y 1SP United Kingdom
All third party trademarks referenced or depicted herein are included solely for the purpose of illustration and are the property of their respective owners. Reference to these trademarks in no way indicates any relationship with, or endorsement by, the trademark owner.
SAGE Publications India Pvt. Ltd. B 1/I 1 Mohan Cooperative Industrial Area
Printed in the United States of America
Mathura Road, New Delhi 110 044 India
ISBN: 9781544339887
SAGE Publications Asia-Pacific Pte. Ltd. 18 Cross Street #10-10/11/12 China Square Central Singapore 048423
Acquisitions Editor: Helen Salmon
This book is printed on acid-free paper.
Associate Editor: Chelsea Neve Editorial Assistant: Megan O’Heffernan Production Editor: Jyothi Sriram Copy Editor: Diane DiMura Typesetter: Hurix Digital Proofreader: Sarah J. Duffy Indexer: Amy Murphy Cover Designer: Ginkhan Siam Marketing Manager: Shari Countryman
19 20 21 22 23 10 9 8 7 6 5 4 3 2 1
CONTENTS Series Editor’s Introduction About the Author Acknowledgments Chapter 1: Introduction to Factor Analysis Latent and Observed Variables The Importance of Theory in Doing Factor Analysis Comparison of Exploratory and Confirmatory Factor Analysis EFA and Other Multivariate Data Reduction Techniques A Brief Word About Software Outline of the Book Chapter 2: Mathematical Underpinnings of Factor Analysis Correlation and Covariance Matrices The Common Factor Model Correspondence Between the Factor Model and the Covariance Matrix Eigenvalues Error Variance and Communalities Summary Chapter 3: Methods of Factor Extraction in Exploratory Factor Analysis Eigenvalues, Factor Loadings, and the Observed Correlation Matrix Maximum Likelihood Principal Axis Factoring Principal Components Analysis Principal Components Versus Factor Analysis Other Factor Extraction Methods Example Summary Chapter 4: Methods of Factor Rotation Simple Structure Orthogonal Versus Oblique Rotation Methods
ix xi xiii 1 1 3 5 7 10 10 13 13 15 18 19 21 22 23 25 28 29 30 31 32 33 37 39 39 40
Common Orthogonal Rotations 42 Varimax Rotation 42 Quartimax Rotation 46 Equamax Rotation 47 Common Oblique Rotations 48 Promax Rotation 48 Oblimin 53 Geomin Rotation 55 Target Factor Rotation 57 Bifactor Rotation 60 Example 64 Deciding Which Rotation to Use 67 Summary 68 Appendix 70 Chapter 5: Methods for Determining the Number of Factors to Retain in Exploratory Factor Analysis Scree Plot and Eigenvalue Greater Than 1 Rule Objective Methods Based on the Scree Plot Eigenvalues and the Proportion of Variance Explained Residual Correlation Matrix Chi-Square Goodness of Fit Test for Maximum Likelihood Parallel Analysis Minimum Average Partial Very Simple Structure Example Summary
71 71 75 77 79 80 82 84 85 87 90
Chapter 6: Final Issues in Factor Analysis Proper Reporting Practices for Factor Analysis Factor Scores Power Analysis and A Priori Sample Size Determination Dealing With Missing Data Exploratory Structural Equation Modeling Multilevel EFA Summary
93 94 96 101 103 106 107 109
References Index
111 119
SERIES EDITOR’S INTRODUCTION Social and behavioral research rests fundamentally on measurement. Much of what we wish to study cannot be directly observed. Exploratory Factor Analysis describes tools for exploring relationships between observed variables and latent variables, also known as factors. The volume provides an accessible introduction to applied exploratory factor analysis (EFA) for those new to the method. Professor Finch’s goal is to give readers sufficient background to embark on their own analyses, follow up on topics of interest, and more broadly engage the literature. In doing so, he draws on his experience teaching this material as well as his expertise as a contributor to the literature. Exploratory Factor Analysis is thorough, but understandable. What the reader needs to know is explained “just in time.” Forty years ago, in the early years of the QASS series, JaeOn Kim and Charles W. Mueller contributed two volumes on factor analysis, one introducing the method and the other explaining its application. This volume replaces them both. Professor Finch begins with problems of measurement, explaining that variables of interest are often latent constructs, not directly observable, and discussing the implications. After a quick review of the mathematical underpinnings of factor analysis, Chapters 3-5 discuss factor extraction, rotation, and retention in that order. Professor Finch walks through the requirements, assumptions, and logic of each of these steps in an EFA. Throughout, he gives much practical advice about how to implement them and about the many choices an analyst must make. For factor extraction, should the analyst use ML (maximum likelihood), PAF (principal axis factoring), or some other method? What are the differences? What is the point of factor rotation? Which is better for a particular purpose, orthogonal or oblique rotation, and given the choice, how does the analyst choose among the many rotation methods available? Which method for determining the number of factors has the most support in the literature? And importantly, for results to be reproducible, how should the analysis be described and results be presented? In the final chapter, Professor Finch discusses factor scores and the problems that can arise using them, determining appropriate sample size, and how to handle missing data as well as introducing some more advanced topics. A strength of the volume is that it clearly differentiates EFA not only from Confirmatory Factor Analysis (chapters 1 and 2), but also Principal Components Analysis (chapter 3), discriminant analysis, partial least squares, and canonical correlation (chapter 6). Two examples enliven the explanations. The first involves achievement motivation. This very simple example, based on four subscales, is used to
x Exploratory Factor Analysis demonstrate calculations and interpretation in the context. The other example is the Adult Temperament Scale. This is a very “real” example in that it reveals some of the challenges associated with EFA. In EFA, there are many mathematically plausible solutions: what criteria does the analyst use to make choices between them? Professor Finch uses the ATS example to demonstrate how comparisons of results based on different approaches to extraction and rotation can serve as robustness checks. However, it is also sometimes the case that the different methods do not point to a single, unified conclusion. In the ATS example, different methods provide inconsistent guidance on factor retention. This can happen, and Professor Finch comments on how the analyst might respond. Data and software code for both examples are contained in a companion website at study.sagepub .com/researchmethods/qass/finch-exploratory-factor-analysis. As Professor Finch explains, there is a continuum of factor analysis models, ranging from purely exploratory models which incorporate no a priori information to confirmatory models in which all aspects of the model are specified by the researcher as hypotheses about measurement structure. The approach taken in Exploratory Factor Analysis puts it somewhere in the middle of this continuum. Throughout, Professor Finch stresses the importance of theoretical expectations in making choices and assessing results. Although formal hypotheses about measurement structure are not tested in EFA, theory nevertheless guides the application of this data analytic technique. Barbara Entwisle Series Editor
ABOUT THE AUTHOR W. Holmes Finch (Ph.D. South Carolina) is the George and Frances Ball Distinguished Professor of Educational Psychology in the Department of Educational Psychology at Ball State University. Prior to coming to Ball State, he worked as a consultant in the Statistics Department at the University of South Carolina for 12 years advising faculty and graduate students on the appropriate statistical methods for their research. Dr. Finch teaches courses in statistical and research methodology as well as psychometrics and educational measurement. His research interests involve issues in psychometrics, including dimensionality assessment, differential item functioning, generalizability theory and unfolding models. In addition, he pursues research in multivariate statistics, particularly those involving nonparametric techniques. He is the co-author of Multilevel Modeling Using R (with Holden, J.E., & Kelley, K., CRC Press, 2014); Applied Psychometrics Using SAS (with French, B.F. & Immekus, J., Information Age, 2014); and Latent Variable Models in R (with French, B.F., Routledge, 2015).
ACKNOWLEDGMENTS I would like to acknowledge several folks for their help with this book. First, Barbara Entwistle and Helen Salmon were invaluable sources of encouragement, guidance, and editorial ideas throughout the writing of the book. In addition, I would like to acknowledge the many great teachers and mentors with whom I’ve had the pleasure to work over the years, in particular John Grego, Brian Habing, and Huynh Huynh. Finally, I would like to acknowledge Maria, without whose love and support none of this work would be possible. I would like to thank the following reviewers for their feedback: Damon Cann, Utah State University Stephen G. Sapp, Iowa State University Michael D. Biderman, University of Tennessee at Chattanooga
Chapter 1 INTRODUCTION TO FACTOR ANALYSIS Factor analysis is perhaps one of the most widely used statistical procedures in the social sciences. An examination of the PsycINFO database for the period between January 1, 2000, and September 19, 2018, revealed a total of approximately 55,000 published journal articles indexed with the keyword factor analysis. Similar results can be found by examining the ERIC database for education research and JSTOR for other social sciences. Thus, it is not an exaggeration to state that understanding factor analysis is key to understanding much published research in the fields of psychology, education, sociology, political science, anthropology, and the health s ciences. The purpose of this book is to provide you with a solid foundation in exploratory factor analysis, which, along with confirmatory factor a nalysis, represents one of the two major strands within this broad field. Indeed, a portion of this first chapter will be devoted to comparing and contrasting these two ways of conceptualizing factor analysis. However, before getting to that point, we first need to describe what, exactly, factors are and the differences between latent and observed variables. We will then turn our attention to the importance of having strong theory to underpin the successful use of factor analysis, and how this theory should serve as the basis upon which we understand the latent variables that this method is designed to describe. We will then conclude the chapter with a brief d iscussion of the software available for conducting factor analysis and an outline of the book itself. My hope in writing this book is to provide you, the reader, with a sufficient level of background in the area of exploratory factor analysis so that you can conduct analyses of your own, delve more deeply into topics that might interest you, and confidently read research that has used factor analysis. If this book achieves these goals, then I will count it as a success.
Latent and Observed Variables Much research in fields such as psychology is focused on variables that cannot be directly measured. These variables are often referred to as being latent, and include such constructs as intelligence, personality, mood, affect, and aptitude. These latent variables are frequently featured in social science research and are also the focus for clinicians who want to gain insights into the psychological functioning of their clients. For example, a researcher might be interested in determining whether there is a relationship between 1
2 Exploratory Factor Analysis extraversion and job satisfaction, whereas a clinician may want to know whether her client is suffering from depression. In both cases, the variables of interest (extraversion, job satisfaction, and depression) are conceived of as tangible, real constructs, though they cannot be directly measured or observed. We talk about an individual as being an extravert or we conclude that a person is suffering from depression, yet we have no direct way of observing either of those traits. However, as we will see in this book, these latent variables can be represented in the statistical model that underlies factor analysis. If latent variables are, by their very nature, not observable, then how can we hope to measure them? We make inferences about these latent variables by using variables that we can measure, and which we believe are directly impacted by the latent variables themselves. These observed variables can take the form of items on a questionnaire, a test, or some other score that we can obtain directly, such as behavior ratings made by a researcher of a child’s behavior on the playground. We generally conceptualize the relationship between the latent and observed variables as being causal, such that one’s level on the latent variable will have a direct impact on scores that we obtain on the observed variable. This relationship can take the form of a path diagram, as in Figure 1.1. Figure 1.1 Example Latent Model Structure X1
X2
X3 F1
X4
X5
Chapter 1 Introduction to Factor Analysis 3 We can see that each observed variable, represented by the squares, is linked to the latent variable, denoted as F1, with unidirectional arrows. These arrows come from the latent to the observed variables, indicating that the former has a causal impact on the latter. Note also that each observed variable has an additional unique source of variation, known as error and represented by the circles at the far right of the diagram. Error represents everything that might influence scores on the observed variable, other than the latent variable that is our focus. Thus, if the latent variable is mathematics aptitude, and the observed variables are responses to five items on a math test, then the errors are all of the other things that might influence those math test responses, such as an insect buzzing past, distracting noises occurring during the test administration, and so on. Finally, latent variables (i.e., the factor and error terms) in this model are represented by circles, whereas observed variables are represented by squares. This is a standard way in which such models are diagrammed, and we will use it throughout the book. In summary, we conceptualize many constructs of interest in the social sciences to be latent, or unobserved. These latent variables, such as intelligence or aptitude, are very important, both to the goal of understanding individual human beings as well as to understanding the broader world around us. However, these constructs are frequently not directly measurable, meaning that we must use some proxy, or set of proxies, in order to gain insights about them. These proxy measures, such as items on psychological scales, are linked to the latent variable in the form of a causal model, whereby the latent variable directly causes manifest outcomes on the observed variables. All other forces that might influence scores on these observed variables are lumped together in a latent variable that we call error, and which is unique to each individual indicator variable. Next, we will describe the importance of theory in both constructing and attempting to measure these latent variables.
The Importance of Theory in Doing Factor Analysis As we discussed in the previous section, latent variables are not directly observable, and we only learn about them indirectly through their impact on observed indicator variables. This is a very important concept for us to keep in mind as we move forward in this book, and with factor analysis more generally. How can we know that performance or scores on the observed variables are in fact caused by the latent variable of interest? The short answer is that we cannot know for sure. Indeed, we cannot know that the latent variable does in fact exist. Is depression a concrete,
4 Exploratory Factor Analysis real disease? Is extraversion an actual personality trait? Is there such a thing as reading aptitude? The answer to these questions is we don’t know for sure. How then can we make statements about an individual suffering from depression, or that Juan is a good reader, or that Yi is an extravert? We can make such statements because we have developed a theoretical model that explains how our observed scores should be linked to these latent variables. For example, psychologists have taken prior empirical research as well as existing theories about mood to construct a theoretical explanation for a set of behaviors that connote the presence (or absence) of depression. These symptoms might include sleep disturbance (trouble sleeping or sleeping too much), a lack of interest in formerly pleasurable activities, and contemplation of suicide. Alone, these are simply behaviors that could be derived from a variety of sources unique to each. Perhaps an individual has trouble sleeping because he is excited about a coming job change. However, if there is a theoretical basis for linking all of these behaviors together through some common cause (depression), then we can use observed responses on a questionnaire asking about them to make inferences about the latent variable. Similarly, political scientists have developed conceptual models of political outlook to characterize how people view the world. Some people have views that are characterized as being conservative, o thers have liberal views, and still others fall somewhere in between the two. This notion of political viewpoint is based on a theoretical model and is believed to drive attitudes that individuals express regarding particular societal and economic issues, which in turn are manifested in responses to items on surveys. However, as with depression, it is not possible to say with absolute certainty that political viewpoint is a true entity. Rather, we can only develop a model and then assess the extent to which observations taken from nature (i.e., responses to survey questions) match with what our theory predicts. Given this need to provide a rationale for any relationships that we see among observed variables, and that we believe is the result of some unobserved variable, having strong theory is crucial. In short, if we are to make claims about an unobserved variable (or variables) causing observed behaviors, then we need to have some conceptual basis for doing so. Otherwise, the claims about such latent relationships carry no weight. Given that factor analysis is the formalized statistical modeling of these latent variable structures, theory should play an essential role in its use. This means that prior to conducting factor analysis, we should have a theoretical basis for what we expect to find in terms of the number of latent variables (factors), and for how observed indicator variables will be associated with these factors. This does not mean that we cannot use factor analysis in an exploratory way. Indeed, the entire focus
Chapter 1 Introduction to Factor Analysis 5 of this text is on exploratory factor analysis. However, it does mean that we should have some sense for what the latent variable structure is likely to be. This translates into having a general sense for the number of factors that we are likely to find (e.g., somewhere between two and four), and how the observed variables would be expected to group together (e.g., items 1, 3, 5, and 8 should be measuring a common construct and thus should group together on a common factor). Without such a preexisting theory about the likely factor structure, we will not be able to ascertain when we have an acceptable factor solution and when we do not. Remember, we are using observed data to determine whether predictions from our factor model are accurate. This means that we need to have a sufficiently well-developed factor model so as to make predictions about what the results should look like. For example, what does theory say about the relationship between d epression and sleep disturbance? It says that individuals suffering from depression will experience what for them are unusual sleep patterns. Thus, we would expect depressed individuals to indicate that they are indeed suffering from unusual sleep patterns. In short, having a well-constructed theory about the latent structure that we are expecting to find is crucial if we are to conduct the factor analysis properly and make good sense of the results that it p rovides to us.
Comparison of Exploratory and Confirmatory Factor Analysis Factor analysis models, as a whole, exist on a continuum. At one extreme is the purely exploratory model, which incorporates no a priori information, such as the possible number of factors or how indicators are associated with factors. At the other extreme lies a purely confirmatory factor model in which the number of factors, as well as the way in which the observed indicators group onto these factors, is provided by the researcher. These modeling frameworks differ both conceptually and statistically. From a conceptual standpoint, exploratory models are used when the researcher has little or no prior information regarding the expected latent structure underlying a set of observed indicators. For example, if very little prior empirical work has been done with a set of indicators, or there is not much in the way of a theoretical framework for a factor model, then by necessity the researcher would need to engage in an exploratory investigation of the underlying factor structure. In other words, without prior information on which to base the factor analysis, the researcher cannot make any presuppositions regarding what the structure might look like, even with regard to the number of factors underlying the observed indicators. In other situations, there may be a strong theoretical basis upon which a hypothesized latent structure rests,
6 Exploratory Factor Analysis such as when a scale has been developed using well-established theories. However, if very little prior empirical work exists exploring this structure, the researcher may not be able to use a more confirmatory approach and thus would rely on exploratory factor analysis (EFA) to examine several possible factor solutions, which might be limited in terms of the number of latent variables by the theoretical framework upon which the model is based. Conceptually, a confirmatory factor analysis (CFA) approach would be used when there is both a strong theoretical expectation regarding the expected factor structure and prior empirical evidence (usually in the form of multiple EFA studies) supporting this structure. In such cases, CFA is used to (a) ascertain how well the hypothesized latent variable model fits the observed data and (b) compare a small number of models with one another in order to identify the one that yields the best fit to the data. From a statistical perspective, EFA and CFA differ in terms of the constraints that are placed upon the factor structure prior to estimation of the model parameters. With EFA there are few, if any, constraints placed on the model parameters. Observed indicators are typically allowed to have nonzero relationships with all of the factors, and the number of factors is not constrained to be a particular number. Thus, the entire EFA enterprise is concerned with answering the question of how many factors underlie an observed set of indicators, and what structure the relationship between factors and indicators takes. In contrast, CFA models are highly constrained. In most instances, each indicator variable is allowed to be associated with only a single factor, with relationships to all other factors set to 0. Furthermore, the specific factor upon which an indicator is allowed to load is predetermined by the researcher. This is why having a strong theory and prior empirical evidence is crucial to the successful fitting of CFA models. Without such strong prior information, the researcher may have difficulty in properly defining the latent structure, potentially creating a situation in which an improper model is fit to the data. The primary difficulty with fitting an incorrect model is that it may appear to fit the data reasonably well, based on statistical indices, and yet may not be the correct model. Without earlier exploration of the likely latent structure, however, it would not be possible for the researcher to know this. CFA does have the advantage of being a fully determined model, which is not the case with EFA, as we have already discussed. Thus, it is possible to come to more definitive determinations regarding which of several CFA models provides the best fit to a set of data because they can be compared directly using familiar tools such as statistical hypothesis testing. Conversely, determining the optimal EFA model for a set of data is often not a straightforward or clear process, as we will see later in the book.
Chapter 1 Introduction to Factor Analysis 7 In summary, EFA and CFA sit at opposite ends of a modeling continuum, separated by the amount of prior information and theory available to the researcher. The more such information and the stronger the theory, the more appropriate CFA will be. Conversely, the less that such prior evidence is available, and the weaker the theories about the latent structure, the more appropriate will be EFA. Finally, researchers should take care not to use both EFA and CFA on the same set of data. In cases where a small set of CFA models do not fit a set of sample data well, a researcher might use EFA in order to investigate potential alternative models. This is certainly an acceptable approach; however, the same set of data used to investigate these EFA-based alternatives should not then be used with an additional CFA model to validate what exploration has suggested might be optimal models. In such cases, the researcher would need to obtain a new sample upon which the CFA would be fit in order to investigate the plausibility of the EFA findings. If the same data were used for both analyses, the CFA model would likely yield spuriously good fit to the sample for the model, given that the sample data had already yielded the factor structure that is being tested, through the EFA.
EFA and Other Multivariate Data Reduction Techniques Factor analysis belongs to a larger family of statistical procedures known collectively as data reduction techniques. In general, all data reduction techniques are designed to take a larger set of observed variables and combine them in some way so as to yield a smaller set of variables. The differences among these methods lies in the criteria used to combine the initial set of variables. We discuss this criterion for EFA at some length in Chapter 3, namely the effort to find a factor structure that yields accurate estimates of the covariance matrix of the observed variables using a smaller set of latent variables. Another statistical analysis with the goal of reducing the number of observed variables to a smaller number of unobserved variates is discriminant analysis (DA). DA is used in situations where a researcher has two or more groups in the sample (e.g., treatment and control groups) and would like to gain insights into how the groups differ on a set of measured variables. However, rather than examining each variable separately, it is more statistically efficient to consider them collectively. In order to reduce the number of variables to consider in this case, DA can be used. As with EFA, DA uses a heuristic to combine the observed variables with one another into a smaller set of latent variables that are called discriminant functions. In this case, the algorithm finds the combination(s) that maximize the group mean difference on these functions. The number of
8 Exploratory Factor Analysis possible discriminant functions is the minimum of p and J-1, where p is the number of observed variables, and J is the number of groups. The functions resulting from DA are orthogonal to one another, meaning that they reflect different aspects of the shared group variance associated with the observed variables. The discriminant functions in DA can be expressed as follows:
Dfi = wf 1 x1i + wf 2 x2i + ⋅⋅⋅ + wfp xpi
(Equation 1.1)
where Dfi = Value of discriminant function f for individual i wfp= Discriminant weight relating function f and variable p xpi = Value of variable p for individual i. For each of these discriminant functions (Df), there is a set of weights that are akin to regression coefficients and correlations between the observed variables and the functions. Interpretation of the DA results usually involves an examination of these correlations. An observed variable having a large correlation with a discriminant function is said to be associated with that function in much the same way that indicator variables with large loadings are said to be associated with a particular factor. Quite frequently, DA is used as a follow-up procedure to a statistically significant multivariate analysis of variance (MANOVA). Variables associated with discriminant functions with statistically significantly different means among the groups can be concluded to contribute to the group mean difference associated with that function. In this way, the functions can be characterized just as factors are, by considering the variables that are most strongly associated with them. Canonical correlation (CC) works in much the same fashion as DA, except that rather than having a set of continuous observed variables and a categorical grouping variable, CC is used when there are two sets of continuous variables for which we want to know the relationship. As an example, consider a researcher who has collected intelligence test data that yields five subtest scores. In addition, she has also measured executive functioning for each subject in the sample, using an instrument that yields four subtests. The research question to be addressed in this study is, how strongly related are the measures of intelligence and executive functioning? Certainly, individual correlation coefficients could be used to examine how pairs of these variables are related to one another. However, the research question in this case is really about the extent and nature of relationships between the two
Chapter 1 Introduction to Factor Analysis 9 sets of variables. CC is designed to answer just this question, by combining each set into what are known as canonical variates. As with DA, these canonical variates are orthogonal to one another so that they extract all of the shared variance between the two sets. However, whereas DA created the discriminant function by finding the linear combinations of the observed indicators that maximized group mean differences for the functions, CC finds the linear combinations for each variable set that maximize the correlation between the resulting canonical variates. Just as with DA, each observed variable is assigned a weight that is used in creating the canonical variates. The canonical variate is expressed as in Equation 1.2. Cvi = wc1 x1i + wc2 x2i + ⋅⋅⋅ + wcp xpi
(Equation 1.2)
where Cvi = Value of canonical variate v for individual i wcp = Canonical weight relating variate v and variable p xpi = Value of variable p for individual i. Note how similar Equation 1.1 is to Equation 1.2. In both cases, the observed variables are combined to create one or more linear combination scores. The difference in the two approaches is in the criteria used to obtain the weights. As noted above, for DA the criteria involve maximizing group separation on the means of Df, whereas for CC the criteria is the maximization of correlation between Cv for the two sets of variables. The final statistical model that we will contrast with EFA is partial least squares (PLS), which is similar to CC in that it seeks to find linear combinations of two sets of variables such that the relationship between the sets will be maximized. This goal stands in contrast to EFA, in which the criterion for determining factor loadings is the optimization of accuracy in reproducing the observed variable covariance/correlation matrix. PLS differs from CC in that the criterion it uses to obtain weights involves both the maximization of the relationship between the two sets of variables as well as maximizing the explanation of variance for the variables within each set. CC does not involve this latter goal. Note that PCA, which we discuss in Chapter 3, also involved the maximization of variance explained within a set of observed variables. Thus, PLS combines, in a sense, the criteria of both CC and PCA (maximizing relationships among variable sets and maximizing explained variance within variable sets) in order to obtain linear combinations of each set of variables.
10 Exploratory Factor Analysis
A Brief Word About Software There are a large number of computer software packages that can be used to conduct exploratory factor analysis. Many of these are general statistical software packages, such as SPSS, SAS, and R. Others are specifically designed for latent variable modeling, including Mplus and EQS. For many exploratory factor analysis problems, these various software packages are all equally useful. Therefore, you should select the one with which you are most comfortable, and to which you have access. On the other hand, when faced with a nonstandard factor analysis problem, such as having multilevel data, the use of specialized software designed for these cases might be necessary. In order to make this text as useful as possible, on the book website at study.sagepub.com/researchmethods/qass/finchexploratory-factor-analysis, I have included example computer code and the annotated output for all of the examples included in the text, as well as additional examples designed to demonstrate the various analyses described here. I have attempted to avoid including computer code and output in the book itself so that we can keep our focus on the theoretical and applied aspects of exploratory factor analysis, without getting too bogged down in computer programming. However, this computer-related information does appear on the book website, and I hope that it will prove helpful to you.
Outline of the Book The focus of this book is on the various aspects of conducting and interpreting exploratory factor analysis. It is designed to serve as an accessible introduction to this topic for readers who are wholly unfamiliar with factor analysis and as a reference to those who are familiar with it and who need a primer on some aspect of the method. In Chapter 2, we will lay out the mathematical foundations of factor analysis. This discussion will start with the correlation and covariance matrices for the observed variables, which serves as the basis upon which the parameters associated with the factor analysis model are estimated. We will then turn our attention to the common factor model, which expresses mathematically what we see in Figure 1.1. We will conclude Chapter 2 with a discussion of some important statistics that will be used throughout the book to characterize the quality of a particular factor solution, including eigenvalues, communalities, and error variances. Chapter 3 presents the first major step in conducting a factor analysis, extraction of the factors themselves. Factor extraction involves the initial estimation of the latent variables that underlie a set of observed indicators.
Chapter 1 Introduction to Factor Analysis 11 We will see that there are a wide range of methods for extracting the initial factor structure, all with the goal of characterizing the latent variables in terms of the observed ones. The relationships between the observed and latent variables are expressed in the form of factor loadings, which can be interpreted as correlations between the observed and latent variables. The chapter describes various approaches for estimating these loadings, with a focus on how they differ from one another. Finally, we conclude Chapter 3 with an example. Chapter 4 picks up with the initially extracted factor loadings, with a discussion of the fact that the initially extracted loadings are rarely interpretable. In order to render them more useful in practice, we must transform them using a process known as rotation. We will see that there are two general types of rotation: one allowing factors to be correlated (oblique) and the other which restricts the correlations among the factors to be 0 (orthogonal). We will then describe how several of the more popular of these rotations work, after which we present a full example, and then conclude the chapter with a discussion of how to decide which rotation we should use. One of the truths about exploratory factor analysis is that the model is indeterminate in nature. This means that there are an infinite number of mathematically plausible solutions, and no one of them can be taken as optimal over the others. Thus, we need to have some criteria for deciding what the optimal solution is likely to be. Making this determination is the focus of Chapter 5. First and foremost, we must be sure that the solution we ultimately decide upon is conceptually meaningful. In other words, the factor model must make sense and have a basis in theory in order for us to accept it. Practically speaking, this means that the way in which the variables group together in the factors is reasonable. In addition to this theoretically based determination, there are also a number of statistical tools available to us when deciding on the number of factors to retain. Several of these are ad hoc in nature and may not provide terribly useful information. Others, however, are based in statistical theory and can provide useful inference regarding the nature of the final factor analysis model. We will devote time to a wide array of approaches, some more proven than others, but all useful to a degree. We close the chapter with a full example and some discussion regarding how the researcher should employ these various methods together in order to make the most informed decision possible regarding the number of factors to retain. We conclude the book with a chapter designed to deal with a variety of ancillary issues associated with factor analysis. These include the calculation and use of factor scores, which is somewhat controversial. Factor scores are simply individual estimates of the latent trait being measured by the observed indicator variables. They can be calculated for each member
12 Exploratory Factor Analysis of the sample and then used in subsequent analyses, such as linear regression. Given the indeterminacy of the exploratory factor model, however, there is disagreement regarding the utility of factor scores. We will examine different methods for calculating them and delve a bit into the issue of whether or not they are useful in practice. We will then consider important issues such as a priori power analysis and sample size determination, as well as the problem of missing data. These are both common issues throughout statistics and are important in exploratory factor analysis as well. We will then focus our attention on two extensions of EFA, one for cases in which we would like to investigate relationships among latent variables, but where we do not have a clear sense for what the factors should be. This exploratory structural equation modeling merges the flexibility of EFA with the ability to estimate relationships among latent variables. We will then turn our attention to the case when we have multilevel data, such that individuals are nested within some collective, such as schools or nations. We will see how ignoring this structure can result in estimation problems for the factor model parameters, but that there is a multilevel factor model available to deal with such situations. We will conclude the chapter and the book with discussions on best practices for reporting factor analysis results and where exploratory factor analysis sits within the broader framework of statistical data reduction. This discussion will include tools such as discriminant analysis, canonical correlation, and partial least squares regression. Upon completing this book, I hope that you are comfortable with the basics of exploratory factor analysis, and that you are aware of some of the exciting extensions available for use with it. Factor analysis is a powerful tool that can help us understand the latent structure underlying a set of observed data. It includes a set of statistical procedures that can be quite subtle to use and interpret. Indeed, it is not hyperbole to say that successfully using factor analysis involves as much art as it does science. Thus, it is important that when we do make use of this tool, we do so with a good sense for what it can and cannot do, and with one eye fixed firmly on the theoretical underpinnings that should serve as our foundation. With these caveats in mind, let’s dive in.
Chapter 2 MATHEMATICAL UNDERPINNINGS OF FACTOR ANALYSIS As we discussed in Chapter 1, factor analysis is an extremely popular and widely used statistical methodology across a wide array of disciplines. It is particularly common in the social and behavioral sciences, where much work focuses on understanding phenomena that cannot be directly observed, such as cognitive ability, personality, motivation, socioeconomic status, and academic achievement. In Chapter 1, we also described how factor analysis is typically classified into one of two broad categories, exploratory (EFA) and confirmatory (CFA). As was noted, each of these paradigms has its own use, and the decision regarding which approach to take is largely driven by the amount of theory and prior research that exists, with CFA requiring more of both than does EFA. In this chapter, we will briefly outline the core mathematical underpinnings of factor analysis. Despite their different applications, EFA and CFA actually share the same model and are based upon the same data source, the correlation or covariance matrix of the observed indicator variables. This chapter will not, by any means, be an exhaustive examination of the mathematics underlying the factor model. Rather, this discussion is intended to introduce the core concepts that underlie the estimation and expression of this model and the accompanying analyses. I hope that upon finishing this chapter the reader will have a solid understanding of the various parameters that make up the common factor model and of the link between the observed variable correlation (or covariance) matrix and the factor model parameters. Readers interested in a more rigorous treatment of the factor model are encouraged to read texts such as Gorsuch (1983) or Mulaik (2010). In addition, Tabachnick and Fidell (2013) provide an accessible but relatively detailed discussion of these issues.
Correlation and Covariance Matrices Estimation of factor analysis models is based upon the correlation or covariance matrix associated with the observed indicator variables. These matrices contain measures of the associations among the indicators on the off diagonal locations and their variances on the diagonal. The difference in the two types of matrices is simply whether they are standardized (correlation) or not (covariance). The correlation matrix for a set of four indicator variables is presented in Table 2.1. 13
14 Exploratory Factor Analysis The covariance matrix for these indicators is identical to the correlation matrix, except that the values are not standardized. It appears in Table 2.2. Either of these matrices can be used as the basis for the estimation of factor models in the context of both EFA and CFA. However, when the analyses are being conducted on the correlation matrix, we need not be concerned that the indicator variables are on different scales because the data are standardized, with each indicator having a variance of 1, and the relationships among the indicators bounded between −1 and 1. On the other hand, if we were to use the covariance matrix with variables on different scales, the variances and covariances in Table 2.2 would reflect not only the actual variability in the data, but also differences in the measurement scales of the variables. For example, if some of the variables included in the factor analysis were on the commonly used intelligence test scale, which has a mean of 100 and standard deviation of 15, whereas others were on a T scale, with a mean of 50 and standard deviation of 10, their elements in Table 2.2 would not be comparable. The result of this difference in scale would mean that the indicators with larger scales (e.g., mean of 100 and Table 2.1 Correlation Matrix for Four Indicator Variables X1
X2
X3
X4
X1
1
rX1,X2
rX1,X3
rX1,X4
X2
rX1,X2
1
rX2,X3
rX2,X4
X3
rX1,X3
rX1,X3
1
rX3,X4
X4
rX1,X4
rX2,X4
rX3,X4
1
Table 2.2 Covariance Matrix for Four Indicator Variables X1 X1
σ
2 X1
X2
X3
X4
covX1,X2
covX1,X3
covX1,X4
X2
covX1,X2
σ2X2
covX2,X3
covX2,X4
X3
covX1,X3
covX1,X3
σ2X3
covX3,X4
X4
covX1,X4
covX2,X4
covX3,X4
σ2X4
Chapter 2 Mathematical Underpinnings of Factor Analysis 15 standard deviation of 15) would have a disproportionate influence on the final results of the factor analysis. Therefore, using the correlation matrix as the basis for estimating the model would be preferable in this situation because all of the variables would be on the same standardized scale. One final issue that we should consider with respect to the correlation and covariance matrices is their role not only in model parameter estimation, but also in the assessment of how well a factor model fits the data. A key aspect of using factor analysis involves determining whether a given factor model provides an accurate reflection of the observed data. In other words, is the latent variable structure implied by a specific factor model an accurate representation of the relationships among the observed indicator variables? As we will see in later chapters, several of the primary tools for determining how well a given factor model fits the data are based on how accurately the model can predict the correlation or covariance matrix among the observed indicators. Thus, the correlation and covariance matrices can be seen as the central pieces of information in the estimation and verification of factor models, for both EFA and CFA.
The Common Factor Model The common factor model that links the observed indicator variables to the latent factors can be thought of as a type of regression in which the indicators serve as the dependent variables and the factors as the independent. We can express this model as follows:
y = Λξ + ε
(Equation 2.1)
where y = Matrix of observed indicator variables
ξ = Matrix of factors Λ = Factor loading matrix linking observed indicators with factors
ε = Matrix of unique factors for the indicators; i.e., sources of variance in the indicators that is not associated with ξ. We can also express the factor model in terms of a single individual, as below. yij = Λξil + εj
(Equation 2.2)
16 Exploratory Factor Analysis where yij = Value of indicator j for individual i
ξil = Value of factor l for individual i εj = Random error for indicator j. From Equation 2.2, we can see that in the factor model, the value of the observed indicator is a function of an individual’s level on the latent variable(s), the relationship between the latent and observed variables, and random error associated with the indicator. As with regression models, we assume that ε for a given indicator variable is random and independent of the ε for all other indicators, as well as independent of ξ. Within the matrix Λ are separate factor loadings relating each indicator to each factor. Very much like slopes in a regression equation, these loadings reflect the relationships between the factors and the indicators, with larger values being indicative of a closer association between the latent and observed variables. When the data are standardized (i.e., z scores) the loadings will generally range between −1 and 1, much like correlation coefficients, though in some specific instances this will not be the case, as we will discuss in Chapter 3. We can see a visualization of factor Equation 2.1 in Figure 2.1. Figure 2.1 is what is commonly referred to as a path diagram. In path diagrams, circles are used to represent latent variables, and squares or rectangles represent observed variables. This diagram illustrates for us the relationships expressed in Equation 2.1. Namely, we see that each of the indicator variables (X1–X10) is predicted by each of the factors (F1 and F2). This is a hallmark quality of an EFA model. In contrast, for a confirmatory factor model, each indicator would be associated with only one of the factors. Also note that values on the observed variables are also partially determined by the random errors, which are represented by the circles to the right of the observed variables. The absence of correlation among the errors is evident by the fact that there are no paths connecting them with one another. On the other hand, the two factors are allowed to be correlated with one another, as is expressed by the double-headed arrow linking F1 and F2. Finally, it is important to note that whereas this model allows each indicator to be associated with both factors, it is desirable that for each indicator only one of the loadings be relatively large, with the others being as close to 0 as possible. We will address this issue in much more detail in Chapters 3 and 4.
Chapter 2 Mathematical Underpinnings of Factor Analysis 17 Figure 2.1 P ath Diagram for a Two Factor EFA Model With Five Indicator Variables Associated With Each Factor X1
X2
X3 F1 X4
X5
X6
X7
X8 F2 X9
X10
18 Exploratory Factor Analysis
Correspondence Between the Factor Model and the Covariance Matrix The parameters in Equation 2.1 can be used to predict the correlation (or covariance) matrix of the indicator variables as expressed in Equation 2.3. Σ = ΛΨΛ′ + Θ
(Equation 2.3)
where Σ = Model predicted correlation or covariance matrix of the indicators Ψ = Correlation matrix for the factors Θ = Diagonal matrix of unique error variances. In other words, it is possible to take the factor model full circle from the observed correlation or covariance matrix used to estimate the parameters to a predicted correlation/covariance matrix for these indicators. As we discuss briefly below, and in more detail in Chapter 5, the correspondence between Σ and the observed correlation/covariance matrix can be used to assess the quality of the factor model. When trying to decide whether a particular factor model is appropriate given our data, one of the primary tools that we will use is the predicted correlation matrix from Equation 2.3. Specifically, we will be interested in comparing Σ to S, the observed correlation matrix for our indicator variables. In general, if the values in Σ and S are close to one another, then we can conclude that the proposed factor model fits the data well. On the other hand, if the corresponding elements of the two matrices are far apart, we would conclude that the model does not fit the data well. For example, consider the relationship between indicator variables X1 and X2, for which the correlation is 0.55. If the factor model fit the data well, then we would expect the predicted correlation between these two variables in Σ to be close to 0.55. However, if the model does not do a good job of representing the observed data, then this predicted correlation value would be relatively far from 0.55. Such a result would indicate that the proposed latent variable structure does not accurately capture the relationship between these two observed indicators. If such divergence between the observed and factor model predicted correlations were common, we would conclude that the proposed factor model does not work well for the data more generally. There are a number of statistical tools for assessing the factor model fit based upon this comparison, which we will discuss in later chapters.
Chapter 2 Mathematical Underpinnings of Factor Analysis 19
Eigenvalues A key concept underlying the linkage between the factor model parameters and correlation matrix in Equation 2.2 are eigenvalues. The mathematical details of how and where eigenvalues and eigenvectors originate is beyond the scope of this book. However, it will be helpful for us to have a conceptual understanding of what these quantities are, as they will play a central role in our use and interpretation of factor models, particularly EFA. Eigenvalues are the variances of the latent variables obtained through a mathematical decomposition of the correlation matrix. In factor analysis, each latent variable (i.e., factor) has an associated eigenvalue, with the first factor having the largest eigenvalue, the second factor the second largest, and so on. Factors with larger eigenvalues account for a greater share of the variance in the indicators. As we will see in Chapter 3, the fact that eigenvalues reflect the variance accounted for in the observed indicators by the factors will be very useful in helping us to determine how many factors should be retained in the EFA. Eigenvalues are derived through a mathematical decomposition of the correlation matrix using a set of weights known as eigenvectors. Each indicator and factor pairing has associated with it a value in the eigenvector, and the full eigenvector is in turn used to derive the eigenvalues. Thus, these eigenvectors serve as weights linking the indicators to the latent factors. The eigenvalues themselves play a key role in the estimation of the factor loadings, Λ, as we can see in Equation 2.4: Λ = V √L
(Equation 2.4)
where V = Matrix of eigenvectors L = Vector of eigenvalues. Again, we will not delve further into the issues of eigenvalues and eigenvectors, but it is helpful for us to have made their acquaintance. In order to demonstrate the calculation of factor loadings using the eigenvalues and eigenvectors, we will refer to a small correlation matrix taken from a larger example that we will be using throughout the text. This example involves scores from an adult temperament scale (ATS), which is described in more detail in Chapter 3. For our purposes here, we will use a subset of four of the scale scores, corresponding to a latent variable called negative affect. The correlation matrix appears in Table 2.3.
20 Exploratory Factor Analysis Table 2.3 Correlation Matrix for Negative Affect Variables Fear Fear
Frustration
Sadness
Discomfort
1
0.292
0.469
0.387
Frustration
0.292
1
0.245
0.188
Sadness
0.469
0.245
1
0.186
Discomfort
0.387
0.188
0.186
1
We can see that the correlations in Table 2.3 range from relatively small (0.186) to moderate (0.469) in size. We will not describe in detail the extraction of eigenvalues and eigenvectors in this book. However, we will use these values to calculate factor loadings for the four variables associated with the negative affect factor. The eigenvectors appear in Table 2.4. Table 2.4 Eigenvectors for Negative Affect Variables Fear
Frustration
Sadness
Discomfort
Fear
−0.60
0.09
−0.23
0.77
Frustration
−0.42
−0.52
0.74
−0.05
Sadness
−0.52
−0.35
−0.58
−0.53
Discomfort
−0.45
0.78
0.27
−0.36
The eigenvalues for this correlation matrix are 1.91, 0.83, 0.79, and 0.47. Applying Equation 2.4, we obtain the factor loadings for the negative affect variables shown in Table 2.5. Table 2.5 Factor Loadings for Negative Affect Variables Variable
Loading
Fear
0.85
Frustration
0.38
Sadness
0.54
Discomfort
0.43
Chapter 2 Mathematical Underpinnings of Factor Analysis 21
Error Variance and Communalities As we saw in Figure 2.1, and Equations 2.1 and 2.2, factor analysis can be viewed as a kind of regression model in which the observed indicators play the role of dependent variables and the factors serve as the independent or predictor variables. In addition, there is error associated with each of the indicators, which is assumed to be random in nature. This randomness means that the errors are implicitly assumed to be uncorrelated with one another. This lack of correlation among the errors is also referred to as local independence. Local independence simply means that scores for indicator variables associated with the same latent variable are not correlated with one another, once the latent trait is accounted for by the model. In other words, any correlation between the variables is due to their being associated with a common latent variable. From Equation 2.1 and Figure 2.1, we can see that the factor model accounts for all of the observed variation in each of the indicators, either through their individual relationships with the factor(s) or through error. Given this fact, we can use the results of the model to apportion the variability associated with the factor and variability associated with error. Typically, we express these separate variance components in terms of proportions, so that their sum is equal to 1. The proportion of variance that is accounted for by the factors is called the communality, and for uncorrelated factor models it is simply the sum of the squared factor loadings from Equation 2.1, or Cj = λj12 + λ2j2 + . . . + λjl2
(Equation 2.5)
where
λjk = The standardized loading relating factor l to indicator variable j. The amount of error variance can then be calculated as 1−Cj. As an example of calculating the communalities for the indicator variables, consider an EFA for which there are 3 factors and 12 indicators. Now imagine that for indicator variable 1, the loadings for the factors were 0.06, 0.03, and 0.58. We would then calculate the communality as 2 2 2 C1 = λ11 + λ12 + λ13 = −0.062 + 0.032 + 0.582 = 0.34.
From this value, we can conclude that together the three factors account for approximately 34% of the variance in variable X1. This fact also implies that 0.66 (1 0.34), or 66% of the variance is associated with random error. In the case of the negative affect factor described above, we have only a
22 Exploratory Factor Analysis single loading for each variable. The squared values of this loading represents the proportion of variance in each indicator that is explained by the factor: Fear = .852 = 0.72 Frustration = .382 = 0.15 Sadness = .542 = 0.29 Discomfort = .432 = 0.19 Thus, we can see that Fear has the most variance explained by the negative affect factor, and Frustration the least.
Summary In this chapter, we have very briefly examined some of the core mathematical concepts that undergird the factor model. This is certainly not an exhaustive discussion, and interested readers who wish to see a more detailed description of the model will want to examine one of the recommended readings mentioned earlier. However, I do hope that this discussion has provided the reader with an understanding of the common factor model and its relationship to the observed correlation/covariance matrix. In addition, we have attempted to make clear the correspondence between the factor model and regression, such that the outcome of interest (the observed indicators) are functions of a set of predictors (factors) and random error. The relationships between the factors and the outcomes are expressed in the form of factor loadings, such that larger values indicate a stronger relationship between a latent variable and an indicator. In the next three chapters, we will turn our attention to the fitting of EFA models. Chapter 3 will focus on the methods for initially estimating the factor loading matrix in a process known as extraction. As we will see, there are a number of methods for this purpose, with some having been shown to be optimal in certain situations. In Chapter 4, our focus will turn to the rotation of the factor loading matrix. Rotation involves transforming the initially extracted set of loadings in order to make them more interpretable. We will discuss both what we mean by more interpretable and how rotations seek to carry out this task. The goal in Chapter 5 is to describe a variety of methods for determining how many factors should be retained in the context of EFA. As noted above, a number of these approaches are directly derived from the correspondence between the observed and model predicted correlation/covariance matrices. We will limit our attention to those methods for determining the number of factors to retain that are supported in the research literature.
Chapter 3 METHODS OF FACTOR EXTRACTION IN EXPLORATORY FACTOR ANALYSIS In Chapter 2, we learned about the basic factor model and how it can be used to provide insights into the latent structure underlying a set of observed indicators, such as items on a scale, subscales obtained from a psychological or educational test battery, or measurements made on a biological process, among many other contexts. As we saw, factor analysis is traditionally divided into two separate types, based upon the use to which researchers plan to put it. When there are strong a priori hypotheses supported by previous research, researchers my use factor analysis in a confirmatory fashion in order to test whether these hypotheses about the underlying structure are likely to hold in the population. In this context, we refer to our factor model as confirmatory factor analysis (CFA). On the other hand, when such a priori hypotheses do not exist, or there is not much prior research to support them, then exploratory factor analysis (EFA) may be preferable because it places fewer restrictions on the form of the latent variable model (Brown, 2015). Therefore, as was described in Chapter 2, researchers making use of EFA may have ideas or hypotheses regarding the latent structure underlying the observed data, but they are not limiting their model’s structure to a small number of possibilities, as is the case with CFA. Rather, EFA, which is the focus of this book, allows for a very large array of potentially viable models and provides the researcher with information about these. As we will see in the following pages, this wide scope of models is both an advantage and a disadvantage of using EFA. Though it does present the data analyst with the largest possible pool of potential models thereby not foreclosing on just a few solutions, EFA may also create a situation lacking clarity regarding which of these models is optimal. In this and the succeeding chapters, we will discuss these issues and statistical tools that will help us make some decisions regarding which model(s) may prove optimal. EFA is not really a single statistical analysis, but rather it consists of multiple distinct steps, beginning with what is called factor extraction, followed by factor rotation. Concomitantly with extraction and rotation, the researcher will use several statistical tools to determine the number of factors that should be retained in the final analysis. The first step, factor extraction, involves the estimation of an initial set of factor loadings that 23
24 Exploratory Factor Analysis link the observed indicators to the unobserved factors. In this chapter, we will delve into the methods by which the initial factor structure in an EFA is obtained. As we will see, there exist a number of such approaches, each differing based upon the statistical criterion used to identify the optimal factor model parameter estimates. These initially extracted solutions are almost never interpretable in their own right, and will therefore need to be transformed in a process known as factor rotation, which we will discuss in Chapter 4. However, obtaining interpretable factor loadings through rotation cannot occur until after we first have an initial set of factor loading estimates. In order to provide some context for this chapter, let’s consider an example, which involves several subscales from a measure of an adult temperament scale (ATS), and which we touched on briefly in Chapter 2. The instrument consists of 13 subscales, each of which is a ssociated with one of four factors: Negative Affect, Effortful Control, Extraversion/Surgency, and Orienting Sensitivity. This assumed relationship between subscales and factors appears in Table 3.1. Table 3.1 Proposed Latent Structure of the Adult Temperament Scale Subscale
Hypothesized Factor
Fear
Negative Affect
Frustration
Negative Affect
Sadness
Negative Affect
Discomfort
Negative Affect
Activation Control
Effortful Control
Attentional Control
Effortful Control
Inhibitory Control
Effortful Control
Sociability
Extraversion/Surgency
High Intensity Pleasure
Extraversion/Surgency
Positive Affect
Extraversion/Surgency
Neutral Perceptual Sensitivity
Orienting Sensitivity
Affective Perceptual Sensitivity
Orienting Sensitivity
Associative Sensitivity
Orienting Sensitivity
Chapter 3 Methods of Factor Extraction in EFA 25 It is important to note at this point that although the scale developers do have a hypothesis regarding the factor structure underlying the various subscales, they are not prepared to conduct a CFA. In order to use CFA, they would need to have developed a strong theory about this latent structure that is supported by prior empirical work suggesting that the theory is in fact supported by data. In the current example, the scale is newly developed, and whereas its authors do have a fairly strong theoretical basis for the latent structure of the instrument, which they used in developing it, there has been virtually no prior factor analytic work to investigate whether this structure does indeed appear to hold in practice. Therefore, the EFA described in this and the following chapters is an initial examination of the hypothesized structure. With this background information now in hand, along with an example dataset on which to work, we are ready to learn about the various methods for extracting initial factor solutions. There are a large number of techniques available, although relatively few of these are currently used in practice. We will discuss those approaches that have been shown to be most effective in some detail, and provide a brief mention of others that are perhaps no longer widely used, but which are still of some interest either because they are known to be useful in very specific circumstances or for historical reasons. Prior to this discussion, however, we need to briefly review some important core ideas that we touched on in Chapter 2, and which will serve as the basis for our understanding of how factor extraction works.
Eigenvalues, Factor Loadings, and the Observed Correlation Matrix In Chapter 2, we discussed the basic factor model, eigenvalues, and their relationship to the correlation matrix for the observed indicator variables. As a quick reminder, the factor model can be written as follows: y = Λξ + ε
(Equation 3.1)
where y = Matrix of observed indicator variables
ξ = Matrix of factors Λ = Factor loading matrix linking observed indicators with factors
ε = Set of unique variances for the indicators; that is, variance in the indicators that is not associated with the factors.
26 Exploratory Factor Analysis The factor loadings can be interpreted as measures of the relationships between each of the latent variables (ξ) and each of the observed indicators (y). The unique variances are simply those parts of the indicators that are not associated with the underlying latent variables. The loadings and the unique variances are negatively associated with one another such that the larger a set of loadings is for an observed variable (i.e., the stronger the relationships between the variable and the factors), the smaller the unique variance that is associated with that indicator. As we noted in Chapter 2, when using the factor model we assume that the unique variances for the indicators are uncorrelated with one another. This means that any correlation among the indicators is assumed to be only as a result of the factor structure. Another important quantity implied by the model in Equation 3.1 is the communalities, the proportion of variance in each of the observed variables that is accounted for by the set of unobserved variables, or factors. Conceptually, a communality is very much analogous to R2 in regression analysis, with values ranging between 0 and 1, and those closer to 1 indicating a higher proportion of variation in the indicator being accounted for by the factors. For a given indicator, the communality can be calculated as the sum of the squared factor loadings. Thus, if variable X had factor loadings of 0.45 and 0.7 for a 2-factor solution, its corresponding communality would be 0.6925 (0.452 + 0.72). In EFA, variables are said to be associated with, or load on, the factor for which they have the largest loading. We will discuss interpretation of factor analysis below, when we consider the results from our example. As we defined them in Chapter 2, eigenvalues are the variances of the latent variables obtained through a mathematical decomposition of the correlation matrix. In factor analysis, each latent variable (i.e., factor) has an associated eigenvalue, with the first factor having the largest eigenvalue, the second factor the second largest, and so on. The factor with the largest eigenvalue accounts for the largest portion of variance in the set of observed indicators. If the number of factors retained in an analysis is equal to the number of observed indicators (which is the maximum number of factors that could be retained), then all of the observed variance in the indicators would be accounted for by the set of factors. Of course, as we discussed in Chapter 2, a primary goal of factor analysis is to reduce the dimensionality in a set of data by retaining fewer factors than there are observed variables. Because the factors are extracted so as to be uncorrelated with one another (orthogonal), the proportion of variance in the indicators that is accounted for by a given factor solution is equal to the ratio of the sum of eigenvalues for the retained factors divided by the sum of all possible eigenvalues, which is equal to the number of observed indicators. Thus, if the dataset contains 20 indicators, and 3 factors are retained, the proportion of variance accounted for by this solution is the sum of the first three eigenvalues (corresponding to the 3 retained factors) divided by 20.
Chapter 3 Methods of Factor Extraction in EFA 27 In this chapter, we will learn about methods designed to obtain initial estimates for the factor model parameters, and particularly the factor loadings (Λ) that appear in Equation 3.1. As we will see, these parameter estimates are determined using some method of minimizing the difference between the observed correlation matrix and one predicted using the model parameter estimates themselves. The predicted correlation matrix for the observed variables can be expressed in terms of the factor loadings, covariances, and error variances as Σ = ΛΨΛ′ + Θ
(Equation 3.2)
where Λ = Factor loading matrix Σ = Model predicted correlation or covariance matrix of the indicators Ψ = Correlation matrix for the factors Θ = Diagonal matrix of unique error variances. Thus, factor loading estimates are used to predict the correlations among the observed indicators. If the predictions are very close to the actual values, then we can conclude that the solution is a good one. On the other hand, if the predictions are far from the actual values, then we must conclude that our solution is poor. In the following pages, we will examine some methods for estimating factor model parameters that work by determining whether the proximity of the predicted and observed correlation values is close or not. One final point to make in regard to factor extraction is that when the number of factors retained is equal to the number of observed variables, the correlation matrix resulting from the application of Equation 3.2 will be identical to the observed correlation matrix for the indicators. However, when the number of factors retained is less than the number observed variables (which is usually the goal of factor analysis), the predicted correlation matrix will not be identical to the observed one. Thus, the goal of factor extraction methods is to retain the fewest number of factors that will still yield accurate predictions of the correlation matrix. Indeed, this is the primary problem to be solved by the extraction methods described below. It is important to reiterate, however, that the final determinant of whether a given factor solution is good rests upon its theoretical coherence. Thus, a solution that makes sense statistically but does not make sense in terms of the theory about how the observed indicators should be constituted in the latent variable context is not a good solution.
28 Exploratory Factor Analysis
Maximum Likelihood One of the most popular and widely used methods of factor extraction is maximum likelihood (ML). ML is an estimation technique that is used throughout statistics in order to obtain model parameter estimates in areas as diverse as linear and nonlinear modeling, time series analysis, and text mining, among others. In all of these applications, ML has as a common goal the calculation of parameter estimates that will maximize the likelihood of the observed data. In the case of EFA, we are particularly interested in the correlation or covariance matrix of the observed variables and thus focus on the likelihood of obtaining it given a set of factor model parameter estimates. Mathematically, we express this likelihood in the form of a fit function. In the context of factor analysis, ML is designed to provide parameter estimates that will minimize the value of the following fit function: FML = ln|Σ| + tr(Σ−1S) − ln|S| − n
(Equation 3.3)
where Σ = Predicted correlation matrix for the observed variables based upon the factor analysis model parameters |Σ| = Determinant of Σ S = Observed correlation matrix for the observed variables n = Sample size. ML uses an iterative methodology to obtain the factor model parameter estimates (i.e., loadings, factor variances and covariances), which are in turn used to calculate Σ using the approach that we outlined in Equation 3.2. The algorithm explores a range of values for these parameters, updating as it goes, and then stops when the estimates from one step to the next are very similar to one another, that is, when FML changes very little. At this point, it is said that the algorithm has converged, and ML stops. If, after a predetermined number of steps, the algorithm does not converge, it stops and returns an error, essentially telling the data analyst that it was not able to achieve a viable solution. At this point, the researcher can either change her model, increase the number of iterations, or try a different estimator. She should not, however, trust results from an ML solution that did not converge. ML has been shown to be an effective method for extracting an initial factor solution, provided that the assumptions underlying the model are met (e.g., Jöreskog, 1967). The primary assumption is multivariate normality of
Chapter 3 Methods of Factor Extraction in EFA 29 the observed indicators. In addition to the distribution of the latent variables, ML is also sensitive to the size of the sample. In particular, if the sample size is small, it can be difficult for the algorithm to reach convergence. For large samples, the goodness of fit test, which we will discuss in more detail in Chapter 5, may be too sensitive to small model misspecifications in order to be useful.
Principal Axis Factoring Another very popular factor extraction method is principal axis factoring (PAF). PAF is popular because it has been shown to work well in terms of recovering the underlying factor structure across a number of conditions (de Winter, Dodou, & Wieringa, 2009), and does not rely on the assumption of multivariate normality of the observed indicators that underlies ML. It is based upon an ordinary least squares methodology (the same family of estimators that is used in standard linear regression analysis), and minimizes the following function:
FPAF = tr[(S−Σ)2]
(Equation 3.4)
The matrices Σ and S are as defined for Equation 3.3, and tr is the trace of the difference matrix. The diagonal of the correlation matrices in Equation 3.4 contain estimates of the communalities for each of the observed variables. Following is a brief description of the iterative process underlying PAF. As is the case with ML, PAF is an iterative procedure that updates factor model parameter estimates in a series of steps that continue until the estimates do not change from one step to the next. In the initial step, the communality estimates are the squared multiple correlation (R2) values for each of the observed variables. These initial communality estimates are obtained by treating each observed indicator in turn as the dependent variable in a regression model, with all of the other indicators serving as independent variables. The resulting R2 values are then placed in the diagonal of S, and parameter estimates (e.g., factor loadings) are estimated so as to minimize FPAF in Equation 3.4. After this first pass, new communality values in S are obtained based on the initial set of factor model parameter estimates from Step 1, and new parameter estimates are then obtained by again minimizing the value in Equation 3.4 based upon this updated set of communalities in S. These steps are repeated until there is very little to no change in FPAF, at which point the algorithm is said to have converged. As an example of S, consider the simple correlation matrix in Table 3.2 for the four subscales that are thought to make up the negative affect factor.
30 Exploratory Factor Analysis Table 3.2 C orrelation Matrix for a Set of Observed Variables, With Communalities in the Diagonal Fear
Frustration
Sadness
Discomfort
Fear
0.337
0.292
0.469
0.387
Frustration
0.292
0.119
0.245
0.188
Sadness
0.469
0.245
0.233
0.186
Discomfort
0.387
0.188
0.186
0.152
The off diagonal elements of the matrix are the correlations between pairs of the observed indicators, and the diagonal elements are the initial communalities, or R2 values. After the first set of factor model parameter estimates are obtained using this matrix, the elements on the diagonal (and only those elements of the matrix) are reestimated, and replace those that appear in Table 3.2. The algorithm will then use the criterion in E quation 3.4 to update the factor model parameter estimates, recalculate communality values, and update S, until the change in FPAF is below a predetermined (very small) threshold value, at which point the algorithm stops. The model parameter estimates at this point are considered to be final and are what appear in the statistical output that the data analyst sees.
Principal Components Analysis Another statistical methodology that is very closely associated with, but different from, factor analysis is principal components analysis (PCA). PCA works in a manner very similar to that described above for PAF. Indeed, the only difference between the two approaches is that whereas for PAF the diagonal of S contains the communalities of the indicators, for PCA, the diagonal contains 1. Similar though the two methods are, the use of 1 in the diagonal rather than the communalities carries with it implications for both the data analysis and interpretation of the results. With respect to the data analysis, PCA differs from PAF in that it seeks to extract as much of all observed variance in the data as possible, whereas PAF seeks to maximize the amount of shared variance among the variables that is accounted for by the model. An examination of the correlation matrix in Table 3.2 shows that if the communalities of the observed variables are very large, the S matrix used in PAF and PCA will be quite similar to one another, as would the resulting estimates from the two techniques.
Chapter 3 Methods of Factor Extraction in EFA 31 However, as the communalities become smaller, solutions obtained by PAF and PCA will diverge from one another, and in some cases quite markedly (e.g., Widaman, 2007). The inclusion of 1s in the diagonal of S also has implications for how we interpret the results of PCA as compared to PAF. When employed with psychometric and educational scales, use of PCA implicitly assumes that our measurements are made without error. In other words, by placing 1s in the diagonal of the correlation matrix used to extract the factors, we are assuming that all of the relevant variance with regard to the construct being measured is contained in these values, and there is nothing left that is unaccounted for. In practice, this is rarely if ever the case, of course, and has served to spur debate about when PCA or factor analysis (particularly PAF) should be used, which we address in the following section. Before discussing this issue, though, we do need to note here how PCA might be best used in practice. Given that it is designed to extract all of the variance in a set of variables, and not only the shared variation, PCA is used in a somewhat different fashion from PAF or other factor extraction methods. In particular, PCA may be best thought of as a method of data reduction, as opposed to a tool for exploring the latent structure in a set of variables. Indeed, there is a long history of researchers first using PCA to reduce the dimensionality in a set of variables and creating composite scores of these variables, prior to conducting another analysis such as multiple regression (Gupta & Kabundi, 2010; Ongel, Kohler, & Harvey, 2008; Tan, Shi, Tong, & Wang, 2005; Weiss, Gale, Batty, & Deary, 2013). As an example, a researcher studying how various size measurements for a cricket (e.g., carapace length, wingspan) are related to survival may first use PCA to reduce the set of variables from a relatively large number, such as 20, to a more manageable number like five. These five components from PCA would then serve as the independent variables in the analysis linking cricket body dimensions and survival. Presumably, the components would reflect common aspects of the body dimensions and therefore make sense being included in the second data analysis. Variants of PCA have also been proposed for use with high dimensional data problems in which the number of observed variables is as large as, or even exceeds, the sample size (Yu, Yu, Tresp, Kriegel, & Wu, 2006), and as a way of dealing with the problem of collinearity, which occurs when predictors in a linear model are highly associated with one another (Fox, 2016).
Principal Components Versus Factor Analysis Given the apparent differences between PCA and factor analysis in terms of what is being fit, and how that impacts interpretation of results, there has been much discussion in the latent variable modeling literature regarding
32 Exploratory Factor Analysis under what conditions PCA or factor analysis should be used (Bentler & Kano, 1990; Floyd & Widaman, 1995; MacCallum & Tucker, 1991; Schönemann, 1990; Velicer & Jackson, 1990). It is important to point out that although researchers sometimes use PCA when investigating latent structure, in fact, that is not PCA’s designed purpose. As Widaman (2007) notes, PCA is designed to find linear combinations of the observed variables that will maximize the total variation accounted for in them, and that can yield variables for use in other analyses. On the other hand, factor analysis is designed to identify sources of shared variation (factors) underlying a set of variables, which can provide theoretically meaningful explanations for observed patterns of correlations among these variables. Indeed, as we will see in Chapter 6, there are inherent problems with using the scores from a factor analysis that are not present with components produced by PCA. Finally, simulation studies (e.g., Snook & Gorsuch, 1989; Widaman, 1993, 2007) have demonstrated that the solutions produced by PCA and factor analysis can differ markedly, leading to different conclusions regarding the nature of the unobserved structure in the data. Therefore, it is generally recommended that researchers interested in understanding the latent structure underlying a set of data rely on one of the factor analysis extraction methods, whereas those interested in reducing the dimensionality of a dataset and creating variables for use in further analyses use PCA.
Other Factor Extraction Methods In addition to the popular ML and PAF methods described above, there are a number of other extant factor extraction methods available to the data analyst. These approaches are much less frequently used in practice but do remain available in statistical software packages. One such approach is image factoring (Jöreskog, 1969). With this technique, each indicator variable is treated as the dependent variable in a regression equation where all of the other indicators serve as independent variables. Predicted values (image scores) of the dependent indicator are then obtained based upon this regression model. This process is repeated for all of the indicators, and the resulting image scores are then correlated with one another in order to create an image correlation matrix. The diagonal of this image matrix consists of the variances of the image scores, and factor model parameters are then extracted from the image matrix rather than the observed correlation matrix. Another alternative factor extraction method is alpha factoring (Kaiser & Caffrey, 1965), which is designed to identify a factor structure that maximizes Cronbach’s α coefficient of reliability. Cronbach’s α is a measure of the generalizability, sometimes referred to as the consistency, of a score on a psychological or educational scale. When α is large, we can conclude that
Chapter 3 Methods of Factor Extraction in EFA 33 the variance in the items making up the scale is largely accounted for by the true variance in the latent construct being measured. There is a direct relationship between the eigenvalue of a factor and its α value, as is demonstrated below. α=
n 1 1− n − 1 λ
(Equation 3.5)
where
λ = Eigenvalue associated with the factor n = Sample size. There exist two approaches for factor extraction that rely on the least squares methodology that is also a part of PAF. With unweighted least squares (ULS), the algorithm finds the factor solution that minimizes the squared difference in off-diagonal elements of the observed and factor model predicted correlation matrix, much as we saw in Equation 3.4. The difference between ULS and PAF is that in the latter, the communality estimates are included in the estimation of the factor solution, as they are updated along with the factor model parameters in the iterative process described above. However, with ULS the communalities are only obtained at the end of the estimation process and are not updated with each step. Thus, ULS is just a special case of PAF in which communalities are estimated separately from the other model parameters. Weighted (generalized) least squares (WLS) extraction is similar to ULS in that it seeks to find a solution that minimizes the squared difference between the model predicted and the observed off-diagonal elements of the correlation matrix, while providing an estimate of the communalities at the end of the algorithm. With WLS, however, each indicator has applied to it a weight that is proportional to the variance shared by that variable with the others in the analysis. Thus, indicators that are more closely associated with the other variables in the factor analysis will play a larger role in determining the final factor solution.
Example In order to gain a sense for how factor extraction works, let’s consider again the example that we introduced at the beginning of this chapter. In Table 3.1, we saw a set of 13 subscales from the ATS that its developers believe should be associated with four factors. In order to illustrate the factor extraction techniques that were discussed above, we will investigate the factor structure of this scale, assuming that the developers were correct. In Chapter 5, we will devote much more time to considering methods for
34 Exploratory Factor Analysis determining the number of factors that should be retained. In this chapter, however, our purpose is to gain insight into factor extraction. Table 3.3 includes the unrotated factor loadings (often called the initial factor pattern) for the ATS with four factors based upon the PAF extraction methodology. In addition to the loadings of each subscale with each factor, this table also includes the communalities for each of the indicator variables. Recall that variables are considered to belong to the factor(s) for which they have the largest loading. To help with interpretation, researchers commonly use cutoff values when determining to which factor(s) a given indicator loads. Table 3.3 U nrotated Factor Loadings for Adult Temperament Scale Scores: PAF Extraction Factor ATS Subscale
1
2
3
4
Com*
Fear
.594
−.239
−.347
.249
.592
Frustration
.263
−.462
−.130
.072
.305
Sadness
.595
−.115
−.040
.220
.417
Discomfort
.396
−.021
−.481
.065
.393
Activation Control
.157
.651
−.059
.374
.591
Attentional Control
−.136
.546
−.049
.036
.320
Inhibitory Control
−.174
.507
−.154
.017
.312
Sociability
.144
−.138
.578
.315
.472
High Intensity Pleasure
.050
−.185
.582
−.117
.389
Positive Affect
.182
.153
.475
.290
.366
Neutral Perceptual Sensitivity
.538
.197
.153
−.140
.371
Affective Perceptual Sensitivity
.697
.343
.193
−.291
.726
Associative Sensitivity
.479
.120
.029
−.392
.398
*Communality.
Chapter 3 Methods of Factor Extraction in EFA 35 If a loading equals or exceeds the cutoff, then it can be concluded that the indicator loads with the factor. Common cut-values are 0.3, 0.4, or even 0.5. When using such cutoffs, we are concerned with the absolute value of the loading, so that the sign is not an issue when determining to which factor(s) an indicator belongs. For the purposes of this illustration, we will use a cutoff of 0.3 for determining that an indicator loads with a factor. When considering Table 3.3, we see that the Fear subscale loads with Factor 1 primarily, but also with Factor 3. The Frustration subscale, on the other hand, only loads on Factor 2. We can go through each of the indicators in a similar fashion, in order to determine to which factor each belongs. From the perspective of the factors, it appears that Factor 1 includes the Fear, Sadness, Discomfort, Neutral Perceptual Sensitivity, Affective Perceptual Sensitivity, and Associative Sensitivity subscales. When reflecting on Table 3.3, we certainly notice that many of the indicators load on multiple factors, making interpretation of the results difficult. This unclear pattern of loadings illustrates the fact that the results of an initial factor extraction, such as this one, are rarely interpretable, making further analysis in the form of factor rotation necessary prior to our making any final interpretations. We will discuss this issue in a bit more detail below, and in great detail in Chapter 4. For now, we simply need to note that the initial extraction phase of our analysis is transitional and, that while necessary, does not provide us with the final results that we would use in understanding the latent structure underlying our data. In addition to the factor loadings, Table 3.3 also includes the communalities for each of the subscales. Remember that this number reflects the proportion of variance in each indicator that is accounted for by the factors and can be calculated as the sum of the squared factor loadings associated with that variable. For example, the communality for the Fear subscale is calculated as .5942 + −.2392 + −.3472 + .2492 = 0.592. From these results, we can see that the Affective Perceptual Sensitivity subscale had the largest communality, that is, had the greatest amount of variance explained by the factors, and Frustration had the lowest communality. Table 3.4 contains the initial unrotated factor loadings based on ML extraction. A comparison of these results with those in Table 3.3 shows a very similar pattern in terms of which variables are associated with which factors, though the values of the loadings themselves are different. Of course, the purpose of using factor analysis is to gain insights into the latent structure underlying a set of observed indicator variables. In the current example, we would like to know whether the hypothesized four factors
36 Exploratory Factor Analysis Table 3.4 U nrotated Factor Loadings for Adult Temperament Scale Scores: ML Extraction Factor ATS Subscale
1
2
Fear
.468
−.447
Frustration
.101
Sadness
3
4
Com*
−.371
.208
.600
−.505
−.090
.133
.291
.520
−.282
−.101
.275
.436
Discomfort
.343
−.212
−.462
−.065
.380
Activation Control
.314
.597
−.274
.330
.639
Attentional Control
.019
.565
−.144
−.026
.341
Inhibitory Control
−.031
.491
−.241
−.054
.303
Sociability
.112
−.056
.490
.433
.443
High Intensity Pleasure
.033
−.087
.597
.091
.374
Positive Affect
.220
.168
.351
.373
.339
Neutral Perceptual Sensitivity
.589
.063
.135
−.055
.372
Affective Perceptual Sensitivity
.792
.137
.193
−.210
.727
Associative Sensitivity
.516
−.046
.108
−.367
.415
*Communality.
that the developers of the ATS hypothesized to be present actually hold true in the population. Thus, two issues are of primary importance: (1) determination of the number of factors to be retained and (2) interpretation of the factor solution based upon the loadings. We will deal with the first issue in Chapter 5, as we investigate a variety of tools that can help us in ascertaining the optimal number of factors to retain. The second issue, interpretability, gets at the heart of how we understand the factors and their meanings. Ideally, for interpretation to be clear we need to have an unambiguous pattern of factor loadings such that each indicator is associated with only one factor, and each factor has multiple (preferably three or more)
Chapter 3 Methods of Factor Extraction in EFA 37 indicators associated with it. Of course, the loadings are the tool that we use to determine which indicators are associated with which factors, as we saw above. However, as was already noted, the loadings matrix from an initial extraction will rarely be interpretable because many indicators will have high loadings on more than one of the factors. Therefore, we cannot take these initial extraction results and use them to understand the latent structure in our data. Rather, we are going to need to include an additional step in our analysis, known as factor rotation (see Chapter 4), in order to yield a more easily interpretable factor loading matrix. Here we define more easily interpretable to mean that the factor loadings conform to the guidelines that we laid out above. Namely, each indicator is associated with only a single factor, and each factor has multiple indicators associated with it. Factor rotation is more likely to make this desired set of conditions a reality, though by no means does it guarantee that they will hold true. However, it is the case that they are more likely to be present after rotation has been carried out than after only the initial extraction is completed.
Summary Our focus in this chapter has been on methods for the initial extraction of parameters for the factor model that we introduced in Chapter 2. Factor extraction takes information in the correlation matrix of the observed indicators and translates it into information about an underlying factor model and, in particular, the relationships between the latent and observed variables. These relationships are expressed through the factor loadings, which are generally interpreted as correlations between the observed indicators and latent factors. We saw that there exist several approaches for factor extraction, with ML and PAF being perhaps the most popular and widely used in practice. In addition to introducing various extraction methods in this chapter, we also described PCA, which, although closely related to factor analysis, differs from it in some fundamental ways. In particular, whereas factor analysis extraction methods such as PAF and ML seek to find model parameters that account for as much of the shared variance among the indicators as possible, PCA identifies components that account for the maximum amount of total variance in the indicators. This mathematical difference in the models leads to differences in the way that factor analysis and PCA are typically used. The goal of a factor analysis is to learn how the indicators group together with factors, and thereby to gain a greater understanding of what the indicators are actually measuring. For example, when we use EFA with the ATS we want to learn about how temperament is organized in the respondents’ minds. Does it fit the predetermined structure that the scale
38 Exploratory Factor Analysis developers believe exists, or does it conform to some other structure? The manner in which the indicators group together within factors will help us to answer this question. In contrast, because PCA extracts the total variance in a set of variables, it is most appropriately used to reduce the dimensionality in a set of indicators by creating linear combinations of them that can be used in other analyses. For this reason, PCA is a powerful tool for dealing with high dimensional data, collinearity among predictors in a regression, and in the creation of new variables for use in further analyses. It is not, however, generally recommended for use by researchers looking for insights into the latent structure for a set of observed variables (Costello & Osborne, 2005; Widaman, 2007). We concluded this chapter with an example demonstrating factor extraction using ML and PAF, and walked through in interpretation of the factor loading matrix. However, we also noted that the results from an initial factor extraction will rarely meet the level of interpretability that researchers require. Indicators will frequently have loadings in excess of standard cutoffs (e.g., 0.3) for multiple factors, making it difficult, if not impossible, to clearly determine which observed variables are associated with which latent ones. As we noted, the ideal situation is for each indicator to load on only one factor and each factor to have multiple associated indicators. The further from this ideal a given solution strays, the less interpretable it will be. In almost every real-world scenario, the initial factor extraction will stray far from this ideal. Therefore, to deal with this problem, we will need to use factor rotation, which is the topic of the next chapter. Factor rotation will generally, though not always, yield more interpretable solutions, based upon the guidelines listed above. In summary then, factor extraction is the first step in conducting EFA and is key to gaining an understanding of the latent structure underlying a set of data. It is not the final step, however, and will be used in conjunction with rotation methods, such as those that we describe in Chapter 4, to yield a final EFA solution.
Chapter 4 METHODS OF FACTOR ROTATION In Chapter 3, we devoted our attention to the initial extraction of factors in exploratory factor analysis (EFA). We learned that there are a number of approaches for doing this, with some, such as maximum likelihood and principal axis factoring, being more widely used than others. As we also noted in the previous chapter, these approaches for extracting factors in EFA yield results that are inherently indeterminate in nature. In other words, there are an infinite number of combinations of the factor loadings that will yield precisely the same mathematical fit to the data, that is, the same prediction of the observed covariance matrix. We also saw that the initially extracted loadings are often not particularly interpretable because the indicator variables are associated with, or load on, multiple factors. Given these facts, how can we identify a single factor loading solution for our set of data? As we will discuss in this chapter, a more interpretable factor loading matrix can be derived from the initially extracted one using a process known as factor rotation. Simply put, rotation refers to a mathematical transformation of the initially extracted set of factor loadings with the goal of relating each observed indicator variable to a single latent factor. In the following pages, we will learn about a variety of rotations, including how they work, how they are similar to and different from one another, and the ways in which they can be interpreted.
Simple Structure When factor rotation works well, the result is an interpretable factor loading matrix that provides a simple structure solution. Simple factor structure was defined by Thurstone (1947) as occurring when the following conditions were met: (1) Each indicator has at least one zero loading; (2) each factor has at least m zero loadings, where m is the number of factors; (3) every pair of factors has multiple indicators with a zero loading for one of the factors but not the other; (4) if more than four factors are retained, then every pair of factors should have several indicators with zero loadings on both factors; and (5) every pair of factors should have few rows with nonzero loadings for both factors. Thus, the goal of factor rotation is to adjust all of the loadings so that each indicator has a large value for one of the factors and small values for all of the others. This is the way in which simple structure is typically described in practice. It is important to note 39
40 Exploratory Factor Analysis that even as rotation transforms the loadings to conform to a simple structure solution, however, the underlying fit of the model is not altered at all, meaning that the predicted values of Σ for the unrotated and rotated solutions are exactly the same. This means that the variance in the observed indicators that is accounted for by the factor structure does not change with rotation. In the remainder of this chapter, we will first define the two broad categories of rotations, orthogonal and oblique, after which we will dive more deeply into descriptions of some of the more popular rotations that fall within each of these broad categories. We will then discuss two different approaches to rotation, known as target and bifactor rotations, which are more similar to confirmatory factor analysis. Finally, we will conclude the chapter with a demonstration of rotation using the example data that we introduced in the previous chapter.
Orthogonal Versus Oblique Rotation Methods Factor rotation methods are generally divided into two broad families, orthogonal and oblique. Orthogonal rotations constrain the correlations among factors to be 0, whereas oblique rotations allow the factors to have a nonzero correlation. The issue of correlation among the factors is important to consider both conceptually and statistically. From a conceptual point of view, we are likely to be interested in whether the latent traits that are associated with the observed indicators are, or are not, correlated with one another. For example, an educational researcher investigating creativity may wish to know whether creativity is associated with intelligence, both of which can be represented as latent variables in a factor model. Likewise, a psychologist may have hypotheses about relationships between latent variables associated with personality and those associated with executive functioning. In both situations, prior literature in the field of study will provide guidance with regard to whether latent variables should be correlated with one another. In addition, the researcher may be interested in the extent to which two measures are redundant, or measure the same construct. Such redundancy can be investigated through an examination of the correlations among the latent variables underlying the measures. If the correlation is quite large (e.g., 0.9) then we may conclude that the two instruments associated with these latent variables are really measuring the same or a closely aligned construct, and thus do not need to be included simultaneously in further analyses. In addition to these conceptual issues regarding correlations among the latent factors, there are also statistical considerations to be made. Specifically, the decision of whether or not to allow factors to be correlated in an
Chapter 4 Methods of Factor Rotation 41 EFA has implications for the interpretation of the factor loadings themselves. When an orthogonal rotation is used, the resulting factor loadings are the correlations between the observed indicators and the latent factors. On the other hand, an oblique rotation produces two separate sets of factor loadings, one known as the pattern matrix and the other as the structure matrix. The values in the pattern matrix represent the unique relationships between each of the observed indicator variables and each of the latent factors. In other words, a pattern value for observed variable X1 to factor F1 reveals the relationship between X1 and F1, after controlling for the influence of the other factors in the model. In this respect, the oblique rotations are very much akin to partial regression coefficients. On the other hand, the structure matrix reflects the relationships between the observed indicators and the individual factors, without controlling for the other factors in the model. Thus, the structure value for indicator X1 to factor F1 reflects the direct relationship between X1 and F1, as well as the variation that is shared by F1 with the other factors in the model. This extra loading matrix results in greater complexity for researchers when interpreting the EFA results. In actual practice, researchers most frequently use the pattern matrix to interpret the results of EFA, given that it reflects the unique relationships between the observed variables and the latent factors, with the impact of all other latent variables removed. In doing so, however, it is important to keep in mind that the pattern loading does not represent the correlation between the latent and observed variables, but is more similar to a partial regression coefficient in meaning. Given these two broad families of rotations, a natural question for the reader to ask is, under what conditions is one approach preferable to the other? As with many issues in statistics (and in life more generally), there is no single best answer to this question for all situations. However, it is possible to make some general recommendations for practice. Perhaps the first such suggestion would be for the researcher to apply an oblique rotation to the initially extracted results, in order to obtain the interfactor correlation matrix. If one or more of these correlations are relatively large, then the researcher is probably best suited using an oblique rotation as it better reflects the data at hand. Tabachnick and Fidell (2013) recommend a cutvalue of 0.32 for identifying interfactor correlations that are sufficiently large to warrant the use of an oblique solution. Alternatively, the effect size guidelines for interpreting Pearson’s correlation coefficient provided by Cohen (1988) could also be employed. Thus, values of 0.2 or more, signifying a moderate relationship according to Cohen, might be sufficient to warrant the application of an oblique rotation. The point here is that whereas it is recommended to use an examination of the interfactor correlation to determine whether an oblique or orthogonal rotation is optimal, the magnitude of these
42 Exploratory Factor Analysis correlations that would suggest the use of one rotation approach over the other is less clear. The smaller the correlations among the factors, the more similar the pattern and structure matrices will be to one another, and to the rotation matrix produced by an orthogonal rotation. In the final analysis, the decision regarding which family of rotations to use must be scientifically defensible (i.e., make sense theoretically), and be based on the observed data, in the form of the interfactor correlation matrix (Browne, 2001).
Common Orthogonal Rotations Within the broad framework of orthogonal rotations, there are a number of possibilities from which to choose, much as was the case with factor extraction. In this chapter, we will only discuss the rotations that are most commonly used and well supported by research, but it is important to remember that they don’t represent the sum total of techniques that could be employed in practice. In addition, we should keep in mind that statistical research has not identified a single rotation that is always optimal under every situation. Thus, the approaches described here, while certainly useful in many situations, should not be seen as the full set of methods that could be used, but rather should be viewed as a starting point for further investigation for interested readers. Let’s begin our discussion of orthogonal rotations by taking a look at the unrotated factor loadings for the adult temperament scale (ATS) from Chapter 3, which appear in Table 4.1. An examination of these results makes it clear that Thurstone’s simple structure is not present. Several of the variables are cross-loaded, meaning that they have nontrivial factor loadings on multiple variables. As we discussed in Chapter 3, a loading may be considered nontrivial if it exceeds 0.3 in absolute value. For example, the Fear subscale is cross-loaded on Factors 1 and 3, whereas Activation Control is cross-loaded on Factors 2 and 4. Indeed, several other of the subscales also exhibit cross-loaded factor loading patterns, making interpretation of the latent variables rather difficult. This is where factor rotation can prove helpful.
Varimax Rotation Perhaps the most widely used orthogonal factor rotation method is Varimax, or variance maximizing (Kaiser, 1958). Quite simply, the goal of this approach is to maximize the variance in the factor loadings for each factor across the indicators, by making relatively large loadings even larger, and relatively small loadings even smaller. This transformation of the loadings is carried out using matrix algebra, by multiplying the initially extracted
Chapter 4 Methods of Factor Rotation 43 Table 4.1 U nrotated Factor Loadings for Adult Temperament Scale Scores: PAF Extraction Factor ATS Subscale
1
Fear
2
3
4
Com*
.59
−.24
−.35
.25
.59
Frustration
.26
−.46
−.13
.07
.30
Sadness
.59
−.11
−.04
.22
.42
Discomfort
.40
−.02
−.48
.06
.39
Activation Control
.16
.65
−.06
.37
.59
Attentional Control
−.14
.55
−.05
.04
.32
Inhibitory Control
−.17
.51
−.15
.02
.31
Sociability
.14
−.14
.58
.31
.47
High Intensity Pleasure
.05
−.18
.58
−.12
.39
Positive Affect
.18
.15
.47
.29
.37
Neutral Perceptual Sensitivity
.54
.20
.15
−.14
.37
Affective Perceptual Sensitivity
.70
.34
.19
−.29
.73
Associative Sensitivity
.48
.12
.03
−.39
.40
*Communality.
factor loadings (e.g., those in Table 4.1) with a transformation matrix that contains the sines and cosines of a particular angle, Ψ. The elements of the transformation matrix are determined by maximizing the Varimax criterion, which appears in Table A1 in the Appendix to this chapter. Thus, the algorithm underlying the Varimax method identifies the transformation matrix that can be applied to the original, unrotated factor loadings (λ in the equation) so as to maximize the Varimax criterion in Table A1. The resulting loadings should more clearly reflect simple structure and can then be used by the researcher to characterize the underlying latent variables. In order to see how this process works in some detail, let’s consider a small dataset in which we have four indicators and two factors. We will introduce this data in more detail in the example near the end of this chapter. The initial unrotated factor loading matrix appears in Table 4.2.
44 Exploratory Factor Analysis Table 4.2 Unrotated Factor Loadings for Achievement Motivation Scale Indicator
Factor 1
Factor 2
MAP
.30
.73
MAV
.43
.72
PAP
.84
−.31
PAV
.91
−.29
Using the Varimax criterion, the factor transformation matrix was identified, and appears in Table 4.3. Table 4.3 V arimax Rotated Factor Loadings for Achievement Motivation Scale Factor
1
2
1
.92
.39
2
−.39
.92
Using matrix algebra, we then multiply these matrices by one another in order to obtain the Varimax transformed loading matrix. Table 4.4 Calculation of Varimax Rotated Factor Loadings Indicator
Factor 1
Factor 2
MAP
.30* .92 + .73*−.39 = − .0001
.30* .39 + .73*.92 = .79
MAV
.43* .92 + .72* − .39 = .12
.43* .39 + .72*.92 = .82
PAP
.84* .92 + − .31* − .39 = .90
.84* .39 + − .31*.92 = .04
PAV
.91* .92 + − .29* − .39 = .95
.91* .39 + − .29*.92 = .09
The Varimax rotated loadings for the ATS appear in Table 4.5. For the unrotated solution, 6 of the 13 indicators were cross-loaded, meaning that they had loadings of 0.3 or larger (in absolute value) for more than one factor. In the Varimax rotated solution, only two of the indicators were cross-loaded, with the others exhibiting simple structure. Thus, the results from the rotated solution should be easier to interpret conceptually than are those from the unrotated. Finally, notice that the communalities (the sum of the squared factor loadings for each indicator variable), or the proportion of variance in the indicators that is explained by the latent
Chapter 4 Methods of Factor Rotation 45 Table 4.5 V arimax Rotated Factor Loadings for Adult Temperament Scale Scores: PAF Extraction Factor ATS Subscale
1
2
Fear
.14
3
4
Com*
.74
−.15
−.03
.59
−.02
.36
−.42
−.01
.30
Sadness
.26
.55
−.10
.20
.42
Discomfort
.14
.54
.03
−.28
.39
Activation Control
.08
.21
.71
.18
.59
Attentional Control
.04
−.15
.54
−.05
.32
Inhibitory Control
−.01
−.12
.52
−.16
.31
−.005
−.003
−.14
.67
.47
High Intensity Pleasure
.17
−.29
−.31
.43
.39
Positive Affect
.10
.01
.14
.58
.37
Neutral Perceptual Sensitivity
.56
.16
.06
.16
.37
Affective Perceptual Sensitivity
.82
.14
.13
.14
.73
Associative Sensitivity
.62
.07
−.06
−.09
.40
Frustration
Sociability
*Communality.
variables, is the same in the rotated and unrotated solutions. As an exercise, let’s calculate the communality for the first indicator variable, Fear. Communality = 0.1392 + 0.7402 + (−0.155)2 + (−0.034)2 = 0.592 This result matches the communality for the unrotated loading matrix that we examined in Chapter 3. We discussed earlier the fact that mathematically the rotated and unrotated solutions are identical, meaning that the communalities will be the same for the two, even though the specific loadings may be quite different. Thus, we can think of rotation as the process of reorganizing how indicators are related to each factor within a given model, with no impact on how strongly related each indicator is related to the set of factors.
46 Exploratory Factor Analysis
Quartimax Rotation An alternative orthogonal rotation to Varimax is Quartimax (Carroll, 1953), which is designed to maximize the variation of loadings within indicators, across the factors. Note that this is opposite of the approach underlying Varimax, which maximizes variance across factors within each of the indicators. This criterion is known as Quartimax because it involves the raising of loadings to the 4th power, as can be seen in Table A1. The algorithm underlying Quartimax identifies the transformation matrix that maximizes the Quartimax criterion, thereby yielding the rotated loading values. The Quartimax loadings for the ATS appear in Table 4.6. These results are very similar to those for Varimax rotation. Indeed, in many (perhaps most) cases, the results obtained from the different orthogonal rotation approaches will be very similar to one another, particularly with respect to which variables are associated with which factors. Table 4.6 Q uartimax Rotated Factor Loadings for Adult Temperament Scale Scores: PAF Extraction Factor ATS Subscale
1
2
3
4
Fear
.13
.74
−.16
−.03
−.02
.36
−.42
−.01
Sadness
.25
.55
−.11
.21
Discomfort
.14
.54
.03
−.28
Activation Control
.08
.22
.71
.19
Attentional Control
.04
−.14
.55
−.05
Inhibitory Control
−.01
−.11
.53
−.15
−.002
−.01
−.15
.67
High Intensity Pleasure
.17
−.30
−.31
.42
Positive Affect
.11
.002
.13
.58
Neutral Perceptual Sensitivity
.56
.17
.06
.16
Affective Perceptual Sensitivity
.82
.14
.13
.14
Associative Sensitivity
.62
.07
−.06
−.09
Frustration
Sociability
Chapter 4 Methods of Factor Rotation 47
Equamax Rotation A third orthogonal rotation that is frequently used in practice is Equamax (Crawford & Ferguson, 1970), which combines the Varimax and Quartimax approaches. This rotation criterion appears in Table A1. One primary difference in the equation for the Equamax rotated solution is that it includes a term for the number of factors, m, which does not appear in either Varimax or Quartimax. The Equamax solution for the adult temperament scale appears in Table 4.7. The results for this solution are quite similar to those for both Quartimax and Varimax, which is in keeping with our earlier statement that these orthogonal approaches quite often yield very similar results to one another. As with the other orthogonal rotations, two of the indicators remain cross-loaded.
Table 4.7 E quamax Rotated Factor Loadings for Adult Temperament Scale Scores: PAF Extraction Factor ATS Subscale
1
2
Fear
.14
3
4
.74
−.15
−.04
−.02
.37
−.41
−.01
Sadness
.26
.55
−.09
.20
Discomfort
.15
.54
.04
−.29
Activation Control
.08
.20
.72
.16
Attentional Control
.04
−.15
.54
−.06
Inhibitory Control
−.01
−.13
.52
−.16
Sociability
−.01
.004
−.13
.67
High Intensity Pleasure
.16
−.28
−.305
.44
Positive Affect
.10
.01
.14
.58
Neutral Perceptual Sensitivity
.56
.16
.06
.16
Affective Perceptual Sensitivity
.82
.14
.13
.14
.612
.06
−.06
−.09
Frustration
Associative Sensitivity
48 Exploratory Factor Analysis
Common Oblique Rotations Earlier in this chapter, we introduced the two broad families of factor rotations, orthogonal, which constrain all factor correlations to be equal to 0, and oblique, which allow factors to have nonzero correlations with one another. We noted that a major advantage of the orthogonal approach is in interpretation, which involves examining only one set of rotated factor loadings. Oblique rotations, however, produce two sets of loadings that tell us different things about the relationships between the indicators and factors. The pattern matrix reveals the independent relationship between each indicator and each factor, after the other factors have been accounted for, whereas the structure matrix reveals the relationships between indicators and factors, without first removing the influence of the other factors in the model. We also noted that in practice researchers more often refer to the pattern matrix because it provides information about the unique relationships among the indicators and factors. However, this is not to say that the structure matrix is without meaning, as we will see in the extended example below. Finally, before discussing specific oblique rotations, it is worth noting once again the fact that in most real-world research scenarios we would anticipate the latent variables under investigation to be correlated with one another, at least to some degree. Next, we will consider some of the more commonly used oblique rotation strategies that appear in the literature. We do need to keep in mind, however, that many other alternatives are available and that it is worth the interested reader’s time to investigate these.
Promax Rotation Perhaps the most commonly used of all rotations, both oblique and orthogonal, is Promax (Hendrickson & White, 1964). This rotation algorithm uses a Varimax rotation solution as an initial set of rotated loadings. These initial loadings are then transformed further by being raised to a particular power, typically the 4th, after which a transformation matrix is obtained using least squares estimation so as to maximize the Promax criterion. The values in this second transformation matrix allow the factors to be correlated with one another. The power used in the Promax criterion, known as kappa, can be set by the researcher. Its value controls, in part, the maximum possible correlation that can be obtained for a given dataset. Larger values of kappa generally yield larger interfactor correlations and pattern loadings that tend toward more simple structure solutions. However, in practice, it is unusual for a researcher to alter kappa from the default value of 4. The unrotated, Varimax, and Promax loadings for Factors 1 and 2 are plotted in Figure 4.1.
Chapter 4 Methods of Factor Rotation 49 Figure 4.1 U nrotated, Varimax, and Promax Loadings for Factors 1 and 2 of the Adult Temperament Scale Unrotated Loadings
0.8
Activation Control
0.6 Attentional Control Inhibitory Control 0.4
Affective Perceptual Sensitivity
Factor 2
0.2
Neutral Perceptual Sensitivity Associative Sensitivity
Positive Affect
0 –0.3
–0.2
–0.1
0
0.1
0.2
0.3
0.4
Discomfort 0.5
Sociability High Intensity Pleasure
-0.2
0.6 0.7 Sadness
0.8
Fear
-0.4 Frustration -0.6
Factor 1
Varimax Loadings 0.800 Fear
0.600 Discomfort
Factor 2
0.400
0.200
Sadness
Frustration
Activation Control
Sociability Positive Affect 0.000 –0.100 0.000 0.100 0.200 0.300
Neutral Perceptual Affective Perceptual Sensitivity Sensitivity Associative Sensitivity 0.400
0.500
0.600
0.700
0.800
0.900
Inhibitory Control Attentional Control –0.200 High Intensity Pleasure –0.400
Factor 1
(Continued)
50 Exploratory Factor Analysis Figure 4.1 (Continued) Promax Loadings
1.000
0.800
Fear
0.600 Sadness Discomfort
Factor 2
0.400 Frustration 0.200
–0.200
Activation Control
Sociability 0.000 Positive Affect 0.000 0.200 –0.200
Neutral Perceptual Sensitivity 0.400
0.600
Affective Perceptual Sensitivity Associative Sensitivity 0.800 1.000
Inhibitory Control Attentional Control High Intensity Pleasure
-0.400
Factor 1
The loadings plotted in the three panels of Figure 4.1 show that the Promax and Varimax loadings are generally similar to one another in location, with Promax values conforming slightly more closely to simple structure (most loadings are near one axis and far from the other) than do the Varimax loadings. When interpreting results obtained from an oblique rotation, perhaps the first thing that a researcher needs to do is examine the factor correlation matrix. This matrix for the ATS appears in Table 4.8. Our goal in examining these correlation values is twofold. First, we need to determine whether an oblique rotation is appropriate for the data at hand, which is the case if one or more of the correlation coefficients are sufficiently large. What is sufficiently large? As we discussed earlier in the chapter, Table 4.8 F actor Correlations for the Promax Rotation of the Adult Temperament Scale Factor 1
Factor 2
Factor 3
Factor 4
Factor 1
1.00
.29
.09
.29
Factor 2
.29
1.00
.03
.00
Factor 3
.09
.03
1.00
−.09
Factor 4
.29
.00
−.09
1.00
Chapter 4 Methods of Factor Rotation 51 there is not one agreed upon standard, but common values are 0.1, corresponding to the small effect size as suggested by Cohen (1988), 0.2, or even 0.3, which corresponds to Cohen’s moderate effect size recommendation. Whichever value is selected, it is important for the researcher to make a conceptually compelling case that the factors exhibit sufficient correlations among one another so as to warrant the use of an oblique rotation. In this example, the correlations between Factor 1 and Factors 2 and 3 are close to 0.3, which provides sufficient evidence that an oblique rotation is appropriate in this case. One other note regarding this decision is that if the factors are not highly correlated, the loadings obtained using orthogonal and oblique rotations will be very similar to one another, meaning that we will likely come to similar conclusions about the latent structure in such cases, regardless of the rotation strategy that we choose. The Promax rotated pattern loadings appear in Table 4.9. Remember that the pattern loadings reflect the unique relationship between each indicator and each factor. These results are very similar to Table 4.9 P romax Rotated Factor Pattern Loadings for Adult Temperament Scale Scores: PAF Extraction Factor ATS Subscale
1
2
3
4
Fear
.005
.76
−.13
−.01
Frustration
−.07
.40
−.40
−.004
Sadness
.13
.55
−.07
.21
Discomfort
.08
.54
.03
−.28
Activation Control
−.03
.18
.74
.22
Attentional Control
.05
−.18
.53
−.05
Inhibitory Control
.01
−.15
.51
−.15
−.11
.01
−.09
.69
.18
−.30
−.30
.39
.002
−.01
.17
.59
Neutral Perceptual Sensitivity
.53
.10
.05
.09
Affective Perceptual Sensitivity
.81
.05
.11
.04
Associative Sensitivity
.66
.001
−.10
−.19
Sociability High Intensity Pleasure Positive Affect
52 Exploratory Factor Analysis those obtained using any of the orthogonal techniques, with two of the indicators exhibiting cross-loaded values. However, this similarity in results between orthogonal and oblique rotations is not always the case, particularly when the correlations among the factors are large, unlike in the current example, where the correlations are relatively modest in size. Before moving on from Promax rotation, we should consider the structure matrix, which reflects the relationships between individual indicators and factors, without accounting for the other factors that appear in the model. The pattern (Λ) and structure (VR) loading matrices are related through this equation: Λ=VRDV
(Equation 4.1)
where
( )
−1 DV = diag RVV
RVV = Matrix of correlations among the reference axes. The structure loadings for the ATS appear in Table 4.10. Table 4.10 P romax Rotated Factor Structure Loadings for Adult Temperament Scale Scores: PAF Extraction Factor ATS Subscale
1
2
3
4
Fear
.21
.76
−.11
.004
.003
.36
−.40
.01
Sadness
.34
.58
−.06
.25
Discomfort
.16
.57
.08
−.25
Activation Control
.15
.19
.72
.14
Attentional Control
.03
−.15
.54
−.08
Inhibitory Control
−.03
−.13
.52
−.19
Sociability
.08
−.03
−.17
.67
High Intensity Pleasure
.18
−.26
−.32
.47
Positive Affect
.19
−.001
.12
.58
Neutral Perceptual Sensitivity
.59
.26
.10
.24
Affective Perceptual Sensitivity
.84
.28
.18
.26
Associative Sensitivity
.60
.19
−.02
.01
Frustration
Chapter 4 Methods of Factor Rotation 53 As with the comparison of the Promax pattern loadings to the Varimax loadings, there are not great differences between the pattern and structure loadings here. Once again, this similarity in values is due to the relatively modest correlations among the factors. Larger such correlations will result in a greater difference between the two sets of loadings.
Oblimin A second commonly used oblique rotation is Oblimin (Jennrich & Sampson, 1966). It is sometimes referred to as Direct Oblimin because it provides a direct estimate of the pattern loading matrix. The criterion for this oblique rotational strategy is designed to yield simple structure by minimizing the cross-products of factor loadings, as can be seen in the Oblimin criterion that appears in Table A1. Oblimin was originally developed as a combination of two earlier rotation strategies that were found to be somewhat problematic in practice, quartimin (Carroll, 1957) and covarimin (Kaiser, 1958). However, the Oblimin approach has been shown to be more effective than its predecessors at recovering the latent structure (Gorsuch, 1983). As with Promax, there is a control parameter, delta, that can be adjusted by the researcher. And similarly to kappa, delta impacts the level of correlation among the latent traits, and therefore the values of the factor loadings as well. When delta is set to a positive value, the correlations among the factors will be relatively larger than when it is set to a negative number. Gorsuch (1983) recommends that the researcher using Oblimin consider a range of values between 1 and –4, and select the value that maximizes simple structure among the pattern loadings. The default delta value in common software packages such as SPSS is 0, and typically this value can be used successfully. Table 4.11 contains the Oblimin pattern loadings for the ATS using a delta of 0. When comparing these results to those obtained using the Promax rotation, perhaps the most interesting point to note is that the factors with which some of the indicator variables are associated (i.e., load on) differ between the two rotations, although the groupings of the variables are the same. For example, the variables Fear, Sadness, Discomfort, and Frustration (which cross-loads) are all associated with Factor 4 in the Oblimin rotation results, whereas for the Promax rotation results, they were associated with F actor 2. Similarly, the indicators Activation Control, Attentional Control, Inhibitory Control, and Frustration were all associated with Factor 3 in the Promax solution and Factor 2 in the Oblimin rotation. This switching of factor associations from one solution to another is not problematic and does not represent any
54 Exploratory Factor Analysis Table 4.11 Oblimin Rotated Factor Pattern Loadings for Adult Temperament Scale Scores: PAF Extraction Factor ATS Subscale
1
2
3
4
Fear
.22
−.19
−.01
.76
Frustration
.02
−.43
.01
.38
Sadness
.34
−.13
.24
.56
Discomfort
.18
.03
−.26
.57
Activation Control
.13
.69
.16
.17
Attentional Control
.02
.55
−.07
−.16
Inhibitory Control
−.04
.54
−.18
−.13
Sociability
.06
−.18
.67
−.05
High Intensity Pleasure
.17
−.31
.45
−.29
Positive Affect
.17
.10
.58
−.04
Neutral Perceptual Sensitivity
.59
.06
.21
.21
Affective Perceptual Sensitivity
.84
.13
.21
.21
Associative Sensitivity
.61
−.04
−.03
.14
inconsistency in the results. Recall from Chapter 3 that the factors are arbitrary mathematical constructs, so that any meaning that they carry in the real world is due solely to the ways in which the observed variables are associated with them. This means that Factor 2 in the Promax rotation corresponds directly to Factor 4 in the Oblimin rotation, for example. A final interesting point of divergence between the two rotated solutions is that in the case of Promax, the variable High Intensity Pleasure cross-loaded on Factors 2, 3, and 4, whereas for Oblimin it only cross-loaded on Factors 2 and 3. As an exercise, you might try several different values of delta with the Oblimin solution to see how the results change for this dataset, which can be found on the companion website for the book. The interfactor correlation matrix for the Oblimin rotation appears in Table 4.12.
Chapter 4 Methods of Factor Rotation 55 Table 4.12
actor Correlations for the Oblimin Rotation of the Adult F Temperament Scale Factor 1
Factor 2
Factor 3
Factor 4
Factor 1
1.00
.02
.19
.22
Factor 2
.02
1.00
−.10
−.09
Factor 3
.19
−.10
1.00
−.06
Factor 4
.22
−.09
−.06
1.00
The interfactor correlations for the Oblimin rotation are similar to those that were obtained from the Promax rotation. Note once again that the factor labels are switched for the Oblimin rotation when compared to the Promax results.
Geomin Rotation The next rotation approach that we will consider in this chapter is Geomin (Yates, 1987). Geomin is an oblique rotation that allows the factors to be correlated with one another, as do Promax and Oblimin. The Geomin criterion, which appears in Table A1, includes a small penalty term to the evaluation of factor loadings. Prior research has demonstrated that Geomin performs particularly well at recovering the latent structure underlying a set of observed indicators for simple structure situations, and when there are fewer than three nonzero loadings for each of the variables (Asparouhov & Muthèn, 2009; Finch, 2011; Sass & Schmitt, 2010). With regard to the penalty term, ∈, Marsh et al. (2010) recommended using a value of 0.5, which is what is typically employed in standard statistical software, such as Mplus. The interfactor correlations for the four-factor solution with the ATS data based on the Geomin criterion appear in Table 4.13. Table 4.13 Factor Correlations for the Geomin Rotation of the Adult Temperament Scale Factor 1
Factor 2
Factor 3
Factor 4
Factor 1
1.00
.02
−.05
.27
Factor 2
.02
1.00
−.14
.17
Factor 3
−.05
−.14
1.00
.20
Factor 4
.27
.17
.20
1.00
56 Exploratory Factor Analysis These correlation estimates are generally similar to those obtained using the Promax and Oblimin rotations. The Geomin factor loadings appear in Table 4.14. These loadings provide a similar story to the results presented above with respect to the underlying latent structure in the ATS. Fear, Frustration, Sadness, and Discomfort are associated with Factor 1, whereas Activation Control, Attentional Control, Inhibitory Control, and High Intensity Pleasure are associated with the Factor 2. The scales loading on Factor 3 are Sociability, High Intensity Pleasure, and Positive Affect, and those loading on Factor 4 are Neutral Perceptual Sensitivity, Affective Perceptual Sensitivity, and Associative Sensitivity. Thus, we might call Factor 1 N egative Affect, Factor 2 Control, Factor 3 Positive Affect, and Factor 4 Sensitivity. The interfactor correlations in Table 4.9 indicate that the strongest relationship was found between Negative Affect and Sensitivity, with weaker (but still not negligible) relationships between Sensitivity and both Control and Positive Affect. Table 4.14 G eomin Rotated Factor Pattern Loadings for Adult Temperament Scale Scores: PAF Extraction Factor ATS Subscale
1
2
3
Fear
.76
−.03
−.05
.01
Frustration
.45
−.30
.01
−.12
Sadness
.60
.02
.20
.09
Discomfort
.39
.07
−.20
.13
Activation Control
.02
.73
.19
−.002
Attentional Control
−.24
.50
−.05
.01
Inhibitory Control
−.17
.42
−.09
.001
.01
−.07
.63
−.07
−.14
−.31
.34
.15
−.002
.12
.60
.03
Neutral Perceptual Sensitivity
.13
.04
.15
.46
Affective Perceptual Sensitivity
.02
.01
.02
.80
−.01
−.07
−.11
.58
Sociability High Intensity Pleasure Positive Affect
Associative Sensitivity
4
Chapter 4 Methods of Factor Rotation 57
Target Factor Rotation The next rotation strategy that we will discuss in this chapter is target factor rotation. Target rotation was first described by Horst (1941) and Tucker (1940), with contributions by Lawley and Maxwell (1964), Jöreskog (1965), and Browne (1972). The core idea when using this rotation strategy is for the researcher to create a target loading matrix, B, for the factor solution, with some elements specified and others left unspecified. Typically, the specified values will be set to 0, corresponding to what the researcher expects to be very small factor loading values. The rest of the loadings in B are left unspecified, indicating that the researcher anticipates them to be nonzero, though they may in the end be of any value, depending on what provides the best fit to the data. Each column in B must have at least m−1 specified values, where m is the number of factors. Therefore, in our example m is 4, and each column of B would need to have at least three specified values. The rotation algorithm then works to minimize the target criterion through minimization of the sum of squares for the rotated loadings and the target values (see Table A1). Thus, if the target loadings are set to 0, then the algorithm will attempt to find rotated loadings that are as close to 0 as possible. As we discuss below, however, this is not always possible given the data at hand. When considering target rotation as a whole, we can see that it exhibits some elements of confirmatory factor analysis, in that the researcher at least partially prespecifies the expected factor structure. For example, if prior research suggests that certain indicator variables should be associated with one another through a common factor, then loadings for those variables on a factor will be left unspecified. Conversely, if theory suggests that some variables should not be associated with one another on a particular factor, then the researcher could specify those loadings to be 0. It is important to keep in mind that specifying a value of 0 in the target matrix does not guarantee that the final rotated loading estimates will be 0, or even small in value. If these loadings do have a large value despite being targeted at 0 initially, then we would conclude that the target matrix is misspecified, and would then construct a new target matrix accounting for this additional information. In other words, our presupposition about some indicators having very small loadings for some factors was wrong, and we need to rethink our target matrix. In order to illustrate the process of creating a target matrix, let’s consider the ATS again. In Chapter 3, we saw that theory suggests the presence of four factors underlying the 13 scale scores. Thus, if we define the elements of B to conform to this structure, we would obtain the target matrix appearing in Table 4.15.
58 Exploratory Factor Analysis Table 4.15 Target Matrix for Adult Temperament Scale Scores Factor ATS Subscale
1
2
3
4
Fear
?
0
0
0
Frustration
?
0
0
0
Sadness
?
0
0
0
Discomfort
?
0
0
0
Activation Control
0
?
0
0
Attentional Control
0
?
0
0
Inhibitory Control
0
?
0
0
Sociability
0
0
?
0
High Intensity Pleasure
0
0
?
0
Positive Affect
0
0
?
0
Neutral Perceptual Sensitivity
0
0
0
?
Affective Perceptual Sensitivity
0
0
0
?
Associative Sensitivity
0
0
0
?
The ? indicates an unspecified value in the target matrix, meaning that we do not have theoretical evidence as to the value of the loading, but we do not expect it to be 0. The 0s in B are for factor indicator relationships that we expect to be very small because the indicator is not expected to be associated with that factor. Again, however, if the target specification is incorrect and one or more of the 0 target loadings are actually relatively large, this will become apparent in the final rotated matrix. The target rotation loadings using the B matrix specified above appear in Table 4.16. The loadings for this example obtained using the target rotation are generally very similar to those obtained from the other rotations that we have examined in this chapter. One interesting point to note is that there are no cross-loaded indicator variables with the target loadings, whereas each of the other methods did yield two cross-loaded variables. Thus, we can see this as an advantage of the target rotation for this particular example.
Chapter 4 Methods of Factor Rotation 59 In addition to the factor loadings, we might also be interested in examining the interfactor correlation coefficients for the target rotation, which appear in Table 4.17. Table 4.16 T arget Rotated Factor Pattern Loadings for Adult Temperament Scale Scores: PAF Extraction Factor ATS Subscale
1
2
3
4
Fear
.76
−.01
−.04
.01
Frustration
.42
−.28
.02
−.12
Sadness
.54
.04
.13
.16
Discomfort
.47
.04
−.27
.10
Activation Control
.12
.76
.14
−.01
Attentional Control
−.23
.50
−.08
.04
Inhibitory Control
−.15
.44
−.22
.01
.04
−.03
.71
−.07
High Intensity Pleasure
−.24
−.26
.43
.16
Positive Affect
−.02
.18
.56
.06
Neutral Perceptual Sensitivity
.11
.07
.10
.52
Affective Perceptual Sensitivity
.04
.07
.07
.81
−.01
−.15
−.13
.64
Sociability
Associative Sensitivity
Table 4.17 F actor Correlations for the Target Rotation of the Adult Temperament Scale Factor 1
Factor 2
Factor 3
Factor 4
Factor 1
1.00
−.08
.00
.27
Factor 2
−.08
1.00
−.06
.15
Factor 3
.00
−.06
1.00
.16
Factor 4
.27
.15
.16
1.00
60 Exploratory Factor Analysis As with the factor loadings, the interfactor correlations for the target rotation are quite similar to those for the other oblique rotations, such as Geomin and Promax.
Bifactor Rotation The final rotation criterion that we will discuss is based on work by Jennrich and Bentler (2011) on the application of the bifactor model to the EFA context. The bifactor model has traditionally been used with confirmatory factor models, as well as in the context of item response theory. Bifactor structure specifies that all indicators be associated with a single primary factor and that, additionally, each indicator be associated with an additional secondary factor. An example of such structure in practice might be for a test of cognitive ability, in which all items are indicators of overall cognitive ability, and each is associated with an additional latent trait, such as processing speed or short-term memory. As represented in Jennrich and Bentler, the factor pattern matrix for a bifactor structure involving six indicators with one primary and two secondary factors appears in Table 4.18. The ∗ indicate loadings that are freely estimates, whereas the 0s indicate where loadings are set to 0. The factor structure implied in Table 4.18 associates each of the indicators with the primary factor, indicators X1 through X3 with secondary Factor 1, and indicators X4 through X6 with secondary Factor 2. Note that this table shares some similarities with the target matrix B, in that we prespecify which loadings are likely to be small in value.
Table 4.18 F actor Pattern Matrix for Bifactor Structure of a Confirmatory Factor Analysis Model Indicator
Primary Factor
Secondary Factor 1
Secondary Factor 2
X1
*
*
0
X2
*
*
0
X3
*
*
0
X4
*
0
*
X5
*
0
*
X6
*
0
*
Chapter 4 Methods of Factor Rotation 61 In order to implement the bifactor rotation, the algorithm described by Jennrich and Bentler (2011) uses the rotation criterion B(Λ), where Λ is a factor loading matrix. When the factor structure for a set of indicators is perfectly bifactor, then B(Λ) = 0. Thus, the algorithm seeks to find the factor loadings structure for a given sample that minimizes B(Λ). It is important to note that B(Λ) is only applied to the last k−1 columns of the factor loading matrix, but the rotation is applied to the entire set of loadings, including the first column. In the context of bifactor rotation for an EFA model, a standard rotation, such as quartimin, is used, with all indicators being allowed to load on the first factor, that is, have nonzero loadings in the first column of the loading matrix. Jennrich and Bentler discuss the possibility of applying alternative rotations to the initially extracted loading matrix but caution that such a choice must be made carefully. For example, they demonstrate that the varimax rotation criterion is not a good choice, because the varimax criterion does not measure departure from pure simple structure (unlike the quartimin criterion, which does), and thus is unlikely to yield a solution that is close to the bifactor structure. To demonstrate the bifactor rotation, we will apply it to a dataset consisting of 24 scores from a cognitive functioning scale. These data were described by Harman (1976). Each scale is expected to measure a primary cognitive functioning factor, as well as one of five secondary factors associated with spatial, verbal, speed, pattern recognition, or memory. Prior to using the bifactor rotation, we will fit an EFA model using maximum likelihood estimation and Promax rotation for five factors. The rotated factor loadings for this solution appear in Table 4.19. The results in Table 4.19 reveal that Factor 1 is primarily associated with verbal functioning, Factor 2 with spatial functioning and certain aspects of pattern recognition, Factor 3 with processing speed, Factor 4 with recognition tasks, and Factor 5 with only two measures that appear to be disparate from one another. Next, let’s examine the quartimin bifactor rotation results, which appear in Table 4.20. Perhaps the largest difference in these results and those in Table 4.20, other than the number of factors, is that all of the subtests do in fact load on a single common factor. We can term this factor as general cognitive functioning. The only scores that do not have a rotated factor loading of at least 0.4 are word recall and number recall. In addition to the primary factor, there are also five secondary factors. The first of these corresponds to verbal functioning, with the second including the addition and counting variables only. The third factor corresponds to pattern recognition, with the fourth including only the straight measurement, and the final factor including only the figure memory score. Taken together, these results suggest that when we employ the bifactor rotation, only the single primary cognitive
62 Exploratory Factor Analysis Table 4.19 P romax Rotated Factor Loadings for the Holzinger Cognitive Functioning Data Variable
F1
F2
F3
F4
F5
Visual
−0.01
0.66
−0.01
0.01
0.23
Cubes
−0.06
0.66
−0.06
−0.12
0.05
Paper
0.12
0.25
−0.02
0.14
0.46
Flags
0.06
0.73
−0.09
−0.09
0.17
General
0.79
0.05
0.07
−0.09
0.09
Paragraph
0.85
−0.07
−0.10
0.11
0.10
Sentence
0.93
−0.08
0.04
−0.09
0.06
WordC
0.58
0.10
0.13
−0.02
0.12
WordM
0.89
0.03
−0.13
0.03
−0.02
Addition
0.05
−0.29
0.95
−0.04
−0.04
Code
0.07
−0.15
0.50
0.31
0.22
Count
−0.15
0.13
0.79
−0.13
0.07
Straight
0.07
0.23
0.54
−0.11
0.42
WordR
0.12
−0.20
−0.08
0.71
0.13
NumberR
−0.04
0.03
−0.10
0.60
0.07
FigureR
−0.12
0.32
−0.15
0.56
0.15
0.00
−0.23
0.15
0.72
−0.09
NumberF
−0.23
0.29
0.24
0.37
−0.01
FigureW
0.03
0.11
0.11
0.31
−0.07
Deduct
0.25
0.43
−0.09
0.15
−0.12
Numeric
−0.04
0.46
0.33
0.00
0.04
ProblemR
0.23
0.43
−0.05
0.13
−0.15
Series
0.21
0.55
0.06
−0.01
−0.07
Arithmetic
0.22
0.04
0.51
0.08
−0.22
Object
Chapter 4 Methods of Factor Rotation 63 Table 4.20 B ifactor (Quartimin) Rotated Factor Loadings for the Holzinger Cognitive Functioning Data Variable
Primary
S1
S2
S3
S4
S5
Visual
0.63
−0.07
−0.31
−0.14
−0.09
0.06
Cubes
0.43
−0.10
−0.23
−0.16
0.11
−0.06
Paper
0.48
0.06
−0.25
0.08
−0.31
−0.11
Flags
0.59
−0.01
−0.32
−0.16
0.02
−0.05
General
0.59
0.55
0.03
−0.04
−0.02
−0.06
Paragraph
0.54
0.61
−0.05
0.08
−0.04
0.02
Sentence
0.54
0.66
0.06
−0.03
0.00
−0.03
WordC
0.60
0.39
0.03
−0.01
−0.03
−0.06
WordM
0.55
0.63
−0.06
0.00
0.08
0.07
Addition
0.50
0.00
0.66
0.00
−0.06
0.00
Code
0.57
0.03
0.21
0.16
−0.28
0.13
Count
0.58
−0.19
0.44
−0.08
−0.02
−0.18
Straight
0.68
0.01
0.06
−0.14
−0.38
−0.01
WordR
0.39
0.06
−0.02
0.54
−0.02
−0.06
NumberR
0.38
−0.07
−0.07
0.41
0.06
−0.03
FigureR
0.53
−0.14
−0.24
0.30
0.01
0.03
Object
0.44
−0.03
0.17
0.43
0.08
0.18
NumberF
0.57
−0.23
0.01
0.10
0.03
0.22
FigureW
0.41
−0.01
−0.05
0.00
−0.04
0.56
Deduct
0.59
0.10
−0.07
0.10
0.38
−0.16
Numeric
0.65
−0.10
0.02
−0.12
0.04
0.08
ProblemR
0.58
0.11
−0.11
−0.04
0.23
0.18
Series
0.68
0.08
−0.07
−0.08
0.24
−0.03
Arithmetic
0.61
0.12
0.34
−0.03
0.16
0.21
64 Exploratory Factor Analysis functioning factor and verbal functioning hold together in a coherent fashion. In other words, we might conclude that the Holzinger scores are really measuring a single cognitive functioning latent variable.
Example In order to fully illustrate how factor analysis fits in with the broader practice of conducing EFA, let’s consider a small example. In this case, we have four subscales that measure different aspects of achievement motivation: (1) Mastery Approach, (2) Mastery Avoidance, (3) Performance Approach, and (4) Performance Avoidance. Mastery goal orientations are focused on learning for the intrinsic desire to master the material under study, whereas performance goal orientations are centered on the outcome of the learning process, such as grades and the opinions of others. Approach goals refer to a desire to learn the material (mastery) or to be viewed in a favorable light by others (performance), whereas avoidance refers to the need to not miss any of the important material (mastery) or to avoid being seen in a negative light (performance). Scores on these scales were collected for 432 university students, and based on the design of the scales, the researcher anticipates that there should be two latent variables, one associated with mastery goals and the other with performance goals. Given this hypothesis, we will focus on the two-factor solution in this example. The correlation coefficients among the subscales, along with their means and standard deviations, appear in Table 4.21. The correlation coefficients conform to the hypothesized structure, given that the relationships within the two variable sets (mastery and performance) are larger than those between the sets. In order to more formally investigate the latent structure underlying these scales, principal axis factoring (PAF) will be used, forcing a two-factor solution, given that theory would suggest the presence of factors based around mastery and performance orientations. In Chapter 5, we will discuss a number of statistical tools that can be used to help us ascertain the optimal number of factors to retain. Initially, we will use a Promax rotation and examine the interfactor correlation matrix. If the correlations among the factors are sufficiently large, we will retain the oblique rotation, otherwise we will rerun the analysis using an orthogonal rotation, before interpreting the results. The interfactor correlation matrix based on a Promax rotation with kappa = 4 appears in Table 4.22. The correlation between the pair of factors is in the small range based on Cohen’s (1988) guidelines. The interfactor correlation based on the Oblimin rotation was 0.13, which is quite similar to that produced using
Chapter 4 Methods of Factor Rotation 65 Table 4.21 C orrelation Matrix, Mean, and Standard Deviation for Achievement Motivation Subscales Mastery Approach
Mastery Avoidance
Performance Approach
Performance Avoidance
Mastery Approach
1
0.65
0.05
0.06
Mastery Avoidance
0.65
1
0.14
0.21
Performance Approach
0.05
0.14
1
0.86
Performance Avoidance
0.06
0.21
0.86
1
17.59
15.80
16.17
16.67
2.67
3.45
4.05
4.05
Mean Standard Deviation
Table 4.22 F actor Correlations for the Promax Rotation of the Achievement Motivation Scale Factor 1
Factor 2
Factor 1
1.000
.14
Factor 2
.14
.000
Promax. Thus, it would appear that the correlation between the factors is relatively small in nature. For this reason, an orthogonal rotation might be appropriate to use. In order to fully investigate the factor structure underlying the scales, we will examine results for multiple rotations, including Varimax, Promax, Oblimin, and Geomin. +MAV = Mastery Avoidance, MAP = Mastery Approach, PAP = Performance Approach, PAV = Performance Avoidance
−0.001
0.12
0.90
0.95
MAP+
MAV
PAP
PAV
0.09
0.04
0.82
0.79
Varimax2
*Pattern loadings are used for the oblique rotations.
Varimax1
Scale
0.95
0.90
0.12
0.000
Quartimax1
0.09
0.04
0.82
0.79
Quartimax2
0.95
0.90
0.06
−0.06
Promax1*
Table 4.23 Factor Loadings for Achievement Goal Orientation Scales
0.02
−0.03
0.82
0.79
Promax2
0.95
0.90
0.06
−0.05
Oblimin1
0.03
−0.02
0.82
0.79
Oblimin2
66 Exploratory Factor Analysis
Chapter 4 Methods of Factor Rotation 67 Based upon the results presented in Table 4.23, it would appear that regardless of the rotation strategy the two-factor solution yields results that conform to the conceptual model presented by the researcher. In particular, Mastery Approach and Mastery Avoidance clearly group together on Factor 2, and Performance Approach and Performance Avoidance group on Factor 1. Thus, the results, regardless of the rotation used, appear to conform to theoretical expectations.
Deciding Which Rotation to Use We have seen in this chapter that there exist a large number of options for factor rotation. Indeed, we have only touched on a few of the most popular such methods, with many more being available through statistical software. Given this plethora of possibilities when it comes to factor rotation, the reader would be excused for feeling somewhat confused and befuddled regarding what approach is best suited for a given research scenario. And indeed, there is not one single best technique to use in all situations, as highlighted in simulation work investigating the performance of factor rotation techniques (e.g., Finch, 2011; Sass & Schmitt, 2010). Nonetheless, it is possible for us to identify a few recommendations for making sense of factor rotations. Perhaps the most important recommendation in this regard was given by Browne (2001), who suggested that researchers must use their expert judgment throughout the process of conducting factor analysis, and most particularly when selecting an optimal rotation. In part, this means that the researcher needs to carefully examine the results of a particular rotation and compare those with what would be expected theoretically given prior literature in the field of study. Rotation results that make no conceptual sense cannot be taken as worthwhile, regardless of the statistical fit of the model to the data. In addition to comparing the results of the rotated solution to what is theoretically expected, the researcher must also decide on the general family of rotation to use, that is, orthogonal or oblique. As we have already discussed in this chapter, this decision should be made based upon the interfactor correlations obtained using an oblique rotation. For example, a Promax rotation could be applied to a set of initially extracted results, and the researcher would then examine the estimated correlations among the factors. If these values are nontrivial, perhaps 0.2 or larger, then an oblique rotation should be employed, otherwise an orthogonal approach can be used. Once the decision regarding which family of rotations to use has been made, it is advisable to then try multiple rotations within that family and compare the results. If the factor loadings and correlations are stable across
68 Exploratory Factor Analysis methods, as was the case in the examples included in this chapter, then we can have confidence in our findings and thus report results from any of them. On the other hand, if the results differ across the various rotations, then we will need to be very cautious in reporting any one set of findings as truly optimal. This last statement should be qualified by saying that if one of the solutions produces results that are closely in keeping to theoretical expectations, then we may have more confidence in them. Finally, it is important to keep in mind that EFA results are inherently mathematically indeterminate. As we have discussed earlier in the text, this means that for a given set of data there is not a single optimal mathematical solution. Thus, we can try several different combinations of extraction and rotation, and then compare the results with one another and with theory in order to ascertain what might be the best result given our data and the theory with which we are working. Such exploration is not to be frowned upon and, in fact, is a key element of working with EFA. This final point brings us back around to Browne (2001), who said, “All this [comparing results from rotations] involves human thought and judgment, which seems unavoidable if exploration is to be carried out.” This would seem to be a good place to leave the discussion of how we might use the factor rotation methods.
Summary Our focus in this chapter was on the key topic of factor rotation. As we saw in Chapter 3, the initial results of factor extraction are typically somewhat difficult to interpret because individual indicator variables will have relatively strong relationships (i.e., nontrivial factor loadings) for more than one factor. Thus, it is not easy to discern with which factors specific indicators are associated. Factor rotation is designed to resolve this problem by creating factor loading matrices that conform to what Thurstone (1947) termed simple structure. We learned that simple structure can be summarized as each indicator being primarily associated with only one factor and each factor having multiple associated indicators. Thus, though there are many approaches to the problem of factor rotation, they all share this common goal of creating a simple structure factor loading matrix, which is easier to interpret. Each of these methods has a mathematical criterion that it seeks to optimize when coming up with the rotated factor loading matrix. After discussing the overarching goal of factor rotation, we then learned about the two broad rotation families. The first of these is orthogonal rotation, which constrains the factors to be uncorrelated with one another and includes common approaches such as Varimax, Quartimax, and Equamax.
Chapter 4 Methods of Factor Rotation 69 The second family of rotations, oblique, allows factors to be correlated and in fact provides estimates of these interfactor correlations. Within the oblique family are Promax, Oblimin, and Geomin. We then discussed a third strategy, known as target rotation, which takes some aspects of confirmatory factor analysis and combines them with EFA. Specifically, the researcher provides target values for some of the factor loadings, while leaving others unspecified. Most commonly the prespecified values are 0, indicating that the researcher expects there to be no relationship between an indicator and a factor. The form of this target matrix is based upon theoretical expectations that the researcher has about the latent structure, based upon prior literature in the field, as we saw in the examples presented in this chapter. Finally, we concluded the chapter with a discussion regarding the issue of selecting from among the various factor rotations available in practice. Perhaps the most important piece of advice in this regard comes from Browne (2001), who implored researchers to use their expertise and subject matter knowledge when selecting and interpreting rotation results. It is advisable for a researcher to try multiple rotations, and the final decision regarding the optimal solution should be based upon both the mathematical fit to the data and how well the results conform to theoretical expectations. In the end, if the results are not conceptually meaningful, they should not be retained by the researcher. Furthermore, it is perfectly permissible (and even advisable) for researchers to try several different rotations and compare the results with theory prior to selecting which to report.
APPENDIX Table A.1 Exploratory Factor Analysis Rotation Criteria Rotation Varimax
Quartimax
Equamax
Promax Oblimin
Criteria p f Λ = p∑ λij 2 i =1
( ) − ( λ )
( ) ( )
p
i =1 j =1
70
m
2
2 / p
m
i =1 j =1 l ≠ j
p
p
m m m m m m 2 2 2 2 1 − 2 p ∑∑∑λij λlj + 2 p ∑∑∑λij λlj i =1 j =1 l ≠ j i =1 j =1 l ≠ j
Raise loadings from Varimax to some power (e.g., 4) and rotate the resulting matrix allowing for correlated factors p
m
( ) ∑ m∑λ λ i = k =1
Target
2
ij
p
m
2
f Λ = ∑∑λij4 + ∑∑∑λij2 λlj2
f Λ = Geomin
2
2 ij
m m − δ ∑λij2 ∑λik2 / m j =1 j =1
2 il
j =1
1
m m f Λ = ∑ ∏ λij2 + ∈ i =1 j =1
( )
p
( )
m
(
(
f Λ = ∑∑ λij − bij j =1 i ∈I j
)
)
2
Chapter 5 METHODS FOR DETERMINING THE NUMBER OF FACTORS TO RETAIN IN EXPLORATORY FACTOR ANALYSIS In Chapters 4 and 5, we focused on the initial extraction and rotation of the factor loadings in exploratory factor analysis (EFA). As we discussed in those chapters, the combination of extracting and rotating factor loadings yields the primary information that researchers use to make inferences about the latent structure underpinning a set of observed variables. For example, researchers commonly use factor analysis to gain insights into latent variables that theory would suggest underlie observed item responses on a test of cognitive ability. As we have seen, the pattern of factor loadings in such a case provides information regarding which items are primarily associated with which factors. This information in turn allows researchers to gain a greater understanding regarding the nature of unobserved variables such as intelligence, depression, motivation, and the like. Of course, in order for these factor loading results to be meaningful, we need to have some sense for how many latent variables to retain. Certainly theory will play a key role in this decision. As we discussed in Chapter 1, it is important that we have some sense for how many factors to expect, given what is known about the field. However, we will also need to use some statistical tools to aid in our decision-making process, which is the focus of this chapter. There exist a number of such methods, and we should preface our discussion by acknowledging the fact that none of these approaches can be taken as optimal under all circumstances. However, it is also true that some of these approaches have been shown to work better than others in certain circumstances. In addition, we will see that these methods can be used in conjunction with one another to make informed decisions about the number of factors that we should retain. In practice, we will use several of the approaches discussed below, in conjunction with theory, in order to make a final determination as to the likely optimal number of factors to retain for a given sample.
Scree Plot and Eigenvalue Greater Than 1 Rule One of the earliest approaches for determining the number of factors is the eigenvalue greater than 1 criterion, also sometimes referred to as Kaiser’s criterion, in honor of its developer (Fabrigar & Wegener, 2011; 71
72 Exploratory Factor Analysis Kaiser, 1960; Pett, Lackey, & Sullivan, 2003). Recall from Chapter 2 that in the context of factor analysis, eigenvalues can be interpreted as the amount of variance in a set of observed variables that is accounted for by one or more latent variables. Each of the retained factors has associated with it an eigenvalue, with the first factor having the largest eigenvalue, the second factor the second largest eigenvalue, and so on. Thus, factors with larger eigenvalues account for a larger share of the variance in the observed indicators. Remember that in Chapter 2, we also discussed the fact that factor analysis is typically conducted using observed indicators that have been set to the standard normal distribution (mean of 0, standard deviation of 1). A primary reason that this standardization is done is to ensure that scale differences among the indicators (e.g., grade point average and college entrance exam scores) do not distort the factor loadings by giving variables that are on a scale with larger values (e.g., college entrance exam scores) and larger loadings than those that are on a smaller scale (e.g., college grade point average). Thus, when the observed variables are standardized, they each have a variance of 1. Given this fact, a factor that has an eigenvalue greater than 1 accounts for more variance than does any single indicator variable. Kaiser’s criterion, therefore, calls for the retention of any factor with an eigenvalue of more than 1, based on the logic that it accounts for more variance in the set of indicators than does any single indicator alone. This approach is certainly simple and somewhat intuitive, which is likely the reason that it has remained popular for many years. A researcher using Kaiser’s criterion can simply review a table of eigenvalues and retain those factors with values greater than 1. However, research has shown that this technique is relatively ineffective at accurately identifying the number of factors that should be retained. Specifically, it frequently leads to the retention of too many factors (Cliff, 1988; Pett et al., 2003; Zwick & Velicer, 1986). Given this tendency to over-factor a given solution, the eigenvalue greater than 1 rule is not recommended for use in practice. Returning to the adult temperament scale (ATS) example with which we have been working in this book, the eigenvalues appear in Table 5.1. The eigenvalue greater than 1 rule would indicate that five factors should be retained for the ATS. Recall that theoretically the ATS measures four latent traits. Thus, it would appear that Kaiser’s criterion suggests an extra factor that is not expected, given the conceptual framework upon which the scale was built. The scree plot (Cattell, 1966) is another popular approach for determining the optimal EFA solution. As with Kaiser’s criterion, the scree plot relies on the eigenvalues of the covariance (or correlation) matrix among the observed indicators. The purpose of this plot is to examine the
Chapter 5 Determining the Number of Factors to Retain in EFA 73 Table 5.1 E igenvalues for the Exploratory Factor Analysis of the Adult Temperament Scale Factor
Eigenvalue
1
2.575
2
2.079
3
1.932
4
1.235
5
1.003
6
0.830
7
0.684
8
0.553
9
0.480
10
0.458
11
0.430
12
0.391
13
0.349
relationship between the number of factors and the amount of variance explained in the observed variables (as measured by the eigenvalues), with an eye toward identifying for which factor number the amount of explained variance declines sharply. In practice, this is done using a scatterplot with the eigenvalue on the y axis, the factor number on the x axis, and a line connecting points in the plot. As noted above, the eigenvalues decrease in value from the first through the last factor, which is reflected in the scatterplot. The researcher using this approach examines the plot, looking for the point where the line connecting the eigenvalues begins to flatten out in its rate of decline. This point will correspond to the number of factors that should be retained. The scree plot is named after rubble (scree) that appears at the bottom of a cliff. Figure 5.1 displays the scree plot for the ATS. We can see that the eigenvalues decrease monotonically with the increasing number of factors, but that they do not do so in a uniform fashion. It appears that the line flattens
74 Exploratory Factor Analysis Figure 5.1 S cree Plot for the Exploratory Factor Analysis Solution of the Adult Temperament Scale 3.0
Eigenvalue
2.5 2.0 1.5 1.0 0.5 0.0 1
2
3
4
5
6 7 8 Factor Number
9
10
11
12
13
to some degree at four factors (i.e., the drop in the line is greater between three and four factors than it is between four and five), and more so at nine factors. Based on the scree plot, we might conclude that four factors represents the optimal number of factors to retain. The scree plot has the advantage of being very straightforward to explain and to understand. Interpretation of the scree plot does not require a high degree of statistical background and can be presented easily in the context of a research report. However, the scree plot is not without problems. Obviously, the potentially most problematic aspect of applying this tool in practice is the inherent subjectivity associated with determining when the line connecting the individual points flattens out (Raiche, Wall, Magis, Riopel, & Blais, 2012). The ATS example illustrates this issue perfectly. We may conclude that it flattens out at four factors, but an equally plausible argument could be made for nine factors. In addition, it is even possible that a researcher could conclude that the graph really flattens for the first time between two and three factors. Despite the ease with which the scree plot can be employed, it is not without limitations. Prior research has shown that with small sample sizes, and when the number of indicator variables is low, the scree plot is less accurate (Linn, 1968). Such is also the case when the factor structure is relatively more complex, meaning that some of the indicators load on more than one factor (Crawford & Koopman, 1979). Given these limitations, it is not
Chapter 5 Determining the Number of Factors to Retain in EFA 75 recommended that researchers rely heavily on the scree plot for determining the number of factors to retain. Nonetheless, it can be a useful auxiliary tool to include along with other, potentially more dependable methods such as those described below.
Objective Methods Based on the Scree Plot Given the aforementioned limitations of interpreting the scree plot only from a visual inspection, some more objective methods for determining the number of factors to retain based upon the scree plot have been proposed in the literature. One such approach is the optimal coordinate scree test (Raiche et al., 2012), which is based upon comparing the actual eigenvalue for a given factor (e.g., Factor 3) with the eigenvalue that would be predicted using a 2-point regression model based on the set of eigenvalues obtained from the covariance matrix of the observed data (the optimal coordinate). The two points used in the regression equation for predicting eigenvalue i would be the (i + 1)th eigenvalue and the last eigenvalue. If the observed eigenvalue i is larger than the predicted eigenvalue associated with factor i, then factor i is retained. The researcher would examine each of these comparisons and retain factors up to the first one for which the observed eigenvalue was less than the predicted eigenvalue. This procedure is illustrated below. A second alternative based upon the scree plot is the acceleration factor scree test. The acceleration factor statistic is calculated as the second derivative of the regression equation used to predict the optimal coordinate, as described above. This second derivative is then applied to each eigenvalue in order to calculate the acceleration factor, which is simply a measure of the steepness in the line connecting the points in the scree plot. The last factor to be retained is the one that precedes the coordinate where the acceleration factor is maximized. As with the optimal coordinates approach, use of the acceleration factor is demonstrated below. In addition to the optimal coordinate and acceleration factor methods, there are two other objective approaches based on the scree plot that have been discussed in the literature, and which have proven to be effective in simulation studies. Each of these methods does have the limitation that they are only applicable in cases where three or more factors are to be retained. Gorsuch’s (1983) CNG scree test involves the calculation of the slope linking the first three eigenvalues, then the calculation of the slope linking eigenvalues 2, 3, and 4, then the slope linking eigenvalues 3, 4, and 5, and so on. The researcher then compares these slopes with one another and selects the number of factors where the difference between the slopes is
76 Exploratory Factor Analysis greatest. Thus, for example, if the largest difference between slope values lies between the line for points 2, 3, and 4 versus the line for points 3, 4, and 5, we would retain 4 factors. Zoski and Jurs (1993) suggested a variant of the Gorsuch approach in which pairs of regression equations are estimated using all of the data points, rather than just sets of three at a time. Thus, for p indicator variables, the following pairs of equations would be considered: Line 1 (eigenvalues 1, 2, and 3) Line 2 (eigenvalues 4 through p) Line 3 (eigenvalues 1, 2, 3, and 4) Line 4 (eigenvalues 5 through p) Line 5 (eigenvalues 1, 2, 3, 4, and 5) Line 6 (eigenvalues 6 through p) The slopes for the lines in each pair (e.g., line 1 versus line 2) are then compared using a t-test, and the number of factors to be retained is associated with the maximum t value. As an example, if the maximum t statistic is associated with the comparison between lines 3 and 4, then four factors (corresponding to the largest factor number in line 3) would be retained. As noted above, simulation research has shown that the Zoski and Jurs approach and the CNG scree test are both very effective at determining the number of factors to retain, assuming that there are at least three latent variables present in the data (Raiche et al., 2012). As was noted above, it was difficult to determine precisely the number of factors to retain based on a subjective examination of the scree plot in Figure 5.1. Thus, we will apply the objective scree plot methods described above to the ATS in order to gain more insights into its factor structure. The number of factors to retain, as indicated by each of these methods, appears in Table 5.2. The Zoski and Jurs b, which research has shown to be perhaps the most reliable of these statistics (Raiche et al., 2012), indicates that four factors should be retained, whereas the Gorsuch CNG indicates five, optimal coordinate one, and the acceleration factor suggests three factors. Table 5.2 N umber of Factors to Retain for the Adult Temperament Scale Based on the Objective Scree Plot Methods
Factors
Zoski and Jurs b
Gorsuch CNG
Optimal Coordinate
Acceleration Factor
4
5
1
3
Chapter 5 Determining the Number of Factors to Retain in EFA 77
Eigenvalues and the Proportion of Variance Explained In addition to using the eigenvalues directly, as in the eigenvalue greater than 1 rule, or in the form of a graph, as with the scree plot and the objective methods associated with it, it is also possible to calculate the proportion of variance associated with each factor using the eigenvalues. Recall that an eigenvalue associated with an individual factor reflects the amount of variance in the set of observed indicators that is associated with the factor. In addition, we know from Chapter 3 that there are theoretically as many factors as there are observed indicator variables in the dataset. Thus, in order to obtain the total variance present in the set of indicators, we can sum the eigenvalues. Using these facts, it follows naturally that we can calculate the proportion of variance in the indicators associated with each factor by simply dividing the individual eigenvalues by the aforementioned sum. For example, if the sum of the eigenvalues for a set of 10 standardized indicators is 10, and the first eigenvalue is 3.75, then the proportion of variance accounted for by the factor associated with this eigenvalue is 3.75 10 = 0.375. Or, put another way, the first factor accounts for 37.5% of the variance in the set of indicator variables. Similar calculations can be made for all factors, thereby yielding insights into the relative importance of each in terms of explaining the observed variability in the data. Larger proportions indicate relatively more important factors. There are not formal guidelines for interpreting the proportion of variance explained. Rather, the researcher should look at the cumulative proportion both for a given factor solution (e.g., four factors explains 65% of the variance in the indicators) and for the place where she believes that adding additional factors yields a relatively modest gain in the amount of observed variability that is explained. Clearly, such determinations are subjective in nature and should be made in light of prior work in the field and theoretical expectations. Thus, for example, if previous research in the motivation literature shows that conceptually meaningful latent variables explain approximately 60% of the variability in motivation scale items, then a researcher working with a new motivation scale might expect to see similar results. In addition, if theory suggests that this scale should be measuring four separate motivation factors, then the inclusion of a fifth such factor would only be warranted if it accounted for a relatively large proportion of variance in the indicators, and if it were conceptually meaningful. As an example of how the proportion of variance accounted for is calculated, let’s again consider the ATS data. The eigenvalues for all possible factors appear in Table 5.1. First, we need
78 Exploratory Factor Analysis to obtain the sum of the eigenvalues, which will be the same as the number of observed indicators, or 13 in this case. Next, we can calculate the proportion of variance accounted for by the first factor as 2.575 = 0.198. 13 Thus, the first factor accounts for approximately 20% of the variance among the ATS subscales. The variance accounted for by the second factor is 2.079 13 = 0.160, or 16%. Similar calculations can be made for each factor and appear in Table 5.3. The largest decline in the proportion of variance explained as the number of factors is increased occurs between Factors 3 and 4. In addition, from Table 5.3 we can see that including four factors will lead to the explanation of approximately 60% of the variance in the ATS subscales, whereas three factors explains roughly 51%. If we were to retain five factors, the model would account for about 68% of the variance in the observed variables. Given the theoretical expectation of four factors, coupled with an almost Table 5.3 E igenvalues and Proportion of Variance Explained by Each Factor for the Exploratory Factor Analysis of the Adult Temperament Scale Eigenvalue
Proportion of Variance Explained
Cumulative Proportion
1
2.575
0.198
0.198
2
2.079
0.160
0.358
3
1.932
0.149
0.507
4
1.235
0.095
0.602
5
1.003
0.077
0.679
6
0.830
0.064
0.743
7
0.684
0.053
0.795
8
0.553
0.043
0.838
9
0.480
0.037
0.875
10
0.458
0.035
0.910
11
0.430
0.033
0.943
12
0.391
0.030
0.973
13
0.349
0.027
1.000
Factor
Chapter 5 Determining the Number of Factors to Retain in EFA 79 10% increase in variance explained by including the fourth factor, these results would seem to support the four-factor solution. We should also note, however, that the fourth factor only explains 2% more variance than does the fifth factor, so it is not that much stronger. However, given the theoretical expectation for four factors, it makes sense for us to go with four rather than five here.
Residual Correlation Matrix One of the oldest and still most commonly used methods for determining the number of factors to retain involves an examination of the residual correlation matrix. The factor model can be used to obtain the predicted covariance matrix for the observed variables using the following equation: Σ = ΛΨΛ′+ Θ
(Equation 5.1)
where Λ = Factor loading matrix Ψ = Factor covariance matrix Θ = Residual covariance matrix. We discussed in Chapter 2 that the error terms are uncorrelated with one another, so that all of the off-diagonal elements of Θ are 0. Furthermore, when the data have been transformed to the standard normal distribution, Σ is equivalent to the correlation matrix for the indicators. The individual elements of Σ are then simply the predicted relationships between individual pairs of variables. When a factor model correctly reflects the observed data, these predicted covariances/correlations will be close to the observed covariances/correlations contained in the observed covariance/ correlation matrix S. Indeed, if the factor model is perfectly accurate, then the elements of Σ will be exactly equal to the elements of S. Thus, one approach for assessing the fit of a particular factor model is through an examination of the difference between the elements of Σ and S, which are referred to as the residual correlations. By convention (Gorsuch, 1983; Thompson, 2004), residual correlations with an absolute value greater than 0.05 are considered to be too large, so a good solution is one which produces few residual correlations greater than 0.05, in absolute value. As an example of how the residual correlations are calculated, we can consider the ATS variable pair Fear and Frustration. The observed correlation between these two variables is 0.292. The factor
80 Exploratory Factor Analysis model predicted correlation based on a four-factor solution using principal axis factor extraction and Promax rotation is 0.330. Thus, the residual correlation for this variable pair is 0.292 − 0.33 = −0.038. This value falls below the 0.05 threshold, meaning that we would not consider it to be very large. If we fit a one-factor model using principal axis factoring, rather than the four-factor model, the predicted correlation is 0.143, meaning that the residual correlation for this pair is 0.292 − 0.143 = 0.149. Obviously, this residual value exceeds the 0.05 cut-off, meaning that it is too large. In other words, for this particular variable pairing the one-factor model does not yield as good a fit to the data (for this variable pair) as does the fourfactor model. In order to make a determination about the overall fit of the model, we would need to examine the residual correlations for each of the variable pairs. Given the large number of such residuals, some software packages, such as SPSS and SAS, provide a message indicating the number and percentage of variable pairs for which the residual correlation exceeds 0.05. The researcher can then use this information to make a determination regarding the optimal fitting model. Table 5.4 contains the proportion of residual correlations that exceed the 0.05 threshold for one to five factors. The results in Table 5.4 strongly suggest that five factors should be retained. When four factors (the theoretically appropriate number) are retained, 24% of the residual correlations exceed 0.05, whereas for five factors only 3% are larger than this cut-value.
Chi-Square Goodness of Fit Test for Maximum Likelihood The methods for determining the number of factors to retain that we have discussed up to this point are all descriptive in nature. However, there are Table 5.4 P roportion of Residual Correlations That Exceed 0.05 by Number of Factors Retained Factors
Proportion of Residual Correlations Exceeding 0.05
1
0.74
2
0.57
3
0.47
4
0.24
5
0.03
Chapter 5 Determining the Number of Factors to Retain in EFA 81 also some inferential methods for determining the number of factors to retain in an EFA. One of the more widely used of these approaches is a Chi-square statistic associated with maximum likelihood estimation (ML). Recall from Chapter 3 that ML utilizes a minimization process in which the algorithm finds the model parameter estimates (i.e., factor loadings, variances, and error variances) that yield a predicted covariance matrix (Σ) of the indicator variables that is (hopefully) very close in value to the observed covariance matrix (S). The value of the function that results from this minimization process can be converted to a Chi-square statistic that can be used to test the null hypothesis that Σ = S, that is, that the model predicted covariance matrix among the indicator variables is equivalent to the actual covariance matrix. It is important to note that this null hypothesis states that the fit of the model is perfect. Unfortunately, such perfect fit is rarely achieved in practice, even for factor solutions that are reasonably close to the underlying model in the population. In other words, many models that are fit in practice are not precisely correct, but are nonetheless very close to the actual population model, and therefore quite useful for researchers trying to understand a particular area of investigation. However, such close fitting models, despite their utility, may be rejected by the Chi-square goodness of fit test, thereby leading a researcher to conclude that the model does not provide a good fit to the data, when in fact the proposed model does provide reasonably good data model fit. Thus, this test may not be particularly useful for assessing the fit of an individual model. This tendency to reject good factor models is exacerbated when the sample size is large (Tong & Bentler, 2013). The end result of this tendency to reject reasonable models is the extraction of too many factors (Kim & Mueller, 1978). The Chi-square goodness of fit test results for one to five factors appear in Table 5.5.
Table 5.5
Chi-square Goodness of Fit Test Chi-Square, Degrees of Freedom, and p-Value for Various Factor Solutions Chi-Square
Degrees of F reedom
p
1
696.642
65