141 0 1MB
English Pages 217 [224] Year 2012
Structural Equation Modeling
POCKET GUIDES TO SOCIAL WORK RESEARCH METHODS Series Editor Tony Tripodi, DSW Professor Emeritus, Ohio State University
Determining Sample Size Balancing Power, Precision, and Practicality Patrick Dattalo Preparing Research Articles Bruce A. Thyer Systematic Reviews and Meta-Analysis Julia H. Littell, Jacqueline Corcoran, and Vijayan Pillai Historical Research Elizabeth Ann Danto Confirmatory Factor Analysis Donna Harrington Randomized Controlled Trials Design and Implementation for Community-Based Psychosocial Interventions Phyllis Solomon, Mary M. Cavanaugh, and Jeffrey Draine Needs Assessment David Royse, Michele Staton-Tindall, Karen Badger, and J. Matthew Webster Multiple Regression with Discrete Dependent Variables John G. Orme and Terri Combs-Orme Developing Cross-Cultural Measurement Thanh V. Tran Intervention Research Developing Social Programs Mark W. Fraser, Jack M. Richman, Maeda J. Galinsky, and Steven H. Day Developing and Validating Rapid Assessment Instruments Neil Abell, David W. Springer, and Akihito Kamata
Clinical Data-Mining Integrating Practice and Research Irwin Epstein Strategies to Approximate Random Sampling and Assignment Patrick Dattalo Analyzing Single System Design Data William R. Nugent Survival Analysis Shenyang Guo The Dissertation From Beginning to End Peter Lyons and Howard J. Doueck Cross-Cultural Research Jorge Delva, Paula Allen-Meares, and Sandra L. Momper Secondary Data Analysis Thomas P. Vartanian Narrative Inquiry Kathleen Wells Policy Creation and Evaluation Understanding Welfare Reform in the United States Richard Hoefer Finding and Evaluating Evidence Systematic Reviews and Evidence-Based Practice Denise E. Bronson and Tamara S. Davis Structural Equation Modeling Natasha K. Bowen and Shenyang Guo
NATASHA K. BOWEN SHENYANG GUO
Structural Equation Modeling
1
1 Oxford University Press, Inc., publishes works that further Oxford University’s objective of excellence in research, scholarship, and education. Oxford New York Auckland Cape Town Dar es Salaam Hong Kong Karachi Kuala Lumpur Madrid Melbourne Mexico City Nairobi New Delhi Shanghai Taipei Toronto With offices in Argentina Austria Brazil Chile Czech Republic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore South Korea Switzerland Thailand Turkey Ukraine Vietnam
Copyright © 2012 by Oxford University Press, Inc. Published by Oxford University Press, Inc. 198 Madison Avenue, New York, New York 10016 www.oup.com Oxford is a registered trademark of Oxford University Press All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of Oxford University Press. ____________________________________________ Library of Congress Cataloging-in-Publication Data Bowen, Natasha K. Structural equation modeling / Natasha K. Bowen, Shenyang Guo. p. cm. — (Pocket guides to social work research methods) Includes bibliographical references and index. ISBN 978-0-19-536762-1 (pbk. : alk. paper) 1. Social sciences—Research— Data processing. 2. Social service—Research. 3. Structural equation modeling. I. Guo, Shenyang. II. Title. III. Series. H61.3.B694 2011 300.72—dc22 2010054226 ____________________________________________ 1 3 5 7 9 8 6 4 2
Printed in the United States of America on acid-free paper
Acknowledgment
The authors thank Kristina C. Webber for her many wise and helpful contributions to this book, and the University of North Carolina’s School of Social Work for giving us the opportunity to teach PhD students about structural equation modeling.
This page intentionally left blank
Contents
1 Introduction
3
2 Structural Equation Modeling Concepts 3 Preparing for an SEM Analysis 4 Measurement Models
16
52
73
5 General Structural Equation Models
109
6 Evaluating and Improving CFA and General Structural Models 135 7 Advanced Topics
167
8 Become a Skillful and Critical Researcher 187 Glossary
191
Appendix 1: Guide to Notation used in SEM Equations, Illustrations, and Matrices 202 Appendix 2: Derivation of Maximum Likelihood Estimator and Fitting Function 204 References Index
215
207
This page intentionally left blank
1
Introduction
RATIONALE AND HIGHLIGHTS OF THE BOOK Social work practitioners and researchers commonly measure complex patterns of cognition, affect, and behavior. Attitudes (e.g., racism), cognitions (e.g., self-perceptions), behavior patterns (e.g., aggression), social experiences (e.g., social support), and emotions (e.g., depression) are complex phenomena that can neither be observed directly nor measured accurately with only one questionnaire item. Measuring such phenomena with multiple items is necessary, therefore, in most social work contexts. Often, scores from the multiple items used to measure a construct are combined into one composite score by summing or averaging. The new composite score is then used to guide practice decisions, to evaluate change in social work clients, or in research contexts, is entered as a variable in statistical analyses. Structural equation modeling (SEM) offers a highly desirable alternative to this approach; it is arguably a mandatory tool for researchers developing new measures. In sum, SEM is highly recommended for social work researchers who use or develop multiple-item measures. Using SEM will improve the quality and rigor of research involving such measures, thereby increasing the credibility of results and strengthening the contribution of studies to the social work literature. One barrier to the use of SEM in social work has been the complexity of the literature and the software for the method. SEM software programs vary considerably, the literature is statistically intimidating to many researchers, 3
4
Structural Equation Modeling
sources disagree on procedures and evaluation criteria, and existing books often provide more statistical information than many social workers want and too little practical information on how to conduct analyses. This book is designed to overcome these barriers. The book will provide the reader with a strong conceptual understanding of SEM, a general understanding of its basic statistical underpinnings, a clear understanding of when it should be used by social work researchers, and step-by-step guidelines for carrying out analyses. After reading the book, committed readers will be able to conduct an SEM analysis with at least one of two common software programs, interpret output, problem-solve undesirable output, and report results with confidence in peer-reviewed journal articles or conference presentations. The book is meant to be a concise practical guide for the informed and responsible use of SEM. It is designed for social work faculty, researchers, and doctoral students who view themselves more as substantive experts than statistical experts, but who need to use SEM in their research. It is designed for social workers who desire a degree of analytical skill but have neither the time for coursework nor the patience to glean from the immense SEM literature the specifics needed to carry out an SEM analysis. Although the book focuses on what the typical social work researcher needs to know to conduct his or her own SEM analyses competently, it also provides numerous references to more in-depth treatments of the topics covered. Because of this feature, readers with multiple levels of skill and statistical fortitude can be accommodated in their search for greater understanding of SEM. At a minimum, however, the book assumes that readers are familiar with basic statistical concepts, such as mean, variance, explained and unexplained variance, basic statistical distributions (e.g., normal distributions), sum of squares, standard deviation, covariance and correlation, linear regression, statistical significance, and standard error. Knowledge of exploratory factor analysis, matrix algebra, and other more advanced topics will be useful to the reader but are not required. Highlights of the book include: (a) a focus on the most common applications of SEM in research by social workers, (b) examples of SEM research from the social work literature, (c) information on “best practices” in SEM, (d) how to report SEM findings and critique SEM articles, (e) a chronological presentation of SEM steps, (f) strategies for addressing common social work data issues (e.g., ordinal and nonnormal data), (g) information
Introduction
on interpreting output and problem solving undesirable output, (h) references to sources of more in-depth statistical information and information on advanced SEM topics, (i) online data and syntax for conducting SEM in Amos and Mplus, and (j) a glossary of terms. In keeping with the goals of the Pocket Guides to Social Work Research Methods series, we synthesize a vast literature into what we believe to be a concise presentation of solid, defensible practices for social work researchers.
WHAT IS STRUCTURAL EQUATION MODELING? SEM may be viewed as a general model of many commonly employed statistical models, such as analysis of variance, analysis of covariance, multiple regression, factor analysis, path analysis, econometric models of simultaneous equation and nonrecursive modeling, multilevel modeling, and latent growth curve modeling. Readers are referred to Tabachnick & Fidell (2007) for an overview of many of these methods. Through appropriate algebraic manipulations, any one of these models can be expressed as a structural equation model. Hence, SEM can be viewed as an “umbrella” encompassing a set of multivariate statistical approaches to empirical data, both conventional and recently developed approaches. Other names of structural equation modeling include covariance structural analysis, equation system analysis, and analysis of moment structures. Developers of popular software packages for SEM often refer to these terms in the naming of the programs, such as Amos, which stands for analysis of moment structures; LISREL, which stands for linear structural relations; and EQS, which stands for equation systems. A number of software programs can be used for SEM analyses. See Box 1.1 for citations and links for Amos, EQS, LISREL, and Mplus, four SEM programs commonly used by social workers. This book provides instructions and online resources for using Amos and Mplus, each of which has distinct advantages for the social work researcher. The general principles covered, however, apply to all SEM software. For social work researchers, SEM may most often be used as an approach to data analysis that combines simultaneous regression equations and factor analysis (Ecob & Cuttance, 1987). Factor analysis models test hypotheses about how well sets of observed variables in an existing dataset measure latent constructs (i.e., factors). Latent constructs represent
5
6
Structural Equation Modeling Box 1-1 Examples of SEM Software Programs Used by Social Work Researchers The following four programs are widely used for SEM analyses: Amos (Arbuckle, 1983–2007, 1995–2007). Website: http://www.spss.com/amos/ EQS (Bentler & Wu, 1995; Bentler & Wu, 2001). Website: http://www.mvsoft.com/index.htm LISREL (Jöreskog & Sörbom, 1999; Sörbom & Jöreskog, 2006). Website: http://www.ssicentral.com/lisrel/ Mplus (Muthén & Muthén, 1998–2007; Muthén & Muthén, 2010). Website: http://www.statmodel.com/index.shtml
theoretical, abstract concepts or phenomena such as attitudes, behavior patterns, cognitions, social experiences, and emotions that cannot be observed or measured directly or with single items. Factor models are also called measurement models because they focus on how one or more latent constructs are measured, or represented, by a set of observed variables. Confirmatory factor analysis (CFA) in the SEM framework permits sophisticated tests of the factor structure and quality of social work measures. (Shortly we will provide examples and much more detail about the terms being introduced here.) Latent variables with adequate statistical properties can then be used in cross-sectional and longitudinal regression analyses. Regression models test hypotheses about the strength and direction of relationships between predictor variables and an outcome variable. Unlike standard regression models, SEM accommodates regression relationships among latent variables and between observed and latent variables. Unlike conventional regression models, SEM can estimate in a single analysis procedure models in which one or more variables are simultaneously predicted and predictor variables. Structural equation models with directional relationships among latent variables are often called general structural equation models (general SEMs). In sum, SEM is a general statistical approach with many applications. Over the past two decades, statistical theories and computing software packages for SEM have developed at an accelerated pace. Newer SEM approaches include methods for analyzing latent classes cross-sectionally and over time (mixture modeling), and latent growth curve modeling (Bollen & Curran, 2006). Consistent with the goals of the pocket guides,
Introduction
this book focuses on a manageable subset of SEM topics that are relevant to social work research. Specifically, we focus on SEM’s most common social work applications—confirmatory factor analysis and cross-sectional structural models with latent variables. In addition, we focus on proper methods for addressing common data concerns in social work research, ordinal-level data, nonnormal data, and missing data.
THE ROLE OF THEORY IN STRUCTURAL EQUATION MODELING The primary goal of an SEM analysis is to confirm research hypotheses about the observed means, variances, and covariances of a set of variables. The hypotheses are represented by a number of structural parameters (e.g., factor loadings, regression paths) that is smaller than the number of observed parameters. As a confirmatory approach, it is crucial for researchers using SEM to test models that have strong theoretical or empirical foundations. Nugent and Glisson (1999), for example, operationalized two ways children’s service systems might respond to children: either as responsive or reactive systems. “Responsive systems,” the ideal, were defined as “[quick] to respond appropriately or sympathetically” to each child’s specific mental health needs (p. 43). “Reactive systems” were operationalized as those that refuse to provide services, provide disruptive services, or otherwise fail to provide children with needed mental health treatments. With well-defined hypotheses based on previous research, the authors tested the nature of services provided in 28 counties in one state and the relationship between reactivity and responsiveness of the systems. Similarly, confirmatory factor analyses should be based on theory and/or the results of exploratory factor analyses and other psychometric tests. SEM models are commonly presented in path diagrams. The path diagram is a summary of theoretically suggested relationships among latent variables and indicator variables, and directional (regression) and nondirectional (i.e., correlational) relationships among latent variables. Importantly, correlated errors of measurement and prediction can also be modeled in SEM analyses. We emphasize throughout the book that having a theoretical model and/or theory-derived constructs prior to any empirical modeling is mandated for both CFA and structural modeling with latent variables.
7
8
Structural Equation Modeling
Path diagrams are graphics with geometric figures and arrows suggesting causal influences. SEM, however, has no better ability to identify causal relationships than any other regression or factor analytic procedure. Cross-sectional SEMs reveal associations among variables (one criterion for causality), and repeated measures in SEM can model time order of variables (another criterion for causality), but SEM in and of itself cannot definitively rule out other potential explanations for relationships among variables (the third criterion for establishing causality). The arrows in SEM illustrations reflect hypothesized relationships based on theory and previous research. SEM results may or may not provide support for the theory being tested, but they cannot prove or disprove theory or causality. Reversing the direction of arrows in any SEM may yield equally significant parameter estimates and statistics on model quality. For another brief treatment of this subject, see Fabrigar, Porter, and Norris (2010). These authors point out that although SEM cannot compensate for a nonexperimental design, it can be a useful analysis technique for experimental data and can be superior to other techniques with quasi-experimental data for ruling out competing causes of intervention outcomes. Because models proposing opposite effects can yield similar statistics, it is a common and desirable practice to test alternative models in SEM. Good model statistics for an SEM model support its validity; model statistics that are superior to those obtained for a competing model provide valuable additional credibility. But neither establishes causality nor proves theory. Using experimental or quasi-experimental designs or statistical models specially developed for observational data in research studies remains the best way to identify causal effects.
WHAT KINDS OF DATA CAN OR SHOULD BE ANALYZED WITH SEM? Ideally, SEM is conducted with large sample sizes and continuous variables with multivariate normality. The number of cases needed varies substantially based on the strength of the measurement and structural relationships being modeled, and the complexity of the model being tested. CFA models and general SEM with strong relationships among variables (e.g., standardized values of 0.80), for example, with all else
Introduction
being equal, can be tested with smaller samples than models with weak relationships (e.g., standardized values of 0.20) among variables. Sample size and statistical power are discussed further in Chapters 3 and 7. Social workers often work with variables that are ordinal and/or nonnormally distributed, and datasets containing missing values. SEM software provides a number of satisfactory options for handling data with these statistically undesirable characteristics. In addition to its advantages over traditional regression approaches, therefore, SEM software provides solutions to common social work methodological issues that, if ignored, reduce the quality of social work studies, and consequently, the literature used to guide social work practice.
WHAT RESEARCH QUESTIONS ARE BEST ANSWERED WITH SEM? EXAMPLES FROM SOCIAL WORK STUDIES Measurement Questions Answered with SEM Measurement questions relate to the reliability and validity of data collected with questionnaires, checklists, rating sheets, interview schedules, and so on. SEM’s ability to model sets of questions as indicators of hypothesized latent constructs (such as depression, social support, attitudes toward health care, organizational climate) provides a number of major statistical advantages, which will become evident later. Questions about the quality of multiple items as indicators of one or more dimensions of a construct are factor analysis questions. The questions answered by CFA differ from those answered by exploratory factor analysis (EFA) procedures. As implied in the title, confirmatory factor analysis is used to test the adequacy of a well-defined model. The specified model is predetermined by theory or past research. The questions asked are closed ended: Do these indicators measure the phenomenon well? Do the data support the existence of multiple dimensions of the phenomenon, each measured by prespecified items? EFA is used earlier in the scale development process to answer more open-ended questions—for example, how many dimensions of the phenomenon are represented by these items? Which items are associated with each dimension? More about the distinction between EFA and CFA and their roles in the scale development process will be presented in Chapter 4.
9
10
Structural Equation Modeling
CFA provides answers to questions about the structure of latent phenomena (e.g., the nature and number of dimensions), and the individual and collective performance of indicators. For example, researchers in one study (Bride, Robinson, Yegidis, & Figley, 2004) used data from 287 social workers who completed the Secondary Traumatic Stress Scale (STSS) to validate the scale as a measure of indirect trauma. Items on the STSS assess dimensions of traumatic stress as defined in the diagnostic criteria for posttraumatic stress disorder in the Diagnostic and Statistical Manual of Mental Disorders (American Psychiatric Association, 1994). Therefore, the hypothesized factor structure was derived from a strong foundation in theory and previous research. The results of the researchers’ CFA provided answers to the following measurement questions: 1. Did the items measure the three hypothesized dimensions of trauma symptomatology? Yes, each of the 17 items on the scale was associated with the one dimension of trauma it was hypothesized to measure and not strongly associated with the other two dimensions it was not hypothesized to measure. 2. How well did each indicator perform? Factor loadings were moderate to high (0.58 to 0.79) and statistically significant. The size of the factor loadings indicates which items are most strongly related to each dimension. 3. How good was the model overall? The model explained 33% to 63% of the variance of each indicator, which is “reasonable” according to Bride et al. (2004). Other measures of the quality of the model met or exceeded standard criteria. 4. How highly correlated were the three dimensions of trauma symptomatology? Intercorrelations of the three dimensions ranged from 0.74 to 0.83 and were statistically significant. These correlations are consistent with theory and previous research about the components of trauma, according to the authors. Bride et al. (2004) did not report the variances of the latent variables associated with the three dimensions of trauma symptoms in their model, but CFA results do indicate the magnitude of variances and whether they are statistically significantly different from zero. Subscales with little variance are not useful in practice, so it is important to examine these variance estimates in SEM output.
Introduction
Like Bride et al. (2004), social workers may use CFA as a final test in a process of developing a new scale. Another important measurement question for social workers that can be answered with CFA is “whether measures . . . have the same meaning for different groups and over time” (Maitland, Dixon, Hultsch, & Hertzog, 2001, p. 74). If scores on a measure are compared for individuals from different populations (e.g., of different ages, gender, cultural backgrounds) or for the same individuals over time, it is critical to establish that the scores obtained from different groups or at different times have the same meaning. Maitland et al. (2001) used CFA to study the measurement equivalence or invariance of the Bradburn Affect Balance Scale (Bradburn ABS) across gender and age groups and over time. The researchers found that a small number of items from the two-dimension scale performed differently across groups and time, leading them to conclude that comparisons of scores across groups and time from past and future studies needed to be interpreted cautiously. Observed group and longitudinal differences in positive and negative affect could be partly attributed to variations in item performance rather than differences in the true scores for affect. Structural Questions Answered with SEM Relationships among latent variables (or factors) and other variables in an SEM model are structural relationships. Structural questions relate to the regression and correlational relationships among latent variables and among latent and observed variables. SEM structural models can include any combination of latent variables and observed variables. Observed demographic variables can be included as covariate or control variables, for example, in a model with latent independent and dependent variables. As with CFA models, all variables and relationships in structural models should be justifiable with theory and/or previous research. SEM permits simultaneous regression equations, that is, equations in which one variable can serve as both an independent and a dependent variable. It is therefore a valuable tool for testing mediation models, that is, models in which the relationship between an independent variable and a dependent variable is hypothesized to be partially or completely explained by a third, intervening variable. It also permits tests of models in which there are multiple dependent variables. In Nugent and Glisson’s (1999) model of predictors of child service system characteristics, for example, “system reactivity” and “system responsivity” were simultaneously
11
12
Structural Equation Modeling
predicted by all other variables in the model (either directly, indirectly, or both). SEM is also a useful framework for testing moderation (interaction) models, or models in which the effects of one variable on another vary by the values or levels of a third variable. It provides more detailed output about moderation effects than typical regression procedures. In multiple regression, for example, moderation effects are obtained by creating product terms of the variables that are expected to interact (e.g., gender × stress). The results indicate the magnitude, direction, and statistical significance of interaction terms. In an SEM analysis, in contrast, the estimate and statistical significance of each parameter for each group (e.g., boys and girls) can be obtained, and differences across groups can be tested for statistical significance. Every parameter or any subset of parameters can be allowed to vary across groups, while others are constrained to be equal. The quality of models with and without equality constraints can be compared to determine which is best. Such information is useful for determining the validity of measures across demographic or developmental groups. A study by Bowen, Bowen, and Ware (2002) provides examples of the flexibility of SEM to answer structural questions. The study examined the direct and indirect effects of neighborhood social disorganization on educational behavior using self-report data from 1,757 adolescents. Supportive parenting and parent educational support were hypothesized mediators of the relationship between neighborhood characteristics and educational behavior. Race/ethnicity and family poverty were observed control variables in the model. The rest of the variables in the structural model were latent. The authors hypothesized that the magnitude of the direct and indirect effects in the model would be different for middle and high school students—a moderation hypothesis—based on past research. Results of the analysis answered the following structural questions: 1. Did neighborhood disorganization have a direct effect on educational behavior? Yes, negative neighborhood characteristics had a statistically significant moderate and negative direct effect on adolescents’ educational behavior. 2. Was the effect of neighborhood disorganization on educational behavior mediated by parental behaviors (supportive parenting and parent educational behavior)? Yes, the effect was partially
Introduction
mediated by a three-part path with statistically significant coefficients between neighborhood disorganization and supportive parenting (negative), between supportive parenting and parent educational support (positive), and between parent educational support and educational behavior (positive). 3. Were race/ethnicity and family poverty predictive of educational behavior? No. Race/ethnicity and family poverty were significantly correlated with each other and with neighborhood disorganization, but the regression path between each observed variable and the dependent variable was not statistically significant. 4. Did the structural paths differ for middle and high school students as hypothesized? No. The moderation hypothesis was not supported. The relationships among the constructs were statistically equivalent for adolescents at both school levels. 5. How good was the model overall? Multiple measures of the quality of the final model met or exceeded standard criteria. As with traditional regression analyses, SEM results indicate the percent of variance of dependent variables explained by predictor variables. In this study: 14% to 33% of the variance of the mediators was explained, and 34% to 44% of educational behavior was explained. It bears repeating that even when SEM models are grounded in theory and previous research, support for models in the form of statistically significant regression paths, factor loadings, and correlations, and good overall model fit does not “prove” that the model or the theory from which it is derived is correct. Nor does such support indicate causality. Such support, as we will discuss in more detail later, can only be interpreted as consistency with the observed data used to test the model.
SEM AS A USEFUL AND EFFICIENT TOOL IN SOCIAL WORK RESEARCH Many challenging questions confronted by social work researchers can be answered efficiently, effectively, and succinctly by SEM. SEM is often the best choice for social work analyses given the nature of their measures
13
14
Structural Equation Modeling
and data. The topics and characteristics of SEM articles in a sampling of social work journals were examined by Guo and Lee (2007). The authors reviewed all articles published during the period of January 1, 1999 to December 31, 2004 in the following eight social work or socialwork-related journals: Child Abuse & Neglect, Journal of Gerontology Series B: Psychological Sciences and Social Sciences, Journal of Social Service Research, Journal of Studies on Alcohol, Research on Social Work Practice, Social Work Research, Social Work, and Social Service Review. During the 6-year period, Social Work and Social Service Review published no studies that employed SEM. A total of 139 articles using SEM were published by the seven remaining journals that were examined. Table 1.1 summarizes the 139 SEM publications by substantive areas and types of SEM. As the table shows, the majority of SEM applications in the targeted social work journals were general structural models (54.7%). The finding is not surprising because many social work research questions concern theoretically derived relationships among concepts that are best measured with latent variables. The second most common type of SEM was CFA (33.1%). Again, this finding is reasonable because developing measures of unobservable constructs is a primary task of
Table 1.1 SEM Applications by Social Work Research Area and SEM Type Substantive area
CFA
General structural models
Path analysis
Total
Aging
9 20.0% 2 14.3% 20 74.1% 2 33.3% 13 27.7% 46 33.1%
29 64.4% 11 78.6% 5 18.5% 2 33.3% 29 61.7% 76 54.7%
7 15.6% 1 7.1% 2 7.4% 2 33.3% 5 10.6% 17 12.2%
45 100% 14 100% 27 100% 6 100% 47 100% 139 100%
Child welfare Health/Mental health School social work Substance abuse Total Total %
Introduction
social work research. The remaining SEM articles reported on studies using path analysis (12.2%). Path analysis is useful for examining simultaneous regression equations among observed variables but does not exploit fully the advantages of SEM. In addition, it is possible (albeit more difficult) to obtain many of the results of a path analysis with more conventional analyses and software. Therefore, it makes sense that fewer social work articles used path analysis than the two SEM procedures with latent variables. Across substantive areas, the proportion of studies using different types of SEM varied, with general structural models more common in the fields of child welfare, aging, and substance abuse. CFA was the most common type of analysis used in SEM studies of health and mental health. The Guo and Lee (2007) study indicated that SEM was being used by researchers in many major topical areas of social work research. It is hoped that by the end of this book, readers will agree that SEM is the most appropriate analysis tool for much of the research done by social researchers.
15
2
Structural Equation Modeling Concepts
In this chapter we discuss in detail a number of theoretical and statistical concepts and principles that are central to SEM. SEM notation and equations are introduced in the context of more familiar graphics and terminology. The role of matrices in SEM analyses is explained. The material in this chapter is essential to understanding the more detailed treatment of topics in later chapters, but later chapters also reinforce and help illustrate concepts introduced here. Iacobucci (2009) also provides a complementary and instructive summary of SEM notation and its relationship to the matrices. For more in-depth information on basic statistical concepts, refer to a social science statistics text (e.g., Cohen & Cohen, 1983; Pagano, 1994; Rosenthal, 2001). More advanced treatment of the statistical foundations of SEM can be found in Bollen (1989), Long (1983), and Kaplan (2009), and among other SEM texts in the reference list.
LATENT VERSUS OBSERVED VARIABLES Latent variable is a central concept in SEM. Latent variables are measures of hidden or unobserved phenomena and theoretical constructs. In social 16
Structural Equation Modeling Concepts
work, latent variables represent complex social and psychological phenomena, such as attitudes, social relationships, or emotions, which are best measured with multiple observed items. Many terms for latent variables are encountered in the SEM literature, for example, factors, constructs, measures, or dimensions. In contrast, observed variables are variables that exist in a database or spreadsheet. They are variables whose raw scores for sample members can be seen, or observed, in a dataset. Observed variables may comprise scores from survey items or interview questions, or they may have been computed from other variables (e.g., a dichotomous income variable obtained by categorizing a continuous measure of income). Individual observed variables may be called items, indicators, manifest items, variables, questionnaire items, measures, or other terms in different sources. The observed items that measure latent variables may collectively be called a scale, subscale, instrument, measure, questionnaire, etc. The use of terms is not always consistent. The main point, however, is that observed variables come from raw data in data files. We’ll see later that the actual input data for SEM is usually the covariance matrix derived from a set of indicators. We follow Bollen (1989) in making a critical distinction between the terms scale and index. Note that this distinction is not made consistently in the literature! The latent variable modeling that is the subject of this book specifically involves scales, which in our conceptualization, are used to measure unobserved phenomena that “cause” scores on multiple, correlated indicators (Bollen). An underlying workplace climate will “cause” employees to respond in a generally negative or positive way to a set of indicators on a workplace support scale. In contrast, indicators of indices “cause” scores on the index and are not necessarily highly correlated. Checking off items on a list (index or inventory) of life stressors, for example, might lead to an individual’s high score on the index, but experiencing the “death of a close family member,” “trouble with boss,” or “pregnancy” are not necessarily or on average correlated or “caused” by some underlying phenomenon (Holmes & Rahe, 1967). Scores on indices are not driven by latent phenomena so are not of interest here. The distinction made between latent and observed variables represents a fundamental difference between SEM and conventional regression modeling. In the SEM framework, latent variables are of interest but cannot be directly measured. Observed variables are modeled as functions of model-specific latent constructs and latent measurement errors.
17
18
Structural Equation Modeling
In this framework, researchers are able to isolate “true” causes of scores and variations in scores due to irrelevant causes. Tests of relationships among the resulting latent variables are therefore superior to tests among variables containing irrelevant variance (i.e., error variance). As we have described, latent variables are measured indirectly through multiple observed variables. Researchers (Glisson, Hemmelgarn, & Post, 2002), for example, examined the quality of a 48-item instrument called the Shortform Assessment for Children (SAC) as a measure of “overall mental health and psychosocial functioning” (p. 82). The instrument includes 48 items, 24 of which are hypothesized to represent an internalizing dimension or factor, and 24 of which represent an externalizing dimension of mental health and psychosocial functioning. The internalizing items relate to affect, psychosomatic complaints, and social engagement. In this example, internalizing behavior is a latent (hidden, unobservable) phenomenon with a continuum of values. Each person is believed to have a “true” but unknowable score on a continuum of internalizing behavior. This internal personal “truth” is believed to largely determine each person’s scores on the set of direct questions about emotion, psychosomatic complaints, and social engagement. Observed scores derived from responses to the instrument’s questions are expected to be correlated with each other because they are all caused by each respondent’s true, unobservable internalizing status. Similarly, in the study by Bride et al. (2004), social workers’ differing experiences with the latent phenomenon “indirect trauma” were expected to influence their responses to the 17 items on the STSS. Scores on the items are expected to be correlated with each other and with the latent variable because they are “caused” by the same experience. If a worker’s exposure to indirect trauma has been low, responses to all 17 items are expected to reflect that level of exposure. Overall and in general, if a worker’s exposure to indirect trauma is high, his or her scores on all items should reflect that reality. Latent constructs also apply to characteristics of organizations. In a study of turnover among employees of child welfare agencies, for example, researchers (McGowan, Auerbach, & Strolin-Goltzman, 2009) describe constructs such as “clarity and coherence of practice,” “technology, training, and record keeping,” and “job supports and relationships.” In another study using SEM, Jang (2009) also used measures of workplace characteristics, for example, “perceived supervisory support,” and “perceived workplace support.” The assumption behind such measures
Structural Equation Modeling Concepts
is that some true but unobservable characteristic of an organization will systematically affect the responses of individuals within the organization to questions related to those characteristics. In the SEM framework, the presence and nature of a latent variable such as “indirect trauma exposure” or “perceived workplace support” is inferred from relationships (correlations or covariances) among the scores for observed variables chosen to measure it. Specifically, one starts with known information—e.g., a covariance between two observed variables— and applies statistical principles to estimate the relationship of each indicator to the hypothesized latent variable. If we hypothesize the existence of the latent variable “ability,” shown in Figure 2.1, for example, and we know from the questionnaire responses of 200 subjects that the correlation between items Q1 and Q2 is 0.64, we know (from measurement theory) that the product of the standardized paths from “ability” to Q1 and Q2 equals 0.64 (DeVellis, 2003). If we assumed that the two observed variables are influenced equally by the latent variable “ability,” we would know that the path coefficients were both 0.80 (because 0.80 × 0.80 = 0.64). Squaring the path coefficients also indicates the
Where: “Ability” is a latent variable measured by Q1 and Q2 Q1 and Q2 are observed variables (items on a questionnaire) d1 and d2 are error terms Given: Correlation r between Q1 and Q2 = 0.64 Influence of “ability” on Q1 and Q2 is the same 1 d1
Q1 Path1
1 d2
Q2
Ability
Path2
Results: Both path coefficients must be 0.80 (product of the paths = 0.64). 64% of Q1 is explained by “ability” (0.802). 64% of Q2 is explained by “ability” (0.802). 36% of Q1 must be explained by d1 (100% minus 64%). 36% of Q2 must be explained by d2 (100% minus 64%).
Figure 2.1 Calculating the Relationships of Observed Variables to a Latent Variable.
19
20
Structural Equation Modeling
amount of variance of each indicator explained by the latent variable— 64% in example in Figure 2.1. Because the explained and unexplained variance of a variable must equal 100%, we also know how much of the variance of each indicator is error (unexplained variance) (d1 or d2; 36% in the example). The variance of the error term is the difference between 100% and the amount of variance explained by “ability” (Long, 1983). In other words, 36% of the variance of Q1 is error variance, or variance that is unrelated to the construct of interest, “ability.” Given the correlation between Q1 and Q2 and the magnitude of the relationship between the unobserved construct “ability” and observed scores on Q1 and Q2, it is possible to estimate scores for subjects on the new latent variable “ability” and the variance of those scores. This illustration is simplified, but the process of working “backward” from known relationships (usually covariances among observed variables) to estimates of unknown parameters is a central notion in SEM. In this discussion, we have illustrated an important property of SEM, that is, the product of the standardized path coefficients (i.e., 0.80 and 0.80) from one latent variable to two observed variables equals the correlation (i.e., 0.64) of the observed variables. In Box 2.1, we provide a proof of the property, which was developed by Spearman in 1904, marking the birth of SEM. In any SEM, researchers have observed data, such as a known correlation of 0.64. The known (or observed) data are used to estimate path coefficients, such as the two coefficients reflecting the net influence of “ability” on Q1 and Q2. Of course, the estimation becomes more complicated when there are multiple correlations or covariances as input data, latent variable effects are not assumed to be the same on all indicators, there are more than two indicators of a latent variable, and so on. In more complicated models, in fact, more than one solution is possible—more than one set of parameters might satisfy the multiple equations defining the model. An important component of the analysis therefore becomes determining which solution is the best. We will examine that issue more thoroughly shortly.
PARTS OF A MEASUREMENT MODEL We will now look more closely at the statistical and conceptual foundations of a measurement model building on the terms introduced in the
Structural Equation Modeling Concepts Box 2-1 Proof of an SEM Property and a First Peek at SEM Notation In Spearman’s original work, he claimed that observed intercorrelations among scores on tests of different types of mental ability could be accounted for by a general underlying ability factor. Using our current example, we can imagine that the general ability factor affecting all test scores is the latent variable “ability.” Scores on Q1 and Q2 in this example represent observed scores on two mental ability subtests. Variance in Q1 and Q2 that is not explained by “ability” is captured in d1 and d2, respectively. Denoting the two path coefficients (now called factor loadings) as λ1 and λ2 (lambda 1 and lambda 2), Spearman proved that the observed correlation between Q1 and Q2 (i.e., ρ12) equals the product of the two factor loadings λ1 and λ2, or ρ12 = λ1λ2, or 0.64 = 0.80 * 0.80. To prove this, we first express our model of Figure 2.1 in the following equations: Q1 = λ1Ability d1 Q2 = λ 2 Ability d2.
Assuming we work with standardized scores for all variables, then the correlation ρ12 is simply the covariance of Q1 and Q2, or ρ12 = Cov(Q1, Q2). Using the algebra of expectations, we can further write Cov (Q1,Q2) = E ⎡⎣( Ability Abilit A bilityy + d
))((
2 = E ⎡⎣λ1λ 2 Ability A + λ1
(
)
A Ability + d2) ⎤⎦ + λ 2 Abilityd1 + d1d2 ⎤⎦
= λ1λ 2E Ability 2 + λ1E ( Abili A iityd2) λ 2E ( Abilityd1) E (d1d2) . Because E(Abilityd2) = 0 and E(Abilityd1) = 0 (because there is no correlation between the common factor Ability and each error), and E(d1d2 = 0) (because the two measurement errors are not correlated), then the equation becomes ρ12 = λ1λ2 E(Ability2). Because E(Ability2) is Variance(Ability) and equals 1 (because Ability is a standardized score), then ρ12 = λ1λ2. That is, the observed correlation between two variables is a product of two path coefficients.
21
22
Structural Equation Modeling
previous section. In this section, and throughout the rest of the book, we will employ the common practice of using Greek notation to refer to specific elements in the models presented. For example, using Greek notation, error terms are indicated by δ (delta), rather than the “d” used in Figure 2.1. Readers are encouraged to refer to the guide to Greek notation provided in the Appendix 1 for an explanation of all symbols used. The notation for SEM equations, illustrations, and matrices varies across sources. We present one set of notations that we believe minimizes confusion across measurement and structural examples, but readers should be aware that they will encounter other notation protocols in other sources. Figure 2.2 presents a simple CFA model using common symbols. The model has three latent variables: Risk1, Risk2, and Behavior. Latent variables are indicated by circles or ovals. Because they are latent, by definition the three variables do not exist in a dataset. They are hidden, unobservable, theoretical variables. In the model, each is hypothesized to have three indicators. Risk1 represents some risk phenomenon that influences (hence the one-way arrows) the scores individuals in the database have on three observed variables, x1, x2 and x3. Often latent variables have more than three indicators, especially when they represent complex phenomena assessed with many scale items, or items on a questionnaire. For example, 25 items assessing feelings of happiness, loneliness, and sadness make up the Generalized Contentment Scale available at the WALMYR website (http://www.walmyr.com/). It is also possible to have
d1
1
d2
1
d3
1
x1 x2
l11 l21 Risk1 x1 l31
x3
l73 Behavior l83
x3 d4
1
d5
1
d6
1
x4 x5
l42 l52 Risk2 x2
l62
x6
Figure 2.2 Measurement Model.
x7
1 d 7
x8
1 d 8
x9
1 d 9
l93
Structural Equation Modeling Concepts Box 2-2 Components of Factor Structure The factor structure of a set of variables includes the number of factors the number of observed items the pattern and magnitude of loadings of items on factors the correlations among the factors correlations among error terms
a latent variable with only two indicators, but it is best to have a minimum of three (later in this chapter, we will examine the reasons for this in more detail). Characteristics of a measurement model represent its factor structure. See Box 2.2. The common symbol for an observed variable in an SEM diagram (including CFA models) is a square or rectangle. In Figure 2.2, x1, x2 and x3 are three questionnaire items. Responses from the questionnaire items have been entered into a database for analysis. The values may be numbers corresponding to respondents’ answers to survey questions, or items on a rating scale, or values coded from administrative, observational, or interview data. Observed variables may also be recoded variables or composites based on other observed variables. Like the Risk1 variable, Risk2 and Behavior are latent variables that are hypothesized to “cause” the observed values of other questionnaire items (x4 through x9). It may seem inaccurate to call Behavior a latent variable. Aren’t behaviors observable? Many latent variables include items related to observable behaviors, such as hyperactivity or impulsivity as manifestations of an underlying attention disorder, or sleeplessness as a manifestation of depression. Even such observable phenomena are often more accurately measured with multiple items. In the latent variable framework, both measurement error and model-specific error can be removed from the observed indicators, leaving higher quality measures for use in structural analyses. The relationships among the latent and observed variables in Figure 2.2 can also be expressed in equations that are similar to regression equations. The equations relating latent variable Risk1 (ξ1, pronounced ksee) to x1, x2 and x3 are
23
24
Structural Equation Modeling
x1 = λ11ξ1 + δ1 x 2 = λ 21ξ1 + δ 2 x 3 = λ 31ξ1 + δ 3 (Long, 1983). The equations state that the score for an individual on any one observed variable (x1, x2, x3) is the individual’s score on the latent variable times the factor loading λ (lambda) of the observed variable on the latent variable, plus an error term δ (delta). Note that the first subscript for a path coefficient (λ in these examples) refers to the dependent variable in the equation—the variable to which an arrow is pointing in the figure, or the variable on the left side of the equation. The second subscript refers to the subscript of the independent variable. The relationship between a latent factor (ξ) and one of its indicators is similar to the regression relationship between a predictor, or independent variable, and a dependent variable. The similarity reflects the fact that scores on the indicator are “caused” by the latent variable. A critical difference, however, is that in factor analysis, the predictor variable is unobserved, theoretical, or latent. Without observed data in the dataset on the predictor, estimating its effects on observed variables requires a different process than conventional regression analysis (Long, 1983). It involves the use of matrix algebra and maximum likelihood estimation, which will be discussed later. Still, the factor loading λ that is obtained as an estimate of the strength of the effect of the latent variable (the independent variable) on an indicator (dependent variable) is interpreted the same as a regression coefficient—that is, a 1-unit change in the latent variable is associated with a change of magnitude λ in the observed dependent variable (Long). If variables are standardized, λ “is the expected shift in standard deviation units of the dependent variable that is due to a one standard deviation shift in the independent variable” (Bollen, 1989, p. 349). Another difference between the latent variable equation and standard regression equations is the lack of an intercept. Observed variables in SEM are treated as deviations (or differences) from their means; in other words, instead of using the raw scores that appear in a dataset, SEM software “centers” variables by subtracting the mean from each score. This transformation has no effect on the variances and covariances of
Structural Equation Modeling Concepts
variables (the input data for SEM model tests) but allows important simplifications of the equations used to estimate models. Some of these simplifications were evident in the proof presented in Box 2.1. For further explanation, see Long (1983, pp. 22–23). In Figure 2.2 rectangles representing observed variables associated with latent variables have a second arrow pointing to them, coming from smaller latent variables (whose names start with delta “δ”). The second arrow suggests that scores on the observed variable are influenced by something other than the latent variable of interest. This “something other” is a combination of omitted effects, primarily measurement errors. It includes traditional measurement error and a new kind of error that is unique to latent variable models. Traditional measurement error refers to differences between an individual’s “true” (unknowable) score for an indicator and the actual observed score obtained for the individual. Differences between “true” scores and obtained scores are assumed to be due to random error. Random error is unpredictable—as when a child makes a picture by filling in the response ovals on a questionnaire, or when respondents become fatigued and stop reading items carefully. In measurement models with latent variables, a second source of measurement error is grouped with random error and partitioned out of the latent variable variance. The second type of error is variation in indicator scores that is not caused by the latent variable(s) modeled in the measurement model, but by other unobserved factors not relevant to the current model. It may include systematic measurement error, which is predictable—as when a regional difference in vocabulary causes all respondents in one region to interpret a question in the same predictable but wrong way. Or it may include legitimate but unwanted variation for the current model. An example is provided below. Measurement error terms in SEM represent variance in an observed indicator that is due to random and systematic error specific to the indicator. The latent error variables are also called residual variances or unique factors. They are “residual,” or “left over,” because they contain all variance of the observed variables that is not explained by the latent factors of interest regardless of the source of the variance. They are “unique” because each error term represents variance that is unique or specific to an observed variable, unlike the latent factors (also called common factors) which explain variance in multiple observed variables (i.e., variance that is common to multiple indicators).
25
26
Structural Equation Modeling
As an example of a unique factor, imagine a latent model of depression (see Figure 2.3). Consistent with the American Psychiatric Association’s definition of a major depressive episode (American Psychiatric Association, 1994), the model includes cognitive, affective, and physical indicators of depression, each of which is measured with a certain amount of systematic and random error. One hypothetical cognitive indicator, “how often in the past 2 weeks have you had trouble concentrating,” is a valid indicator of the cognitive dimension of depression. We can imagine that a small amount of its variance (let’s say 5%) is due to random error due to the unpredictable responses of patients who do not understand the word “concentrating.” We might also imagine that an additional amount of variance (e.g., 12%) in the indicator is due to a latent anxiety phenomenon; the item is also reliable and valid indicator of anxiety. Individuals who are not depressed but who have anxiety respond predictably to the item, even though their anxiety-driven responses are not related to the construct of interest. Because our model does not include a latent anxiety variable, variance in the Concentrate variable that is exclusively caused by different levels of anxiety in respondents is treated as error in our depression model. SEM output provides estimates of the variances of the error terms for latent variable indicators and indicates if they are statistically significantly different from 0. Error variance is a summary measure of how much the error terms for a sample on a predicted variable differ from the mean of those scores, which is assumed to be 0. Larger error variances indicate that observed items are not well explained by latent variables or may not be good measures of those latent variables. Double-headed arrows in SEM models represent hypothesized correlational relationships—relationships in which neither variable is considered independent or dependent. Such relationships are sometimes called “unanalyzed” relationships. In Figure 2.2, there are double-headed arrows between pairs of latent factors. When more than one latent construct, or factor, is included in a measurement model in SEM, the factors are usually allowed to be correlated with one another. In traditional regression, correlations among independent variables, although common, are not desirable because they complicate the interpretation of regression coefficients. Therefore, another advantage that SEM has over conventional regression is that the correlations among independent variables can be modeled and estimated.
Error term d1 represents variance of the “concentrate” indicator that is not correlated with other indicators or with the latent variable Depression. d1= e + s where e is random error and s is systematic error specific to the indicator (Gerbing & Anderson, 1984.) If d1 = 0.05 + 0.12 (17% of the variance of Concentrate), 83% of the variance of Concentrate is explained by Depression (Gerbing & Anderson, 1984.) If the latent variable of interest were Anxiety, variance in Concentrate associated with Depression would be part of s in the error term instead.
d1
1 1
1
1
1
Concentrate
1
Depression
Figure 2.3 A Closer Look at Measurement Error Variance Partitioning.
27
28
Structural Equation Modeling
In summary, measurement models include latent factors and the correlations among them, observed indicators of those factors, and error terms for observed variables. Chapter 4 describes in more detail how to specify confirmatory factor models and interpret their results.
PARTS OF A STRUCTURAL MODEL Whereas measurement models are concerned with how latent constructs are measured, structural models are concerned with the directional relationships among latent variables or factors, once their measurement qualities have been established. Structural models in SEM are like standard regression models, except that the independent variables, dependent variables, or both are latent factors measured with observed indicators. For example, using the three dimensions of indirect trauma established in the Bride et al. (2004) study discussed earlier, a hypothetical structural model might test the hypothesis that levels of stress affect an observed measure of annual number of sick days taken, controlling for gender and preexisting health condition. The focus in structural models is on testing the strength and direction of substantive relationships among variables with implications for theory, practice, or policy. A major advantage of latent-variable models is that estimates of the relationships among latent variables are based only on variation in the observed indicators that is related to the latent variables. If the latent Depression variable in Figure 2.3 were used as a predictor of another variable, Parenting for example, the part of Concentrate associated with anxiety (and not depression) would not be included in the calculation of the Depression’s effect on Parenting. Variance in Concentrate that is associated with underlying anxiety would be contained in the error variance for the Concentrate indicator. The estimate obtained for the relationship of Depression to Parenting would be based only on the theoretically error-free variance of Depression. Figure 2.4 presents a structural model based on Figure 2.2. Although the latent variables and their relationships to indicator variables are still present, the structural model has components that are different from the measurement model. First, there are both single-headed and doubleheaded arrows among the three latent variables in the model. Single headed arrows between two latent variables indicate a hypothesized
Structural Equation Modeling Concepts d1 1 d2
1
d3 1
x1 x2
1 l2 l3
z1 Risk1 x1
g11
1 1
x3 Behavior h1
d4
1
x4
d5 1
x5
d6 1
x6
g12 1 l5 l6
Risk2 x2
g13
l8
y1 y2
l9 y3
1 1 1
e1 e2 e3
Gender x3
Figure 2.4 General Structural Model 1: Direct Effects Only.
directional relationship. Figure 2.4 hypothesizes that the two latent risk variables are statistically predictive of the behavioral outcome. Behavior is being regressed on Risk1, Risk2. Estimates of the effects of the two risk variables on Behavior are denoted with the symbol, γ (gamma). Note that when a latent variable, such as Behavior, is a dependent variable in a structural model or equation, the notation used is η (eta) instead of ξ, which was used in the measurement model. This is because in SEM, variables are either exogenous, meaning they are not explained or predicted by any other variables in the model; or they are endogenous, meaning they are explained or predicted by one or more other variables. Every latent and observed variable in an SEM model is either exogenous or endogenous. Endogenous variables serve as dependent variables in at least one equation represented in a model. In our simple structural model of risk and behavior, for example, Risk1, Risk2, Gender, and all of our error terms are exogenous; they have no single-headed arrows pointing to them. Behavior and all of the variables representing our questionnaire items (x1 to x9) are endogenous; they have at least one single-headed arrow pointing to them. Risk1 and Risk2 are connected by a doubleheaded arrow. It is important to remember that because a double-headed arrow symbolizes a correlation, not a directional relationship, the two risk variables are considered exogenous. Note that the distinction between exogenous and endogenous variables is model specific. Exogenous variables in one study may be
29
30
Structural Equation Modeling
endogenous variables in another study, or vice versa. Neighborhood cohesiveness might be an exogenous predictor of the success of community organizing efforts in one model, for example, but could be a dependent (endogenous) variable predicted by community organizing efforts in another. Note also that to avoid confusion between λ’s associated with exogenous (ξ) and endogenous (η) variables with the same subscripts, we follow notation used by Bollen (1989) for models containing both measurement and structural components. Instead of two subscripts indicating the observed variable number and latent variable number, respectively, λ’s are simply numbered consecutively with one subscript throughout the model. The SEM equation for regressing Behavior (η) on the two risk variables (ξ1, ξ2) is 1
= γ 11ξ1 γ 12 ξ2 ζ1
The equation states that the score for an individual on the latent behavior variable (η1) is predicted by the individual’s score on the Risk1 latent variable (ξ1) times the regression coefficient γ11 (gamma) plus the individual’s score on the Risk2 latent variable (ξ2) times the regression coefficient γ12 plus the error term ζ1 (zeta). ζ is structural error—the variance of Behavior that is unexplained by its predictor variables. Structural error can also be thought of as the error of prediction because, as in all regression analyses, variance in a dependent variable (e.g., the endogenous Behavior variable) is likely to be influenced, or predicted, by influences other than the variables included in a model. In other words, we would not expect the risk and gender variables to predict Behavior perfectly. Box 2.3 explains the difference between this type of error and the measurement error we discussed in the previous section. Like the measurement model equations, the equations predicting latent variable scores are similar to regression equations, but with different notation and no intercepts. Latent variable scores are also treated as deviations from their means. There is an additional observed variable in Figure 2.4: Gender. We know it is an observed variable because it is represented with a rectangle. Unlike the other rectangles in the figure, however, it does not appear to be an indicator of a latent variable. The arrow between Gender and Behavior points toward the latent variable. The scores individuals have
Structural Equation Modeling Concepts Box 2-3 Two Types of Error in SEM In the discussion of measurement models starting on p. 20, we defined measurement error as “unique” and “residual” variation in scores of observed indicators that were not associated with the hypothesized factor model. An additional type of error is relevant to structural models and should not be confused with measurement error. SEM structural models, like other regression models, include structural errors. The structural error for any dependent variable in a structural model is the variance of the variable that is not explained by its predictor variables. Although the latent risk and behavior variables in Figure 2.4 are theoretically free of measurement error, we do not expect the risk and gender variables to predict Behavior perfectly. In other words, we do not expect 100% of the variance of Behavior to be explained by the two risk variables and gender. In a general structural model, any variable that is regressed on others in the model has an error term representing the structural error (this error can also be thought of as the “error of prediction”). The latent variable ζ1 represents the error in our structural model—the variation in behavior scores that is not explained by Risk1, Risk2, and Gender.
on the Gender variable are not caused by the underlying Behavior tendency of the individuals. Instead, the arrow represents a hypothesized regression, or structural, relationship. Gender is being used as a control variable or covariate in the model. With the same diagram, we could also call Gender another independent variable. By calling gender a control variable in this example, we are indicating that we are most interested in the effects of Risk1 and Risk2 on Behavior after removing the effects of gender on Behavior, that is, the effects of the two independent variables on variation in Behavior left over after the effects of gender have been accounted for. Based on Figure 2.4, the complete regression equation for Behavior needs to include Gender (ξ3). In this example, Gender is a tenth observed variable that affects the dependent variable and is not itself predicted by any other variable in the model: η1 = γ 11ξ1 γ 12 ξ2
γ 13ξ3 ζ1
In this equation, Gender (ξ13) has been added as the third predictor of Behavior (η1). γ13 is the regression coefficient representing the effect of Gender on Behavior scores. Including the gender variable in Figure 2.4 illustrates how structural models in SEM can include a combination of latent and observed independent (and dependent) variables.
31
32
Structural Equation Modeling
The absence of double-headed arrows between Gender and the risk variables signifies that the correlations between Gender and risk are expected to be 0. It is important to remember that any possible path between two variables in an SEM model that is not explicitly pictured represents the hypothesis that the value of the directional or nondirectional relationship is 0. In the current example, Risk1 and Risk2 might be latent measures of neighborhood disorganization and danger, which we would not expect to be correlated with gender. The equations for indicators of Behavior in the measurement part of the structural model also change from those used for the measurementonly model in Figure 2.2. The indicators of latent variables, like Behavior, that serve as dependent variables in a model are now noted as y variables (instead of x), and their error terms are noted with ε (epsilon, instead of δ). In addition, as stated earlier, the latent variable is now notated with η (instead of ξ): y 1 = λ 7 η1 + ε1 y 2 = λ 8 η1 + ε 2 y 3 = λ 9 η1 + ε 3 . All endogenous variables in a model are predicted (imperfectly) by one or more other variables. Therefore, they all have associated error terms. If the predicted variables are observed indicators of the measurement part of a model, the error terms represent measurement error. If the variables are substantive dependent variables (either latent or observed) being regressed on predictors, the error terms represent structural errors. Figure 2.5 presents a slightly different structural model. Risk1, Risk2, δ1 through δ6 , ε1 through ε4 and ζ1 and ζ2 are exogenous variables. Behavior, Parenting, Par10 (y4), x1 through x6, and y1 through y3 are endogenous variables. There are two structural errors, ζ1 and ζ2 , and 10 measurement errors, δ1 through δ6, and ε1 through ε4. In Figure 2.5, Parenting is a new latent variable with one indicator (Par10). We can imagine that the Parenting variable is an observed composite—the sum of responses to 10 items on a parenting scale. Modeled as it is, Parenting is a second endogenous latent variable whose value is equal to its one observed indicator, which may or may not be
Structural Equation Modeling Concepts d1
1
d2
1
d3
1
x1 x2
1
z1
l2
Risk1 x1
l3
1
g11
x3 Behavior h1
d4 d5 d6
1 1 1
l8
y2
l9 y3
b12
x4 x5
y1
1
1
1
1
e1 e2 e3
1 l5 l6
Risk2 x2
g22
x6
Parenting 1 h2
z2
1 Par10 1 e4 (y4)
Figure 2.5 General Structural Equation Model 2: Direct and Indirect Effects.
modeled as having a positive error variance. We could fix the error term of Par10 to 0, if we believe it is a perfect measure of parenting (an unlikely claim), to a value between 0 and 1 if its reliability is known from previous studies; or we could seek an estimate of the variance of ε4 in the current SEM analysis. This modeling technique demonstrates one way to include an observed variable of substantive interest in a latent variable model. (The modeling of Gender in Figure 2.4 illustrated another.) In Figure 2.5, Parenting mediates the effects of Risk2 on Behavior. If Risk2 is a latent variable assessing neighborhood danger, for example, we could hypothesize that danger affects children’s behavior indirectly by influencing parents’ monitoring of their children’s activities. Parenting serves both as a dependent variable because it is predicted by Risk2, and an independent variable because it predicts Behavior. The addition of Parenting as an endogenous variable necessitates a new equation for the specification of the structural model, and a change in the equation for predicting Behavior. The predictive equation for Parenting is η2 = γ 22 ξ2 ζ2 . Behavior (η1) is now predicted directly by the exogenous variable Risk1 (ξ1, with a γ path), and by the endogenous Parenting variable with a β (beta) path. Because there is no direct path from Risk2 to
33
34
Structural Equation Modeling
Behavior, ξ2 does not appear in the equation predicting Behavior (even though Risk2 has an indirect effect on Behavior): η1
β12 η2 + γ 11ξ1 ζ1 .
In summary, structural models in SEM models with latent variables have measurement components and structural components. The structural paths hypothesize substantive relationships among variables. Paths from exogenous (ξ) to endogenous (η) latent variables are γ paths. Paths from endogenous to endogenous latent variables are β paths. Observed indicators of exogenous variables are “x” variables and have error terms labeled δ. Observed indicators of endogenous variables are “y” variables and have measurement error terms labeled ε. Structural errors, or errors of prediction, are designated with the symbol ζ.
TESTING MODELS—AN INTRODUCTION The inclusion of latent variables in SEM models necessitates an analysis approach that is different from the approach used in regression models with observed variables. If the user specifies a raw dataset for analysis, the SEM program first generates a covariance matrix (in the default situation) from the raw data. It is also possible to provide a covariance matrix without its associated raw data. Either way, the covariance matrix provides the data analyzed in the SEM program. The data are used to estimate the parameters in the model specified by the user. Models, as we’ll see later, are specified in Amos through graphics such as those presented in Figures 2.4 and 2.5. In Mplus, the user specifies the model with simple syntax dictating measurement and structural relationships. After a CFA or general SEM is specified based on the researcher’s theoretical model, the next step is to use the observed data (i.e., a covariance or correlation matrix) to estimate the parameters specified. This step is called model estimation. The maximum likelihood estimator (ML) is the most popular approach used in the model estimation and is the default method of SEM programs. Additionally, weighted least squares (WLS) is a family of methods that may be especially relevant for social work data. Later, we will describe some of the options available and how to choose among them.
Structural Equation Modeling Concepts
After data, a model, and an estimation procedure have been selected, the SEM program iteratively generates estimates for parameters in the model, which means the program continues to make and refine estimates that are consistent with the input covariance matrix until no improvements can be made. In a measurement model, the parameters to be estimated are the factor loadings, latent variable variances and covariances, and measurement error terms. In a general structural model, estimates of regression paths among latent variables and structural error variances are also generated. A simplified version of how the estimation process occurs was presented in the discussion of Figure 2.1. In reality, most models contain many parameters to be estimated, so the program must attempt simultaneously to find estimates consistent with numerous criteria, not just one initial covariance. What does it mean to say “no improvements” can be made in a model? The determination of what are the best obtainable estimates is based on the minimization of a function that the SEM program uses to compare the original covariance matrix of the observed variables and a new covariance matrix that is implied by the specified model and the estimates generated in the estimation procedure. The new matrix is generated taking into account the constraints imposed by the model specified by the user. For example, in Figure 2.5, a moderate to strong covariance between observed variables x1 and x2 is suggested by their common relationship to Risk1. In contrast, the model suggests that the covariance between x1 and y1 is much smaller and occurs only through the relationship of each observed variable with its latent variable. The goal is to obtain an implied matrix that is as close to the original covariance matrix as possible. The minimization function basically assesses how close each element in the original covariance matrix is to its corresponding element in the implied covariance matrix generated by each set of estimates tried. We will return to this concept frequently because it is so key to understanding SEM. Before we can go much further with this discussion of testing structural equation models, we need to examine the numerous roles that matrices play in SEM.
MATRICES IN SEM A matrix is set of elements (i.e., numbers, values, or quantities) organized in rows and columns. Most social work researchers are familiar
35
36
Structural Equation Modeling
with matrices. An Excel spreadsheet summarizing incoming students’ test scores, grades, and demographics; a grading grid; and a proposal budget are just some examples of matrices. The simplest matrix is one number, or a scalar. Other simple matrices are vectors, which comprise only a row or column of numbers. Correlation matrices are commonly encountered in social work research. They summarize raw data collected from or about individuals and vary in size based on the number of variables included. Correlation matrices have the same number of rows and columns—one for each variable. Matrices can be multiplied, divided, added, subtracted, inverted, transposed, and otherwise manipulated following rules of matrix algebra. Matrices are used in multiple ways in SEM analyses. Analyses rely, for example, on data in the covariance or correlation matrices that summarize values in a raw dataset. Also, all specified measurement and structural models with latent variables are translated by SEM software into between three and eight matrices (some of which may be vectors or scalars). The matrices are then manipulated based on established proofs from matrix algebra and the algebra of expectations to generate estimates of unknown parameters. Because matrices have known properties and the outcomes of different operations on matrices (e.g., adding or multiplying them together) are known, they provide a shortcut way—that is, a faster, easier, less computationally demanding way—to accomplish the goals of SEM analyses. As stated earlier, matrices are also the basis of the fundamental SEM method of evaluating the quality of a model— comparing the original input matrix to the model-implied matrix of covariances. More about each of these roles of matrices in SEM is presented below. A full explanation of matrix algebra is beyond the scope of this book. Bollen (1989) provides a useful summary for interested readers. Long (1983) discusses matrix algebra as it applies to CFA. In addition to being used by SEM programs to estimate models, matrices are useful tools that researchers use to specify models in great detail. Matrix notation can be used to present and expand upon the information given in SEM equations, such as the equations presented earlier in this chapter. SEM software can be used without in-depth knowledge of matrix algebra, but understanding the basic role of matrices in SEM has practical benefits for preventing misspecification errors, interpreting output, and solving problems reported by software. It also makes users more confident and knowledgeable in the written and oral presentation of their research.
Structural Equation Modeling Concepts
Matrices 1: Expanding Equations into Matrix Notation Measurement Model Equations. The measurement model pictured in Figure 2.2 contains information for the three matrices used to specify and estimate CFA analyses. The equations presented earlier contain the same information as the figure. Recall the following equations: x1 = λ11ξ1 + δ1 x 2 = λ 21ξ1 + δ 2 x 3 = λ 31ξ1 + δ 3 . The equations state that the observed scores of each x in the dataset are predicted by a score on the latent factor (ξ1, Risk1) times a factor loading (λ) plus error (δ). We can add similar equations for the rest of the observed variables, which load on Risk2 (ξ2) and Behavior (ξ3) in the factor model in Figure 2.2: x 4 = λ 42 ξ2 + δ 4 x 5 = λ 52 ξ2 + δ 5 x 6 = λ 62 ξ2 + δ 6 x 7 = λ 73ξ3 + δ 7 x 8 = λ 83ξ3 + δ 8 x 9 = λ 93ξ3 + δ 9 . All of these relationships can also be compactly expressed in the following equation: x = Λxξ + δ where Λ (capital λ) is the matrix of λ’s, or factor loadings relating latent variables to observed variables. The equation states more generally
37
38
Structural Equation Modeling
that the vector of values for a variable x in a raw dataset is a product of the variable’s factor loading (Λ) on the latent variable (ξ) and the vector of scores for cases on that latent variable, plus a vector of error terms. The matrix format corresponding to both the detailed and compact equations is ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
⎡λ ⎢λ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
λ λ λ
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎡ξ ⎥ ⎢ξ ⎥⎢ ⎥ ⎢⎣ ξ λ ⎥ ⎥ λ ⎥ λ ⎥⎦
⎡δ ⎢δ ⎢ ⎢ ⎢ ⎤ ⎢δ ⎥ + ⎢δ ⎥ ⎢ ⎥⎦ ⎢ δ ⎢δ ⎢ ⎢δ ⎢δ ⎣
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
Brackets are used to enclose matrices. Mathematical symbols indicate the operations specified for the matrices. The juxtaposition of the Λ and ξ matrices indicate that they are to be multiplied. The equations predicting observed variables from latent variables can be derived from this matrix expression by progressing across each line and performing the operations. For x1, the three terms in the first row of Λx matrix are multiplied by the elements in the ξ matrix as follows, x1=
[λ11
⎡ ξ1 ⎤ 0 0] ⎢⎢ 2 ⎥⎥ = λ11 (ξ1 )+ 0 (ξ2 )+ 0 (ξ3 ) = λ11 (ξ1 ) . ⎢⎣ ξ3 ⎥⎦
Then, the error term δ1 is added, resulting in the equation given earlier: x1 = λ11ξ1 + δ1 . In models with endogenous latent variables (e.g., Figure 2.4), the endogenous latent variable equations have the same format but different notation, as indicated earlier: y 1 = λ 7 η1 + ε1
Structural Equation Modeling Concepts
y 2 = λ 8 η1 + ε 2 y 3 = λ 9 η1 + ε 3 . These equations can be expanded into matrix notation in the same way as the exogenous latent variable equations. Structural Model Equations. Figure 2.5 included two endogenous variables, one of which (Parenting, η2) was predicted by an exogenous latent variable (Risk2, ξ2), and one of which (Behavior, η1) was predicted by both the exogenous latent variable (Risk1, ξ1), and the endogenous observed Parenting variable (η2). The equations given earlier for these structural relationships were η1
β12 η2 + γ 11ξ1 ζ1 η2 = γ 22 ξ2 ζ2 .
The compact expression for these equations is η = Bη + Γξ ζ. where B (capital β) is the matrix of β parameters between endogenous variables, and Γ (capital γ) is the matrix of γ parameters between exogenous and endogenous variables. The matrix format corresponding to both the detailed and compact equations is ⎡ η1 ⎤ ⎡0 β12 ⎤ ⎡ η1 ⎤ ⎡ γ 11 ⎢ η ⎥ = ⎢0 0 ⎥ ⎢ η ⎥ + ⎢ 0 ⎣ 2⎦ ⎣ ⎦⎣ 2⎦ ⎣
0 ⎤ ⎡ ξ1 ξ1 ⎤ . + γ 22 ⎥⎦ ⎢⎣ξ2 ⎥⎦ ⎢⎣ξ2 ⎥⎦
If you carry out the operations, you obtain η1
(0) η1
β12 η2
γ 11ξ1
(0 ) ξ 2
ζ1 ,
which reduces to the original equation for η1 above. In summary, one important way that matrices are used in SEM is to convey the elements and operations of equations that define SEM models.
39
40
Structural Equation Modeling
Matrices 2: Computational Matrices SEM estimation involves the manipulation of between three (for measurement-only models) and eight matrices (for general structural models). Each of these matrices is described below. The matrices will be discussed in later chapters, so the information in this section should be viewed as reference material, not material that needs to be fully understood at this point. Measurement-Only Model Matrices. Because all CFA latent variables are exogenous, all observed variables in a CFA model are labeled “x,” all latent variables are labeled ξ, and all error terms are labeled δ. (Note, however, that other texts sometimes use x and y notations in measurement models based on the role of the latent variables in later general structural models.) CFA models include a Λ matrix containing factor loadings (λ’s) specifying which observed variables load on which factors. This matrix has a row for each observed variable and a column for each hypothesized latent variable. The Λ matrix for our Figure 2.2 example with nine observed variables and three factors would be the following: ⎡ λ11 ⎢λ ⎢ 21 ⎢ 31 ⎢ ⎢0 Λx = ⎢ 0 ⎢ ⎢0 ⎢0 ⎢ ⎢0 ⎢0 ⎣
0 0 0 λ 42 λ 522 λ 62 0 0 0
0 ⎤ 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎥ 0 ⎥. ⎥ 0 ⎥ λ 73 ⎥ ⎥ λ 83 ⎥ λ 93 ⎥⎦
Although the rows and columns are not labeled, it is understood through the subscripts that the rows correspond to observed x variables 1 through 9, and the columns correspond to latent ξ variables 1, 2, and 3. We noted earlier that the first λ subscript in factor equations referred to the (dependent) indicator variable, and the second referred to the factor. The same rule applies for the Λ matrix entries; the first subscript refers to the number of the indicator variable or row, and the second refers to the number of the factor or column. In Figure 2.2, no observed variable
Structural Equation Modeling Concepts
loaded on more than one factor. Consistent with the figure, the Λx matrix above specifies that one factor loading is to be estimated for each variable and the loadings for the other two factors are to be fixed at 0. In confirmatory factor analysis, it is possible, however, to have variables load on multiple factors. If, for example, observed variable 2 (x2) loaded on factors 1 and 3 (ξ1, ξ3), and variable 6 (x6) loaded on factors 1 and 2 (ξ1, ξ2), the matrix for the model would be ⎡ λ11 ⎢λ ⎢ 21 ⎢ 31 ⎢ ⎢0 Λx = ⎢ 0 ⎢ ⎢λ 61 ⎢0 ⎢ ⎢0 ⎢0 ⎣
0 0 0 λ 42 λ 522 λ 62 0 0 0
0 ⎤ λ 23 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎥ 0 ⎥. ⎥ 0 ⎥ λ 73 ⎥ ⎥ λ 83 ⎥ λ 93 ⎥⎦
A second matrix that is used in the analysis of a measurement model is the Φ (capital phi) matrix, containing variances and covariances of the latent variables (φ’s, phis). This matrix has one row and one column for each latent variable in a model. The phi matrix for the model in Figure 2.2 with three correlated latent variables, therefore, would look like the following: ⎡ φ11 Φ = ⎢⎢φ21 ⎢⎣φ31
φ22 32
⎤ ⎥. ⎥ ⎥ 33 ⎦
The phi matrix is symmetrical. Values above the diagonal are not included because they are identical to those below the diagonal. The covariance of ξ1 and ξ2, for example, is the same as the covariance between ξ2 and ξ1. As with a covariance matrix of observed variables, the values on the diagonal are variances. Again, the rows and columns are not labeled, but it is understood through the subscripts that the values from left to right and from top to bottom apply, respectively, to ξ1, ξ2,
41
42
Structural Equation Modeling
and ξ3. If any pair of factors in a model do not covary, a 0 would replace the corresponding off-diagonal φ element. The third matrix used in the analysis of measurement models is the Θδ (theta delta) matrix, containing the error variances and covariances of the observed indicators of exogenous variables (θ’s). The theta matrix has one row and one column for each observed variable in the CFA model. The diagonal of the Θδ matrix contains the variances of the error terms of observed variables, and the off diagonals contain their covariances. Usually error terms are not correlated, however, in CFA they are allowed to be, if there is theoretical justification. It is considered reasonable, for example, to allow the errors of the same measure administered at two different times to be correlated. Often, CFA models are revised to include correlated errors to improve fit. This issue will be discussed in more detail in Chapter 4. In the example of a Θ matrix following this paragraph, most of the error covariances are fixed at 0, however, the matrix specifies that the covariance between the error terms for variables 4 and 5 is expected to be different from 0:
Θδ
⎛ θ11 ⎜0 ⎜ = ⎜0 ⎜0 ⎜ ⎜⎝ 0
θ22 0 0 0
θ33 0 0
θ44 θ54
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ θ55 ⎟⎠
The estimates in the Λx,Φ, and Θδ matrices are used in SEM analyses to generate an x by x matrix of estimated population variances and covariances (Σxx, sigma) using the equation presented after this paragraph. The equation is based on a sequence of algebraic proofs using matrix algebra and expectation theory, which are beyond the scope of this book. Users interested in learning how the equation was derived as a central expression in CFA are referred to Bollen (1989) and Long (1983) for more information. ∑ xx = Λ x ΦΛ x ′ + θδ Long (1983) emphasizes the importance of this equation. It indicates how estimated parameters of a confirmatory factor model can be
Structural Equation Modeling Concepts
manipulated into a new implied matrix of variances and covariances that can be compared to the original matrix of observed variances and covariances. It is important to remember that because the symbols are capital Greek letters, each element of the equation represents a matrix (not just one number). In words, the equation reads as follows: (a) the multiplication of the Λ matrix of factor loadings by the Φ matrix of latent variable variances and covariances, and (b) the multiplication of the resulting matrix by the transpose of the Λ matrix, and (c) the addition to each element in the resulting matrix of the corresponding elements in the matrix of estimated error variances and covariances of the observed variables (θδ) generates Σxx, which is a matrix of estimates of population variances and covariances. The new square matrix will have the same number of rows and columns as the original input covariance matrix. The number of rows and columns will equal the number of observed variables in the analysis. The newly estimated matrix has a central role in determining the quality of the hypothesized model, which we will discuss in more detail shortly. Structural Model Matrices. So far, we have discussed the three matrices that are used in the analysis of a confirmatory factor model. Up to five additional matrices are used in the analyses of structural models. First, the factor loadings of the indicators of dependent latent variables are contained in the Λy matrix, which has the same properties as the previously discussed Λx matrix. The variances of the error terms for the indicators of the dependent latent variables are contained in a Θε (theta epsilon) matrix that has the same properties as the Θδ (theta delta) measurement matrix. Note that the error variance of an exogenous variable like Gender in Figure 2.4, which is assumed to be measured without error, would be fixed to 0 and included in the Θδ matrix. The error variance of Par10 in Figure 2.5, would also set to 0 if the endogenous latent Parenting variable in that model was assumed to be measured without error by Par10. If Par10 had a known reliability, its error variance could alternatively be specified in the Θε matrix as 100% minus that reliability value.
43
44
Structural Equation Modeling
A third new matrix encountered in general structural models is the Γ (gamma) matrix. The regression relationships between exogenous ξ and endogenous η variables are contained in the Γ matrix. The matrix has one row for each endogenous variable and one column for each exogenous variable in the model. The Γ matrix for Figure 2.5 would look as follows: ⎡ γ 11 ⎢0 ⎣
0 ⎤ . γ 22 ⎥⎦
The γ11 parameter represents the path from Risk1 (ξ1) to Behavior (η1) that is present in Figure 2.5. The 0 to its right represents the absence of a path from Risk1 to Parenting—i.e., the fixing of the value of that path to 0. The 0 in the second row represents the absence of a hypothesized path from Risk2 to Behavior. The γ22 parameter represents the path from Risk2 to Parenting. The fourth new matrix encountered in general structural models is the B (beta) matrix, which contains the regression paths between pairs of endogenous (i.e., η) variables. This matrix has one row and one column for each endogenous variable in a model. The B matrix for Figure 2.5 would look as follows: ⎡0 β12 ⎤ ⎢0 0 ⎥ . ⎣ ⎦ The diagonal of a B matrix always contains 0s because a variable cannot be regressed on itself (Bollen, 1989). The term above the diagonal in the matrix presented represents the regression path from Parenting (η2) to Behavior (η1) in Figure 2.5. The final new matrix that is used in the estimation of structural models is the Ψ (psi) matrix, which contains the variances and covariances of the structural errors (i.e., ζ’s) in a model. Endogenous latent variables are not represented in the Φ matrix of variances and covariances among ξ’s, and their variances are not estimated. Instead, the variances of their associated error terms are estimated. The values represent the amount of variance in the endogenous variables that is unexplained by predictors in the model, and from these values the percent of variance explained can be calculated. In Figure 2.5 there are two endogenous structural variables (Behavior and Parenting). Each has a ζ term. The Ψ matrix has one row
Structural Equation Modeling Concepts
and one column for each endogenous variable. In most cases, no correlation between ζ terms will be modeled, so off-diagonal elements of the Ψ matrix will be 0. The diagonal of the matrix contains the variances of the error associated with each endogenous variable. For Figure 2.5, this matrix would look as follows: ⎛ Ψ11 ⎜⎝ 0
0 ⎞ . Ψ22 ⎟⎠
Some structural models with latent variables do not posit directional relationships among endogenous latent variables; they may only have directional relationships among exogenous and endogenous variables. In such cases, no B matrix is needed. We saw earlier that one equation, Σxx = ΛxΦΛx´+θδ, relates CFA model estimates to the population covariance matrix of observed variables. For structural models with Λy, Θε, Γ, Β, and Ψ, matrices, the relationship is more complicated. A new matrix based on four matrix blocks created by four equations relates estimates of parameters in the eight SEM matrices to the new implied matrix of variance and covariances. Notation for these equations varies across sources; we use Bollen’s (1989) notations:
(
)
− ⎛∑ Λ (Ι ) ΓΦΓ + Ψ ⎡⎣(Ι ⎜ ⎜ ∑ = Λ ΦΓ ⎡ Ι ( )− ⎤ Λ ⎝
)−
⎤ Λ + Θε ∑ ⎦
Λ (Ι
)−
(ΓΦΛ )⎞⎟
∑ = Λ ΦΛ + Θ δ
⎟ ⎠
.
Note that the lower right equation is the covariance equation used in CFA models. Because CFA, or measurement-only, models have no η, β, γ, λ y or ε values, and therefore no Β, Γ, Λy, Θε, or Ψ matrices, only the third equation is necessary to generate the comparison matrix in CFA models. For the derivation of these equations based on matrix algebra and expectancy theory, see Bollen (1989). Although it is not essential to know how these equations were derived, it is important to understand that the equations permit the all-important linking of estimated parameters to an implied matrix that can be compared to the original covariance matrix of observed variables. When parameter estimates can be used to recreate a covariance matrix of the observed variables in a model, the comparison of the new
45
46
Structural Equation Modeling
matrix with the original matrix, which is central to the SEM analysis framework, is possible. Matrices 3: Analyzed or Input, Implied or Reproduced, and Residual Matrices Unlike other statistical analysis, the input data for SEM is usually a covariance matrix of observed variables, or a correlation matrix of observed variables plus the means and standard deviations of the variables (from which a covariance matrix can be generated). SEM programs will accept raw data, but they only use them to generate the necessary input matrix before an SEM analysis is conducted. The input variance–covariance matrix, or its corresponding correlation matrix plus standard deviation and mean vectors, not only provides the data for SEM analysis, but it also makes possible the key mechanism for testing the quality of a CFA or general structural model. The quality of SEM results is measured in terms of how well the SEM model being tested can reproduce the analyzed matrix. An SEM model, such as the one presented in Figure 2.4, implies a set of relationships among the observed variables contained in the model. Figure 2.4, for example, implies that observed variables x1, x2, and x3 are more highly correlated with each other than with observed variables x4, x5, and x6. The variables x1, x2, and x3 are still expected to have some degree of correlation with x4, x5, and x6 , due to the correlation between Risk1 and Risk2. Figure 2.4, on the other hand, does not imply a correlation between Gender and the observed indicators of Risk1 and Risk2. When two variables have no arrows linking them, the implication is that they are unrelated, uncorrelated, or have a correlation of 0. Note that although Gender, Risk1, and Risk2 all have arrows pointing to Behavior, that fact does not imply correlations between the risk variables and gender. Correlations among structural variables are not implied when pathways are going the “wrong” direction along an arrow. As we described in the overview, each estimation method available in SEM programs uses its own unique formula to obtain estimates of model parameters that minimize the differences between the input matrix and the model implied matrix. The implied matrix, then, is the matrix of covariances or correlations that is as close to the input matrix as possible, given the hypothesized model, the relationships it implies among the original observed variables, and the estimator’s minimization function.
Structural Equation Modeling Concepts
The null hypothesis in SEM is that the population covariance matrix equals the matrix that is implied by the CFA or general structural model. The equation for this null hypothesis in the population is H 0 : ∑ = ∑ (θ ) The equation states simply that the population variance covariance matrix (Σ, sigma) equals the implied matrix (Σ) that is based on estimated parameters (contained in θ). (Note that θ here has a different meaning from the θ used to designate the measurement error matrices.) Technically, this null hypothesis invokes the inference of population values from sample statistics. Because the population matrix is rarely available to researchers, however, the sample covariance matrix (derived from the observed variables in our dataset) is substituted in the equation (Bollen, 1989). Therefore, the equation for the null hypothesis in the sample is H0 : S ∑(q ). which states that the covariance matrix Σ(qˆ ) reproduced based on parameter estimates is not statistically different from the input matrix of observed covariances for the sample (S). As described in Box 2.4, the SEM researcher wants to accept the null hypothesis of no difference.
Box 2-4 The (Backward) SEM Null Hypothesis The null hypothesis in SEM analyses is that the input or analyzed matrix of observed covariances is statistically the same as the implied or reproduced matrix obtained by estimating parameters specified in the researcher’s model. Unlike in an intervention study, for example, where the researcher wants evidence that two group means are not the same, the SEM researcher wants to accept the null hypothesis of no difference:
H0 : S
∑(q)
The difference between two matrices such as S and Σ(qˆ ) can be presented in a third matrix, the residual matrix, in which the elements indicate the differences between corresponding elements in the input and implied matrices.
47
48
Structural Equation Modeling
The residual matrix is the matrix containing the differences between corresponding elements in the analyzed and implied matrices. It is obtained by subtracting each element of the implied matrix from its counterpart in the input matrix. If the elements of a residual matrix are small and statistically indistinguishable from 0, then the analyzed model fits the data well.
TESTING MODELS—A CLOSER LOOK The Key to SEM: The Discrepancy (or Fitting) Function The hypothesis about the relationship between the analyzed and implied matrices is fundamental in SEM. Unlike in most other statistical procedures, the goal in SEM is to accept the null hypothesis. Why? Because, if the null hypothesis is true—the implied matrix is not statistically different from the original observed covariance matrix—then the researcher has evidence that his or her model and the hypotheses upon which it is based are supported by the data, consistent with the data, or not brought into question by the data. Before the input and implied matrices are compared to determine if the null hypothesis can be accepted or must be rejected, the estimator attempts to minimize the difference between the two matrices. An iterative estimation process is used in which parameter estimates are obtained, tested, tweaked, and tested again until no more reduction in the difference between the original and implied matrices can be obtained. The determination that the two matrices are as similar as possible is made through applying a “fitting” or “discrepancy” function that quantifies the difference. The set of parameter estimates that yields the smallest value for this discrepancy function becomes the final solution for the model. When the smallest value is achieved, the estimation process has converged on a solution in which the discrepancy function has been minimized. The minimization value obtained is critical for assessing the hypothesis that the input and implied matrices are statistically equivalent. After the discrepancy function has been minimized, various tests are run to determine just how similar the two matrices are, and whether the differences are statistically significant. One test reported by all SEM
Structural Equation Modeling Concepts
software is the actual statistic obtained with the discrepancy function. Values obtained with the fitting functions are χ2 (chi square) distributed, so they can be evaluated in terms of statistical significance with regard to the number of degrees of freedom (discussed in the following section on identification) of the model. A nonsignificant χ2 value indicates that the null hypotheses can be retained—the researcher’s model is consistent with the data. A statistically significant χ2 value indicates that S and Σ(qˆ ) are statistically different. However, due to limitations of the χ2 statistic, there are now a large number of additional tests of fit that can be used to support claims of good fit, even if the χ2 statistic is statistically significant. Specific fit indices will be examined in Chapter 6.
IDENTIFICATION A final concept that should be introduced here is model identification. SEM models must be identified in order for the matrix manipulations they require to succeed. A statistical model is said to be identified if it is theoretically possible to derive a unique estimate of each parameter (Kline, 2005). Conceptually, model identification refers to having enough observed information to make all the estimates requested in a model. Hence, identification is a data issue concerning the number of known pieces of data and the number of parameters to be estimated in a model. Although software programs generally provide warnings when a model is not identified, it is important for researchers to understand the concept in order to avoid identification problems, to know how to solve such problems if they occur, and to perform their own identification calculations, particularly in the cases of more complicated SEM models. Kline (2005, pp. 106–110) provides a good explanation of the concept of identification that readers can use to supplement our discussion. Structural equation models are identified (generally) when there are more covariances and variances in the input data matrix than there are parameters to be estimated, and when each latent variable has a metric or measurement scale (Kline, 2005). The amount of observed information available for an SEM model is the number of unique elements in the covariance or correlation matrix being analyzed. Model underidentification occurs when the number of parameters to be estimated in a model
49
50
Structural Equation Modeling
exceeds the number of unique pieces of input data, or when there is too little information available for the estimation of any one parameter. In SEM analysis, “just-identified” means that the number of parameters to be estimated in a model is equal to the number of unique pieces of input data. The difference between the number of unique matrix elements and the number of parameters to be estimated is called the degrees of freedom of a model.
Illustration of Identification Count the number of observed variables in Figure 2.2. There are nine variables represented with rectangles in the model, x1 through x9. A covariance matrix of these variables would have 9 by 9 elements. However, the full matrix would have fewer than 81 pieces of unique information because it contains redundant items. For example, the covariance of x1 and x5 would be presented in the column under x5; the covariance of x5 with x1, the same quantity, would be presented in the column under x1. Instead of 81 pieces of information in a 9 by 9 covariance matrix, there are p (p +1)/2, or 9(10)/2 = 45, unique pieces of information, where p is the number of observed variables. Table 2.1 illustrates a covariance matrix of three variables (p = 3). The variances of the three variables are along the diagonal and are nonredundant pieces of information. The three covariances above and below the diagonal are redundant—only one set should be counted. Using the formula p(p + 1)/2, there are 3(3 + 1)/2 = 6 unique pieces of information in the matrix—the number shown in the shading in Table 2.1. In any measurement model, the parameters to be estimated include elements of the Λ, Φ and Θδ matrices (the factor loadings of observed indicators on one or more factors, the variances and covariances of the latent factors, and the error variances of the observed indicators, respectively). Count the number of parameters to be estimated in the Table 2.1 Illustration of Unique Elements in a Covariance Matrix x1 x2 x3
x1 Var. of x1 Cov. of x2 & x1 Cov. of x3 & x2
x2 Cov. of x1 & x2 Var. of x2 Cov. of x3 & x2
x3 Cov. of x1 & x3 Cov. of x2 & x3 Var. of x3
Structural Equation Modeling Concepts
measurement model presented in Figure 2.2. There are nine observed variables and nine factor loadings. One loading on each factor is typically fixed at 1 for scaling the latent variable and identifying the latent variable. (More will be said about this later.) Fixing one loading per factor reduces the number of parameters to be estimated. With three factor loadings (one for each factor) fixed at 1, only six factor loadings need to be estimated. There are three latent variables, so three variances will be estimated. There are three interfactor covariances. There are nine error variances, one for each observed variable. Therefore, 21 parameters need to be estimated. The covariance matrix contains 45 unique elements. 45 – 21 equals a positive number, so the model is identified. It has 45 – 21 degrees of freedom, or 24 df. If the model had 0 degrees of freedom, that is, it is just-identified, only one solution is possible and it will have perfect fit (the input and implied matrices will be equal). Models with 0 degrees of freedom cannot be tested for their quality in comparison to other models. Models with negative degrees of freedom (i.e., underidentified models) will not run in most SEM programs—there are too few rules guiding the analysis, and an infinite number of solutions may be possible.
51
3
Preparing for an SEM Analysis
Now that the basic concepts of SEM have been presented, we turn to preparatory steps for conducting an SEM analysis, as listed in Box 3.1. SEM programs, like all data analysis software, make assumptions about the characteristics of data. They also require that data files be in a specific format. Like other programs, SEM software will often run when assumptions and recommendations are violated, but results of such analyses are subject to bias (inaccuracies and/or inefficiencies). Studies based on questionable methods are also vulnerable to criticism in the review process. Although the lines are blurry about many cutoffs and standards in SEM research, there are also many choices that are generally recommended. Rather than overwhelm the reader with the range of options and opinions in the literature, we summarize the major issues and then present a short list of recommendations related to each issue. References for other sources with more extensive discussion of these and other practices are provided. Instructions for Amos and Mplus, syntax, and data examples are all available online on the companion website.
ASSESS SAMPLE SIZE ADEQUACY CFA requires data on multiple indicators (usually questionnaire items) from a large number of cases. Sample size requirements vary widely 52
Preparing for an SEM Analysis Box 3-1 Preparation Steps and Analysis Decisions 1. Assess sample size adequacy. 2. Decide how to handle missing data. 3. Choose the proper estimation method for the measurement level and distributional characteristics of variables. 4. Consider options for analyzing clustered data. 5. Finalize variables and data files for analysis.
depending on characteristics of the model tested, such as model complexity and magnitude of factor loadings. Kline (2005) gives absolute guidelines and guidelines based on the ratio of cases to estimated parameters. In absolute terms he suggests that fewer than 100 cases is a “small” sample, 100 to 200 is “medium,” and over 200 is “large.” In relative terms, Kline suggests that a 20:1 case-to-parameter ratio is desirable, 10:1 “more realistic,” and 5:1 “doubtful.” Users with small samples (e.g., fewer than 100 cases, or only 5 cases per parameter to be estimated) may be able to proceed with an SEM analysis if factor loadings are high. In an example presented in Chapter 4, researchers had a sample of 103 cases, and they tested models with 18 to 20 free parameters (Kelly & Donovan, 2001). Not all of their factor loadings were high (e.g., some were under 0.50), but there was general consistency in their results across models, so it appears the sample size was adequate. A more rigorous calculation of needed sample size should be determined by statistical power analysis, which is described in Chapter 7. However, Kline’s guidelines are helpful in many applications. In practice, even 200 cases can be inadequate for complex models or data requiring special estimators. Analyses using methods appropriate for ordinal and nonnormal data require larger sample sizes in some programs. Even if models run with a small sample, the results may be unstable (e.g., parameter estimates in one part of the model might change substantially when minor changes are made elsewhere, or when the model is run with a different sample). Studies have shown that nonconvergence of analyses (i.e., the estimation procedure is unable to converge on a minimum fitting value) is a problem when sample sizes are under 100 (Enders & Bandalos, 2001). Small samples also preclude the use of an important best practice in SEM—the development and validation
53
54
Structural Equation Modeling Box 3-2 Sample Size Recommendations We recommend that researchers with data from 100 or fewer subjects use an analysis method other than SEM. For multiple-group analyses, also aim for 100 or more cases per group. For datasets with 200 or fewer cases, keep in mind Kline’s recommendation of 10 or more cases per parameter to be estimated. In analyses with sample sizes near or below these cutoffs, problems with convergence may signal inadequate sample size. Ideally, researchers will conduct an SEM power analysis before choosing to analyze their data with SEM. Instructions for determining the exact sample size needed for an SEM with given effect size, statistical significance, and power are provided in Chapter 7. When data from one unit of analysis—for example, individuals—are nested or clustered within another unit of analysis—for example, families, classrooms, or communities—sample size issues must also be considered at the higher unit of analysis.
of models on separate random subsets of cases. We provide recommendations for sample sizes in Box 3.2.
DECIDE HOW TO HANDLE MISSING DATA Before deciding how to handle missing values, researchers should be aware of the extent to which their datasets have missing values and understand the mechanisms of missingness. The extent to which data are missing from variables to be analyzed can be examined by obtaining univariate descriptive data in any general statistical analysis program. Output includes the number and percentage of missing values. Users can obtain information on the extent to which individual cases are missing values on variables to be used by conducting a count of the missing value code across analysis variables, and then obtaining descriptives on the newly created count variable. Three types of mechanisms (or causes) of missingness are frequently discussed in the literature: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). “If the cases for which the data are missing can be thought of as a random sample of all the cases, then the missingness is MCAR. This means that everything one might want to know about the data set as a whole can be estimated
Preparing for an SEM Analysis
from any of the missing data patterns, including the pattern in which data exist for all variables, that is, for complete cases” (Graham, 2009, p. 552). MCAR is a restrictive form of missingness, and typically, it is difficult for researchers to discern its presence in empirical settings. MAR refers to a pattern of missingness that may depend on observed data (e.g., other variables in the study), but not on values of the outcome of interest (Saunders et al., 2006). “The missing data for a variable are MAR if the likelihood of missing data on the variable is not related to the participant’s score on the variable, after controlling for other variables in the study.” (Acock, 2005, p. 1012). MNAR is a pattern of missingness that is systematic or that is based on values of the outcome variable of interest (Saunders et al., 2006): for example, if the level of a respondent’s income is related to the likelihood that he or she will provide income information in a study of predictors of income. Acock mentions another type of missing value that should also be noted. Data are “missing by definition of the subpopulation” (p. 1013) if the researcher has decided to study a subset of the population of cases on which data are available. If, for example, a researcher is interested in studying school success among girls using data from a secondary data source containing information on boys and girls, he or she may remove the male cases from the dataset. In this situation, missingness does not refer to the data on boys. The examination of missing data patterns and decisions about how to handle them pertains only to the data on female cases. Prior to the development of new methods for missing data imputation, there were several common methods for dealing with missing data, including: (a) listwise deletion or analysis based on complete cases, (b) pairwise deletion, that is, calculating a correlation or covariance matrix such as that employed by SEM by using complete cases for each pair of variables, while ignoring missing values irrelevant to the pair of variables for which a correlation or covariance is constructed, (c) mean substitution by replacing missing values of a variable by its sample mean of the complete cases, (d) incorporation of a missingness dummy variable in the analysis in addition to the specially coded missing value, (e) regression-based single imputation, and (f) imputation of categorical values based on data from cases with similar response patterns on other variables. Of these older methods, only the first (i.e., listwise deletion) is still thought to be valid for statistical analysis and only then under certain conditions, for example when the sample is size is large and the amount of missing data is small
55
56
Structural Equation Modeling
(e.g., less than 5%, Graham, 2009; Saunders et al., 2006) and if data are missing completely at random (Acock, 2005; Enders & Bandalos, 2001). Researchers generally agree that all other methods yield biased parameter estimates and should not be used (Allison, 2002; Graham, 2009). Since 1987, when Little and Rubin published their seminal work on missing data, statisticians have developed three “modern” missing data procedures: (a) the expectation maximization algorithm (EM or EMA), (b) multiple imputation (MI) under the normal model, and (c) fullinformation maximum likelihood (FIML) methods. A study that looked at both the traditional and newer methods of handling missing values found that FIML and MI were both superior to other approaches (Acock, 2005). Results from a simulation study comparing FIML with pairwise deletion, listwise deletion, and the “similar response pattern imputation” method indicated that FIML outperformed all of the comparison methods and (a) performs well under the MAR condition, as well as the MCAR, (b) worked well regardless of the amount of missing data (the authors tested samples with between 2% and 25% missing data), and (c) was the least likely of the methods to have convergence failures (Enders & Bandalos, 2001). One of the attractive features of FIML is that the method deals with the missing data, conducts parameter estimation, and estimates standard errors all in a single step. Unlike the EM method, FIML offers good estimates of standard errors and permits researchers to perform hypothesis testing without serious bias (Graham, 2009). Unlike the MI method, FIML does not require multiple imputed files and postimputation aggregation and, therefore, provides final results in one step. In addition, FIML can be used with estimators other than maximum likelihood in some programs, such as Mplus. Based on the missing data literature, we recommend using FIML for analyses with missing values in Amos and Mplus. Using FIML requires using raw data files, so it will not be an option for users whose data are only in correlation or covariance matrix format. In addition, for Amos users, information about how to improve models is not available when FIML is used. Multiple imputation is a good alternative to FIML when users are analyzing raw data or generating an input matrix from raw data for analysis in Amos and want model improvement suggestions (Acock, 2005). A summary of best practices for handling missing data is provided in Box 3.3. To determine the impact of missing values on analysis results (and help decide between the pros and cons of alternative strategies), researchers
Preparing for an SEM Analysis Box 3-3 Best Practices for Handling Missing Data Report the extent to which data are missing from the cases and variables to be analyzed. Determine if values are missing at random, completely at random, or not at random. When data are not MNAR and raw data are available for analysis, use the FIML procedure offered by many SEM computing packages. If using a correlation or covariance matrix as the input data matrix, use multiple imputation with the original raw data file before generating the matrix. When missingness is MNAR, researchers should consider using more sophisticated procedures, or collecting new data. Propensity score matching could be used to balance the sample on observed variables between those that have missing values and those that do not (see Guo & Fraser, 2010). At a minimum, when analyses are conducted with data that are MNAR, the limits to which findings are generalizable to the population must be acknowledged.
may consider running sensitivity analyses (Saltelli et al., 2008) with and without imputed values, FIML, and so on. For more in-depth discussions of the nature of missing data and methods of handling them, readers are referred to Allison (2002) and Graham (2009); for issues pertaining to social work applications of missing data imputation, readers are referred to Rose and Fraser (2008) and Saunders et al. (2006); and for a more technical treatment of missing data imputation, readers are referred to Little & Rubin (2002) and Schafer (1997).
UNDERSTAND MEASUREMENT LEVEL AND DISTRIBUTIONAL CHARACTERISTICS OF VARIABLES Social work researchers should be aware of the measurement level and distributional characteristics of their data before conducting SEM analyses. SEM programs can accommodate all measurement levels and distributions; however, special analysis properties must be selected for variables not meeting default assumptions. Ignoring these issues can lead to biased results and inaccurate conclusions, reduces the credibility of results, and increases the chances of rejection of a manuscript by reviewers
57
58
Structural Equation Modeling Box 3-4 Best Practices for Understanding the Measurement Level and Distributional Qualities of Data Evaluate the measurement level of your variables. Examine the univariate and multivariate distributional properties of analysis variables. Transform variables, if possible, to obtain better univariate distributions. Identify influential outliers and recode or delete as appropriate. Confirm that the analysis variables do not have widely discrepant variances. More recommendations are provided in Chapter 4, specifically Box 4.5.
of better journals. Applying the default maximum likelihood estimator in most SEM programs, for example, when data are nonnormal and/or ordinal, can lead to biased estimates, misleading significance testing, and erroneous conclusions about model fit. Therefore, social work researchers must examine the properties of their data and determine the most appropriate analysis strategies before undertaking SEM analyses. Box 3.4 offers best practices for understanding the measurement level and distributional qualities of data. Measurement level and distributional characteristics are often, but not always, related. In general, continuous, interval-level or ratio-level variables are more likely to have the normal or near-normal distributions that are desirable for statistical analyses. At the other extreme, a dichotomous nominal variable cannot have a normal distribution regardless of what proportion of respondents choose each response option. Although they are often related, measurement level and distributional characteristics are separate issues. The primary problem with ordinal level variables in SEM analyses is that the response option values in the dataset do not have true quantitative meaning. The assignment of values to Likert scale responses is arbitrary, and the distance of response option “1” from 0 or of response option “1” to option “2” is not measurable. The central problem with nonnormally distributed variables in SEM analyses is that most estimators make assumptions about distributions of variables in the generation of parameter estimates and other model statistics. When the assumptions are not met, matrix computations may fail, or results may be inaccurate.
Preparing for an SEM Analysis
Measurement Level Variables in social-behavioral and health sciences may be classified into four levels of measurement: nominal, ordinal, interval, and ratio. Identifying the measurement level of one’s variables requires no special test. Variables at each of the four levels can be distinguished simply by whether they possess three quantitative properties: (a) the zero property (values the variable can take on include a meaningful quantitative “0” value), (b) the distance property (the distance between two numerical levels has a meaningful quantitative value and can be measured), and (c) the ranking property (values the variable can take can be logically ranked or ordered). Table 3.1 illustrates the properties possessed by variables with each of four levels measurement. The majority of social work variables are nominal (categorical) or ordinal. Although their values can be ordered logically, they have no true “0,” the distance between their response values are neither equally spaced nor truly quantitative. “Strongly Disagree” on 5-point Likert scale, for example, could arbitrarily be assigned a value of “5” or a “1” by a researcher. And, whether it is assigned a 1 or a 5, the number is meaningful only in relation to other values on the scale, not in any true quantitative sense. Like most statistical procedures, SEM assumes that variables have interval- or ratio-level properties. What this means is that the variables most commonly used in social work research are not appropriately analyzed with the default SEM methods. The researcher must employ special analysis procedures for variables not meeting the default assumptions regarding measurement levels.
Table 3.1 Measurement Levels Classified by Three Quantitative Properties Measurement level
Ranking property
Distance property
Zero property
Nominal Ordinal Interval Ratio
X X X
X X
X
59
60
Structural Equation Modeling
Distributional Properties Default SEM procedures assume that observed variables have normal distributions. Determining the distributional qualities of one’s variables is more complicated than determining measurement level, but all general statistics programs provide the information necessary. Univariate normality can be assessed by examining the skewness and kurtosis values of individual variables, which can usually be obtained as part of the frequency or descriptive procedure in statistical packages. The skew index measures the degree and direction of asymmetry of a distribution. A symmetric distribution, such as a normal distribution, has a skewness of 0, where the mean of the distribution is equal to the median. A distribution with negative skewness is skewed to the left, and the mean of the distribution is less than its median. Similarly, a positive skewness has a mean that is greater than the median and is skewed to the right. There is no definite cutoff to indicate an unacceptable level of skewness. With a conservative approach, one might conclude that a skew index greater than 1 or less than –1 is problematic. Kline (2005, p. 50) indicates that some researchers consider 3 or –3 and greater “extreme” skewness. Kurtosis is a measure of whether the distribution of the data is peaked or flat in comparison with a normal distribution. Datasets with high kurtosis (called leptokurtic distributions) tend to have a distinct peak near the mean then decline rather rapidly. Datasets with low kurtosis (called platykurtic distributions) tend to have a flatter top near the mean rather than a sharp peak. The kurtosis value of a normal curve is 3; however, some programs, such as SPSS and SAS, transform kurtosis values so that 0 is normal for easier interpretation. Leptokurtic distributions have kurtosis values greater than 3 (when 3 represents normal kurtosis) and platykurtic distributions have kurtosis less than 3. There is no definitive cutoff value for unacceptable levels of univariate kurtosis. Conservatively, if the kurtosis is greater than 4 or less than 2 (in programs using 3 for a normal distribution), or greater than 1 or less than –1 (in programs using 0 to represent normal), one might conclude that the distribution is problematic. According to more liberal recommendations from Kline (2005, p. 50), however, one could also adopt a cutoff of +/– 10 as indicative of “problematic” kurtosis, and +/– 20 as indicative of “more serious” kurtosis. If standardized skewness and kurtosis values for most of the analysis variables exceed cutoff values
Preparing for an SEM Analysis
chosen (and justified by a reference to the literature) by the researcher, then one of the SEM analysis options for nonnormal data should be chosen. Bollen (1989) and Kline (2005) also suggest looking for influential outliers before conducting SEM analyses. Outliers, or influential cases, can lead to inadmissible solutions (Chen, Bollen, Paxton, Curran, & Kirby, 2001), among other undesirable consequences. Outliers are unusual univariate, bivariate, and multivariate values for individual cases in a sample. Influential outliers are outliers that substantially affect analysis results. Bollen (1989, pp. 24–31) describes how to identify outliers and how to determine if they are influential. He recommends examining stem and leaf plots of univariate distributions and scatterplots of bivariate distributions. These steps are conducted in a general statistical program. Simply reviewing frequencies for lone values that are distant from the majority of values can also be helpful at the univariate level. To determine if outliers are influential, analyses must be run with and without the cases. Defining which cases have values that are “distant” from other values, and which results are “substantially” different from each other, is a subjective process. Identifying multivariate outliers is a more complicated process reviewed by Bollen. Kline (2005, pp. 51–52) suggests examining the Mahalanobis distance statistics for variables using a p-value of 0.001 to identify cases that are outliers. The options for handling outliers include deleting problematic cases or recoding variables so that the outlying values are collapsed into a category that includes the next nearest (nonoutlying value). Diagnostics to assess for potential multicollinearity and multivariate nonnormality can be obtained by running a regression using the analysis variables. First, run a regression with any one of the analysis variables designated as the dependent variable and the rest of the analysis variables designated as independent variables. Ask for the variance inflation factor (VIF) to assess for multicollinearity, and the Mahalanobis’ distance and Cook’s distance (Cook’s D) to detect multivariate outliers (i.e., cases with extreme scores on multiple variables or an atypical pattern of scores) and influential cases (i.e., cases whose exclusion causes substantial changes in the regression estimates). Note that in SPSS, these diagnostics are options that can be requested at the same time the regression
61
62
Structural Equation Modeling
is specified; in STATA, the diagnostics are requested after the regression has been run. VIF values greater than 10 indicate a potentially harmful multicollinearity problem (Kline, 2005; Kutner, Nachtsheim, & Neter, 2004). An individual case with a statistically significant (at the p < 0.001 level) Mahalanobis distance is likely to be an outlier (Kline, 2005; Tabachnick & Fidell, 2007). Cook’s D values greater than 1 indicate influential cases (Cook & Weisberg, 1982). For a detailed discussion of distributional assumptions, violations, detection, consequences, and remedies, see Bollen (1989, pp. 415–446). An alternative and rigorous check of the assumption of multivariate normality is also available to researchers familiar with SAS. Users may use SAS PROC CALIS to compute a set of multivariate kurtosis values and the relative multivariate kurtosis index (SAS, 1999). If a study’s manifest variables are multivariate normal, then they have a zero relative multivariate kurtosis, and all marginal distributions have zero kurtosis (Browne, 1982). If the relative multivariate kurtosis is not equal to zero, then the assumption of multivariate normal distribution may be violated. Strategies for Distributional and Measurement Level Problems In summary, many if not most, social work variables have neither the measurement level nor the distributional qualities that are appropriate for the default maximum likelihood estimator of SEM programs. Numerous approaches have been developed to address the majority of cases when these conditions do not exist. Kline (2005, pp. 194–198) summarizes four strategies to address nonnormal distributions: (1) normalize the nonnormally distributed variables with data transformations, then analyze the transformed data with standard maximum likelihood estimation; (2) use a corrected normal theory method such as the Satorra-Bentler (Satorra & Bentler, 1994) approach to adjust estimated standard errors and perform a revised version of model chi-square test; (3) use an estimator that does not assume multivariate normal distribution, and (4) create a special correlation matrix that takes into account the measurement level of variables, and analyze it and an appropriate weight matrix with a weighted least squares estimation procedure. A fifth approach, described by Bollen (1989), is to use bootstrapping to obtain more accurate standard error estimates based on multiple samples of available data.
Preparing for an SEM Analysis
The first option requires using data transformations to create new, normalized variables in the dataset before conducting SEM analyses. Transformations are conducted in a general statistics program before submitting data to SEM analysis. The transformation solution applies to the problem of univariate nonnormality. Tabachnik and Fidell (2007) provide a useful summary of transformations that can improve the distributional qualities of variables, including taking the square root (x1/2), the log (e.g., log10x), or inverse of values (1/x), or reflecting the original distribution and then transforming it. Tabachnik and Fidell recommend trying different transformations to see which improves normality; starting, respectively, with a square root transformation, a log transformation, and taking the inverse for increasingly nonnormal distributions. If a distribution cannot be improved through transformation, they recommend dichotomizing the variable. Drawbacks of using transformations to improve normality include the difficulty of finding transformations that improve the distribution of all problematic variables, the subjective nature of decisions about what constitutes an adequately improved distribution, the fact that transformations do not address the ordinal status of variables, and the fact that data transformations make interpretation more complex, especially when different transformations are required for different variables. The second and third options for managing nonnormal distributions and measurement-level problems also have drawbacks. Using a corrected normal theory method as proposed by Option 2 does not address the problem of biased parameter estimates. In addition, the option is not available in Amos. The option is available in Mplus, but Mplus has estimation capabilities that address biased parameter estimates as well, so the option should be used in combination with an appropriate estimator. Option 3, using an estimator that does not assume multivariate normal distributions, has limited applicability in some programs because of the lack of satisfactory alternative estimators. The asymptotically distribution-free (ADF) estimation method available in Amos, for example, requires very large samples, especially in models involving many observed variables (Flora & Curran, 2004). In addition, ADF estimation has not always fared well in simulated comparisons of estimation approaches (West, Finch, & Curran, 1995). The fifth approach, bootstrapping, corrects for biased significance tests and biases in parameter estimates by using repeated samples with replacement, but the χ2 statistic may remain biased (West, Finch, & Curran).
63
64
Structural Equation Modeling
Of the four options presented by Kline (2005), the use of a special correlation matrix and associated weight matrix is especially recommended by many SEM experts (Bollen, 1989; Jöreskog, 2005; Muthén & Muthén, 1998–2007) and is increasingly being used. Mplus provides several estimators to handle dichotomous, ordered categorical (ordinal), unordered categorical (nominal), and count variables, making it an appropriate choice for many social work researchers. Because of the availability of alternative methods for managing undesirable measurement and distributional properties of their data, researchers may want to consider running analyses with more than one alternative approach to determine the effects of disregarding analysis assumptions on results. Findings from such tests or sensitivity analyses (Saltelli et al., 2008) can help researchers choose among various strategies, each of which has its own positive and negative characteristics. Ill-Scaling of Variances in the Analysis Matrix One additional potential data problem should be mentioned here: the presence of greatly different variances across variables to be used in an SEM analysis. The problem is known as “ill-scaling.” Specifically, if the ratio of the greatest to the smallest variance of observed variables in a dataset is greater than 10, then the covariance matrix is said to be ill scaled. Running SEM with an ill-scaled covariance matrix often causes nonconvergence in the statistical analysis. A widely practiced strategy to remedy ill-scaled covariance matrix is to multiply one of the variables by a constant that will make its variance more similar to other variables (Kline, 2005, p. 57). First, the user obtains the variances for all variables to be used in the analysis (through the univariate descriptive options in a general statistics program). If the variance ratio between one or more pairs of variables exceeds 10, the variable or variables that are problematic are multiplied by a constant to create new variables with variances more in line with others in the dataset. In some cases the choice of constant is obvious. For example, if an income variable is measured in dollars and has a much greater variance than the 5-point Likert scales used for most other variables in a dataset, changing the income metric from dollars to 1,000 dollars might solve the problem. Changing the scale of a variable will change its mean and variance, and its covariance with other variables. Its correlations with other variables, however, will remain the same.
Preparing for an SEM Analysis
Therefore, the transformation causes no problem for SEM analysis. The interpretation of estimates involving the affected variable will simply need to be adjusted to reflect the new metric.
Amos and Mplus Capabilities for Handling Violations of Measurement Level and Distributional Assumptions The best practices described in Box 3.4 apply to preparations for the analysis of data with any SEM program. After examining their data in a general statistical program and taking any appropriate steps for improving the distributions and scaling of variables, researchers must decide which SEM programs and procedures are acceptable given the nature of their data. The default maximum likelihood (ML) estimation procedures in Amos and Mplus assume interval or higher-level measurement level of variables and normal distributions. Although ML has been shown to be robust to some degree of nonnormality, the ordinal and categorical data common in social work research often require special analysis procedures. Therefore, if users have interval or higher-level variables and distributions that fall within acceptable degrees of nonnormality (liberal and conservative ranges were discussed earlier), default procedures in either Amos or Mplus may be used. In many cases, however, social work researchers will need to address the categorical and/or ordinal nature of their variables and its associated nonnormality with special estimation options that are presented in Chapter 4.
UNDERSTAND ISSUES OF CLUSTERED DATA Social work data often pose another analysis complication: data are nested, clustered, or hierarchical (three names for the same thing). Clustering of data occurs when subsets of the subjects are grouped within higher level units. Examples include: students in classrooms, families in the same community, workers with the same supervisor, and agencies in states with statewide child welfare policies. Longitudinal data collected from the same individuals over time are also clustered (time points within individuals). In intervention research, assignment to condition may occur at the cluster level rather than the individual level, making it imperative to
65
66
Structural Equation Modeling
adjust for clustering when outcome analyses take place at the individual level (What Works Clearinghouse, 2008). In survey research, researchers often conduct sampling in stages to first obtain units higher than the individual unit of analysis (e.g., states sampled before agencies within those states). Researchers are usually aware that data obtained through complex sampling strategies require analysis procedures that take into account their clustered nature. However, even data obtained from subjects recruited through sampling at the level of the unit of analysis (e.g., students) are often clustered (e.g., in classrooms or schools) and need special treatment. If higher-level units or groupings are predictive of the scores of subjects at the level of analysis, then the clustered nature of data should be accounted for in analysis procedures. Although it is possible to have datasets with more than two levels of clustering, we restrict our attention to two-level models. The problem presented by clustered data is that they violate an important analysis assumption. Most statistical procedures, such as regression, assume that observations (i.e., values for different cases on a variable) are independent, or unrelated to each other. Statistically, nonindependence is conceptualized as auto-correlated residuals (residuals are the differences between individuals’ scores and the mean, or predicted, score for the sample). The residuals of study subjects who are clustered together tend to be correlated. For example, two workers in the same child welfare agency are likely to have scores that are more similar on a measure of supervisor support than the scores of two workers in a different organization. Some of this similarity is expected to be error due to causes affecting both workers because they are the same agency; perhaps workers in the same agency have difficulty answering a series of support questions because they refer to a type of support that is not provided by supervisors in their agency. Therefore, some information in the clustered dataset is redundant. An SEM that fails to adjust for correlated errors tends to produce standard errors for coefficients that are smaller than they should be, which increases the chances of a Type I error (or false positive finding). Two general options exist for hierarchical modeling in SEM, one in which Level 1 error terms are adjusted for intercorrelations due to the grouping factor, and another in which these corrections are made and the effects of higher units are explicitly modeled and estimated. The choice may depend on the researcher’s theoretical interest in higher-level effects
Preparing for an SEM Analysis Box 3-5 Calculating the Intraclass Correlation (ICC) The ICC, or effects of a grouping entity on the scores of individuals in that entity, can be assessed by performing an unconditional ANOVA with random effects (Raudenbush & Bryk, 2002, p. 36; Snijders & Bosker, 1999, pp. 16–35) in a general statistics program. The dataset must contain a variable that indicates which subjects belong to which group (e.g., a state variable, or a teacher variable, etc.). The unconditional ANOVA with random effects is run with the outcome variable of interest as the dependent variable and the cluster variable as the categorical grouping variable. The output includes between-group and within-group variances. The ICC coefficient can be calculated by dividing the between-group variance by the sum of the between-group variance and the within-group variance (i.e., total variance of the dependent variable). ICC =
Between Group Variance n Group Variance Between Group Variance + Within
on lower-level outcomes, and whether or not second-level substantive variables exist in the dataset. Before choosing between these options, however, social work researchers can determine through a simple analysis (see Box 3.5) the degree of clustering in their data, assuming a variable representing the higher-level unit is available. The intraclass correlation (ICC) coefficient measures the proportion of total variance in the outcome variable that is explained by differences between groups. If all observations are independent, the ICC will equal zero; conversely, if all observations in all clusters are the same, the ICC will equal one. An ICC value other than zero implies that the observations are not fully independent and multilevel modeling may be necessary. Deciding what the ICC implies for one’s analysis is not as straightforward as we would like. As with many of the decision points in SEM analysis, experts disagree on how much explained variance must be attributable to the higher unit before a clustered or hierarchical modeling procedure should be used. Some researchers suggest that multilevel modeling must be considered if the ICC is 0.25 or higher (Heinrich & Lynn, 2001; Kreft, 1996). Others suggest that multilevel methods must be considered when the ICC is “more than trivial”—that is, anything greater than 0.10 (Lee, 2000, p. 128). However, Kreft & de Leeuw (1998) illustrate that even smaller ICCs (e.g., 0.05) can substantially inflate the possibility of making
67
68
Structural Equation Modeling Box 3-6 Best Practices for Understanding the Implications of Clustered Data Consider whether observations in your dataset are clustered or nested, even if the participants were not intentionally sampled by cluster (e.g., family, school, organization, geographic location). Evaluate the intraclass correlation for units in which observations are clustered to determine if hierarchical approach is recommended. ICCs should be obtained for all dependent variables of interest. If necessary, employ special analysis options available in software programs to take into account the clustered nature of the data. Specific instructions for conducting analyses with clustered data in Amos and Mplus will be provided in the online materials.
a Type I error, depending on the study’s sample size. In addition, the judgment of what is considered a high ICC can vary depending on the area of research. For example, descriptive studies of many commonly used outcomes in neighborhood and school research show few ICC values greater than 0.25 (Cook, 2005), although the need for multilevel modeling is widely accepted in these research areas. The What Works Clearinghouse (2008) assumes correcting for clustering in educational research is necessary with ICC values at least as low as .1. In general, therefore, we recommend using a clustered model technique if complex sampling was used or if subjects are grouped in clusters that research in your area has shown to influence the outcomes you study. As with missing values and variable properties, sensitivity analyses (Saltelli et al., 2008) exploring the effects of ignoring or taking into account clustering in one’s data may help researchers understand the implications of analysis choices and justify those choices. Box 3.6 offers best practices for understanding the implications of clustered data.
FINALIZE VARIABLES NEEDED FOR ANALYSIS AND SAVE THE DATA FILE After addressing each of the considerations introduced up to this point, the researcher must finalize and save the data file in a format that can be read by the SEM program to be used. SEM programs can also analyze covariance or correlation matrices, which may be generated and saved from a statistical program, or entered manually into a text or Excel file. As discussed earlier, an advantage to using a raw data file is the ability to
Preparing for an SEM Analysis Box 3-7 General Steps for Finalizing Variables and Data Files for Analysis Evaluate sample size in relation to recommendations for SEM analyses. Assign values to missing values if required by your SEM program. Conduct all recodes and transformations in source program. Evaluate the measurement level, distributional characteristics, and clustered nature of your data. Divide the dataset into at least two random subsamples that can be used for calibration and validation of your SEM models. Generate frequency and bivariate correlation statistics on raw data files, if they are to be used in the SEM analysis. Use the output later to confirm that the SEM program is reading the data files correctly.
use FIML for missing values in Amos and Mplus. Before saving the input data file, all data cleaning should be completed, any necessary data transformations and recodes should be completed, and missing values should be handled. Files may contain variables that will not be included in SEM analyses, but they will all have to be named and listed in order if a text file without variable names is used for the analysis (as is required in Mplus). Amos can read text files with .txt and .csv extensions, SPSS and Excel files, and files in several other formats. If your data are in one of these formats, there is no need to convert them. We should note that the replication of SEM studies is made possible by the fact that SEM programs analyze covariance or correlation matrices. It is recommended that researchers present their analysis matrices in manuscripts reporting on SEM studies. Other researchers can then replicate analyses and/or conduct studies of alternative models, simply by entering the published matrix into Excel or another application. The general steps for finalizing variables and data files for analysis are summarized in Box 3.7 and are described in the following sections.
Name Observed Variables Users of SEM software will conduct analyses either with data files containing names for observed variables or text files without variable names. If an SEM program is reading a dataset with variable names, the names of observed variables in the SEM model must match those in the dataset. For example, if Amos is reading an SPSS file with variable
69
70
Structural Equation Modeling
names Supcares, Suphelps, the user must use those variable names when specifying that the two variables are expected to load on a latent construct (e.g., supervisor support). If the SEM program is reading a text file in which variables appear in order but have no variable names, the user will provide variable names in the SEM program. In this situation, it is not necessary to use the same names the observed variables had in the source data file. The only constraint on this process is the desirability of being able to quickly and unequivocally determine which variable is which in analysis syntax and program output. In most cases, therefore, the names for observed variables provided to the program should be similar to, if not the same, as in the original data file (unless they were always only in text format). Variable names must follow the naming protocol of the software to be used. Program-specific naming conventions are presented on the companion website.
Complete All Recodes and Transformations It is advisable to create any transformed or recoded variables needed for an SEM analysis in the more familiar environment of your usual statistical program. If demographic variables need to be recoded into dummy variables, for example, create the new variables before saving files for analysis with your SEM program. Any variables that include “not applicable,” “don’t know,” or other response options that are not consistent with the numbering scheme of response options should be recoded. Often such options need to be recoded as missing values; other times it may be appropriate to combine them with another existing option. New variables created through transformations and recoding should be named in such a way that it is clear which original variable they are associated with and how they have been modified. For example, a new “free reduced price lunch” variable containing two categories (yes, lunch program participation and no program participation) collapsed from a three-category variable (e.g., free lunch, reduced price lunch, and no participation) might be named FRlunch2 to distinguish it from the original variable. Another strategy to avoid confusion about related variables is to save a copy of the data file and then delete all variables that have been replaced with recodes for the SEM analysis. Users are likely to develop their own preferred conventions for handling these issues.
Preparing for an SEM Analysis
Create Random Subsamples for Calibration and Validation of Models As with any inferential statistics, the generalizability of SEM results depends on selection bias and the degree to which a sample represents or fails to represent its purported population. Social work researchers publishing the results of SEM analyses must adequately report on sampling procedures and the relationship of the sample to a study’s sampling frame and population. Study results are still vulnerable to unknown sampling fluctuations; researchers can only report thoroughly their procedures so that others can evaluate generalizability. Ideally, researchers can validate their findings on data from new samples. Such validation lends credibility to findings. In the absence of data from two totally separate samples, many researchers using SEM believe it valuable to replicate findings on random samples of the currently available sample. This step is especially important, in the absence of a new sample, when modifications to improve fit are made based on SEM output. Even across random subsamples, modifications may not replicate, reducing confidence in findings. Bootstrapping of estimates, in which multiple random subsamples are drawn from the current sample (which serves as the “population”) can also serve the purpose of ensuring that the results from any one sample are not statistical flukes. We recommend the use of a “test,” “calibration,” or “development” sample for the development of an adequate model, and then the use of a “validation” or “confirmation” sample to validate the findings of the first sample. Data files containing calibration and validation samples should be generated with the random sampling procedure in a general statistical program. Creating two test files of adequate sample size requires a large original sample (see the section in this chapter on sample size). When the original sample is not large enough to create calibration and validation files, bootstrapping estimates and the testing of alternative models may become even more important for establishing credibility of results. Interpreting and comparing results obtained from calibration and validation samples will be discussed in Chapter 6.
Double-Check that Data Files Are Read Correctly by the SEM Program All SEM programs require information about the input data—at a minimum where the file is to be found, and in some cases, the variable
71
72
Structural Equation Modeling
order, names, and format. Users should always confirm that the data are being read correctly by the SEM program. Sample size, number of clusters (if applicable), frequencies, and correlation output from the SEM program, for example, should all be compared with statistics generated with the data file’s source program.
4
Measurement Models
This chapter describes when and how to conduct a confirmatory factor analysis (CFA). CFA is a step in the scale development process, and it is also the first step in testing structural models. Therefore, all researchers using a latent variable analysis approach must have an understanding of CFA, whether or not they are developing and testing a new scale. Researchers primarily interested in testing structural models with latent variables should read this chapter before Chapter 5, which focuses on structural tests that are conducted after a measurement model is established. Before going into depth about CFA, it may be useful to contrast it to exploratory factor analysis (EFA).
EXPLORATORY VERSUS CONFIRMATORY FACTOR ANALYSIS In general, factor analysis methods are used to analyze the relationships among measured variables to determine whether the observed variables can be grouped into a smaller set of underlying factors or theoretical constructs (Thompson, 2004; Worthington & Whittaker, 2006). Exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) are two approaches in the factor analysis family. Although both approaches are often used in the scale development process, EFA and CFA serve different purposes and answer different questions. Table 4.1 73
74
Structural Equation Modeling Table 4.1 Comparison of Exploratory and Confirmatory Factor Analysis Exploratory factor analysis
Confirmatory factor analysis
Purpose Explore the nature of the dimensions of the latent variable and how scale items relate to dimensions: (a) How many dimensions of a phenomenon are represented by scale items? (b) Which items are associated with each dimension? Insufficient to establish psychometric properties of an instrument Used early in the scale development process to answer preliminary questions about a measure’s factor structure and item performance
Test stated hypotheses regarding the nature of the dimensions of a latent variable and how scale items relate to dimensions: (a) Do the data support the hypothesized dimensions of the latent variable? (b) Do the indicators measure the latent variable well? Critical in establishing psychometric properties of an instrument Used at or near the end of scale development to test: (a) how well items measure hypothesized dimensions (based on theory/past research) of latent variables, and (b) whether measures are invariant across time and/or populations
Methods Only the observed variables are prespecified. The number and structure of factors is determined by examining analysis output. Every variable is allowed to load on every factor. Factor rotation aims for simple structure. All parameters are freely estimated. Error variances cannot be correlated. Either all factors are allowed to correlate or none.
The researcher specifies the factor structure, including the number of factors, which variables measure which latent factors, and which factors are correlated. No factor rotation needed because a priori models generally specify simple structure. Some parameters can be estimated from data (free) while some are not (fixed). Correlations of various pairs of error variances can be estimated as appropriate. Correlation of various pairs of factors can be estimated as deemed appropriate.
provides an overview of the purpose and methods of EFA and CFA. Readers are referred to Thompson (2004) for a more detailed discussion of EFA and its purpose, methods, and relationship to CFA.
Measurement Models
CONTEXTS FOR USING CONFIRMATORY FACTOR ANALYSIS CFA tests measurement models, that is, the relationships among hypothesized latent variables and the observed variables whose scores they influence. CFA is commonly used for the following purposes: 1. To confirm the factor structure and quality of a new scale or instrument 2. To determine if a modified scale or instrument performs adequately 3. To establish that the use of observed composite variables in research and practice is justified 4. To determine if an existing scale or instrument performs adequately for a new population 5. To determine if an existing scale or instrument performs the same across two or more populations 6. To confirm the factor structure and quality of an existing scale or instrument that is being used in practice but has not undergone rigorous testing 7. To determine that a measurement model is adequate for the available sample before performing a substantive latent variable analysis One primary use of CFA is to establish the psychometric qualities of a new or modified instrument being introduced to the social work research and practice literature. Instruments may consist of one scale or factor measuring a unidimensional construct, two or more scales (or subscales) measuring multiple dimensions of a construct, or multiple scales and subscales measuring many different constructs. CFA can help establish the construct validity of a new measure: for example, by demonstrating that observed variables are adequate indicators of proposed latent variables, that latent variables measure distinct dimensions, and that the dimensions are substantively consistent with theory and prior research. CFA can also evaluate the presence or absence of more complex relationships among factors—such as second-order factor structures (described later) and correlated error structures within or between factors. It can also be used to test criterion validity (Bollen, 1989), that is, that scores
75
76
Structural Equation Modeling
from a new measure are highly correlated with scores from an existing measure of the same construct. When it is assumed that an existing scale will not perform adequately for a new population without modifications, the scale may be adapted before being tested. Scales may be translated into a different language and then tested using CFA (e.g., McMurtry & Torres, 2003). Others may be adapted for different age groups or for specific target groups or settings. The School Success Profile (SSP, Bowen, Richman, & Bowen, 2002; Bowen, Rose, & Bowen, 2005) designed for adolescents, for example, underwent extensive modifications to be appropriate for younger children (Bowen, 2008a; Wegmann, Thompson, & Bowen, 2011). CFA is also used to establish the appropriateness of using composite scores in practice and research. Practitioners often sum, average, or otherwise combine the scores of a set of assessment items and use the new score for decision making. Before combining scores, however, they should have statistical evidence that it is valid to do so. Similarly, it is not uncommon for researchers to combine items in secondary datasets to test hypotheses involving constructs that were not specifically targeted in the original instrument. Before testing relationships among constructs, researchers must establish that the proposed “scales” have adequate statistical qualities. One example of this use of CFA comes from the child welfare literature. Researchers (Yoo & Brooks, 2005) studying how organizational characteristics affect service effectiveness, for example, established the adequacy of their measures using CFA (in a separate study) before conducting hierarchical linear modeling with composite scores. Eight dimensions of organizational context were measured with an instrument that combined existing scales, adapted scales, and newly developed scales. Identifying valid composites is a data reduction goal—reducing a set of items to a more parsimonious subset of items that can be used in practice or research. Confirming that three components of posttraumatic stress disorder are well represented by the 17 items in the Secondary Traumatic Stress Scale introduced in Chapter 1 (Bride et al., 2004), for example, may indicate that social workers can make practice decisions based on composite scores for those dimensions instead of trying to consider all 17 scores simultaneously. Similarly, a researcher examining the implications of indirect trauma for providing mental health services to practitioners could conduct analyses using the three observed composite
Measurement Models
scores (or latent variables) instead of 17 individual item scores. In addition to providing evidence that multiple items collectively measure the same phenomenon well, CFA can provide appropriate weights (factor loadings) for the calculation of composite scores, indicating how important each item is to the overall measure. This weighting occurs automatically when latent variables are used in general structural models, but it is ignored when observed composite scores are created by simply averaging or summing scores on a set of items. CFA is also used to determine if an existing scale can be used with a population different from the one for which it was intended. Often scales are tested with groups who differ in terms of gender, age, or culture from the group for which they were developed. Similarly, scales meant for measuring worker competence or organizational capacity in one type of human service organization (e.g., a public child welfare agency) might be used in a different type of service agency (e.g., a nonprofit housing agency). The research question behind these analyses is: “Is the scale an adequate measure of the construct in a new population (of individuals or organizations)?” Kelly and Donovan (2001) for example, tested an alcohol screening tool used with adults to see if it performed adequately among adolescents admitted to emergency rooms. The goal of such tests is to determine whether a scale is appropriate for a new population or if modifications are needed. A related research question that requires a different CFA approach is: “Does the scale perform differently for one population than another, and if so, how?” Multiple-group CFA allows simultaneous tests of data from two or more populations to see if and how their measurement models differ, and if the differences are statistically significant. These tests are tests of measurement invariance or partial measurement invariance. Sometimes CFA is performed in order to more rigorously test an existing instrument that has been in use in research or practice. The widely used Professional Opinion Scale, for example, was retested with CFA years after its factor structure had been tested with EFA (Abbott, 2003). As more social workers become statistically savvy and the software for sophisticated procedures becomes more accessible, this use of CFA is likely to increase. The final common use of CFA is for the establishment of good fit of measurement models before researchers proceed to substantive hypotheses testing. The value of establishing measurement model adequacy
77
78
Structural Equation Modeling
before testing structural relationships in SEM (Anderson & Gerbing, 1988) is widely considered a best practice (Bollen, 2000) (although here as in other areas of practice, there are multiple perspectives). The rationale for using this approach will be presented in Chapter 5. In a study of the effects of supervisory communication on social workers in health care settings (Kim & Lee, 2009), researchers identified inadequacies with their measure of burnout, even though they were using an established tool to assess the construct. Based on their findings, the researchers modified the measurement model before proceeding to their structural tests.
CONFIRMATORY FACTOR ANALYSIS IN THE INSTRUMENT DEVELOPMENT PROCESS Confirmatory factor analysis is a highly recommended component of the scale or instrument development process. Table 4.2 illustrates three possible scale development paths a social work researcher might take to arrive at the point of using CFA. The quantitative process, described by DeVellis (2003), is more commonly reported in the literature and may be appropriate when the constructs to be assessed are relatively well understood. The mixed methods process may often be appropriate in social work because social workers often work with social problems, populations, and constructs that are understudied, for example, victims of intimate partner violence, the homeless, older adults with Alzheimer’s, traumatized children, community capacity, and neighborhood organization. Interviewing individuals to gain understanding of their perceptions of the nature and scope of newly studied constructs may be a critical and time-saving step in the development of appropriate scales. Cognitive testing is a qualitative scale development method in which individuals with the characteristics of intended respondents of a new scale are interviewed while responding to questionnaire items (Willis, 2005). For example, respondents may be asked to read each questionnaire item aloud, to explain their understanding of what the item is asking, and to explain why a particular response was selected rather than other response options (Bowen, 2008a). This process allows the researcher to judge if questionnaire items and response options are being interpreted as intended. Cognitive testing may be a critical step in the development of any scale targeting a population whose experiences or cognitive
Table 4.2 Possible Paths to CFA in the Evaluation of Social Work Measures Quantitative scale development
Mixed-methods scale development
Tests of existing measures
Conduct literature review. Create initial item pool. Solicit expert feedback. Pilot test revised item pool. Examine distributions. Collect data from a large sample. Conduct exploratory factor analysis. Conduct confirmatory factor analysis. Conduct reliability and additional construct and criterion validity tests.
Conduct literature review. Interview intended respondents about construct. Create initial item pool. Solicit feedback from academic experts and experts from intended setting. Cognitively test items and response options with intended respondents. Solicit expert feedback on revised item pool and format. Pilot test revised item pool. Examine distributions. Collect data from a large sample. Conduct exploratory factor analysis. Conduct confirmatory factor analysis. Conduct reliability and additional construct and criterion validity tests.
Test an existing or adapted measure for use with a new population. Collect data from a large sample. Conduct exploratory factor analysis (optional). Conduct confirmatory factor analysis. Conduct reliability and additional construct and criterion validity tests.
Note. The quantitative scale development steps are detailed by DeVellis (2003). Most of the mixed methods steps were used by Bowen and colleagues (Bowen, 2006, 2008a; Bowen et al., 2004) in the development of the Elementary School Success Profile.
80
Structural Equation Modeling
processes may differ substantially from the researchers’ knowledge base. For example, one group of researchers (Bowen, 2008a; Bowen, Bowen, & Woolley, 2004) describes the use of cognitive testing in the development of a computerized social environmental assessment for children in grades 3 through 5. Cognitive testing with children as they read and responded to questions led to substantial revisions in items, indicating that the “expert” scale development team had not accurately predicted the effects of concrete thinking on the responses to questionnaire items of children in middle childhood. As illustrated in Table 4.2, CFA is often used after EFA results suggest a factor structure for a set of items. Exploratory factor analysis is a useful preliminary step in the testing of a measure’s factor structure and item performance. Exploratory factor analysis results, however, are not generally sufficient for fully establishing the psychometric properties of measures to be offered as high-quality research or practice tools. Models that meet all criteria for adequacy in an exploratory framework may not “pass” all confirmatory factor analysis tests. Abbott (2003), for example, based her CFA of data from the Professional Opinion Scale on results of EFAs conducted 10 years earlier. This researcher was interested in more rigorous tests of the quality of the Professional Opinion Scale after concluding that it was likely to continue being used in practice in spite of mixed results of past exploratory analyses. Also, as illustrated in the third column of Table 4.2, some social workers arrive at CFA without starting the scale development process from scratch. Social workers may use CFA to determine if an existing scale is appropriate for use with a different population than the one for which it was developed. Kelly and Donovan (2001), for example, used CFA to test the factor structure of data collected from adolescents with an existing substance abuse assessment tool for adults. Researchers who have collected data with a translated or adapted version of an existing measure may use CFA to confirm that the factor structure of the original instrument is also supported in data collected from a new population. McMurtry and Torres (2003) used recommended instrument translation procedures, EFA, and then CFA to validate the factor structure of the Spanish version of the Client Satisfaction Inventory. Whether a researcher uses a purely quantitative scale development process or a mixed-methods approach, confirmatory factor analysis is a valuable part of the process. CFA provides evidence of the overall quality
Measurement Models
of a measure, the dimensionality of the assessed construct, the quality of the observed variables as indicators of the construct, and information about the relationships among factors. In addition, CFA provides information on a number of less familiar aspects of scales that cannot be obtained through EFA, including error structures, invariance across groups, second-order factor structures, and statistical comparisons of alternative models.
STEPS IN CFA/SEM Four major steps of SEM analyses (both CFA and general SEMs) are discussed in the next three chapters. The chapters provide examples that can serve as instructions for social workers conducting their own analyses. Box 4.1 lists the major steps of SEM analysis and where each step is discussed in the text. Additional resources, examples, and specific software instructions can be found at the book’s website.
SPECIFICATION OF MEASUREMENT MODELS Overview of CFA Specification Specifying a measurement model involves defining how many factors are expected to be represented by data collected with the observed indicators, which variables are related to each factor, which latent variables
Box 4-1 Four Major Steps of SEM Analyses and Where They Are Discussed 1. Model Specification CFA: Starting on p. 81 in Chapter 4 General SEM: Starting on p. 111 in Chapter 5 2. Estimation CFA: Starting on p. 100 in Chapter 4 General SEM: Starting on p. 123 in Chapter 5 3. Evaluation of Results CFA and General SEM: Chapter 6 4. Model Modifications CFA and General SEM: Chapter 6
81
82
Structural Equation Modeling
are correlated, and which error terms, if any, are correlated. Only observed variables and the latent variables they are hypothesized to measure are included in a measurement model. Single-indicator observed variables, such as “gender” or “income,” are not included, even if they are expected to be included later in a structural model. Unlike in the common factor model of EFA, in which every variable is allowed to load on every factor, CFA models typically allow each observed variable to load on only one factor. When theoretically justified, however, one or more observed variables may load on multiple factors. Decisions about which variables load on which factor must be justified in CFA, but what qualifies as justification is subjective. As with the reporting of most statistical procedures, researchers must be explicit about their choices. Readers can then make their own judgments about the procedures. Ideally, theory and previous analyses with similar or different populations can offer support for factor structure hypotheses. For example, Wegmann, Thompson, and Bowen (2011) hypothesized a multifactor structure for items on the Elementary School Success Profile (ESSP) for Parents based on the theory-based ecological domains and dimensions assessed by the ESSP and the adolescent questionnaire upon which the ESSP was based. EFA results can also be used to justify a hypothesized factor structure. However, because EFA estimation procedures are different and modeling options more limited, they are not necessarily the best or only source of factor structure hypotheses. Because CFA is a confirmatory method, models should be based on theory, EFA results, and/or performance of the measure in prior analyses. Among the models of the Alcohol Use Disorders Identification Test examined by Kelly and Donovan (2001), for example, was a 1-factor model and a 2-factor model, each of which was supported by previous analyses. Abbott (2003) based her CFA model on prior EFA studies. A study of the Spanish version of the Client Satisfaction Inventory (McMurtry & Torres, 2003) refers to prior research and theory to justify the models tested. In specifying a model, the user indicates which parameters are to be fixed, constrained, or freely estimated. Fixed parameters are those for which the user designates a value. Fixed parameters are not estimated by the software and therefore do not count against the degrees of freedom available for estimating a model. In CFA, the values to which parameters are most likely to be fixed are 0 or 1. Parameters are typically fixed to 0 by the deliberate failure to specify a relationship between two variables.
Measurement Models
As discussed in Chapter 2, one loading per factor is usually fixed at 1 to set a metric for the factor. In most SEM programs, the regression lines between error variances and observed variables are also fixed at 1. Fixing these parameters to 1 is necessary for the scaling of the error variance and for identification purposes. If instead the error variances were fixed to 1, the regression coefficients would be estimated. In SEM models, however, error variances are of substantive interest. In certain circumstances, parameters may be fixed at other values. Freely estimated parameters are those that the software will estimate using information from the input matrix or matrices. Factor and error variances, correlations, and factor loadings that have been specified as part of the measurement model, and have not been fixed or constrained, will be freely estimated. The more parameters that are estimated (and the fewer degrees of freedom), the better the fit of a model will be; if every parameter in a model is freely estimated, model fit will be perfect. Estimating fewer parameters is rewarded in some fit indices because it is more difficult to obtain adequate fit for such models. Constrained parameters are parameters that are specified by the researcher to have the same value, even though that value must be estimated. Constrained parameters are commonly encountered in multiplegroup CFAs. To determine if a latent variable has the same relationship to hypothesized indicators for two or more populations, such as boys and girls, the quality of the model with and without factor loadings constrained to be equal across groups is compared. In this example, a single value for the factor loading will be estimated and applied to both groups. We will discuss multiple-group CFAs further in Chapter 7. Naming Latent Variables. Because latent variables (factors and error variances) do not exist in the observed dataset, the user provides names for these variables when specifying the model. Regardless of the program being used, it is helpful to name latent variables with an abbreviated form of the construct being hypothesized. Output is much easier to interpret when variable names are descriptive. This rule is true for latent error terms as well. Use brief labels that make it clear to which observed variable an error term belongs. SEM output is complicated and lengthy; having a logical and consistent variable-naming plan makes it much easier to determine which parameters are adequate or problematic. The naming of latent factors and errors is most often guided by substantive and theoretical considerations and the meaning of indicators.
83
84
Structural Equation Modeling Box 4-2 Four Specification Steps 1. Specify how many latent variables there are and which observed variables load on each one (regardless of software choice, drawing a path diagram as part of this step is helpful in visualizing the model you intend to specify). 2. Set the scale of each latent variable. 3. Specify that each observed indicator has measurement error and indicate if any of the error terms are correlated. 4. Specify which latent factors are correlated.
As illustrated in Chapter 2, the details of CFA model specification can be conveyed graphically, with equations, or with matrices. All of these formats convey the same information. Amos users create graphics of their models, and Mplus users specify equations in simple verbal format. It is valuable to become familiar with all of these formats because the literature includes them all and they collectively reinforce understanding of the underlying mechanisms of SEM. In the following section we provide an example of CFA specification that follows four steps that can be applied by social work researchers specifying their own models. These steps are summarized in Box 4.2. CFA Specification Example The CFA reported in Kelly and Donovan (2001) will be used to illustrate Specification Steps 1 through 4. The three models tested by the authors will be specified graphically, in equations, and in matrix format. The graphic representations were developed in Amos and illustrate how models are specified using that program. Mplus model specification syntax is also presented to illustrate its relationship to the equations format. The online materials give more detail about how to use the two programs to specify and estimate CFA models. The authors tested the factor structure and adequacy of data from the Alcohol Use Disorders Identification Test (AUDIT, Kelly and Donovan, 2001), which was collected from adolescents in an emergency room. This 10-item assessment, according to the authors, was designed by the World Health Organization. The authors indicated that previous studies of the
Measurement Models
AUDIT with adults supported both a 1-factor and 2-factor structure. They undertook their study of the instrument because few studies had examined the AUDIT with adolescents and none had evaluated the factor structure with CFA. Because the instrument is widely used, they wanted to evaluate its validity with adolescents. Although the authors used only one sample to test the instrument, they tested multiple models to determine which fit the data best. They tested 1- and 2-factor models based on previous research with adult samples. They also tested a 3-factor model because the instrument was originally designed to assess three constructs: “alcohol consumption, drinking related problems and alcohol dependence” (p. 838). Following this paragraph is the verbatim brief description of each of the 10 items on the AUDIT (Kelly & Donovan, p. 840). We have invented 8-letter names for the observed variables based on the descriptions provided in the article. The authors provided names for the latent variables in their 2- and 3-factor models: Consumption, Problems, and Dependence. We have also named the latent variable in the 1-factor model and the latent error terms in all three models. If the authors used a data file containing variable names, the observed variable names specified in the models would have had to match those names. If they used a text file, they could have entered the original or new variable names in the SEM program. Variable 1: Frequency of drinking (FREQ) Variable 2: How many drinks do you have (NUMDRINK) Variable 3: How often do you have six or more drinks (SIXPLUS) Variable 4: How often could you not stop drinking (CANTSTOP) Variable 5: How often did you not do what you were supposed to do (FAILTODO) Variable 6: How often did you need a drink to get going (GETGOING) Variable 7: How often did you feel guilt or remorse about drinking (GUILT) Variable 8: How often could you not remember (MEMLOSS) Variable 9: Injury as a result of alcohol use (INJURY) Variable 10: How often did a friend, family member or health care worker show concern (OTHCARE) Researchers conducting the CFA study of the AUDIT (Kelly & Donovan, 2001) used the best practice of testing alternative models.
85
86
Structural Equation Modeling
The authors compared models with one, two, and three latent variables. Figures 4.1 through 4.9 present graphic, equation, and matrix representations of the three models tested in the study. The path diagrams in Figures 4.1, 4.4, and 4.7 were created using Amos and indicate exactly how each model would be specified in that program. The graphics use the convention of circles or ovals for latent variables and squares or rectangles for observed variables. Figures 4.2, 4.5, and 4.8 present how the models would be specified with generic SEM equations and with Mplus model specification syntax. (Details on other components of the syntax necessary for running a model in Mplus are presented in the online materials available to readers.) Figures 4.3, 4.6, and 4.9 illustrate each of the three models in matrix notation. Specification Step 1. Specify how many latent variables there are and which observed variables load on each latent variable. The CFA study of the AUDIT (Kelly & Donovan, 2001) also demonstrated the best practice of basing models on previous research and theory. The 1-factor model was based on findings from a previous study of AUDIT data collected from adults. In the graphical representation of the model in Figure 4.1, the 10 arrows coming from AUDIT to the observed variables indicate that all 10 of the variables load on one factor. In the factor equations presented in Figure 4.2, each observed variable is written as a function of the same factor (ξ1, AUDIT). And the Λ matrix shown in Figure 4.3 for the 1-factor model is a vector (i.e., has only one column) because all 10 observed variables load on the same factor. One variable (x9) was removed from subsequent models in the Kelly and Donovan (2001)
d1 1
FREQ X1
d2
d3 1
NUM DRINK X2
d4 1
SIX PLUS X3 l21
l31
d5
d6
1
1
CANT STOP X4
FAIL TODO X5
l41
d7 1
GET GOING X6
l51
l61
d8 1
GUILT X7 l71
d9 1
d10 1
1
MEM LOSS X8
INJURY X9
l81
l91
1
OTH CARE X10
l101
AUDIT x1
*Alcohol Use Disorders Identification Test (Kelly & Donovan, 2001)
Figure 4.1 Path Diagram of the 1-Factor Model of the AUDIT* (Using Amos).
Measurement Models
study based on a problematic pattern of covariances with other variables in the model. In the 2-factor model presented in Figures 4.4 through 4.6, the nine remaining observed variables load on either the Consumption factor or the Problems factor. In the path diagram presented in Figure 4.4, three arrows emerge from the Consumption factor and point to three observed variables. The items loading on Consumption relate to how often and how much alcohol is consumed. Six other arrows emerge from the Problems factor and point to the remaining observed variables, which appear to represent undesirable consequences of drinking, making the Problems label for ξ2 appropriate. The first three variables in the factor equations in Figure 4.5 are written as functions of ξ1, and the last six are written as functions of ξ2. In the matrix representation of the model, shown in Figure 4.6, the Λx matrix now has two columns, one for each factor. All of these methods of specifying the model provide the same information. Figures 4.7 through 4.9 illustrate specifications of a 3-factor AUDIT model. The diagram in Figure 4.7 includes three latent factors, each with
a. Generic SEM equation specification
x 1 = (1)x1 + d 1 x 2 = l21x1 + d 2 x 3 = l31x1 + d 3 x 4 = l41x1 + d 4 x 5 = l51x1 + d 5 x 6 = l61x1 + d 6 x 7 = l71x1 + d 7 x 8 = l81x1 + d 8 x 9 = l91x1 + d 9 x10 = l101x1 + d 10
Notes:
x1 through x10 are observed variables FREQ through OTHCARE. x1 is AUDIT, a latent variable representing the true score on the construct. The factor loading of x1 on AUDIT is fixed at 1. l21 though l101 are freely estimated.
b. Mplus Equation Specification of the 1-Factor Model of the AUDIT* MODEL: AUDIT by FREQ NUMDRINK SIXPLUS CANTSTOP FAILTODO GETGOING GUILT MEMLOSS INJURY OTHCARE; Note: Mplus automatically fixes the loading of the first indicator of the factor (x1) at 1.0 and assumes that all 10 observed variables have associated measurement errors with paths fixed at one 1.0 and variances freely estimated. *Alcohol Use Disorders Identification Test (Kelly & Donovan, 2001)
Figure 4.2 Equation Specification of the 1-Factor Model of the AUDIT*.
87
88
Structural Equation Modeling Lx =
F= [f11] The F matrix contains only the variance of x1 because there is only one latent variable. In this example, f11 is a scalar. When more than one latent variable is specified, the F matrix contains additional elements— variances for the The Lx matrix additional contains the latent variables and loadings of 10 covariances between observed latent variables. variables on x1. 1 l 21 l 31 l 41 l 51 l 61 l 71 l 81 l 91 l101
Qd = q11 0 q22 0 0 q33 0 0 0 q44 0 0 0 0 q55 0 0 0 0 0 q66 0 0 0 0 0 0 q77 0 0 0 0 0 0 0 q88 0 0 0 0 0 0 0 0 q99 0 0 0 0 0 0 0 0 0 q1010 Error variances are on the diagonal. Error covariances are fixed at 0.
*Alcohol Use Disorders Identification Test (Kelly & Donovan, 2001)
Figure 4.3 Matrix Specification of the 1-Factor Model of the AUDIT*.
three hypothesized indicators. Three latent ξ variables now appear in the set of equations representing the 3-factor model in Figure 4.8. And in Figure 4.9, the Λx matrix now contains three columns. In the 3-factor model, three items that loaded on Problems in the previous model load on a new factor called Dependence. CANTSTOP, FAILTODO, and d2
d1 1
d3 1
FREQ X1 1
NUM DRINK X2 l21
d4
d5
1
1
SIX PLUS X3
CANT STOP X4
l31
d6
1 FAIL TODO X5 1
d7 1
GET GOING X6 l52
CONSUMPTION x1
d8 1
GUILT X7 l62
l72
d10 1
MEM LOSS X8 l82
1 OTH CARE X10 l102
PROBLEMS x2 f12
*Alcohol Use Disorders Identification Test (Kelly & Donovan, 2001). Note: One variable (x9) was removed from the 2-factor and 3-factor models in the Kelly and Donovan (2001) study based on a problematic pattern of covariances with other variables in the model.
Figure 4.4 Path Diagram of the 2-Factor Model for the AUDIT* (Using Amos).
a. Generic SEM Equation Specification x1 = (1)x1 + d1 Notes: x2 = l21 x1 + d2 x1 through x10 are observed variables x3 = l31x1 + d3 FREQ through OTHCARE. x4 = (1)x2 + d4 x9 was removed from the analysis by the x5 = l52x2 + d5 authors after testing the first model. x6 = l62 x2 + d6 x 1 is Consumption; x2 is Problems x = l x +d 7
72 2
7
x 8 = l82x2 + d8 x10 = l102 x2 + d10
The loadings of x1on Consumption and x4 on Problems are fixed at 1.
b. Mplus Equation Specification of the 2-Factor Model for the AUDIT* MODEL: Consumption by FREQ NUMDRINK SIXPLUS; Problems by CANTSTOP FAILTODO GETGOING GUILT MEMLOSS OTHCARE; Note: Mplus automatically fixes the loading of the first indicator of each factor (x1 and x4) at 1.0, assumes that all 10 observed variables have associated measurement errors with paths fixed at 1.0 and variances freely estimated, and assumes that the covariance between the two factors is to be freely estimated. *Alcohol Use Disorders Identification Test (Kelly & Donovan, 2001) Note: One variable (x9) was removed from the 2-factor and 3-factor models in the Kelly and Donovan (2001) study based on a problematic pattern of covariances with other variables in the model.
Figure 4.5 Equation Specification of the 2-Factor Model of the AUDIT*.
F=
Lx = 1 l 21 l 31 0 0 0 0 0 0
0 0 0 1 l 52 l 62 l 72 l 82 l102
f11 f21 f22
Qd = q11 0 0 0 0 0 0 0 0
q22 0 q33 0 0 0 0 0 0 0 0 0 0 0 0
q44 0 0 0 0 0
q55 0 q66 0 0 q77 0 0 0 q88 0 0 0 0 q1010
Note: To reflect the authors’ respecified model allowing 2 correlated errors to be estimated, the terms q84 and q106 would replace the two underlined elements above. *Alcohol Use Disorders Identification Test (Kelly & Donovan, 2001) Note: One variable (x9) was removed from the 2-factor and 3-factor models in the Kelly and Donovan (2001) study based on a problematic pattern of covariances with other variables in the model.
Figure 4.6 Matrix Specification of the 2-Factor Model of the AUDIT*. 89
90
Structural Equation Modeling
d1
d2 1
d3 1
NUM DRINK X2
FREQ X1
1
l21
d4
d5
d6
1
1
1
SIX PLUS X3
CANT STOP X4
FAIL TODO X5
l31
1
d7 1
l52
GET GOING X6
d8 1
1
DEPENDENCE x2
CONSUMPTION x1 f1
1
MEM LOSS X8
GUILT X7
l62
d10 1
OTH CARE X10 l83
l103
PROBLEMS x3 f2
f3
*Alcohol Use Disorders Identification Test (Kelly & Donovan, 2001) Note: One variable (x9) was removed from the 2-factor and 3-factor models in the Kelly and Donovan (2001) study based on a problematic pattern of covariances with other variables in the model.
Figure 4.7 Path Diagram of the 3-Factor Model of the AUDIT* (Using Amos).
GETGOING are consistent with existing definitions of psychological and physical dependence on substances (e.g., American Psychiatric Association, 1994). The 3-factor model did not have previous empirical support, but the authors report that the AUDIT was originally designed to assess three theoretical constructs. As with the 2-factor model, no cross-loadings of items were hypothesized in the 3-factor model, and at least three items were hypothesized to load on each factor. Cross-loadings are typically undesirable because they indicate that factors may not be adequately distinct from each other. In addition, the variance of a cross-loading item is divided among two or more latent variables, therefore each loading often (but not always) tends to be smaller than one significant loading would be. Values of cross-loadings are also affected by inter-factor correlations. Cross-loadings also complicate the use of composite scores. For example, if Problems and Consumption composite subscores were going to be used as the basis for clinical decisions, an item that loaded on both would be included in the computation of both composite scores, even though its loadings might be low relative to other loadings on each indicator. Unless composite scores were based on item weights derived from the factor analysis results, the item would be overrepresented in the
Measurement Models a. Generic SEM Equation Specification
x 1 = (1)x 1 + d 1 x 2 = l21 x1 + d 2 x 3 = l31 x1 + d 3 x 4 = (1)x 2 + d 4 x 5 = l52 x2 + d 5 x 6 = l62 x2 + d 6 x 7 = (1)x3 + d 7 x 8 = l83 x3 + d 8 x10 = l103 x3 + d 10
Notes:
x1 through x10 are observed variables FREQ through OTHCARE. x9 was removed from the analysis by the authors after testing the first model. x1 is Consumption; x2 is Dependence; x3 is Problems The loadings of x1 on Consumption and x4 on Dependence, and x7 on Problems are fixed at 1.
*Alcohol Use Disorders Identification Test (Kelly & Donovan, 2001) b. Mplus Equation Specification of the 3-Factor Model of the AUDIT* MODEL: Consumption by FREQ NUMDDRINK SIXPLUS; Dependence by CANTSTOP FAILTODO GETGOING; Problems by GUILT MEMLOSS OTHCARE; Note: Mplus automatically fixes the loading of the first indicator of each factor (x1, x4, and x7) at 1.0, assumes that all 10 observed variables have associated measurement errors with paths fixed at 1.0 and variances freely estimated, and assumes that the covariance between each pair of factors is to be freely estimated. *Alcohol Use Disorders Identification Test (Kelly & Donovan, 2001) Note: One variable (x9) was removed from the 2-factor and 3-factor models in the Kelly and Donovan (2001) study based on a problematic pattern of covariances with other variables in the model.
Figure 4.8 Equation Specification of the 3-Factor Model Specification of the AUDIT*.
scores used to make practice decisions. In the 3-factor model tested by Kelly and Donovan (2001), one item, FAILTODO, actually did load on both Consumption and Dependence. The loadings for the item were substantially lower than other items on the two factors, and the loading on Consumption was hard to interpret (i.e., the item content did not seem related to other items loading on the factor). The cross-loading contributed to evidence that the 3-factor model was misspecified and that the 2-factor alternative was a better model. Specification Step 2. Set the scale of each latent factor. Fixing either the variance of a latent factor or one of the factor loadings equal to 1.0 identifies the factor and sets the metric for the latent variable. If the scale, or metric, of the variable is not established, an infinite number of values are
91
92
Structural Equation Modeling Lx = 1
l 21 l 31 0 0 0 0 0 0
F= 0 0 0 1 l 52 l 62 0 0 0
0 0 0 0 0 0 1 l 83 l 103
Each factor has one loading fixed at 1.
f11 f21 f22 f31 f32 f33 The variances and covariances of all three latent variables will be estimated.
Qd = q11 0 q22 0 0 q33 0 0 0 q44 0 0 0 0 q55 0 0 0 0 0 q66 0 0 0 0 0 0 q77 0 0 0 0 0 0 0 q88 0 0 0 0 0 0 0 0 q1010 The variances of all measurement error terms will be estimated. No correlated errors are hypothesized.
*Alcohol Use Disorders Identification Test (Kelly & Donovan, 2001) Note: One variable (x9) was removed from the 2-factor and 3-factor models in the Kelly and Donovan (2001) study based on a problematic pattern of covariances with other variables in the model.
Figure 4.9 Matrix Specification of the 3-Factor Model Specification for the AUDIT*.
possible for the factor variance and factor loadings, and the model is unidentified. Fixing one factor loading equal to 1.0 provides a reference point to which other values can be tied. Kline (2005) calls the variable with the fixed path the “reference variable” for the latent variable (p. 170). Fixing one unstandardized loading to a value of 1.0 “assigns to a factor a scale related to that of the explained (common) variance of the reference variable” (Kline, p. 170). In other words, the scale, or metric, of the latent variable is set equal to the scale of the indicator variable. Some programs automatically fix the loading of the first specified indicator, but users can override that default. Some users prefer to fix the variable with the highest loading (e.g., as indicated in EFA), the highest reliability, or the variable with the most response options (if indicators differ in the number of response options) equal to 1.0. In practice, however, it usually makes little difference in the parameter estimates. Unstandardized loadings will consistently be relatively larger or smaller than each other regardless of which loading is fixed at 1.0. A standardized loading (correlation or regression coefficient) is provided for the reference variable in SEM output even though its unstandardized value
Measurement Models
has been fixed at 1.0. The standardized loading corresponding to an unstandardized loading that has been fixed equal to 1.0 equals the square root of the ratio of the variances of the latent and observed variable (Bollen, 1989, p. 199). A second option for identifying the factor and setting the metric is fixing the variance of the latent variable to 1.0. Unstandardized loadings and their critical ratios can be obtained for all observed variables if this option is chosen. This method is not appropriate for multiple-group analyses, however, and may apply only to exogenous factors, according to Kline (2005), because the variances of endogenous variables are not computed in SEM. Instead, the variances of the structural errors of endogenous variables are estimated. Fixing one factor loading may more often be the appropriate choice, although in single-group CFA, either option is viable. If the user fails to either set one loading or the latent variable variance to 1.0, the model will not be identified and will not run. In the 1-factor model presented in Figure 4.1, the “1” below the arrow going from AUDIT to FREQ indicates that the unstandardized factor loading for FREQ has been fixed at 1. This parameter will not be estimated. The equation for x1 in Figure 4.2 indicates that the coefficient for ξ1 is 1.0, not a freely estimated λ. Consistent with the graphic and equation specifications, the Λ matrix specification for the 1-factor model in Figure 4.3 contains a 1 in place of λ11, indicating the coefficient has been fixed. In the unstandardized output for the model, the loading for FREQ on AUDIT will be 1.0. The loadings for the rest of the variables will be more or less than 1.0, depending on the magnitude of their loadings relative to the loading of FREQ. Depending on which Amos tools are used in creating a path diagram, the first drawn indicator of a factor may automatically be specified as equal to 1.0, even if other indicators are drawn above or to the left of it. Users can manually delete the 1.0 and enter a 1.0 as the value of another loading. Mplus automatically fixes the loading of the first observed variable listed after the word “by” for a factor. Users with a preference for which loading is fixed should list the appropriate variable first when specifying the model structure. It is also possible in Mplus to specify after each variable that its value should be set to a specific value (e.g., 1.0) or freely estimated. More detail on these specifications is provided in the online materials. In the 2- and 3-factor models, one loading for each additional factor has also been set to 1. The fixed values can be seen in the path diagrams
93
94
Structural Equation Modeling
in Figure 4.4 and 4.7, equations in Figures 4.5 and 4.8, and matrix specifications presented in Figures 4.6 and 4.9. Specification Step 3. Specify that each observed indicator has measurement error and indicate if any of the error terms are correlated. As an endogenous variable influenced by a latent variable, each observed indicator has a latent error term. These variables predict all of the variance in the observed indicator that is not explained by the latent construct it helps measure. The arrows from latent error terms to observed variables represent regression paths. In Amos (if the indicator drawing tool is used) and Mplus, all of these paths are automatically fixed at 1.0. As discussed earlier, when a latent variable has only one indicator (as in the case of the error variables), either the path or the latent variable’s variance will be estimated. Because the value of one dictates the value of the other, they do not both need to be estimated. Error variances are useful for interpreting how well a factor model explains variance in observed variables. Therefore, SEM programs by default fix the path and estimate the variance. Fixing the path to 1.0 “has the consequence of assigning to a measurement error a scale related to that of the unexplained (unique) variance of its indicator” (Kline, 2005, p. 171). Typically researchers first specify a measurement model without correlated error terms. An exception to this practice occurs when data are collected using the same scale at two or more points in time. Based on the theory that unique factors (i.e., error terms) contain systematic error due to latent factors not examined in the current measurement model, it is likely that the same sources of error affect an observed variable’s scores each time subjects respond to the item. Therefore, specifying freely estimated correlations between matching items administered at two or more points in time is reasonable. The graphical representations of the three AUDIT models in Figures 4.1, 4.4, and 4.7 contain unique factors (i.e., error terms) for all observed variables. Each parameter going from an error term to its associated indicator has a “1” next to it, indicating that the path relating the error variance to the observed variable has been fixed at 1.0. In Figures 4.2, 4.5, and 4.8, the equations defining the 9 or 10 indicator variables contain δ terms indicating that the variables are not perfectly predicted by the latent variable. The matrix representations of the three AUDIT models include the same information about the error terms given in the path diagram and equation formats. The Θδ matrices include the variances of the errors, or
Measurement Models
unique factors, along the diagonal. The original specification of each model contained no correlated errors. In Figure 4.6, however, the note for the matrix representation of the model indicates that correlated errors were added to the 2-factor model in order to improve fit. More about this modification follows. Specification Step 4. Specify which latent factors are correlated. Because the magnitude of standardized covariances, or correlations, are easier to interpret, discussions of SEM models often refer to correlations instead of covariances. The two terms are used here interchangeably because one can be calculated from the other. Latent variables in CFA are typically expected to be correlated. In some programs, latent variables are allowed to covary by default. In others, the user has to specify that the latent variables covary. Whether an SEM program considers exogenous latent variables to be correlated by default or not, the user can manually fix, constrain, or allow freely estimated interfactor relationships. The most common value to which interfactor covariances or correlations are fixed is 0. An example of possibly uncorrelated factors is a risk scale related to the neighborhood environment and one related to certain biological risks. Uncorrelated factors in both EFA and CFA are described as “orthogonal” (i.e., statistically independent). Constraints on interfactor correlations may be imposed in multiple-group CFA, which will be discussed later. Usually there is theoretical justification for factors in social work measurement models to covary. For example, in her study of the Professional Opinion Scale, Abbott (2003) specified that all six possible covariances among the four factors modeled should be estimated. Because each factor represented a dimension of social work values, such as social responsibility and self-determination, covariances among them would be expected. In the 2- and 3-factor models of the AUDIT tested by Kelly and Donovan (2001), covariances among exogenous latent variables are also expected because the factors are aspects of the same problem behavior. In the 2-factor model that their analyses supported, the correlation between the two factors was 0.75—substantial, but probably indicative of two distinct dimensions. In contrast, in their 3-factor model, which they ultimately rejected, one interfactor correlation was 1.0, meaning that the two factors are statistically indistinguishable and should not be modeled separately. Kline (2005) suggests that correlations over 0.85 may indicate that one rather than two latent variables underlie scores on a set of
95
96
Structural Equation Modeling Box 4-3 Counting Correlations In a 2-factor model, there is one covariance between the two latent variables. In a 3-factor model, there are three (2 + 1) interfactor covariances. In a 4-factor model, there are six (3 + 2 + 1), and in a 5-factor model there would be 10 (4 + 3 + 2 + 1), and so on.
observed indicators. To answer the question definitively, however, researchers should run both models and determine which has the best fit. In the Amos path diagrams of the AUDIT models, interfactor correlations are represented with double-headed arrows. In the Mplus syntax presented in Figures 4.5 and 4.8, interfactor covariances/correlations are assumed by default. If a user wants to fix an interfactor correlation to 0, the “with” command can be used on a new line (e.g., Consumption with Problems @ 0;). In the matrix presentations (Figures 4.6 and 4.9), interfactor covariances or correlations are represented in the off-diagonal elements, while variances are presented on the diagonal. The small φ’s in the off-diagonal elements of Φ matrices associated with the 2- and 3-factor models indicate that the interfactor covariances are to be freely estimated. Off-diagonal elements that contain 0s or blanks are assumed to be 0 (indicating two factors are uncorrelated). The Φ matrix of the 1-factor model (Figure 4.3) is a scalar. It contains only the variance of the one modeled latent variable because there are no other factors with which it can covary. Box 4.3 contains a tip for quickly counting the number of inter-factor covariances to be estimated in a model. Specification of Alternative CFA Models The credibility of CFA results is enhanced when more than one model is tested. After completing the steps discussed up to this point and obtaining a satisfactory model consistent with previous research or theory, one or more alternative models are compared with the hypothesized model. In CFA the following types of models may be reasonable alternative models (assuming they are different from the originally hypothesized model): 1. 1-factor models 2. Models with one more and one fewer factors than the original model
Measurement Models
3. First-order factor models 4. Second-order factor models 5. Models in which selected indicators load on different factors than originally hypothesized 6. Models in which measurement parameters differ for individuals from different populations. Ideally, alternative models have their own empirical or theoretical rationales. When this is true, the CFA study becomes much more rigorous than a one-shot test—it becomes a comparison of competing conceptualizations or operationalizations of social science constructs. Often researchers conducting CFA first test a first-order factor model with multiple factors. (In first-order measurement models, all latent variables are measured directly with observed variables. All models we have discussed thus far are first-order models.) A common comparison model for any hypothesized first-order factor model with multiple factors is a 1-factor model. In a 1-factor model (such as the model in Figure 4.1), all observed indicators load on one factor. Testing a 1-factor model establishes whether a set of indicators actually represents one overarching construct rather than multiple constructs or multiple dimensions of a construct. If researchers are testing a set of indicators that have been shown in EFA to load on multiple factors, the 1-factor model is unlikely to offer serious competition to the multiple-factor model, but it is worth confirming that fact. Comparing the fit of models with one more and one fewer factors than the originally hypothesized model strengthens the case in favor of the hypothesized model. Kelly and Donovan (2001), for example, ruled out 1-factor and 3-factor models as alternatives to a 2-factor model. Such an approach, if successful, also strongly suggests that models with two more or two fewer factors than the hypothesized model are unlikely. If researchers hypothesize a second-order factor as their first model, based on previous research and theory, then a logical comparison model is the corresponding first-order model. Figure 4.10 presents the path diagram of a second-order factor model from a confirmatory factor analysis establishing that scales from an existing instrument could be combined as indicators of an overarching construct called Family Faculty Trust (Bower, Bowen, & Powers, in press).
97
98
Structural Equation Modeling
d
1
Teacher Cares
1
T. Wants S. Success
d 1 d
1 d 1 d 1 d 1 d
d
1
1 1
z1
Parent Perceptions of Teacher Caring
T. Would Contact
Home Environment
1 1
Home Activities
Parent Involvement at Home Family-Faculty Trust 1
Parent Report
z3 1
1 Parent Involvement at School
Teacher Report
1 1
Potential
z2
z4
Teacher Perception of Ability
Figure 4.10 Example of a Second-Order Factor Model (from Bower, Bowen, & Powers, in press). Reprinted with permission from NASW Press.
In this model, the Family Faculty Trust is a latent variable that is not directly measured by observed variables. Instead, it influences scores on the observed indicators indirectly through four first-order latent variables, Parent Perceptions of Teacher Caring, Parent Involvement at Home, Parent Involvement at School, and Teacher Perceptions of Student Ability. The model was derived from educational literature about dimensions of trusting relationships between home and schools. The first-order factor structure was based on previous empirical studies of those scales. Unlike in a model with only first-order factors, the four first-order latent variables have structural error terms (ζ1, ζ2, ζ3, ζ4). We will examine this type of error in the next chapter on general SEMs. Interestingly, ζ1, ζ2, ζ3, and ζ4 can be thought of as error that is shared by or common to the indicators of each first-order factor (Gerbing & Anderson, 1984). The second-order structure, therefore, allows for the partitioning of a different kind of error than that observed in first-order models, where only error that is specific to one indicator is represented. Readers are referred to Gerbing and Anderson (1984) for further discussion of this interesting issue. As before, each indicator has an error term, and the path to the indicator from the error term is fixed equal to 1.0. The loading of one observed indicator of each first-order factor is fixed at 1.0
Measurement Models
to set the metric of those latent variables. Similarly, the loading of one of the first order factors (School Performance) on the second-order factor is also fixed to 1.0. Second-order factor models should be considered when factors in first-order factor models have high interfactor correlations. Bollen (1989) suggests that second-order models are a hybrid of measurement and structural models (pp. 314–315), perhaps because the higher-order factor could just as easily be considered a latent variable that has structural rather than measurement relationships with the first-order factors. Developing a second-order factor model may be also motivated by a substantive interest in combining multiple subscales into an omnibus total scale, in which case the researcher aims to test whether the first-order model (i.e., the model that shows multiple factors or subscales) fits data better than a second-order model. Researchers may have competing theory-based hypotheses about the nature of latent constructs, specifically, which indicators are associated with different latent variables. In a measurement model of child disruptive behavior problems, for example, there may be a theoretical rationale (e.g., the DSM-IV TR definition) to hypothesize that observed indicators of impulsive behavior would load on a latent inattentive/hyperactivity variable. An alternative theory might suggest that the indicators would load on a latent aggressive behavior construct. Testing alternative models in which the indicators load on one factor and not the other, and in which the indicators load on both factors, could offer support for one perspective versus the other. Multiple-group analysis is a version of alternative model testing that requires special attention. It will be discussed and illustrated in Chapter 7. Before proceeding to that discussion, it should be noted that testing alternative models is not the same as modifying models based on results and retesting them. Testing alternative models is a best practice in
Box 4-4 Summary of Best Practices for CFA Model Specification 1. Test models supported by theory, previous research, and/or EFA results. 2. Specify and test two or more competing models. 3. Allow factors to be correlated unless there is a compelling theoretical reason not to.
99
100
Structural Equation Modeling
CFA that involves the prespecification of theoretically and/or empirically justified competing models. The Kelly and Donovan (2001) study we profiled earlier prespecified three models to test and compare. Modifying models, although common and acceptable, is a post hoc procedure accompanied by post hoc justification. It occurs when a hypothesized model does not meet preestablished criteria for model fit. The modifications made to the Professional Opinion Scale in the Abbott (2003) study we presented earlier, for example, were rationalized at each stage based on examination of the statistical output and the content of the items.
ESTIMATION OF CFA MODELS After specifying a model and before running a CFA, the researcher selects an estimation procedure and related analysis options. Estimators are statistical functions used to identify and evaluate values of the parameters associated with a specified model. The goal of estimation is the same for all estimators; however, the specific formula that is used varies. The goal is to find values for elements in the matrices, equations, and path diagrams presented in Chapter 2 that minimize the “fitting function” of the chosen estimator. Starting values for estimates may be chosen in different ways, but are related to values in the input matrix (as described in a simplified manner in Figure 2.1). “Minimizing the fitting function” means finding a solution with the lowest value possible. The lowest possible value is found when estimated parameter values produce an implied covariance matrix as similar as possible to the observed input covariance matrix. When this happens, using the model estimated parameters will have the highest likelihood of reproducing the observed sample. “Similar” means that the difference between each element in the input matrix and its counterpart in the implied matrix is small. SEM estimators in general perform better with CFA models when the sample size is large and factor loadings are high (Long, 1983). “Perform better” means they produce more accurate parameter estimates, standard errors, and χ2 statistics (and other model fit indices because many are based on the χ2 statistic). Undesirable analysis characteristics, such as nonnormal data, models with few degrees of freedom, and small sample sizes have implications for the accuracy of the three components of the results obtained (parameter estimates, standard errors, χ2 statistic). Therefore, as stressed in
Measurement Models
Chapter 3, it is critical to understand the nature of one’s data and to choose an appropriate estimator. Estimation proceeds through a series of iterations. A starting set of values based on observed relationships among the input variables is used to generate parameter estimates and an initial model-implied covariance matrix. The estimation algorithm then refines parameter estimates and generates a second model-implied matrix. The discrepancy function associated with the second implied matrix is compared with that associated with the first matrix. Adjustments are made to parameter estimates, within the constraints of the specified model, and a new implied matrix is produced and compared with the previously generated model-implied matrix. This process continues until parameter adjustments no longer result in smaller minimization values, that is, the difference between the discrepancy function associated with the current model-implied matrix and that associated with the previous model-implied matrix is below a convergence criterion. The final set of parameter estimates, model evaluation statistics, and other requested output are then presented to the user. The basic output obtained from a CFA analysis includes: (a) factor loadings (λ), (b) latent variable variances (φii), (c) covariances between pairs of latent variables (φij), and (d) error variances for the observed indicators of latent variables (θδ). Additional output derived from these estimates, such as standardized variances, covariances, and loadings, and squared multiple correlations for observed indicators, can also be requested. Basic output also includes the minimization function statistic (χ2) and its statistical significance, and a variety of other fit indices. Modification indices are either provided by default or requested by the user. Modification indices are statistics indicating how much model fit can be improved (how much the minimization statistic can be reduced) by allowing additional parameters to be estimated. They are discussed in detail in the section called Improving Model Fit in Chapter 6. With this general introduction, we now turn to a discussion of the four main steps of estimating CFA models: 1. Use the appropriate estimation procedure for the nature of your variables. 2. Use the appropriate estimation options for clustered data. 3. Develop the model with a calibration sample.
101
102
Structural Equation Modeling
4. Confirm the final model with a validation sample. (Also see p. 71 in Chapter 3.) Box 4.5 provides a summary of best practices for estimating CFA models. Estimation Step 1: Use the Appropriate Estimation Procedure for the Nature of Your Data There are numerous estimators available in most SEM programs. Maximum likelihood (ML), generalized least squares (GLS), unweighted least squares (ULS), weighted least squares (WLS), and two-stage least squares (2SLS) are examples. Some of these estimators are specially developed to address data problems such as nonnormal distributions, complex sampling, and other data characteristics. Characteristics of the sample and variables dictate which estimation procedure is most appropriate. We focus on the two estimators that are most likely to be of interest to social work researchers: maximum likelihood (ML) and weighted least squares (WLS). ML is the most commonly used estimator and the default in SEM programs; however, a WLS estimator may be the most appropriate estimator for many social work datasets. The most common and basic estimation algorithm in SEM is the maximum likelihood estimator (ML), which is appropriate for interval, ratio level, or continuous data with normal distributions and large sample sizes. ML is also the default estimation procedure in SEM packages. The function minimized by the ML estimator (FML) is the following: FML = log log ( )
⎡⎣
1
( )⎤⎦ log l
(
)
where Σ(θ) is the model implied matrix, and S is the sample observed covariance matrix. The appendix contains the mathematical derivation of the minimization function used in the ML estimator. For more in-depth statistical information on the ML and other estimators, see Bollen (1989). For a concrete example of the estimation procedure in a simple model, see Ferron and Hess (2007). As stated earlier, maximum likelihood is the default estimator in SEM programs. Bollen (1989) lists a number of characteristics and
Measurement Models
advantages of the estimator, including its efficiency, consistency, scale invariance, and scale “freeness” (pp. 108–109). Others have noted its advantages as well, such as its “computational simplicity, accuracy, and correctness of statistical results,” but note that its performance declines under conditions of nonnormality and small sample sizes (Chou & Bentler, 1995, p. 54). Researchers do not agree on how robust ML is to nonnormality, however. Mplus offers ML options that include the conventional estimation of parameters but formulas for the estimation of standard errors and the χ2 statistic that are robust to nonnormality (Muthén & Muthén, 1998–2007). Because characteristics of the data, the model, and sample size all combine to affect estimator performance, it is difficult to give specific guidelines for when ML can be used or when another estimator should be chosen, in spite of the many tests that have been conducted of the robustness of estimators under various conditions. Readers are referred to Bollen as a starting point for investigating this issue. Weighted Least Squares Estimation. In this section we discuss two distinct uses of WLS of which social work researchers should be aware: first, its use with the sample covariance matrix as an appropriate estimator when data are nonnormal but continuous, and second, its use with a special type of correlation matrix when data are ordinal. Given the prevalence of nonnormal and/or ordinal variables in social work research, it is critical that researchers be aware of these two estimation options. WLS is one recommended estimator for nonnormal data (Bollen, 1989), although as discussed in the section on ML, determining the nature and degree of nonnormality that warrants its use is not a straightforward process. According to Bollen, (p. 432), “the problem is knowing when the nonnormality is severe enough to require FWLS.” The asymptotically distribution-free (ADF) available in Amos is a WLS estimator. The minimization function associated with WLS (FWLS) is the following (Bollen, p. 425): ′
1 FW WLS = ⎡ ⎣s − σ (Θ )⎤⎦ W ⎡⎣s − σ (Θ )⎤⎦
In this function, s is a vector of the elements in the sample (input) covariance matrix, σ is a vector of the estimates in the implied matrix, and Θ is the vector of free parameters in the model. W is a weight matrix, often the covariance matrix of s by s (Bollen, 1989). The closer the input
103
104
Structural Equation Modeling
and implied matrices are, the closer their product is to the s by s weight matrix in the denominator. As with the ML minimization function, therefore, the more similar the input and implied matrices are, the closer the minimization function is to a convergence criterion. With nonnormal data, WLS can be used in conjunction with a weight matrix to analyze the covariance matrix of observed variables. FWLS “makes minimal assumptions about the distribution of the observed variables” (Bollen, p. 432), making it a viable option when social work researchers have nonnormal data. As discussed in Chapter 3, however, measurement level is as likely to be a problem for social work researchers as nonnormality. The recommended estimation option for ordinal data is also weighted least squares (WLS) (Bollen, 1989; Jöreskog, 2005; Muthén & Muthén, 1998–2007). However, when data are ordinal the recommended analysis matrix is a polychoric correlation matrix. The creation and analysis of a special correlation matrix with WLS estimation addresses both the measurement level and nonnormality problems frequently found in social work data. Jöreskog (2005) is unequivocal about the impropriety of treating ordinal variables as continuous: Ordinal variables are not continuous variables and should not be treated as if they are. It is common practice to treat scores 1, 2, 3, . . . assigned to categories as if they have metric properties, but this is wrong. Ordinal variables do not have origins or units of measurements. Means, variances, and covariances of ordinal variables have no meaning. The only information we have are counts of cases in each cell of a multiway contingency table (p.1).
It is the information in the “multiway contingency table” referred to by Jöreskog (2005) that is used to create the polychoric correlation matrix. In this special type of correlation matrix, the usual Pearson moment correlation is calculated only when both variables are continuous; a polyserial correlation is calculated when one is ordinal and the other continuous; a biserial correlation is calculated when one is continuous and one is dichotomous; a polychoric correlation is calculated when both are ordinal; and a tetrachoric correlation is calculated when both are dichotomous (Jöreskog & Sörbom, 1999). Mplus creates the analysis and weight
Measurement Models
matrices automatically when the user indicates in the input syntax that one or more variables are categorical (Muthén & Muthén, 1998–2007). It should be noted that the theory behind the creation of polychoric correlation matrices also makes assumptions, which can be violated. Specifically, it assumes that “a continuous, normal latent process determines each observed variable” (Flora & Curran, 2004, p. 466). In other words, it assumes that behind the ordinal categories used to measure a phenomenon on an assessment instrument lies a continuous, normally distributed phenomenon. The information in the multiway contingency table of all ordinal variables in an analysis is used to recreate the theoretical correlations that would be obtained if the underlying continuous data were available instead of the ordinal data (Flora & Curran). Detailed discussions of the statistical theory behind and derivation of polychoric correlation matrices can be found in Bollen (1989); Flora and Curran; and Jöreskog (2005). As mentioned in Chapter 3, the asymptotically distribution-free (ADF) estimator that appears to be the only option for categorical data in Amos requires a large sample size due to potential problems in the computation of the weight matrix. The developers of Mplus (and other software programs, such as LISREL) have identified alternative weight matrices for the WLS fitting function that reduce the likelihood of two problems associated with the conventional matrix (Flora & Curran, 2004; Jöreskog & Sörbom, 1999; Muthén & Muthén, 1998–2007): large sample size requirements and determinants of 0 for the weight matrix. For example, Mplus offers two robust WLS options, mean-adjusted weighted least squares (WLSM) and mean and variance-adjusted weighted least squares (WLSMV). In a simulation test of the performance of conventional and robust WLS, Flora and Curran (2004) concluded that the robust options were superior. Of the two robust options in Mplus, WLSMV is recommended (Muthén, du Toit, & Spisic, 1997). Estimation Step 2: Use the Appropriate Estimation Options for Clustered Data Another common characteristic of the data that social work researchers analyze is the clustering of observations. When data have been sampled at multiple levels, for example, at the school then the classroom level, or at the state then neighborhood level, they should be analyzed using
105
106
Structural Equation Modeling
a procedure that will take into account the nonindependence of observations. For SEM analyses, Mplus has two options for clustered data. One option allows standard errors at the lower level of analysis to be corrected based on clustering into higher units. The other option provides estimates of the effects of variables at the higher level of analysis (e.g., classroom, state) on dependent variables to be assessed. When the effects of second- or higher-level variables are of interest, it is important to sample enough higher-level units to have the power to detect expected effect sizes. For more in-depth treatment of multilevel research designs and analysis, readers are referred to sources focusing on that topic (e.g., Cook, 2005; Snijders & Bosker, 1999). Estimation Step 3: Develop the Model with a Calibration Sample When the researcher’s sample size is large enough, it is desirable to develop the CFA model with a random subsample of the available cases. Sample size recommendations from Chapter 3, pp. 53–54, should be followed. The calibration sample is used to test alternative models and identify the best fitting measurement model. Following procedures described in Chapter 6, estimation output may be used to make minor modifications. If a final CFA model with adequate fit and substantively valid parameter estimates is obtained, the researcher then proceeds to Estimation Step 4. Estimation Step 4: Confirm the Final Model with a Validation Sample In this step, the researcher retests the final model obtained with the calibration sample using one (or more) validation samples. As discussed in Chapter 3, researchers often do not have access to data from a totally new or separate sample. In this case, dividing the available sample into random subsets is still valuable for identifying unstable findings or untenable modifications from a calibration sample. In the validation analyses, no further modifications or refinements to the model are made. The purpose of the validation step is solely to determine if the results of the model development process can be replicated with an additional sample. We know of no clear-cut guidelines on how similar the results of the calibration and validation samples must be for researchers to claim that the model has been adequately replicated. When modifications to
Measurement Models
a model have been made based on CFA output, such as the addition of correlated errors, it is not uncommon for the refinements not to replicate. If, however, adequate fit is replicated and all other parameters remain stable, researchers might claim that the model has been validated. In an example of the use of a validation sample, Bowen, Bowen, and Ware (2002) reestimated a multiple-group model of neighborhood social disorganization and educational outcomes of adolescents. The original model had achieved adequate fit after the addition of three correlated errors. Model fit improved in the validation test, but two of the correlated errors were not replicated (i.e., only one of the three correlated errors from the original model was needed in the validation sample). In addition, a new substantive path became significant in the validation sample. The model differences obtained in this study illustrate the sample dependence of SEM analyses, the importance of replication, as well as the
Box 4-5 Summary of Best Practices for Estimating CFA Models 1. Use the appropriate estimation procedure for the nature of your data. If a researcher’s data are interval level or continuous, and adequate univariate and multivariate normal distributions exist or can be obtained through transformations, the data can be analyzed in Amos with maximum likelihood. If data are nonnormally distributed, the ADF estimator in Amos may be used (if large sample is available) or a robust ML estimators in Mplus. In the presence of nonnormality and ordinal and/or categorical variables (including dichotomous variable), we recommend the use of Mplus WLSMV estimation along with the specification of variables as categorical (as appropriate). 2. Use the appropriate estimation options for clustered data. Grouping (or class) variables can be specified in Mplus to take into account correlated errors and to accommodate estimation of the ICC and effects of second-level predictors on Level 1 outcomes. See companion website. The manual for Amos 16.0 (Arbuckle, 1995–2007) does not include information on multilevel modeling. See Byrne (2010) and Li and Acock (1999) for information on using Amos for latent growth curve modeling, a type of multilevel model. 3. Develop the model with a calibration sample. 4. Confirm the final model with a validation sample.
107
108
Structural Equation Modeling
potential interpretive complications of conducting a validation analysis. The overall model in this example was validated by the reestimation with a second sample. Because the hypothesized structural model (but not the multiple-group hypothesis) was actually supported more fully in the validation sample, the authors report the results of that sample in detail. Still, the authors should claim the greatest support only for the paths that were significant in both models. Having adequate but slightly different findings across two tests requires careful explication. Chapter 6 discusses the interpretation of output in more detail. Both Amos and Mplus have FIML options for analyzing datasets containing missing values. When analyzing raw data files with Amos, users must select the “analyze means and intercepts” option in the estimation dialogue box to invoke FIML. Modification indices are not available in Amos output when datasets with missing values are analyzed. In Amos, researchers can use an ADF estimator with nonnormally distributed variables if their samples sizes are large enough. They can also request bootstrapped estimates of standard errors when nonnormal data are used to see if values have been underestimated severely enough to affect conclusions of significance tests. In Mplus, FIML can be used with estimators other than maximum likelihood. In Mplus, modification indices are produced when FIML is used. Mplus offers a range of options for analyzing both categorical/ordinal variables and nonnormally distributed data. WLSMV estimation is recommended by the program’s developers and can be combined with FIML and multilevel modeling. Details on using these options in both programs are presented in the online materials associated with the book.
5
General Structural Equation Models
This chapter describes how social workers use general structural equation models (general SEMs) and explains how to specify and test them. Readers should read Chapter 4 before conducting analyses of general structural models. Many of the specification and estimation steps and decisions are the same for CFA and general SEMs; this chapter does not repeat material that applies equally to CFA and general structural models. General SEMs include the measurement model of latent variables and their indicators, as well as the structural model of directional relationships among latent variables. A measurement model becomes a general SEM when some or all of the correlational relationships among latent variables in the measurement model are respecified as directional relationships based on the researcher’s substantive knowledge of the topic (i.e., theory and past research). Structural modeling allows the testing of complex relationships among latent variables. General SEM can accommodate a combination of latent and observed variables, which can serve as independent, control, or dependent variables. Mediation models, for example, in which the effects of one variable on an outcome are exerted via the influence of another intervening variable, are easily modeled in SEM. In this context, the total effect of an exogenous variable on an endogenous variable is 109
110
Structural Equation Modeling
decomposed into two parts: direct and indirect, where the indirect effect is exerted via another endogenous variable, called the mediator. Models with moderation, or interaction effects, can be estimated in SEM with multiple-group modeling; this type of model is discussed in more detail in Chapter 7. In sum, general structural modeling is an appropriate and superior analysis choice for much social work research because of its ability to accommodate the theoretical and measurement complexities present in many social work research questions.
CONTEXTS FOR USING GENERAL STRUCTURAL EQUATION MODELING General structural models are used by social workers to test relationships among constructs measured with multiple items. Social work researchers in numerous substantive areas are using general SEM to test theoretically derived relationships among concepts that are best measured with latent variables. Among the articles reviewed by Guo and Lee (2007), for example, the general SEM approach was used in studies related to aging, child welfare, health/mental health, school social work, and substance abuse (refer to Table 1.1). Social workers test general SEMs in order to advance basic understanding of social and developmental phenomena, and to inform prevention, intervention, and policy. Often basic research and research that informs practice and policy are intertwined in social work research using SEM. Crouch, Milner, and Thomsen (Crouch, Milner, & Thomsen, 2001), for example, examined hypothesized relationships among childhood physical abuse, early support, social support in adulthood, and the risk of physically abusing a child in adulthood. The results increased understanding of mechanisms of intergenerational child abuse and had tentative implications for practice. General SEM was also used to further basic understanding of childhood and adult sexual abuse in another study (Conway, Mendelson, Giannopoulos, Csank, & Holm, 2004) that aimed to identify pathways by which sexual abuse leads to depression. This study also provided tentative implications for practice because understanding pathways helps social workers identify time points and risk factors to target in prevention and intervention efforts. Another study applied theoretical concepts about the interrelationship of workplace policies, work–life balance, and the well-being of working parents (Jang, 2009).
General Structural Equation Models
Findings about indirect effects of workplace culture and supervisor support on employee well-being had implications for social workers’ direct practice with employees as well as for their advocacy of supportive workplace policies and programs.
SPECIFICATION OF GENERAL STRUCTURAL EQUATION MODELS Before delving into the details of structural models, we reiterate two themes of this book: (a) models tested with SEM must have strong theoretical and/or empirical foundations, and (b) even when they have strong rationales, desirable statistical findings do not on their own establish causality. The value of theory in informing intervention is underscored by Benda and Corwyn (2000), who note that theory-based interventions are five times more effective than approaches lacking theoretical foundations. In their study, these researchers also demonstrate the importance of theory-based structural models. They combine elements of control theory and social learning theory to improve upon prior atheoretical studies of causes of drug use among adolescents. Although the prominent placement of theory in SEM models enhances the quality of studies, implications for practice are always constrained by the degree to which a study’s design permits claims of causality. A cross-sectional SEM study, for example, may suggest that a certain risk factor is associated with a poor outcome, but it cannot determine that targeting the risk factor in an intervention will improve outcomes. Even a longitudinal SEM, in which the risk factor predates a poor outcome, cannot determine that the risk factor caused the outcome or that targeting the risk factor will change the outcome. The danger of using SEM without strong theoretical guidance lies in the fact that the underlying test of model quality—the test of the null hypothesis that the model-implied covariance matrix is statistically equivalent to the input covariance matrix—can yield identical results with contradictory models. A structural model with an arrow hypothesizing an effect of latent variable A on latent variable B, for example, may have the same fit as a model with that directional influence reversed. Similarly, Bollen (2000) presented two measurement models, one with one factor and the other with two, that had identical fit. The vast modeling flexibility of SEM makes the best practice of testing theoretically derived models even more imperative. Models must be derived
111
112
Structural Equation Modeling
from well-established theory and prior research because similar models (e.g., two models that differ only by the hypothesized direction of one path) may be statistically identical. Overview of General SEM Specification The basic concepts of specifying parameters as free, constrained, or fixed that apply to CFA models pertain to general models as well. Three additional specification steps apply to general SEMs: adding observed structural variables to the model, adding directional influences among structural latent and observed variables, and adding structural error terms for endogenous latent variables. Box 5.1 lists all the specification steps for a general SEM, starting with the CFA specification steps discussed in Chapter 4. In this chapter, we focus on the last three steps. Chapter 4 presented examples of how CFA models could be specified with path diagrams, equations, and matrices. Three matrices were needed to specify the parameters of a CFA model: the ΛX, (lambda-x) the Φ (phi), and the Θδ (theta-delta) matrices. As described in Chapter 2, the specification of general structural models requires up to five additional matrices. We now illustrate the new matrices and specification steps associated with general SEMs.
Box 5-1 Full List of Specification Steps for a General SEM Measurement Model Specification Steps 1. Specify how many latent variables there are and which observed variables load on each latent variable. 2. Set the scale of each latent variable. 3. Specify that each observed indicator has measurement error and indicate if any of the error terms are correlated. 4. Specify which exogenous latent factors are correlated (in CFA models all latent factors are exogenous; in general structural models some are endogenous). Structural Model Specification Steps 5. Add observed structural variables to the model (if applicable). 6. Specify the directional and nondirectional relationships among latent and observed structural variables. 7. Specify structural error terms for endogenous variables.
General Structural Equation Models
General SEM Specification Example We adapt a model presented in a study of alcohol abuse (Whitbeck, Chen, Hoyt, & Adams, 2004) to illustrate Specification Steps 5, 6, and 7 for a general SEM. Adaptations are made to maintain consistency in the notation and graphics used throughout the book. The authors studied “the effects of discrimination, historical loss and enculturation on meeting diagnostic criteria for 12-month alcohol abuse” (p. 409) among a sample of male and female American Indians. We focus our example on one of the models they tested with women as subjects. In the list that follows, we describe the four structural variables in their model and indicate the names we have assigned them for this discussion. The information comes from pages 412–413 of Whitbeck et al. Variable 1: Perceived discrimination (DISCRIM). DISCRIM was treated analytically as an observed variable in the original model, but it was based on 11 items assessing the frequency with which individuals had experienced different types of discriminatory treatment (such as being treated unfairly, ignored, threatened physically). For our purposes, we treat DISCRIM as a latent variable (and this is how the authors pictured it in their path diagram). Variable 2: Age (AGE). AGE appears to have been measured continuously and was used as an observed control variable in the analysis, according to the authors. Variable 3: Historical Loss (HLOSS). HLOSS is a latent variable that is assessed with two observed indicators. The observed indicators were themselves observed composites, each of which was based on a 12-item scale. The first scale, Historical Loss Scale, assessed the frequency with which respondents had experienced 12 different types of loss (such as loss of land, people, language). The second scale, Historical Loss Associated Symptom Scale, assessed the frequency with which respondents had experienced 12 different emotions or feelings related to historical loss. Variable 4: Alcohol Abuse (ALABUSE). In the original article, ALABUSE was a dichotomous observed variable indicating a diagnostic category based on the University of Michigan Composite International Diagnostic Interview. For our purposes, we treat ALABUSE as a latent variable with multiple indicators from the Interview (and this is how the authors pictured it in their path diagram).
113
114
Structural Equation Modeling
Whitbeck et al. (2004) used the best practice of testing a model with a strong theoretical and empirical foundation. The modeled variables are derived from studies about alcohol abuse among American Indians. The authors provide support for the hypothesized relationships between discrimination and alcohol abuse, and discrimination and historical loss. The authors also support the use of age as a control variable because of its association with alcohol abuse. They suggest that their study is the first to relate historical loss to alcohol abuse, meaning they are advancing knowledge about a potential mediator (or explanatory mechanism) of the effects of discrimination on alcohol abuse in their sample. Although this component of the model may sound “exploratory,” the nature of the latent historical loss variable suggests that its inclusion as a mediator is consistent with the existing knowledge base. This example of a general SEM illustrates the ability of SEM to include direct and indirect (mediated) effects, and observed and latent variables in the same model. In the study reported by Whitbeck et al., models were estimated with unweighted least squares estimation using Mplus. Figures 5.1 through 5.3 specify the structural model using an Amos path diagram, SEM equations and Mplus model syntax, and matrix specification, respectively. Our path diagram is not identical to the one provided in the original article because we include the exogenous variable correlation omitted in the article’s diagram but included in its analysis, and we focus on the structural components only (i.e., indicators of latent variables are not illustrated). The text that follows describes the specification steps and explains the path diagram, equations, and matrices associated with the historical loss model. More detail on Amos and Mplus specification is provided in the online materials.
DISCRIM x1 f12
g11
AL ABUSE h1
g21 g12
AGE X1 = x2
1
z1
b12 g22 HLOSS h2
1
z2
Figure 5.1 General Structural Equation Model of Historical Loss for American Indian Women (Whitbeck et al., 2004).
General Structural Equation Models
Specification Step 5. Add observed structural variables to the model. In our adaptation of the historical loss model (Whitbeck et al., 2004), an exogenous, observed, control variable (AGE) is part of the structural model. In Figure 5.1, we know AGE is an observed variable because it is represented by a rectangle, not a circle. Had we illustrated the measurement model first, AGE would not have been included in the path diagram, equations, or matrix specifications. Although AGE is an “x” (observed) variable in terms of our notation so far, we will treat it as a second ξ variable in our equations and matrices for purposes of this discussion. This specification is consistent with one of the modeling options presented by Bollen (1989) for structural variables in which measurement error is being ignored. See the Parenting variable in Figure 2.5 for another modeling option. Amos modeling allows either of these modeling options. Obtained results are the same regardless of which of these two specification approaches is used. Specification Step 6. Specify the directional and nondirectional relationships among latent and observed variables. The historical loss model for women that is presented by Whitbeck et al. (2004) hypothesizes that perceived discrimination affects the diagnosis of alcohol abuse among women both directly and indirectly through the historical loss construct. The mediated or indirect effect of discrimination on alcohol abuse through its effect on HLOSS is of primary interest in the test of the theoretical model. The mediated effect is represented in Figure 5.1 by the γ21 and β12 paths. As before, the subscripts indicate the number of the variable to which a path is pointing and the number of the variable from which it originates, respectively. The use of γ in a path name indicates that the path travels from an exogenous(ξ) variable to an endogenous (η) variable. Therefore, the γ21 path label refers to the path going to HLOSS (η2) from DISCRIM (ξ1). Based on previous research, DISCRIM is also hypothesized to have a direct effect (γ11) on the abuse of alcohol by American Indian women in the sample. Because AGE is called a control variable, the γ12 and γ22 paths indicate that the researchers are interested in the effects of predictors on ALABUSE and HLOSS after the variance in those two variables associated with age has been removed. As modeled, however, age has an indirect effect on ALABUSE through historical loss, as well as its direct effect. Calling AGE a control variable, covariate, or predictor does not change the statistical estimates of its effects. Because DISCRIM and AGE are exogenous variables,
115
116
Structural Equation Modeling
all paths leading from them to other variables are γ paths. Because HLOSS and ALABUSE are endogenous variables, the path between them is a β path. Women’s age and perceived discrimination are expected to covary, but no directional relationship is specified. The covariance is to be freely estimated and is labeled φ12 in Figure 5.1. Amos “expects” all exogenous variables (latent and observed) to be correlated but does not make them correlated by default. If the user omits any exogenous correlation, Amos confirms that the omission was deliberate before running a model. Specification Step 7. Specify structural error terms for endogenous variables. Because they are endogenous structural variables, ALABUSE and HLOSS have a new type of variable associated with them. Dependent variables in social work research are rarely if ever perfectly predicted by independent variables. The ζ terms associated with the two endogenous variables are latent structural error terms. In structural equation models, the variances of dependent variables are not estimated; instead, the variances of their error terms are estimated. These terms are analogous to 1.0 minus the R2 value obtained in a traditional regression analysis. What is different in SEM, however, is that more than one dependent variable can be predicted at one time, and variables (such as HLOSS) can serve as both independent and dependent variables in the same model. In Amos and Mplus, the paths from structural errors to dependent variables are automatically fixed equal to 1.0 when default specification steps are used. As with measurement errors, it is not necessary to estimate both the structural error variances and their paths. The variances are the parameters of interest, so they are estimated and the paths are fixed at 1.0. Figure 5.2 presents the equation specification of the path diagram in Figure 5.1. As with the path diagram, we present only the structural components; factor equations for the latent variable indicators would resemble those presented for models in Chapter 4. There is a structural equation for each endogenous variable in the model. The first equation indicates that ALABUSE (η1) is predicted by HLOSS (η2), DISCRIM (ξ1), AGE (ξ2), and a structural error term (ζ1). The second structural equation indicates that HLOSS (η2) is predicted by DISCRIM (ξ1) AGE (ξ2), and its own structural error term (ζ2). Amos allows structural errors to be correlated. Specifying a correlation between errors of prediction is appropriate when endogenous latent variables are hypothesized to have associations that are not captured by their relationships with independent variables predicting them.
General Structural Equation Models a. Generic SEM Equation Specification of Structural Components of the Historical Loss Model for American Indian Women h1 = b12h2 + g11x1 + g12x2 + z1 h2 = g21x1 + g22x2 + z2
Notes: x1 is DISCRIM; x2 is AGE h1 is ALABUSE; h2 is HLOSS AGE is an observed exogenous variable that is correlated with DISCRIM, a latent variable ALABUSE is predicted by DISCRIM, AGE, HLOSS, and a structural error term (z1) HLOSS is predicted by DISCRIM, AGE, and a structural error term (z2)
b. Mplus Equation Specification of Structural Components of the Historical Loss Model for American Indian Women (Whitbeck et al., 2004) MODEL: (structural components only; these statements would be preceded by measurement model “by” statements as illustrated in Chapter 4) ALABUSE on DISCRIM AGE HLOSS; HLOSS on DISCRIM AGE; AGE with DISCRIM;
Figure 5.2 Equation Specification of General Structural Model for Historical Loss Among American Indian Women (Whitbeck et al., 2004).
Part b of Figure 5.2 indicates how the model would be specified in Mplus syntax. The “on” statements indicate the directional relationships of the model. Again, we present structural components only; these statements would be preceded by measurement model “by” statements as illustrated in Chapter 4. Structural errors are assumed and estimated without user specification. They can be constrained or fixed, however, if the user has justification to do so. Mplus also assumes that exogenous variables are correlated unless otherwise specified. We have included the syntax for specifying an exogenous variable covariance for illustrative purposes. Covariances are specified using the term “with.” Readers have now seen “by” statements used to indicate items that load on a latent factor, “on” statements to indicate regression relationships, and “with” statements to specify covariances between variables. Figure 5.3 presents the matrix specification of the model. Instead of the three matrices necessary to specify a confirmatory factor model, there are now eight matrices. Two new measurement-related matrices contain the factor loadings (Λy) and error terms (Θε) for the endogenous
117
118
Structural Equation Modeling 3 Matrices used in CFA models (models with only measurement components)
F contains Qd contains Variances and Variances of the measurement covariances of the errors associated with indicators exogenous variables of DISCRIM in the model—
Lx contains Factor loadings for indicators of DISCRIM, the only exogenous latent variable
DISCRIM and AGE
2 Matrices associated with the measurement component of a general SEM Ly contains Factor loadings for ALABUSE and HLOSS, two endogenous latent variables
Qe contains Variances of the measurement errors associated with indicators of ALABUSE and HLOSS
3 Matrices associated with the structural component of the general SEM G g11 g12 g21 g22 The G (gamma) matrix contains the paths from DISCRIM and AGE (x’s) to ALABUSE and HLOSS (h’s). The model specifies that all four paths are to be freely estimated.
B
0 0 b 21 0 The B (beta) matrix contains the directional paths between ALABUSE and HLOSS, the two endogenous latent variables (h’s) in the model. The path to ALABUSE is freely estimated. The reverse path is fixed at 0. The diagonal contains 0 because variables cannot predict themselves.
Y Y11 0 Y22 The Y (psi) matrix contains the variances and covariances of the structural errors (z’s). There is one structural error for each endogenous variable. No covariance is specified, so the lower left element is a 0. The upper left element is redundant so it is left blank.
Figure 5.3 Matrix Specification of General Structural Model with Direct Effects of Risk and Gender on Behavior.
variables in the structural model. These two matrices are structured like their counterparts for exogenous variables (see Chapters 2 and 4 to review). Three new matrices related solely to structural components are detailed in Figure 5.3. The Γ (gamma) matrix has a row for each endogenous variable in the model and a column for each exogenous variable. The first element in the matrix refers to the path from DISCRIM (ξ1) to ALABUSE (η1)—therefore its subscript is “11.” The element in the first column of the second row refers to the path from DISCRIM (ξ1)
General Structural Equation Models
to HLOSS (η2), hence the subscript “21.” Similarly, the elements in the second column refer to the path from AGE (ξ2) to ALABUSE (η1), hence the subscript “12,” and the path from AGE (ξ2) to HLOSS (η2), hence the subscript “22.” The B (beta) matrix contains the potential paths between pairs of endogenous variables. Given that there are only two endogenous variables, only two paths are possible: (1) a directional effect from HLOSS to ALABUSE and (2) a directional effect from ALABUSE to HLOSS. As illustrated in Figure 5.1, the hypothesized model specifies only one path: a directional effect from HLOSS to ALABUSE. Because there is one column and one row for each η variable in the model, each diagonal element of the B matrix refers to the prediction of a variable by itself. Because this is not meaningful, diagonal elements of the B matrix always equal 0. The final structural matrix is the Ψ (psi) matrix containing the variances and covariances of the structural error terms for ALABUSE and HLOSS. No covariance between the errors is specified in the model, so the off diagonals are fixed at 0. It is possible, when theoretically justified, for the off diagonals (correlations among structural errors) to be estimated, constrained, or fixed. Specification of Recursive and Nonrecursive Models In SEM parlance, a “recursive” structural model is one that has no paths that create a “feedback loop” or “reciprocal causation” (Bollen, 1989, p. 83) between two latent variables. Like Figure 5.1, the majority of models in the social work literature are recursive. In contrast, “nonrecursive” models have one or more feedback loops in the structural part of the model. We recognize that at first, this terminology may be confusing, especially for readers who are familiar with fields (e.g., mathematics, computer science) in which “recursive” has the opposite meaning than that used in SEM applications. Structural feedback loops include mutually reciprocal direct paths between two endogenous variables, paths by which the effects of an endogenous variable make their way back to the variable through one or more other endogenous variables, and feedback paths through correlated structural errors. In Figure 5.4, the structural components of a nonrecursive model presented in Nugent and Glisson (1999) are presented. We created the
119
120
Structural Equation Modeling
z1 1
d1
1
x1 Age x1
z3
Internalizing Problems h1
1 System Responsive h3
g11 g12 b41
b21
b12
b43 b34
g21 g42
Gender x2 g22 x2 1 d2
Externalizing Problems h2 1
b42
System Reactive h4 1 z4
z2 Adapted from model presented in Nugent, W. R., & Glisson, C. (1999). Reactivity and responsiveness in children's service systems. Journal of Social Service Research, 25, 41–60.
Figure 5.4 Path Diagram of a General SEM with Nonrecursive Structural Components.
graphic in Amos based on Nugent and Glisson’s hypotheses and a simpler graphic presented in the original article (p. 46). Nonrecursive relationships are hypothesized for two pairs of variables: internalizing and externalizing problems, and system responsiveness and reactivity. Note that unlike hypothesized correlational relationships, which are represented with two-headed arrows, reciprocal directional relationships are represented with two directional arrows. The authors justify the modeled relationship between internalizing and externalizing behavior problems by referring to evidence that the two types of problems “frequently coexist in children and adolescents” (p. 46). The term “coexist” suggests a covariance more than a reciprocal influence.
General Structural Equation Models
Terms more consistent with reciprocal effects include “having mutual effects” or “predicting each other.” The authors indicate that they tested alternative modeling strategies and found the same results. Presumably one of their alternative models represented the relationship between the two types of behavior as a two-headed arrow between the error terms for the two variables (ζ1 and ζ2). Based on the justification offered, the correlational model is more accurate, even if results appear similar in both approaches. The authors provide a more sound rationale for reciprocal effects between system reactivity and responsiveness. Specifically, they hypothesize that the more a child welfare system reacts to children’s mental health problems with placement disruptions and service refusals, the less it is responsive to children’s needs, the greater children’s needs will be, and the less “acceptable” children will be to the system in the future (therefore leading to more reactivity). The contrasting positive cycle of effects includes appropriate system responses to children’s mental health problems, improvements in children’s well-being, decreased reactivity (negative responses) to children, and corresponding increases in responsiveness. Kline (2005) points out that feedback effects can be direct or can be mediated through one or more other variables. In Figure 5.4, however, there are no mediated feedback loops, just the direct reciprocal effects. For example, directional arrows indicated that Internalizing Problems is hypothesized to predict Externalizing Problems, and Externalizing Problems predicts System Reactivity, which in turn predicts System Responsiveness. Yet, there is no continuation of this path back to Internalizing Problems (for example, through an arrow pointing to Internalizing Problems from either System Reactivity or System Responsiveness). Although SEM’s accommodation of reciprocal and transactional effects is superior to most other approaches, it should be noted that cross-sectional SEM is still limited in its ability to capture the complex dynamics of true person–environment transactions. (Growth curve modeling, in which longitudinal processes of change are modeled, captures more complexity over time than cross-sectional models, but this type of SEM modeling is beyond the scope of this book. Interested readers are referred to Bollen & Curran, 2006, for further reading.) One other specification detail to note in Figure 5.4. The two exogenous variables, Age and Gender, are not expected to be correlated. Although exogenous variables are typically expected to be correlate in SEM, the
121
122
Structural Equation Modeling
proper specification in this case is no correlation because gender is not differentially associated with age. The two paths that make the model in Figure 5.4 nonrecursive affect the B matrix, which contains paths between pairs of endogenous variables. Table 5.1 compares how the B matrix for Figure 5.4 would be specified with and without the reciprocal paths shown in Figure 5.4. Each matrix contains four columns and four rows, one for each endogenous latent variable (η). The diagonal of the B matrix always contains 0s, because variables cannot be regressed on themselves. In the matrix on the left, which pictures the B matrix for the model without reciprocal effects, paths from Internalizing Problems to Externalizing Problems (β21), Externalizing to System Reactivity (β42), and System Responsiveness to System Reactivity (β43) are labeled, indicating they are to be freely estimated. All other paths between endogenous variables are fixed at 0. In the column on the right, two additional off-diagonal paths are specified, the path from Externalizing to Internalizing Problems (β12) and from System Reactivity to System Responsiveness (β34). Specification of Alternative Structural Models As with CFA, the findings of general SEM analyses are strengthened when they involve the testing of competing models. Because multiple models may have adequate fit, demonstrating that one theoretical model not only fits the data well but also has superior fit to an alternative increases
Table 5.1 Specification of a B Matrix with and without Reciprocal Effects Recursive Model (not pictured)
Nonrecursive Model (as illustrated in Figure 5.4)
B
B
⎛ ⎜β ⎜ 21 ⎜0 ⎜⎝ 0
0 0 β42
0 0 β43
⎞ 0 ⎟ ⎟ 0 ⎟ 0 ⎟⎠
⎛0 ⎜β ⎜ 21 ⎜0 ⎜⎝ 0
β12 0 0 β42
0 0 0 β43
0 ⎞ 0 ⎟ ⎟ β34 ⎟ 0 ⎟⎠
General Structural Equation Models
confidence in the findings. Alternative models in general SEM may include the same predictors of an outcome but different pathways among them, or a different combination of predictors. Nested as well as nonnested models can be compared (but the comparison criteria are different). As in all types of SEM, the specification of only those models for which substantial theoretical and/or previous empirical support exists is recommended. Evaluating alternative models will be discussed in Chapter 6. Social work researchers are often interested in identifying the mechanisms by which environmental or individual characteristics influence outcomes. Understanding the processes by which outcomes are produced is critical for the development of effective interventions. Therefore, competing models in social work SEM research may frequently involve the modeling of alternative mediators or mediational relationships leading to outcomes. For example, the authors of a study predicting caregiver burden and depression among caregivers of individuals with Alzheimer’s disease (Clyburn, Stones, Hadjistavropoulos, & Tuokko, 2000) tested four different models. Three of the models contained four predictor variables—disturbing behavior, activity limitation, informal help, and institutionalization. In two of these models, the effects of the four predictors on one of the outcomes were mediated by the other outcome. The third model contained only direct effects. A fourth model included an additional variable, distress, which was modeled as a latent variable measured by depression and burden. The researchers provided evidence justifying the inclusion of each of the variables in their models as well as previous research supporting each hypothesized model. They chose structural equation modeling as an analytic approach because it “would permit a more comprehensive analysis” (p. S4) of the relationships that had been studied in isolation in previous studies. When alternative models are tested, criteria for determining which model is the best must be applied. In Chapter 6, in addition to explaining how to evaluate models, we present specific guidelines for comparing the fit of multiple models. Box 5.2 summarizes best practices for specifying general structural equation models.
ESTIMATION OF GENERAL STRUCTURAL EQUATION MODELS Box 5.3 lists all the estimation steps for a general SEM, starting with the CFA estimation steps discussed in Chapter 4. The estimation issues and
123
124
Structural Equation Modeling Box 5-2 Summary of Best Practices for Specifying General Structural Equation Models 1. Test models supported by theory and/or previous research. Relationships among constructs should accurately reflect previous research or theory. Relationships may be nonexistent, recursive, nonrecursive, or correlational, each of which is specified differently. 2. Specify two or more competing models. Competing theories, inconsistent empirical findings, or null hypothesis models can be used to justify model comparisons.
recommendations presented throughout Chapter 4 on CFA also pertain to the estimation of general structural models. As in CFA, researchers should choose the estimator that is most appropriate for the nature of their data, and when possible, should develop the structural model with one random sample and validate it on a second. The use of two random samples is especially important if the model has been modified in the process of obtaining adequate fit. Readers are referred back to Chapter 4 for detail on these two steps. As shown in Box 5.3, two additional estimation steps are recommended for general SEM with latent variables, beyond those used for estimating CFA models. First, before estimating the general SEM, the fit of the measurement model should be ascertained. Second, after specifying but before estimating the structural model, the identification of the structural part of the model should be confirmed. These steps are explained in detail in the sections that follow (Estimation Steps 3 and 4).
Box 5-3 Steps for Estimating General Structural Equation Models 1. Use the estimator that is most appropriate for the data. 2. Use the appropriate estimation options for clustered data. 3. Establish the fit of the measurement model before testing the structural model. 4. Determine that the structural model is identified before estimating the model. 5. Develop the model with a calibration sample. 6. Confirm the final model with a validation sample.
General Structural Equation Models
Estimation Step 1: Use the Estimator That Is Most Appropriate for the Data See Chapter 4, Estimation Step1, p. 102.
Estimation Step 2: Use the Appropriate Estimation Options for Clustered Data See Chapter 4, Estimation Step 2, p. 105. Estimation Step 3: Establish the Fit of the Measurement Model Before Testing the Structural Model In general SEM with latent variables, of primary interest is the test of relationships among latent variables. There are two important reasons for first paying attention to the measurement model, however. First, the test of theory will be compromised if the scores from measures used to test the theoretical constructs have low reliability or low validity. Second, without confirmation that the measurement model has adequate fit, it is possible in some situations to conclude erroneously that the theoretical model does or does not have good fit. Our recommendation is to determine that the measurement model is adequate before proceeding to the structural test. Because there is disagreement about the best way to test a general structural model with latent variables (Hayduk & Glaser, 2000), however, we describe several of the suggested strategies. Some researchers advocate a single-step approach; others recommend a multiple-step procedure. Readers are encouraged to study the arguments for and against the different approaches in the sources cited. One commonly used method for testing a general SEM is to simply test the full model in one step (Hayduk & Glaser, 2000). A second approach involves establishing the quality of the measurement model first and then testing the full (general structural equation) model (Anderson & Gerbing, 1988). A third strategy proposes following a four-step sequence of tests, starting with what is basically a common factor model (EFA), followed by a CFA, a full model test, and a modified model test (proposed by Mulaik, described in Hayduk & Glaser, 2000). Bollen (2000), in an article suggesting that no approach can guarantee that the correct number of factors is discovered, proposed the use of a “jigsaw piecewise” technique. This technique involves testing pieces of a measurement model for adequate fit, then combining pieces until a complete measurement model
125
126
Structural Equation Modeling
with adequate fit is obtained. The purpose of the jigsaw piecewise technique is to locate specific sources of poor measurement fit, even though it cannot ascertain the correct number of factors. It should be noted that the number of “steps” involved in multiple-step procedures depends on how one defines steps. Steps in the testing sequences may actually involve multiple analyses, each with multiple steps. Social work researchers who are developing new measures or using them for the first time would do well to spend time on the exploratory first step proposed by Mulaik (as described in Hayduk & Glaser, 2000), whether they use SEM software or less specialized statistical software. EFA results are useful in the early assessment of new measures and should be considered as a process distinct from SEM hypothesis testing. Finding a desirable common factor (EFA) model, it should be noted, does not guarantee that the CFA step (second step) of Mulaik’s four-step approach will be successful. Many factor structures cannot be pretested in the EFA framework. For measurement models with poor fit, Bollen’s (2000) jigsaw piecewise technique can be used to isolate the factor and items that are problematic. Depending on the number of factors and the researcher’s decisions about how many factors to combine at a time, the strategy could involve a sequence of dozens of separate analyses. In models with many factors, many highly correlated indicators, and moderate to high interfactor correlations, we have found the jigsaw piecewise technique to be useful for identifying problematic items and refining factors before combining them all in one model (Bowen, 2011; Wegmann, Thompson, & Bowen, 2011). The value of using a multiple-step approach seems clear: “Testing the whole model in a single step makes locating the source of the poor fit extremely difficult” (Bollen, 2000, p. 78). Specifically, it makes sense to establish through CFA that one’s measures of constructs are valid and consistent with the data from a current sample before using those measures to test theory. In addition, measurement models often have more parameters and more degrees of freedom than the structural component of a general SEM. Poor fit or good fit of the measurement model may obscure structural model fit. In fact, if the structural model is created by simply replacing each interfactor correlation in the measurement model with a directional path, the fit of the full model will be the same as the fit obtained in the measurement model analysis. If fit is poor for a general SEM that is tested in a one-step approach, it is quite possible that the
General Structural Equation Models
poor fit is due to measurement inadequacy. In any case, it would be unreasonable to assume that poor fit in this situation unequivocally indicated rejection of the structural (i.e., theoretical) hypothesis. If it is first established that the measurement model is adequate, poor fit statistics obtained in a test of the full general model are more reasonably assumed to be related to the structural hypothesis. The four-step approach proposed by Mulaik (described in Hayduk & Glaser, 2000) explicitly accommodates the model revision and retest stage that occurs in virtually all SEM model-testing sequences. Mulaik’s Step 4 is used to test revisions to structural models that are suggested by the confirmatory tests of Step 3. In spite of the putative confirmatory nature of SEM modeling and whether one is following a one-step, two-step, or four-step procedure, a variety of revisions are typically attempted before a suitable model with acceptable fit is obtained. In summary, in our Estimation Step 3, we recommend a multiplestep procedure for testing general structural models, in which fit of the measurement model is established before proceeding to structural tests. An exceptionally well-specified model tested with constructs whose scores are known to be valid and reliable may literally require only two steps—one to demonstrate measurement quality and one to provide support for a theory-derived structural hypothesis. Most social work researchers, however, should expect to spend additional time on steps for testing and refining their measurement and/or structural models. Estimation Step 4: Determine that the Structural Model Is Identified before Estimating the Model In the previous section, we recommended that the measurement model be established before researchers proceed to testing structural models. Some issues of identification were also introduced in that discussion. We will now look more closely at the importance of determining that the structural model is identified, in addition to the model as a whole. To illustrate, we use an example included in the Amos program files (Arbuckle, 1983–2007). The CFA model shown in Figure 5.5 illustrates four latent variables, each measured by two observed variables (e.g., 1knowledge and 2knowledge are items measuring the latent construct of knowledge). The model contains eight observed variables and, therefore, 36 unique pieces of information (8 × 9 divided by 2). Twenty-two parameters will be estimated
127
128
Structural Equation Modeling 1 error3 1 error4
1 error5 1 error6
1 error7
1knowledge
knowledge 2knowledge
1value
1
1 value
2value
1satisfaction
1performance
performance 2performance
1 1
error1 error2
1 satisfaction
1 error8
1
2satisfaction
Figure 5.5 CFA Model for Amos General SEM Example (Ex05) (Arbuckle, 1983–2007).
(eight measurement error variances, four latent variable variances, four factor loadings, and six interfactor correlations). With 36 pieces of information and 22 parameters to be estimated, the model is overidentified; it has 14 degrees of freedom. When this model is run on a sample with 98 cases (also provided in the Amos program file folder), the obtained χ2 value is 10.3, p = 0.737. Given this good fit (good fit will be discussed in detail in Chapter 6), the researcher could well proceed to a structural test. In Figure 5.6, the three correlations between performance and each of the other latent variables in the model have been replaced with directional paths. Performance is now an endogenous variable predicted by knowledge, value, and satisfaction. The number of unique pieces of information and parameters to be estimated remains the same; the degrees of freedom remain 14. When this model is run, the obtained fit statistics are identical to those obtained with the measurement model. This result is obtained because the structural part of the model is just-identified. Therefore, the fit of the structural part has not actually been tested. Determining the Identification Status of the Structural Model In Figure 5.6, the structural part of the model contains four latent variables. In evaluating whether a structural model is identified, the latent variables (and any observed structural variables) are counted the way observed indicators were counted in a CFA. Therefore, we calculate that there are
General Structural Equation Models 1 error3
1knowledge
1 knowledge
1 error4
2knowledge
1 error5
1value
value
1 error6
2value
1 error7
1satisfaction
1
1 1performance
performance
error1 1
2performance
error2
1 satisfaction
1 error8
error9 1
1
2satisfaction
Figure 5.6 Amos General SEM Example (Ex05) (Arbuckle, 1983–2007).
10 pieces of information for estimating the structural component (4 × 5 divided by 2). Ten structural parameters are being estimated (variances for the three exogenous latent variables, Knowledge, Value, and Satisfaction; three covariances among the exogenous variables; one structural error variance for the endogenous latent variable, Performance; and three structural paths to Performance). With 10 pieces of information in the structural model and 10 structural parameters to be estimated, there are no degrees of freedom for testing the fit of the structural component. Bollen (1989) discusses identification in great detail for interested readers. When a CFA or a full model is underidentified, SEM programs will alert users to the fact—either with a message or by not running. As seen in the example in Figure 5.6 however, SEM programs will run when the full model is identified, even if the structural component is not identified and therefore is not being tested. Parameter estimates will be obtained, but they will not represent the results of a search for the best solution in the theoretical part of the model. Therefore, it is up to the researcher to ascertain that the structural component is identified. You do not want to publish a paper claiming support for a theoretical model that has not actually been statistically tested! In summary, when structural models are just-identified or underidentified, obtained fit statistics do not necessarily provide useful information about the fit of the theoretical model. Social work researchers, therefore, must always check the identification of the structural component of their models to ensure that fit statistics actually represent the test
129
130
Structural Equation Modeling
of theory that they claim to be testing. The check must be done manually, because SEM programs will proceed with an analysis of an underidentified or just-identified structural model if the full model is identified. Strategies for Making the Structural Model Identified If the structural part of the model is just-identified or underidentified, researchers will need to modify the model, within the constraints of their theory. Several options may be considered, but the key is to add a constraint that is statistically correct and theoretically meaningful. First, adding an observed covariate or control variable will increase the degrees of freedom available, as long as its hypothesized relationships are not too numerous. In Figure 5.6, for example, if there were literature to support a relationship between gender and performance, gender could be added to the structural model. The model would then have five observed variables, 15 unique pieces of information (5 × 6 divided by 2), and just one additional path to estimate. The difference between structural pieces of information and parameters to be estimated would be 15–11, resulting in 4 degrees of freedom. Another option is to remove one or more exogenous variable covariances, if low values are expected. Alternatively, based on CFA results, one or more of the covariances could be fixed. Finally, alternative models with fewer gamma paths could be estimated and compared. If any path is found to be small or nonsignificant, it could be removed. Such modifications require empirical or theoretical guidance. Chapters 6 and 7 have more information on identification. Estimation Step 5: Develop the Model with a Calibration Sample See p. 71 in Chapter 3 and p. 106 in Chapter 4. Estimation Step 6: Confirm the Final Model with a Validation Sample See p. 71 in Chapter 3 and pp. 106–107 in Chapter 4. Box 5.4 summarizes the additional best practices that apply to general structural models.
USING CROSS-SECTIONAL SEM TECHNIQUES FOR REPEATED MEASURES DATA This book focuses on the modeling of cross-sectional measurement and structural models in the SEM framework. Because social work researchers
General Structural Equation Models Box 5-4 Summary of Best Practices for Estimating General SEMs In addition to the best practices for estimating CFA models, best practices for estimating general structural models include: 1. Establishing the fit of the measurement model before testing the structural model. Use a two- or more step procedure for determining that the measurement model has adequate fit before proceeding to the test of the structural model. Describe procedures for evaluating the measurement model and present fit statistics for the final model. 2. Determining that the structural model is identified before estimating the model. Calculate by hand the degrees of freedom available for testing the structural model. Modify the model if the structural component is just-identified or underidentified.
often have pretest and posttest data, or pretest, posttest, and follow-up data, we briefly describe an extension of the techniques presented earlier to repeated measures data. Repeated measures data may also be called panel data, and such data can be analyzed in autoregressive, cross-lagged (ARCL) models. We distinguish here between repeated measures data containing perhaps two to four time points and longitudinal data with many more time points. For longitudinal data comprising many data points, latent growth curve (LGC) modeling is often the appropriate analysis method. LGC modeling is analogous to the hierarchical linear modeling (HLM) used with clustered data. LGC models can portray more sophisticated change processes than ARCL models, such as changes in means and slopes over time, and differences between individual and group trajectories (McArdle & Bell, 2000). Such complexity is more likely as the number of waves of data increases. Both ARCL and LGC models are useful for the same reason: they address the problem posed by observations (data points) that are nested within individuals over time. Like data from individuals or organizations that are nested in higher-level units, repeated measures or longitudinal data are nested within individuals; they therefore violate the regression assumption of independent observations. Statistically the violation takes
131
132
Structural Equation Modeling
the form of correlated errors among observations over time for the same individual. ARCL models can easily address this violation in ways that should by now be familiar to the reader. Recall the structure of the Θ matrices used to represent the variances and covariances of error terms associated with observed indicators of latent variables (see, for example, the Θ matrix pictured on p. 42). The diagonal of a Θ matrix contains the variances of the error terms for indicators of a latent variable. The off diagonals often contain 0s, signifying that error terms are not correlated. However, it is possibly to model and estimate covariances among error terms. Note that in the Θ matrix pictured on p. 42, although most off-diagonal elements are fixed at 0, one is labeled (θ45). The symbol indicates that the covariance between the error terms of observed variables 4 and 5 will be estimated. The “problem” of correlated errors can be addressed with this simple model specification step. Figure 5.7 illustrates how correlated errors across time points are specified graphically. The example comes from a study of the well-being of children between the ages of six and ten, 18 months after they entered the child welfare system. The data were analyzed with Mplus because of their complex structure. Figure 5.7 was created with Amos. The latent variable, Well-Being, is measured with the same three indicators at the baseline time point and 18 months later. Math achievement, social skills, and behavior are the three observed indicators of the latent variables. Because the same measures are used at the each time point, we expect that the same sources of error contribute to the variance of corresponding indicators across time. Related sources of unreliability in the test used to measure math achievement, for example, contribute to the error terms for Math1 and Math2 (δ1 and ε1). The expected relationship between each pair of well-being indicators is indicated by the double-headed errors between them. The “autoregressive” component of the model is indicated here by the extra thick arrow between baseline Well-Being and Well-Being 18 months later. This modeling is similar to the modeling of a Time 2 outcome in conventional regression, while controlling for Time 1 scores on the same outcome. The effects of caregiver characteristics and service use are estimated for the variance of Time 2 Well-Being (after 18 months in the service system) that remains after removing the variance explained by Time 1 Well-Being. A model with an additional measure of well-being
General Structural Equation Models
d3
d1
d2
Math1
Social Skill 1
e1 1
1
1
e2 1
Behavior1
1
Behavior 2
1
Well-Being 18 mos.
x5
x6 1
d5
x7 1
d6
z2
1
1
1
z1
Service Use Over 18 mos.
Caregiver Characteristics. Baseline
d4
1
Social Skill 2
Math2
Well-Being Baseline
x4
e3
1
y4
1 d7
y6
y5 1
e4
1
1 e5
e6
Figure 5.7 Example of an Autoregressive, Cross-Lagged Model.
from a third time point would allow for more exploration of the reciprocal effects of service use and well-being. While still not demonstrating causality, such a model might provide evidence that both the associational and time order criteria necessary for demonstrating causality are met. If data from additional time points are available on the same measures, such as the indicators of well-being, their error terms may also be specified as being correlated with corresponding indicators from the previous time point. Similarly, the corresponding latent variable from the previous time point is specified as a predictor of the later variable. Note, that even though error variances across time may be allowed to covary, output might indicate that some or all freely estimated covariances
133
134
Structural Equation Modeling
are nonsignificant. In this situation, the covariances can be fixed at 0. For more in-depth discussion of the use of autoregressive, cross-lagged models to test meditational hypotheses, see Cole and Maxwell (2003). For a recent application and illustration of modeling options in crosslagged SEM, see Kiesner, Dishion, Poulin, & Pastore (2009).
6
Evaluating and Improving CFA and General Structural Models
Sometimes instead of getting results when they run an SEM analysis, researchers are confronted with discouraging messages about programming errors, data problems, or other causes of estimation failures. In this chapter, we first summarize possible causes of estimation failures. We then provide guidelines for interpreting the results of successful estimation procedures both statistically and substantively. Finally, we discuss strategies for improving fit when model test results are valid (i.e., the model ran and converged, all parameters estimates are within valid ranges, and no errors are reported by the program) but unsatisfactory (i.e., fit criteria are not met).
ESTIMATION FAILURES Before model fit and parameter estimates can be evaluated, the SEM program must successfully run and converge upon a solution through the iterative estimation process described in earlier chapters. In addition, all parameter estimates must be within valid ranges. Several common causes
135
136
Structural Equation Modeling
of estimation failure are relatively easy to detect, prevent, or address. Others are more difficult to detect or to address. The following paragraphs will help with many, but not all, the problems social work researchers are likely to encounter in their SEM careers. Identification Problems A common SEM problem is that the model to be tested is not identified or is just-identified. Model identification was discussed at the end of Chapter 2. When a model is underidentified, it will not run, and the user will receive a message indicating the problem. In Amos, for example, in the text output section called “Notes for Model,” users will see a calculation of degrees of freedom showing a negative number and a message such as: “The model is probably unidentified. In order to achieve identifiability, it will probably be necessary to impose 1 additional constraint” (Arbuckle, 1983–2007). In Mplus (Muthén & Muthén, 2006), a similar message is received if the proposed model is not identified: “THE DEGREES OF FREEDOM FOR THIS MODEL ARE NEGATIVE. THE MODEL IS NOT IDENTIFIED. NO CHI-SQUARE TEST IS AVAILABLE. CHECK YOUR MODEL.” Output should be checked carefully for warnings and messages because partial results may be obtained even for an underidentified model. Partial results should be ignored when these messages appear. If a model is just-identified, it may run, but no tests of fit will be computed. In Amos, the reported degrees of freedom will be 0. In both Amos and Mplus, the reported χ2 statistic will be 0. There is only one solution to the model equations, and the implied matrix will be the same as the input matrix (i.e., perfect fit will be obtained). The list in Box 6.1 may help you troubleshoot and solve identification problems. See also pp. 175–199 of Chapter 7 for a more in-depth discussion of underidentification and how to solve the problem. An example of addressing an identification problem comes from a study using a latent variable with only one indicator (Bower, Bowen, & Powers, in press). An observed composite called Potential was used to measure a first-order factor called Teacher Perceptions of Ability. To identify the factor, the error variance of the observed variable was fixed at 1 minus its reliability, which was obtained in a general statistics program.
Evaluating and Improving CFA and General Structural Models Box 6-1 Troubleshooting and Solving Model Identification Problems 1. Check to be sure all latent variables have one factor loading fixed to 1.0 (or that the latent variables’ variances are fixed at 1.0) 2. If any latent variable has only two indicators, try to find an additional indicator, or if justifiable, constrain one loading to a value derived from previous analyses (e.g., in published studies or from the researchers’ own exploratory factor analyses). The other loading will be constrained to a value of 1.0 in order to establish the metric for and identification of the latent variable. 3. Be sure all paths from latent error terms to observed indicators are fixed at 1.0. 4. Be sure that all paths from structural error terms to endogenous variables are fixed equal to 1.0. 5. If the model is just-identified, constrain one more parameter in order to gain one degree of freedom so the model will run. Your options include deleting a path (constraining it to 0), constraining two or more paths to be equal, or constraining a path to a certain value based on your knowledge of the variable. 6. Add an observed variable to your model—an additional latent variable indicator if you are testing a measurement model; an observed demographic variable, or other structural variable if you are testing a general SEM. Note: In Mplus, Items 1, 3, and 4 on this list are default settings that generally need not be specified by the user. In Amos, users must take care to verify that these items are accurately reflected in the graphic representation of their model.
Ill-Scaling and Multicollinearity An estimation failure may also occur if the variances of two or more observed variables are too divergent (e.g., if one variance is 10 times the magnitude of another). This problem and solutions were discussed on pp. 64–65 in Chapter 3. Basically, when observed variances are highly divergent (i.e., differ by a factor of 10 or more), users need to transform one or more variables to bring their observed variances into the range of other variables (e.g., dividing income in dollars by 1,000 to reduce its variance). Also, models may not run if two or more observed indicators of latent variables are too highly correlated. Suspect this problem if the computer program warning refers to a “nonpositive definite matrix.”
137
138
Structural Equation Modeling
Examine the correlation matrix of your variables (in your SEM output or output from a general statistics program). Are the highest correlations over 0.90? Even correlations in the high 0.80s may cause problems. Kline (2005, p. 319) suggests that values over 0.85 may contribute to estimation problems, including the model not running, inadmissible solutions, or unstable results. To prevent the problem of highly correlated observed variables, multicollinearity diagnostics can be run on all input variables before they are selected for the SEM analysis. See Kline (2005) or Tabachnick and Fidell (2007) for instructions on obtaining and interpreting statistics on multicollinearity. Kline (p. 57) suggests two options for addressing the presence of a high correlation between two observed variables—combining them into one variable or eliminating one of them. Delete only one variable at a time to see if it solves the problem. Consider domain sampling and the strength of factor loadings in your choice of which variable to delete. Kline also indicates that excessively high correlations between latent variables may cause estimation problems. A high correlation between two latent variables may indicate that their underlying constructs are better modeled as one construct. What constitutes “too high” may be model specific; the best way to troubleshoot this potential problem is to rerun the analysis with the highly correlated factors modeled as one factor. Software Preferences In Amos, the model will not run if you have missing values in your data and have not checked the analysis option “Estimate means and intercepts” in the Estimation tab. Amos will also ask you to confirm that it should proceed with the analysis if you have not specified that all exogenous variables are correlated. If you have deliberately left any pairs of exogenous variables uncorrelated in your path diagram, just tell the program to continue. If you inadvertently missed a correlation, cancel the analysis, add the path, and proceed. Convergence Failures If an analysis fails to converge, a number of potential causes are possible. Nonconvergence may be due to a sample that is too small, a default
Evaluating and Improving CFA and General Structural Models
number of iterations that is too low, problematic start values for the iterative estimation process (Kline, 2005), or misspecification of the model. Note that even without convergence, SEM programs may generate all or some output, including parameter estimates. The estimates provided will be from the final unsuccessful iteration. Always check the analysis summary to confirm that the model converged. In Amos, convergence success is reported in the Notes for Model part of the output. “Minimum achieved” indicates the model ran successfully. Other messages indicate what problems occurred. In Mplus, the message, THE MODEL ESTIMATION TERMINATED NORMALLY indicates that the analysis was successful, and NO CONVERGENCE, NUMBER OF ITERATIONS EXCEEDED conveys the opposite. The messages appear before model estimates in the output but after a substantial amount of summary information about the data and analysis. With both programs, therefore, the user must look specifically for confirmation that the analysis proceeded successfully. If the output indicates the model did not converge, the output should be ignored; it is not acceptable. Suggestions for solving nonconvergence problems are presented in Box 6.2. More detail on some of these options in Amos and Mplus is provided on the companion website, including how to change convergence criteria. Inadmissible Solutions There are times when a model will appear to have run successfully but the solution is inadmissible because some parameter estimates are unacceptable. These unacceptable values, called improper solutions, are most commonly correlations (standardized covariances) of 1.0 or higher and variances that are 0 or negative (Heywood cases) (Chen, Bollen, Paxton, Curran, & Kirby, 2001). Chen et al. suggest that low sample size and model misspecification contribute to invalid parameter estimates, the former more consistently or predictably than the latter. In-depth treatment of this issue is beyond the scope of this book, but the Chen et al. (2001) article corroborates our warning about small sample sizes and misspecified models. The authors suggest a variety of strategies for addressing negative variances, the less technical of which include checking for outliers, which was discussed on p. 61 in Chapter 3, and “constraining the error variances to zero or a small positive number”
139
140
Structural Equation Modeling Box 6-2 Troubleshooting and Solving Nonconvergence Problems 1. If you are at the low end of the sample size guidelines presented in Chapter 3, Box 3.2 on p. 54, you may not be able to analyze your model with SEM. If it is an option, simplify your model to get closer to the desirable ratio of cases to parameters to be estimated. 2. Most software programs allow you to change the default value for the number of iterations that will be tried before the program reports that the model is not converging. Increase the default substantially (e.g., double it) and retry to run the model. 3. Bollen (1989) suggests that the parameter estimates reported from the final iteration of a nonconvergent solution be entered as start values for a new attempt to run the model. He provides other suggestions related to start values (pp. 254–256). Kline (2005) also provides suggestions for start values. 4. Major misspecification of a model can lead to nonconvergence. This book has emphasized the importance of testing models that have strong theoretical and/or empirical foundations. If your model is too “exploratory,” you should consider a different analysis approach. 5. Nonconvergence is more likely with certain estimators, such as those in the WLS family (Flora & Curran, 2004). If no other sources of nonconvergence seem likely, try a different estimator. Flora and Curran (2004) found that nonconvergence was less of a problem with Mplus’ WLSMV than with other WLS options in that program. If the convergence problem remains, the estimator is not likely the source.
in cases where the absolute value of the negative variance was “not far from zero” (p. 504). More technical strategies include determining if the model is empirically underidentified or if the negative estimates are due to sampling fluctuation. Readers are referred to Chen et al. and the sources they cite for more information. One take-home message of this discussion is that researchers should always check their SEM output carefully to be sure the analyzed model is identified, that the analysis converged, and that no unacceptable parameter estimates are present. Once assured that the model has indeed run and converged successfully, the researcher may turn to evaluating model fit and interpreting parameter estimates.
Evaluating and Improving CFA and General Structural Models
EVALUATING MODEL FIT The first thing many researchers look for upon obtaining the results of an SEM analysis is the output related to “goodness of fit.” If the results suggest that the model “fits” the data well, they then proceed to interpret other output statistically and substantively. Therefore, we begin our discussion with strategies for evaluating goodness of fit. The fundamental SEM hypothesis is S = ∑(qˆ ). That is, that the covariance matrix reproduced based on parameter estimates is statistically identical to the input matrix of observed covariances for the sample (S). The real question in practice, however, is not whether the input and implied matrices are truly identical (an unlikely outcome), but how similar they are to each other. The term “goodness of fit” refers to how similar the two matrices are and whether they are similar enough that the researcher can claim support for the hypothesized model. Many indices to test model fit have been developed during the past two decades. Figures 6.1 and 6.2 display some of the fit indices reported in Amos and Mplus. In this section, we highlight only a small number of fit indices that we recommend social workers use in their SEM reports. Social work researchers will find this subset more than adequate for reporting their SEM results in most scholarly journals. Model Chi-Square (χ2) Model chi-square (χ2) is the most basic and common fit statistic used to evaluate structural equation models and should always be provided in reports on SEM analyses. The statistic is sometimes denoted as χ2M (Kline, 2005). It is the product of (a) the sample size minus 1 and (b) the minimization value obtained for the discrepancy function used by the estimator (Kline). For example, the value obtained from either the ML or WLS estimator fitting function presented in Chapter 4, pp. 102–103, is multiplied by the sample size to produce the statistic. The χ2 statistic is distributed as a Pearson χ2 with degrees of freedom equal to that of the user’s model df = (1/2)(p)(p+1)–t, where p is the number of observed variables in the model, and t is the number of free parameters estimated by the model. Because the distributional characteristics of χ2 are known, it is possible to determine the statistical probability of the obtained value. The test of significance is a direct test of the fundamental SEM null
141
CMIN Model NPAR CMIN DF Default model 11 2.902 4 Saturated model 15 .000 0 Independence model 5 116.292 10
P CMIN/DF .574 .725 .000
11.629
RMR, GFI Model RMR GFI AGFI PGFI Default model 1.000 .985 .944 .263 Saturated model .000 1.000 Independence model 11.283 .583 .374 .389 Baseline Comparisons NFI RFI IFI TLI CFI Delta1 rho1 Delta2 rho2 Default model .975 .938 1.010 1.026 1.000 Saturated model 1.000 1.000 1.000 Independence model .000 .000 .000 .000 .000 Model
Parsimony- Adjusted Measures Model Default model Saturated model Independence model
PRATIO PNFI PCFI .400 .390 .400 .000 .000 .000 1.000 .000 .000
RMSEA Model Default model Independence model
RMSEA LO 90 HI 90 PCLOSE .000 .000 .154 .655 .384 .323 .448 .000
AIC Model Default model Saturated model Independence model
AIC BCC BIC CAIC 24.902 26.902 50.097 61.097 30.000 32.727 64.357 79.357 126.292 127.201 137.744 142.744
Figure 6.1 Examples of Selected Fit Statistics from Amos.
142
TESTS OF MODEL FIT Chi-Square Test of Model Fit Value Degrees of Freedom P-Value
30.077 6 0.0000
Chi-Square Test of Model Fit for the Baseline Model Value Degrees of Freedom P-Value
1776.563 10 0.0000
CFI/TLI CFI TLI
0.986 0.977
Loglikelihood H0 Value H1 Value
−4051.766 −4036.728
Information Criteria Number of Free Parameters Akaike (AIC) Bayesian (BIC) Sample-Size Adjusted BIC (n* = (n + 2) / 24)
14 8131.533 8193.321 8148.874
RMSEA (Root Mean Square Error of Approximation) Estimate 90 Percent C.I. Probability RMSEA 0.05) Close fit: ≤ 0.05 Reasonable fit: 0.05–.08 Poor fit: ≥ 0.10 ≥ 0.95 ≥ 0.95 ≥ 0.90 ≤ 0.90
Browne & Cudeck, 1993
Hu & Bentler, 1999 Hu & Bentler, 1999 Hoyle & Panter, 1995 Muthén & Muthén, 1998–2007
Evaluating and Improving CFA and General Structural Models
EVALUATING PARAMETER ESTIMATES After determining that a model meets preestablished fit criteria, it is imperative to examine other features of the model both statistically and substantively. Good fit does not guarantee that all parameters in a hypothesized model are statistically significant or of the magnitude or in the direction expected. In the measurement model, the magnitude and statistical significance of loadings (λ’s) and factor variances (φ’s) should be examined. In addition to seeking statistical significance, researchers may apply a cutoff for the magnitude of standardized loadings. Acceptable loadings in CFA are not as clearly defined as EFA loadings are in the literature. Researchers may accept all statistically significant loadings, reject loadings below one of the common EFA cutoffs (e.g., 0.40), or reject loadings that are dramatically lower than other loadings on a factor. Cutoffs may be also determined by results from previous studies of the same or similar measures. Latent variables with nonsignificant variances (i.e., variances that are not significantly different from 0) are not useful measures because they do not capture meaningful differences among individuals, at least among individuals like those in the studied sample. It should be noted that there are varying opinions among researchers about whether model components should be removed due to nonsignificance. Some argue that theoretically justified elements should remain in final models. The statistical goal of controlling for the effects of common demographic characteristics (or other relevant factors) also justifies retention of nonsignificant variables because they still explain some variance. Researchers in most topical areas will have latitude in such decisions because neither theory nor previous empirical work is likely to offer definitive conclusions. Still, theoretical and/or empirical considerations should always have a role in decisions. Related to the magnitude of loadings in the measurement model is the percent of variance in each observed indicator that is explained by a model (i.e., 1 minus the indicator’s error variance divided by its total variance). These values are reported as squared multiple correlations (SMCs) in Amos and as R2 values in Mplus. There is no generally agreed upon cutoff for what is an unacceptable SMC, but higher values signify that more of an indicator’s variance is associated with the latent variable(s) it is hypothesized to help measure. Better indicators are more closely associated with the latent variable they measure. According to Bollen
147
148
Structural Equation Modeling
(1989, p. 288), “[i]n general, the goal is to find and use measures with high [R2’s].” In an example he presents, values of 0.59 to 0.87 are described as “moderate to large” (p. 288). Substantively, the interpretation of measurement model parameter estimates is guided by the theory and prior research that dictated specification of the model. Model output may indicate that the one or more observed variables do not load significantly on the factors they were hypothesized to represent. Items may load on two factors instead of one, or on an unexpected factor. The pattern of highest and lowest loadings on a latent factor may suggest that the underlying construct is somewhat different than hypothesized. Covariances among factors (φ’s) may be higher or lower than expected. In addition to interpreting the statistical significance of loadings and interfactor covariances, therefore, researchers should also interpret the patterns of loadings and interfactor relationships in the context of analysis hypotheses and the theoretical and empirical support behind those hypotheses. Whether parameter estimates support or deviate from expectations, they have implications for theory and/or prior empirical findings. In a general SEM, the path coefficients for directional relationships between exogenous and endogenous factors (γ ’s) and between endogenous factors (β’s) should be examined for size and statistical significance. In addition, the SEM output includes the percent of variance in endogenous variables that is explained by their modeled predictors. The amount of variance in an endogenous variable that is explained by the model is the sum of the direct and indirect effects of its predictors in the structural model. As with the interpretation of measurement model parameter estimates, the substantive interpretation of structural model estimates is guided by theory and past research. Paths that were predicted by previous research and theory may be statistically nonsignificant or may be significant but smaller than hypothesized. Output related to direct and indirect pathways may not support hypotheses being tested. The percent of variance explained in the endogenous variables may be more or less than hypothesized. The relevance and importance of structural findings can only be made meaningful by interpreting them in relation to expected findings. In social work research, clinical significance should also be considered and reported. A model may meet fit standards but explain little variance in
Evaluating and Improving CFA and General Structural Models
an outcome (e.g., 3%), in which case, claims that new interventions should be developed based on SEM findings may be overstated. In a study of the relationship of workplace characteristics to employee well-being, Jang (2009) found a strong enough relationship between flexible work schedule and work–life balance (standardized path coefficient = 0.68), and work–life balance and well-being (standardized path coefficient = 0.55) to support a variety of practice and policy implications. Social work researchers should interpret and compare the magnitude of path coefficients in terms of the clinical meaning of the relationships (e.g., how many units of change are expected in an outcome variable for a one point or one standard deviation change in a predictor that can be influenced by policy or intervention). It should be noted that the magnitude of factor loadings and/or structural paths does not have to be large in order for a model to have good fit. A model may have good fit and yet contain paths that are nonsignificant or smaller than expected. Similarly, the theory-based expectation may be that a coefficient is small—that the effect of one latent variable upon another is modest. It is the closeness of the implied matrix to the input matrix that determines fit, not the presence of large effects. Model-implied covariances may be small or large; what “counts” in the evaluation of fit in SEM analysis is how similar they are to their counterparts in the input matrix. In sum, the statistical and substantive evaluation of a CFA or general SEM requires careful comparison about what a study’s guiding theory or past research suggested and what was actually found. This thoughtful exercise includes much more than a simple assessment of fit statistics. Box 6.3 offers a summary of best practices in evaluating structural equation models.
COMPARING ALTERNATIVE MODELS A best practice in the testing of CFA and general structural models is the comparison of alternative models. Chapter 4 provided suggestions for choosing alternative measurement models. Chapter 5 gave examples of alternative structural models that were tested in a social work study predicting caregiver burden and depression among caregivers of individuals with Alzheimer’s disease (Clyburn, Stones,
149
150
Structural Equation Modeling Box 6-3 Best Practices for Evaluating Structural Equation Models 1. Establish which fit criteria you will use before running analyses. Justify your choices in manuscripts reporting your findings. 2. Before evaluating fit, confirm that the program reports that the analysis has run successfully and that all parameter estimates are within acceptable ranges. 3. Report χ2, RMSEA (with 90% confidence interval), CFI, TLI, and GFI when conducting analyses with maximum likelihood estimation. 4. Report χ2, RMSEA, CFI, TLI, and WRMR when conducting analyses in Mplus with WLSMV estimation. 5. Evaluate parameter estimates statistically, theoretically, and in terms of practice implications.
Hadjistavropoulos, & Tuokko, 2000). Box 6.4 offers a summary of best practices for comparing alternative models. The purpose of specifying and testing alternative models is to provide additional support for the primary theory-based model being posited. Because more than one model may have good fit and be consistent with the data (i.e., may yield a implied covariance matrix that is statistically similar to the input matrix), a more compelling argument for the validity of one’s preferred model can be made when its fit is superior to the fit of rival models. There are two possible model comparison scenarios: (a) one of the models being compared is “nested” within the other model, and (b) neither of the models is “nested” within the other. Kline (2005) also refers to nested models as “hierarchical” models, but they should not be confused with models in which some respondents are nested within sampling units. Comparing Nested Models In the first model comparison scenario, one of the models being compared is “nested” within the other model. Suppose we have a model, m1, with 10 paths that are freely estimated. A second model, m2, is the same as m1 except that one of the 10 paths is constrained to some value (e.g., 0 or 1). In this case, m2 is nested in m1. The simplest type of nested model (m2) is one in which the freely estimated parameters are a subset of those in the first model (m1, Bollen, 1989). For example, if a freely estimated
Evaluating and Improving CFA and General Structural Models
β or γ path in one model is fixed to 0 in a second, the second model is nested in the first. Nested models, therefore, may be indicated by the absence of a path in the nested model, which signifies that the path coefficient is fixed to 0. A type of nesting that is more common in multiplegroup analyses is when a path, such as a factor loading (λ), is constrained to be equal for two groups. Instead of estimating two loadings, one for each group, the program estimates only one loading that applies to each group. It should be noted that a model with fewer observed variables than another is not nested. Nested models contain all the same observed variables as the models in which they are nested. (There is at least one exception to this statement: a second-order factor model may be nested in a first-order model if the number of loadings on second-order factors is less than the number of interfactor correlations in a first-order model. See Bower, Bowen, & Powers, for an example.) When fit is compared across two models and one is nested in the other, the χ2 statistic has a central role in identifying the preferred model. “Fit” in the context of comparing nested models refers to the value of the χ2 statistic (not its p-value). Unfortunately, identifying the better model is not as simple as finding which has a lower (better) χ2 statistic. The change in χ2 must be evaluated in relation to the change in degrees of freedom. An example and some additional background information will help illustrate how to use χ2 to identify the better of two models. Figures 6.3 and 6.4 present two alternative models of the historic loss study presented in Chapter 5. Figure 6.3 is an adaptation of Figure 5.1;
g11
DISCRIM x1
AL ABUSE h1
1
g21 f12
b12
g12 AGE X1 = x2
g22
1
HLOSS h2
X2
z2
X3
Figure 6.3 Adapted Model of Historical Loss (Whitbeck et al., 2004).
z1
151
152
Structural Equation Modeling g11
DISCRIM x1 f12
AL ABUSE h1 b12
g12 AGE X1 = x2
1 z 1
g22
1
HLOSS h2
X2
z2
X3
Figure 6.4 Fictitious Nested Alternative Model to Model in 6.3.
we have used conventional notation to indicate which variables in the model were observed and latent. Figure 6.4 presents a fictitious nested alternative model. The model in Figure 6.4 is nested in the model in Figure 6.3 because the γ path from DISCRIM to HLOSS has been constrained to 0 (i.e., the path has been removed from the path diagram). Table 6.2 provides information about the two models. The χ2 value and degrees of freedom for the model in Figure 6.3 are from the article on historic loss (Whitbeck et al., 2004); the χ2 value for the model shown in 6.4 is fictitious. Table 6.2 Comparison of Two Models Model 6.3 from Figure 6.3
Model 6.4 from Figure 6.4
Observed variables Unique elements in input covariance matrix Freely estimated parameters Degrees of freedom More parsimonious and more restrictive Less parsimonious and less restrictive
5 15 (5 × 6)/2
5 15 (5 × 6)/2
13
12
2
3 ✓
χ2
5.08 (2 df)
✓ 7.50 (3 df) (Fictitious)
Evaluating and Improving CFA and General Structural Models
With one fewer parameter to be estimated, model 6.4 is more restrictive, and therefore, more parsimonious than model 6.3. It also has more degrees of freedom. Models that are more restrictive and more parsimonious, with more degrees of freedom, and with fewer paths to be freely estimated will virtually always have worse fit compared with the models in which they are nested. One way to understand why this is true is to consider how fixing one additional parameter to 0 in the nested model makes it harder to fully reproduce all the correlations between observed variables that could be traced along the omitted path. In fact, as stated before, the more parameters that are freely estimated in a model, the closer to perfect fit the model is likely to be. A just-identified model, in which the number of unique pieces of information in the input matrix equals the number of parameters to be estimated, will have perfect fit—that is, elements of the implied matrix will be identical to the elements in the input matrix. (See the end of Chapter 2 for more on identification.) As expected, therefore, the (fictitious) χ2 value of model 6.4 is higher (worse) than the value obtained by Whitbeck, Chen, Hoyt, & Adams (2004) for model 6.3. The question for the nested comparison is: “Is the change in χ2 statistically significant, given the corresponding change in degrees of freedom?” The null hypothesis is that the two models are identical. Therefore, a p-value of 0.05 or less indicates that they are not identical and the null hypothesis is rejected. From Table 6.2, the change in χ2 is 2.42 (the difference between 7.50 and 5.08). The change in the number of degrees in freedom is 1 (the difference between 3 and 2). From a chi-square distribution table (which can be found in most basic statistics books or online), we can see that a χ2 change of 3.84 or more is statistically significant at the 0.05 level for a change of 1 degree of freedom. Related information can also be obtained in Excel by typing “=chidist” in any cell, and providing the change in χ2 and the change in degrees of freedom in parentheses, separated by a comma. In the current example: =CHIDIST(2.42,1). The returned value is the p-value of the change in fit across the two models. The information from the chi-square distribution table indicates that although the nested model has a higher χ2 than model 6.3, the fit did not get statistically significantly worse (because 2.42 is less than 3.84). The information obtained from the “chidist” function in Excel corroborates this finding. The p-value returned for the function is 0.119, which is greater than the 0.05 significance level that would indicate a statistically significant worsening of model fit. In this case, we would
153
154
Structural Equation Modeling
retain model 6.4 as the better of the two models—it is more parsimonious, and its fit is not statistically worse than the less restrictive model. This example illustrates how parsimony is favored in SEM models. Because it is more difficult to obtain good fit with more parsimonious models, parsimony is considered a virtue in SEM analyses. According to Kline (2005, p. 136), the “parsimony principle” states that “given two different models with similar explanatory power for the same data, the simpler model is to be preferred.” If the change in χ2 per change in degrees of freedom between two alternative models exceeds the critical value given in a chi-square distribution table (e.g., greater than 3.84 for 1 df ), or returns a p-value below 0.05 in Excel’s “chidist” function, the researcher retains the less parsimonious model. The statistically better fit outweighs the improvement in parsimony. In summary, when the best practice of comparing alternative CFA or general structural equation models is used and when the models being compared are nested, the change in χ2 per change in degrees of freedom is evaluated to determine which of two models is better. This simple statistical test indicates which model is most consistent with the data. The test involves calculating the difference in the χ2 values obtained from estimating the models, calculating the change in degrees of freedom, and determining if the change in χ2 for the given number of degrees of freedom is statistically significant. It is important to note, however, that when certain estimators are used in Mplus (including robust ML and WLS estimators, Muthén & Muthén, 1998–2007), the obtained χ2 and df values cannot be used in the calculations we have described. The estimators generate χ2 and df values that are corrected to take into account nonnormal or complex data (the Satorra-Bentler scaled χ2). Amos does not offer the correction factor. Mplus output clearly states next to the reported χ2 and df values that they cannot be used in difference tests. Mplus provides another mechanism for comparing nested models. Examples of that function are provided in the online resources for the book. Readers are also referred to the print or online Mplus User’s Guide (Muthén & Muthén, 1998–2007) and technical appendices (Muthén, 1998–2004) for a complete list of estimators and associated mechanisms for comparing nested models.
Evaluating and Improving CFA and General Structural Models
Comparing Nonnested Models In the second model comparison scenario, neither of the models being compared is “nested” within the other model. One model may have more observed variables or an additional latent variable. Or the two models may have different combinations of fixed, constrained, and free parameters such that neither one tests a subset of the other’s free parameters. The comparison of two alternative models that are not nested to determine which has better fit is more straightforward than the test of nested model. The Akaike information criterion (AIC) statistic and the Bayesian information criterion (BIC), which are provided in both Amos and Mplus with default estimation settings, may be used to identify the best model. Smaller AIC and BIC values indicate better fit. Table 6.3 provides an example of AIC and BIC information from Amos fit output; the top row contains the values pertaining to the tested model. Amos output contains embedded information on the formulas used to calculate each index; readers can click on the name of the index and see the formula used. The AIC value reported in Amos is simply the χ2 value for the model, plus two times the number of free parameters. In the model for which the values in Table 6.3 were generated, χ2 was 63.235 and 10 parameters were estimated. The AIC in the table = 63.235 + 2(10). The BIC formula (which can be found by clicking on “BIC” in the Amos output) is also based on χ2 but takes into account sample size and model complexity and rewards models that are more parsimonious. Because smaller χ2 values are desirable, it is clear why smaller values for AIC and the related BIC are desirable—χ2 is central to Amos’ calculation of the fit indices. Table 6.4 provides an example of AIC and BIC information from Mplus fit output.
Table 6.3 Example of Amos AIC and BIC Output Model
AIC
BCC
BIC
CAIC
Default model Saturated model Independence model
83.235 30.000 690.637
83.529 30.441 690.784
123.518 90.424 710.778
133.518 105.424 715.778
155
156
Structural Equation Modeling Table 6.4 Example of Mplus AIC and BIC Output Information criteria Number of free parameters Akaike (AIC) Bayesian (BIC) Sample-size adjusted BIC (n* = (n + 2)/24)
19 13208.354 13288.432 13228.124
Mplus uses a different formula for calculating the AIC (Muthén, 1998–2004): AIC = −2log L + 2r where logL is the log of the likelihood function and is r is the number of free parameters in the model. Mplus output includes the log likelihood value for the model. In this example, the log likelihood value for the null hypothesis was –6585.177, and the number of free parameters was 19. Therefore, AIC = –2(–6585.177) + 2(19) = 13208.354. The Mplus online technical appendices contain more information on the AIC formula and the formula used to calculate BIC.
Box 6-4 Best Practices for Comparing Alternative Models 1. Determine if the models to be compared are nested or not. Are the free parameters in one model a subset of the free parameters in the other? For nested models: 2. Compare the difference in χ2 per difference in df between the two models. If the χ2 difference is less than the critical threshold given in a χ2 distribution table for the change in df, or if the p-value obtained in Excel for the change per df is greater than 0.05, retain the more restrictive model. For nonnested models: 3. Compare the AIC or BIC values presented in the fit statistics for the model. Smaller values indicate the better model.
Evaluating and Improving CFA and General Structural Models
In summary, it is the comparison of AICs or BICs across models that allows the researcher to determine which of two alternative nonnested models is preferable. The BIC penalizes models more for complexity, that is, for containing many paths. In general, comparison conclusions will be the same for the values, so either can be used.
IMPROVING MODEL FIT It is not uncommon to obtain inadequate fit statistics for an SEM that has been carefully specified and appropriately estimated. In a strictly confirmatory analysis, a researcher would test a model, determine if it had adequate fit, and report that the model passed the test or not. Most confirmatory factor analyses, however, are not “strictly confirmatory” (Jöreskog, 1993, p. 295) In what Jöreskog calls a “model generating” process, researchers use feedback from SEM output to make model improvements. Model generating, building, or modifying is common in general structural models as well as CFA models. The fit statistics provided by SEM programs provide an overall assessment of the degree to which the input and implied matrices are similar. Other output that is either requested by the researcher or provided by default provides more specific insights into which variables or relationships might be problematic in a model. Long (1983) describes the repeated use of SEM output to modify models until fit indices fall within desirable thresholds as a “blatantly exploratory approach” (p. 69). Improving upon models, however, is generally acceptable in the literature under the following conditions: (a) changes are minor, (b) changes can be theoretically justified, and (c) the improvements do not cause significant changes in other model parameters (Byrne, Shavelson, & Muthén, 1989). In addition, models resulting from post hoc modifications should be validated with a second sample (Long, 1983), and authors should acknowledge the exploratory aspects of their investigations. Validation with an independent sample is important for determining that modifications are not “sample specific,” that is, they do not apply only to data from which the modification suggestions originated. Modifications that are robust across samples are more credible than those that are not replicated. This section provides guidelines for identifying sources of poor fit and addressing them.
157
158
Structural Equation Modeling
Identifying and Addressing Sources of Poor Fit SEM output provides many clues about possible sources of poor fit. Different components of the output should be used in combination to determine the source of problems and which modifications are likely to be most helpful. Clues about poor fit may be found in the following components of the output: 1. Modification indices 2. Residual correlation matrix 3. R2 or SMC values for indicators of latent variables Modification indices. Modification indices (MIs) indicate parameters that can be added to a model to improve fit. “Adding” parameters means allowing additional parameters to be freely estimated. The reader may recognize how MI information relates to nested model comparisons— each MI indicates how decreasing the model’s df by 1 will affect overall model fit and the value of the parameter that is being freed. The current model is “nested” within the model that would result from each proposed change. All MI information is based on making only one change at a time. Freeing fixed parameters so they can be estimated. For each suggested MI, information is given about how much freeing the parameter will reduce χ2, and what the value of the new parameter will be. (Remember, lower χ2 values are desirable and more likely to be nonsignificant.) Table 6.5 shows an example of four MIs from output obtained with Mplus 4.2 (Muthén & Muthén, 2006). The user has requested that only modifications that will reduce χ2 by 0.5 or more be listed in the output. The two “BY Statements” refer to suggested loadings of two observed variables (T12A, T12B) on the latent variable “SKILL1.” If T12A were allowed to load on SKILL1, the obtained χ2 for the model would decrease by 0.606, according to the MI column. The expected unstandardized parameter change (EPC) that would result from adding the path is –0.185. In the model from which these MIs were generated, T12A already loads on a different latent variable. Its loading on SKILL1 is fixed at 0. The EPC column indicates that if T12A were allowed to load on SKILL1—and no other model changes were made—the parameter estimate for the loading would be –0.185.
Evaluating and Improving CFA and General Structural Models Table 6.5 Example of Modification Index Output from Mplus 4.2 Model Modification Indices Minimum MI value for printing the modification index 0.500
BY Statements SKILL1 BY T12A SKILL1 BY T12B WITH Statements T12B WITH T11B T12B WITH T12A
MI
EPC
Std. EPC
Std YX EPC
0.606 1.953
–0.185 0.280
–0.178 0.270
–0.178 0.270
1.136 0.618
0.082 –0.087
0.082 –0.087
0.082 –0.087
The EPC, therefore, indicates that T12A “wants” to double load on a second factor. The last two columns of the output present two versions of the standardized parameter estimate that correspond to the unstandardized estimate. The two “WITH statements” in the MI list indicate covariances that could be added to the model to improve its fit. In Mplus, WITH statements involving observed indicators of latent variables like T12A and T12B actually refer to the error terms for the observed variables. The first “WITH” MI indicates that the model’s χ2 could be reduced by 1.136 if the error variances of T12B and T11B were allowed to correlate. The new correlation coefficient or expected parameter change (from the original fixed value of 0) would be 0.082. In Amos, the MI information would refer specifically to the names the user has given the latent error terms. In practice, MIs may be reviewed one at a time to see which, if any, suggested new parameters are theoretically acceptable. Often users will examine the MIs that offer the biggest improvement to χ2. Another strategy is to identify variables for which multiple MIs are listed. In the sample output, for example, three of the four MIs involve T12B. Although the χ2 and parameter change estimates for each additional parameter suggested in the MI grid are accurate only when one change is made at a time (i.e., χ2 and parameter change estimates are not additive), it is reasonable to assume that T12B may be a key to model improvement. Nevertheless, no changes should be made related to T12B unless they can be theoretically justified.
159
160
Structural Equation Modeling
In a poor-fitting model, the list of MIs may be long. Many of the MIs may make no sense statistically (e.g., suggestions to regress observed indicators of latent variable on each other) and can be immediately ruled out. Many others may be undesirable statistically (e.g., suggestions to allow latent variable indicators to load on more than one factor). Most others will be theoretically untenable. Only a handful at best may be appropriately considered as both theoretically and statistically defensible. Modifications should be made reluctantly, sparingly, carefully, and one at a time. As illustrated in the examples on pp. 162–165, correlations among the error terms of latent variable indicators are among the more common modifications made to models. It should be reiterated here that model modifications must be statistically and theoretically justifiable. In addition, it should be noted that not all models can be modified to an acceptable level of fit. The purpose of SEM analyses is to determine if a proposed model has adequate fit. Many models do not, and no amount of tinkering can or should be used to improve them. Box 6.5 provides a tip about the availability of modification indices in Amos. Residual Correlation Matrix. The residual correlation matrix is another potentially rich source of information about sources of poor model fit. Each element in the residual correlation matrix represents the difference between the corresponding elements in the standardized input matrix and the implied matrix. If the observed correlation between variables x1 and x2 is 0.62, for example, and the reproduced correlation obtained from an SEM analysis is 0.59, the value in the residual matrix for that correlation would be 0.03. The smaller the absolute value in the residual matrix, the better the proposed model was able to reproduce the original correlation. Values in the residual matrix, therefore, are explicit indications of the location of poor fit. Note that a model may
Box 6-5 Modification Indices and Analyses with Missing Values Although both Amos and Mplus have options to handle datasets with missing values, modification indices are not provided in Amos output when there are missing data. Mplus provides MIs even when dependent variables have missing values; therefore, they will be reported when observed indicators of latent variables have missing values.
Evaluating and Improving CFA and General Structural Models
lead to an overestimated or underestimated reproduced covariance/ correlation. In the former case, the residual value is negative; in the latter, it is positive. The absolute value is what should be evaluated. If a model reproduced the original observed variable relationships perfectly, the reproduced matrix would be the same as the input matrix, and all values in the residual matrix would be 0, and the χ2 statistic would be 0. When examining the residual correlation matrix, the researcher is looking for correlations that were not reproduced well by the estimated model. One suggested cutoff for evaluating correlation residuals is an absolute value of 0.10 (Kline, 2005). When a residual correlation is 0.10 or higher, one or both of the variables represented in the correlation may be problematic in the analysis. We have found this cutoff to be useful for identifying model problems, especially if individual observed variables have multiple high residuals. However, in some models higher or lower values may be more appropriate. If a model has poor fit according to fit indices but no residual correlations over 0.10, a lower threshold might be needed to identity sources of poor fit. Strategies for addressing poorly reproduced correlations include adding correlations among the error terms for the variables with high residual correlations, adding other paths that will improve the reproduction of their correlation, or, more drastically, deleting one of the variables. This last strategy will not necessarily improve model fit because it affects numerous aspects of the model, but it may improve fit if the deleted variable has a number of high residual correlations and/or performs poorly in other ways as indicated by MIs or its SMC. As with all modifications, these strategies should be theoretically justifiable. The deletion of observed variables, especially if they are established indicators of factors, should be approached with reluctance and caution. R2 and SMC Values for Indicators of Latent Variables. In conjunction with other output, R2 or SMC values for indicators of latent variables can help researchers identify observed variables that are not performing well in the measurement part of the model. If MIs indicate a variable “wants” to load on two factors, or the residual matrix reveals that its relationships with several other variables are not being reproduced well, a low R2/SMC may help the researcher decide that the measurement model would be stronger without the variable. Again, item deletions are not a desirable solution to poor fit. The decision to delete a variable should be based on more than one criterion, and the implications for both the scale to which
161
162
Structural Equation Modeling
the item belongs and the latent construct being measured should be carefully considered. And, it should be noted that some researchers believe the removal of items from a theoretically justified model is generally inappropriate. The removal of a nonsignificant structural path is another type of model modification. The removal of a path will not directly reduce a residual correlation or the model χ2, but it may improve the model by making it more parsimonious. In the following section we present some examples of model modifications from the social work literature. Examples of Model Modifications Researchers frequently respecify CFA models to include correlated measurement errors based on modification suggestions provided by SEM output. The addition of correlations among the error terms for observed variables can improve model fit substantially and make the difference between an adequate and inadequate model. Allowing selected error terms to correlate is therefore tempting. It is considered acceptable if the model improvements meet the criteria stated earlier—the modifications are minor and selective, they can be justified theoretically, they do not cause substantial change in other model parameters, and the model is tested on an independent sample (Byrne, Shavelson, & Muthén, 1989; Kline, 2005). However, there are no hard and firm rules about what is “minor” and what constitutes “theoretical justification.” In a multiplegroup study of the direct and indirect effects of neighborhood characteristics on adolescent educational behavior (Bowen et al., 2002), three correlated errors were added to the measurement model based on SEM output. The fit of the model improved from inadequate to adequate according to four commonly cited fit indices. The rationale provided for allowing correlated errors among three of nine items associated with one latent variable was that the items were drawn from the same scale on the original instrument and shared the same question stem, which was different from the other six items that loaded on the latent variable. The authors tested the modified model with a separate random sample, which improved confidence in the results, but judgment on the acceptability of this common type of respecification remains subjective. Other potential modifications include the deletion of nonsignificant items, the deletion of problematic items (e.g., cross-loading items), the
Evaluating and Improving CFA and General Structural Models
allowance of additional loadings, and the movement of an observed indicator of one factor to another factor. Kelly and Donovan (2001) removed one item from their 10-item scale because it did not covary as expected with one item and it had a negative covariance with another on the scale they were testing. Observed variables that do not load at a statistically significant level on any proposed factor may be removed from an analysis. Some researchers impose cutoffs for loadings that are independent of statistical significance. Abbott (2003), for example, removed eight items from the Professional Opinion Scale (POS) with loadings below 0.30. Given the sample size of her study (N = 1,433), these loadings were almost certainly statistically significant. Items that load significantly on more than one factor also may be targeted for removal unless double loadings were expected and deemed acceptable. Decisions about double loaders may be subjective or statistical. Often researchers prefer simple structure (one criterion of which is that each indicator load on only one factor) or prefer not to have indicators contribute to multiple factor scores, regardless of statistical performance. Often double loaders do not load at a desired magnitude on two factors when both are in the same model, even if both loadings are statistically significant. A double-loading item may be targeted for deletion from the model if fixing one loading to 0 leads to poor model fit. A measurement model may not meet fit standards if an observed variable that loads on two factors has one of those loadings fixed at 0. The estimation procedure chosen, sample size, and characteristics of the particular model will affect how large unwanted secondary loadings have to be before they jeopardize fit. In some models, even a secondary loading of less than 0.20 will cause a model to have inadequate fit. In such cases, the researcher will have to choose between allowing double loadings and removing problematic items. Neither of these remedies will necessarily solve the fit problem and neither may be theoretically defensible. In other cases, model output may indicate that an item that was expected to load on one factor actually loads on another factor. If the item can be justified as an indicator of the new latent variable, the model could be respecified to accommodate the change in affiliation. If not, the item could be deleted for failing to measure adequately its hypothesized construct. An example of the use of item deletions comes from an article by Wegmann, Thompson, & Bowen (2011). The researchers analyzed scales
163
164
Structural Equation Modeling
assessing the home environment and children’s home behavior from the Elementary School Success Profile (ESSP) for Families. Because the items analyzed in the CFA were ordinal and nonnormally distributed, Mplus’ WLSMV estimation was used with a polychoric correlation matrix. Partly because polychoric correlations are almost always higher than corresponding Pearson correlations (Bollen, 1989), the loadings of many variables on hypothesized factors were high (e.g., standardized values over 0.85) and many had substantial (e.g., > 0.20) secondary loadings, according to modification indices. In conjunction with results of reliability tests on reduced scales, information in the residual matrix, and theoretical knowledge of the target constructs, the MI information was used to guide the selection of items to remove from the scales. Because the ESSP for Families was longer than desirable, the option of removing items while retaining reliable and valid constructs was welcome, albeit unexpected. The high interitem correlations suggested that a small number of indicators would measure the constructs as well as a high number. The fact that the CFA became more exploratory than planned was explicitly acknowledged, and the quality of the respecified model was validated with a separate random subsample. Still, some would argue that the number of modifications made in the course of a “confirmatory” study was excessive. In Chapter 4, we discussed a study of the Alcohol Use Disorders Identification Test (Kelly & Donovan, 2001). The researchers respecified their model at least twice in their study. First, the researchers removed an item (Item 9) after determining that it created a problem with the analysis matrix in their 1-factor model. Before deleting the item, the analysis produced a “nonpositive definite matrix” message. The authors attributed the problem to Item 9 because it had an unusual pattern of observed correlations with other indicators—a high negative covariance with one indicator and no covariance with another. The model ran after the variable was deleted. Later, the researchers respecified the 2-factor model to include two correlated errors. This respecification led to adequate model fit. However, the researchers provided no theoretical rationale for the correlated errors and did not retest the model on an independent sample; therefore, they did not follow two of the best practices for model modifications. With the small sample size and model modifications that characterize their study, caution should be exercised in interpreting or generalizing their findings.
Evaluating and Improving CFA and General Structural Models
Other examples of model modifications are easy to find, especially in reports of CFA models. Ten years after developing the Professional Opinion Scale (POS) to evaluate social workers’ commitment to social work values, for example, Abbott (2003) collected data with the POS using a new sample. The purpose was to “confirm the continued strength of the POS” (p. 650) using CFA. One model based on studies of the scale with the original sample was tested. As a result of findings from a sequence of analyses and from examination of item content, four items were moved from being indicators of one factor to being indicators of another, eight items were deleted, and four correlated errors were added. Examples of structural model modifications are also not hard to find. One of the exogenous workplace variables in Jang’s original model of workplace support, work–life balance, and employee well-being became a mediator variable. The author stated “Although the five-factor model met the standard of criteria for adequacy of fit, the significant chi-square statistic and inspection of the matrix of standardized residuals suggested that a modification to the model was needed” (p. 98). No additional information is given to justify the change that was made, and even with the change, the χ2 remained significant. Fit statistics are not given for the original model, and no χ2 difference test results are provided.
Box 6-6 Best Practices for Improving Model Fit 1. Make changes only if they (a) are minor, (b) can be theoretically justified, and (c) do not cause significant changes in other model parameters (Byrne et al., 1989). 2. Use statistical information in the modification indices, residual matrix, and R2/SMC output for observed indicators to guide modifications. Making changes supported by multiple sources is preferable to relying on only one source of information on poor fit. 3. Testing prestated alternative models, which are described in the article’s literature review, is preferable to making modifications in response to inadequate fit indicators. 4. Because modifications are not desirable, avoid making modifications to improve fit statistics that have already met preestablished criteria. 5. Make changes only if your sample is large enough to allow validation on an independent subsample.
165
166
Structural Equation Modeling
In general, we do not recommend modifying theory-based models (a) when prestated fit criteria are met, and (b) without presentation of the statistical reasons. In summary, model testing with CFA and general structural modeling often involves modifying or respecifying the model being tested. Best practices for improving model fit are summarized in Box 6.6. When modifications are minor, theoretically justifiable, and do not affect other parameter estimates, they may be acceptable. In addition, all but the most minor modifications that are made based on output from SEM analyses, such as parameter estimates, MIs, or residual correlations, should be made only if they can be validated with data from an independent sample. Social work researchers should be explicit about the role of modifications in their model tests. Most importantly, researchers should understand that while resorting to modifications may be appropriate under certain conditions, the results of SEM analyses indicating that a model is not consistent with the data should be respected. This question, after all, is the fundamental SEM question and sometimes the answer is “no, the model is not supported.”
7
Advanced Topics
This chapter discusses three advanced SEM topics: (a) how to conduct a power analysis for SEM, (b) how to prevent and solve problems of underidentification, and (c) how to conduct a multiple-group analysis.
STATISTICAL POWER ANALYSIS FOR SEM Statistical power analysis examines the balance of the four interrelated components shown in Box 7.1: the probability (α) of rejecting a null hypothesis that in fact is true; the statistical power to correctly reject a false null hypothesis (“1– β” where β is the probability of accepting a false hypothesis); sample size; and effect size (Cohen, 1988). By fixing any three of the four components, a researcher can obtain the value of the fourth component. Most frequently, researchers fix the probability of making a Type I error at a small level, such as α = 0.05, and the statistical power (i.e., the ability to reject a false hypothesis) at a large level such as 1–β = 0.80. Power analysis is most commonly used for two purposes: (a) to determine the sample size needed to detect a given effect size, and (b) to determine a study’s statistical power when the sample size and effect size are known. The first type of power analysis is often conducted at the planning stage of a research project, while the second is often conducted after an analysis has been completed. 167
168
Structural Equation Modeling Box 7-1 Components of Power Analysis 1. Alpha (a). The level of statistical significance chosen by a researcher to test a hypothesis. Most commonly social work researchers specify a = 0.05, A low α makes it less likely that the researcher will incorrectly claim an alternative hypothesis to be true, which is called a Type I error. 2. Power (1– b). The probability of rejecting the null hypothesis when it is false. The probability of accepting a false null hypothesis, or making a Type II error, is b. Social work researchers often specify b = 0.20, resulting in a power of 0.80. 3. Sample size. The actual number of study subjects comprises the sample. 4. Effect size. The magnitude of the hypothesized relationship.
Effect size is defined differently for different types of statistical analysis. In mean comparisons (t-test, ANOVA, or HLM), for example, effect size is the standardized mean difference between groups. In regression analysis, effect size is R2 or the explanatory strength of the regression model. In survival analysis, effect size is the hazard ratio between groups. The framework for SEM power analysis described here was developed by MacCallum and colleagues (MacCallum, Browne, & Cai, 2006; MacCallum, Browne, & Sugawara, 1996). In this framework, the effect size of an SEM analysis is defined by the root mean square error of approximation (RMSEA). Readers may recall that RMSEA is a measure of fit. Here the measure is used to reflect effect size and is used in a different way than measuring model fit. Readers are referred to MacCallum et al. (1996) for a more in-depth discussion of the rationale behind this approach to power and the choice of values used in the analyses described below. Power Analysis for Test of Overall Model Fit in One Model When only one model is being tested, the researcher typically employs the model c2 to assess goodness of fit. Recall that the model c2 directly tests the fundamental null hypothesis about the equality of the observed ∧ or input covariance matrix and the model-implied matrix: H0 : S = ∑(q ). The test statistic c2 is a product of the overall degrees of freedom in the
Advanced Topics
sample and the estimated discrepancy function (N –1) Fml. In SEM, and unlike in most other analysis approaches, when the p-value associated with the model χ2 is nonsignificant at a given level of significance (usually, a = 0.05), the researcher accepts the null hypothesis and concludes that the model has good fit to data. Power analysis in this context is concerned with whether the failure to reject the null hypothesis is due to inadequate statistical power caused by a small sample size. To perform the power analysis, MacCallum et al. (MacCallum, Browne, & Sugawara, 1996) define RMSEA, denoted as ε, as an effect size. They then employ ε to test model fit under two scenarios: close fit of the model (H0: ε ≤ 0.05), and not-close fit of the model (H0: ε ≥ 0.05). (Readers may recall that 0.05 is a cutoff for RMSEA used by some researchers to indicate acceptable or not acceptable fit.) A SAS program to implement the test was developed by MacCallum et al.; the program is available in their 1996 publication and in the online materials associated with this book. When testing close fit, the user sets up ε0 = 0.05, and εa = 0.08 (i.e., the user specifies rmsea0 = 0.05 and rmseaa = 0.08 in the SAS syntax). When testing for notclose fit, the user sets up ε0 = 0.05, and εa = 0.01 (i.e., the user specifies rmsea0 = 0.05 and rmseaa = 0.01 in the SAS syntax). Running the SAS program, the user obtains the statistical power of an analysis given the sample size, the model’s degrees of freedom, and the user-specified level of statistical significance (usually 0.05). Likewise, the user may obtain an estimate of the sample size needed for analysis given a desired level of statistical power (usually 0.80), the model’s degrees of freedom, and the level of statistical significance. The SAS syntax file to perform the first power analysis (i.e., obtaining statistical power for a given study) is named “OneModelPower.sas,” and the file to perform the second power analysis (i.e., obtaining needed sample size) is named “OneModelMinimumN.sas.” To demonstrate the use of the MacCallum et al. test of power, we will use a study conducted by Colarossi and Eccles (2003). In the study, the researchers developed a model using SEM to test the hypothesis that providers’ social support at Time 1 affects adolescents’ depression and self-esteem at Time 2. The study reported that their model fit the data well: c2 (df = 6, N = 217) = 8.44, p = 0.49. With a sample of 217, the question was whether the study had adequate power to reject the null hypothesis.
169
170
Structural Equation Modeling
First we test the hypothesis of close fit (H0: ε ≤ 0.05) using the SAS syntax and the model characteristics presented above. We type in the following values (i.e., the effect size values MacCallum et al. suggested for testing close fit) in the syntax file named OneModelPower.sas: rmsea0 = 0.05, rmseaa = 0.08, df = 6, n = 217; we obtain a power value of 0.233. The study’s power is very much below the generally accepted level of 0.80. Second, using the same values in the syntax file OneModelMinimumN.sas, we learn that the study would have needed a sample size of 1,238 to achieve the desirable power level of 0.80. Next, we test the hypothesis of not-close fit (H0: ε ≥ 0.05). This time we only need to revise the values of rmsea0 and rmseaa in the two syntax files. Setting the following values in the syntax file OneModelPower.sas: rmsea0 = 0.05, rmseaa = 0.01, df = 6, n = 217, we find that the study’s power is 0.149, or below the required level of 0.80. Setting the same values in the syntax file OneModelMinimumN.sas, we learn that the study would have needed a size of 1,069 to achieve a power level of 0.80. Note that the conclusion from the not-close fit test is very similar to that from the close fit test. In sum, the power analysis indicates that Colarossi and Eccles’s study (2003) is underpowered; to obtain the same study conclusion but retain statistical power at the 0.80 level, the researchers would need a sample size ranging from 1,069 to 1,238. Statistical Power for Comparing Nested Models The power of a test comparing two nested models can also be evaluated. If Model A is nested in Model B, the null hypothesis being tested is (H0: (FA* – FB*) = 0, where FA* and FB* denote the obtained values for the fitting (or minimization) function for Model A and Model B, respectively. The null hypothesis states that the fitting function values are not statistically significantly different, therefore fit of the nested model is not significantly worse than fit of the less restrictive model. The power analysis is concerned with whether a failure to reject the null hypothesis (i.e., concluding that the two models are identical) is due to inadequate statistical power. To perform the power analysis for comparing nested models, we can again use a SAS program developed by (MacCallum, Browne, & Cai, 2006). The SAS syntax file to obtain the statistical power for a given study is named as “NestPower.sas,” and the file to obtain the needed sample size
Advanced Topics
for a given power of 0.80 is named as “NestMinimumN.sas.” Users specify two RMSEA values based on the assumption about effect size in advance, which in the cases of comparing models is: δ = (F A* – FB*) = (dAe2A – dBe2B). That is, they need to specify the df for Model A (i.e., dA) and for Model B (i.e., dB), and to specify two RMSEA values for eA and eB (the suggested values are: eA = 0.06 and eB = 0.04). We use a study by Conway, Mendelson, Giannopoulous, Csank, and Holm. (2004) to illustrate how to conduct a power analysis for nested models. In the study, Conway et al. addressed the hypothesis that adults reporting sexual abuse are more likely to exhibit a general tendency to ruminate on sadness. The authors claimed that a model using rumination as a mediator (N = 201, df = 8) represented a significantly better fit to the data than the null model of no relationships between constructs (df = 15). We want to know whether the study has adequate power with a sample size of 201, and if not, how many subjects would be needed for the study? Setting the following values in the syntax file NestPower.sas: rmseaa = 0.06, rmseab = 0.04, dA = 15, dB = 8, and n = 201, we determine that the study’s power is 0.516, which is below the required level of 0.80. Setting the same values in the syntax file NestMinimumN.sas, we learn that the study would have required a sample size of 350 to achieve a power of 0.80.
Key Relations among Factors Affecting Power McCallum and colleagues (MacCallum, Browne, & Cai, 2006; MacCallum, Browne, & Sugawara, 1996) emphasize a key finding with regard to relationships among factors affecting power: the crucial factor affecting a study’s power is not sample size alone, but sample size in relation to a model’s degrees of freedom. A more complex model (i.e., a model with more estimated parameters and fewer df ) needs a larger sample in order to achieve the same level of power as a less complex model (i.e., a model with fewer estimated parameters and more df ). Let’s take a closer look at what determines df : Df =
p(p + ) −t 2
where p is the number of observed variables, and t is the number of free parameters to be estimated. From this definition, we see that df
171
172
Structural Equation Modeling
indicates the complexity of a model, that is, when t increases, df decreases, and the model becomes more complex. Likewise, when t decreases, df increases, and the model becomes simpler. To provide readers with a sense of the power requirements of SEM models, and the importance of sample size and df in the estimation of power, we include two tables adopted from MacCallum et al. (1996). Table 7.1 presents power estimates for testing overall fit in models with selected numbers of degrees of freedom and sample size, and Table 7.2 presents minimum sample size to achieve power of 0.80 for selected levels of df. Table 7.1 shows that when a study has a df of 5 and a sample size of 200, the study’s statistical power is 0.199 for a “close fit” test and 0.124 for a “not close fit” test, respectively. The study is underpowered. One explanation to this underpowered study is that the model is complex because of a small df of 5. Now with the same sample size of 200 but a less complex model with a df of 100, the study’s statistical power increases to 0.955 for a “close fit” test and 0.870 for a “not close fit” test, respectively. In other words, under the context of a less complex model, the study can meet the required statistical power of 0.80. Table 7.2 shows the similar information but shows required sample size under the condition of 0.80 statistical power. For a complex model such as that of df = 2, a study needs to have a sample of 3,488 subjects (for a “close fit” test) or a sample of 2,382 subjects (for a “not close fit” test), respectively, in order to meet the required statistical power of 0.80. When the analytic model becomes less complex such that its df = 100, the study only needs 132 (for a “close fit” test) or 178 (for a “not close fit” test) subjects in order to meet the required statistical power. Statistical Power in a Sample of Social Work Applications of SEM Guo and Lee (2007) evaluated the statistical power of SEM analyses reported in 139 articles published in the field of social work research between 1999 and 2004 (see Chapter 1 for more information about these articles). Sample sizes for the studies ranged from 103 to 383, and degrees of freedom for testing the overall model fit ranged from 6 to 27. Applying MacCallum et al.’s framework (MacCallum, Browne, & Cai, 2006; MacCallum, Browne, & Sugawara, 1996), Guo & Lee found 32 studies (23%) whose study power was below the required 0.80 level. Of the
Table 7.1 Power Estimates for Selected Levels of Degrees of Freedom (df ) and Sample Size Sample Size 300 400
df and test
100
200
5 Close
0.127
0.199
0.269
0.335
0.397
0.081
0.124
0.181
0.248
0.324
0.169
0.294
0.413
0.520
0.612
0.105
0.191
0.304
0.429
0.555
0.206
0.378
0.533
0.661
0.760
0.127
0.254
0.414
0.578
0.720
0.241
0.454
0.633
0.766
0.855
0.148
0.314
0.513
0.695
0.830
0.307
0.585
0.780
0.893
0.951
0.187
0.424
0.673
0.850
0.943
0.368
0.688
0.872
0.954
0.985
0.224
0.523
0.788
0.930
0.982
0.424
0.769
0.928
0.981
0.995
0.261
0.608
0.866
0.969
0.995
0.477
0.831
0.960
0.992
0.999
0.296
0.681
0.917
0.987
0.999
0.525
0.877
0.978
0.997
1.000
0.330
0.743
0.949
0.994
1.000
0.570
0.911
0.988
0.999
1.000
0.363
0.794
0.970
0.998
1.000
0.612
0.937
0.994
1.000
1.000
0.395
0.836
0.982
0.999
1.000
0.650
0.955
0.997
1.000
1.000
0.426
0.870
0.990
1.000
1.000
Not close 10 Close Not close 15 close Not close 20 close Not close 30 close Not close 40 close Not close 50 close Not close 60 close Not close 70 close Not close 80 close Not close 90 close Not close 100 close Not close
500
Sources: MacCallum et al. (1996).
173
Table 7.2 Minimum Sample Size to Achieve Power of 0.80 for Selected Levels of Degrees of Freedom (df ) df
Minimum N for test of close fit
Minimum N for test of not-close fit
2
3488
2382
4
1807
1426
6
1238
1069
8
954
875
10
782
750
12
666
663
14
585
598
16
522
547
18
472
508
20
435
474
25
363
411
30
314
366
35
279
333
40
252
307
45
231
286
50
214
268
55
200
253
60
187
240
65
177
229
70
168
219
75
161
210
80
154
202
85
147
195
90
142
189
95
136
183
100
132
178
Sources: MacCallum et al. (1996).
174
Advanced Topics
32 underpowered studies, 7 (21.9%) did not show adequate power for testing nested models. The review showed that a study with n = 383 and df = 20 only had a marginal power level of 0.746. The worst case scenario was a study with n = 217 and df = 6, which had a power level of only 0.233. For testing nested models, the worst case scenario is a study with n = 201 and “difference in degrees of freedom” = 1, which had a power level of only 0.506. The review also found seven excellent studies that used small samples (ranging from 169 to 290) but relatively large degrees of freedom (ranging from 34 to 181). As a result, they all had adequate power. The review of statistical power of the SEM applications in social work research underscores the crucial finding depicted earlier: the key factor affecting a study’s power is not sample size alone, but sample size in relation to degree of freedom. Based on their review and analysis of statistical power, Guo and Lee (2007) made several recommendations to social work researchers. The most important recommendation is that researchers, manuscript reviewers, and editors pay attention to statistical power in SEM, especially when they encounter studies with sample sizes around 200 and df around 20. Under these conditions, an examination of statistical power is warranted. Suggested strategies to address the problem of small sample sizes include: (a) keeping SEM models simple by focusing on specific components of a theoretical model instead of the full theoretical model; (b) using national data to conduct secondary analyses because such databases typically have sufficient sample sizes; and (c) engaging in collaborative research involving multiple sites and agencies in order to obtain sufficiently large samples.
ADDITIONAL STRATEGIES FOR PREVENTING AND SOLVING THE PROBLEM OF UNDERIDENTIFICATION Identification is a data issue that may be encountered in all kinds of statistical analyses, not just in SEM. Its relevance in SEM is evidenced by our inclusion of discussions of identification in almost every chapter of this book. As a reminder: a statistical model is said to be identified if it is theoretically possible to derive a unique estimate of each parameter (Kline, 2005). Hence, identification is a data issue concerning the number of known pieces of data and the number of parameters to be estimated in
175
176
Structural Equation Modeling
a model. In SEM analysis, underidentification most often occurs when the number of parameters in a model exceeds the number of unique pieces of input data (the number of unique elements in the input data, or the sample variance–covariance matrix) or when a latent variable has not been assigned a metric. The former is an example of overall model underidentification; the latter is an example of a “local” identification problem, in which one part of a model is not identified. Several statistical principles have been developed to evaluate identification status of a model. For instance, Bollen (1989) summarizes a few rules to check identification status for a given model, including the t-rule, null B rule, recursive rule, and rank and order conditions. Satisfying some of these rules is necessary but not sufficient to establish identification, while others are sufficient but not necessary. In practice, researchers can apply these rules to check the identification of their model being analyzed. However, users of SEM can also take advantage of statistical analysis programs to discern identification status. They can empirically test the identification status of a proposed model by attempting to test the model with SEM software. In the planning stages of a study, before researchers have data, they can even use fabricated data and run the SEM program to empirically check the identification status. An illustration of how to conduct this type of test of identification status of a model is presented shortly. Although identification is a statistical and technical problem, the solution to underidentification is not technical. It requires theoretical work, substantive knowledge, and a sound rationale. To illustrate this point, consider a simple example. Suppose we want to evaluate a model that specifies: Var (y) = q1 + q2 , where Var(y) is the variance of y and is the only piece of input data, and q1 and q2 are two unknown parameters to be estimated. With one piece of input data (i.e., variance of y) and two pieces of unknown parameters (i.e., q1 and q2) the model is underidentified. To solve the problem, the researcher needs to impose one statistical constraint to the model. One, only one, and any one constraint will make the model identified. Suppose the known datum is Var(y) = 10. We can respecify the model by setting, for example, q2 to be zero. With this constraint, we now have two pieces of known information (i.e., variance of y and the value of q2); the model is now overidentified. Alternatively, we can say that the number of knowns in the input matrix (1) exceeds the number of parameters to be estimated (0). With this constraint, q1 is
Advanced Topics
identified and the estimate of q1 is known; it is 10. Alternatively, if we impose the constraint that q1 = q2, then the model is also identified (i.e., q1 = q2 = 5). We quickly recognize that the number of potential solutions to this underidentification problem is virtually unlimited. For instance, setting q1 to be any value will result in the identification of the model and determination of the value of q2 : Given 10 = q1 + q2, If q1 = 1, q2 = 9 If q1 = 1.5, q2 = 8.5 If q1 = 2, q2 = 8 From here, we see that the decision about solving an underidentified model is not a technical one, but one that requires the researcher’s deliberation based on the theoretical model, evidence from prior studies, and other knowledge. What constitutes the best estimate depends on the soundness of the imposed constraint. In summary, when a model is underidentified, one faces an unlimited number of choices to make the model identified. Therefore, it is crucial to assess the pros and cons associated with each possible solution. The decision the researcher makes should be substantively as well as statistically sound. We use an example originally described by Kline (2005, pp. 247–249) to illustrate these points. The theoretical model of this example (Figure 7.1) d1
1
x1
y1
1
x2
d2
y2
x1 – seriousness of the civil disobedience committed by the protesters; x2 – availability of police riot gear; y1 – violence of the protesters; y2 – violence of the police.
Figure 7.1 Hypothetical Model with Nonrecursive Relations. Adapted from diagram in Kline, 2005, p. 248.
177
178
Structural Equation Modeling
concerns a nonrecursive (i.e., feedback loop) relationship between two endogenous variables: violence of protesters and violence of police. Suppose we are at the beginning stages of a research project and have derived a model completely on the basis of theoretical work. To check the identification status of the proposed model, we create a fictitious covariance matrix for the study variables and then submit the fabricated data to SEM analysis. We used Amos to draw and estimate this example. Table 7.3 shows the data file. The only purpose for having the data file was to run Amos so that we can determine whether our theory-based models will be identified when we run them with real data. With the fabricated data, we run the analysis with a software package such as Amos and learn the identification status of the original model, as well as three alternative models. Results are shown in Figure 7.2. As the figure shows, our original model (Model 1) with a nonrecursive relationship between the two endogenous variables is underidentified. There are many choices (i.e., constraints) we could make to make the model identified. We need to select just one constraint. We consider three possibilities, the results of which are also shown in Figure 7.2. If we delete one path (i.e., fix the path from x1 to y2 to be zero, as shown by Model 2), the model will be identified. If we are able to collect data on a new variable that is substantively meaningful (i.e., as shown in Model 3, which adds x3), the model will be identified. Finally, if we constrain the two path coefficients between y1 and y2 to be equal (as shown in Model 4),
Table 7.3 Fabricated Data (in Excel format) Submitted to Amos for an Empirical Test of the Identification Status of a Proposed Model rowtype_
varname_
n
y1
y2
x1
x2
x3
100
100
100
100
100
cov
y1
6.89
cov
y2
6.25
15.58
cov
x1
0.73
0.62
0.54
cov
x2
1.27
1.49
0.99
2.28
cov
x3
0.91
1.17
0.82
1.81
1.98
Advanced Topics Model
Advantage
Disadvantage
1. Original (Fig 7.1)
Derived from theory
Underidentified
2. Delete one path
Identified
The model may be misspecified if it does not reflect current theoretical and empirical knowledge of relationships among variables.
3. Add an exogenous variable x3
Identified
Data on x3 needs to be collected. Potentially difficult or not possible if data collection is already complete.
4. Constrain Identified two paths to be equal
Precludes the detection of potentially unequal mutual influences
Model 2
Model 3 x3 1 d1
x1
1 d1
y1
x1
y1 1 d2
1 d2 x2
x2
y2
Model 4
y2
d1 x1
1 y1 p1 p1
x2
y2
d2 1
Figure 7.2 Solutions and Considerations in Resolving an Identification Problem.
the model will also be identified. Advantages and disadvantages for each choice are described in the figure. This exercise illustrates how each solution to model underidentification implies a different theoretical model. The choice of solutions is not a technical question, but a theoretical one. Like any subjective decision a researcher may make in statistical analysis, the selection of a solution to underidentification requires a thorough understanding of the phenomenon being studied, and sound theoretical and empirical backing.
179
180
Structural Equation Modeling
CONDUCTING MULTIPLE-GROUP COMPARISONS: MODERATION IN SEM Mediation versus Moderation When conceptualizing mediational effects, researchers hypothesize that the influence of an exogenous variable (x) on an endogenous variable (y) is either both direct and indirect, or solely indirect. The indirect effect is exerted through the effects of another endogenous variable called the “mediator.” One of the most attractive features of SEM is its ability to analyze mediational effects elegantly and in one step instead of analyzing them in multiple steps as required with conventional regression analysis. Conceptual models focusing on mediational effects should be categorically distinguished from those concerned with moderating effects (Baron & Kenny, 1986). Researchers are interested in moderating effects if they hypothesize that the impact of an exogenous variable (x) on the endogenous variable (y) varies by the level of another exogenous variable called the moderator (M). SEM is also an attractive option for researchers testing moderation (interaction) effects when the moderator is a categorical, or grouping, variable. The diagrams in Box 7.2 illustrate the difference in how mediation and moderation hypotheses are modelled. Conceptualizing a moderating effect implies that the researchers are interested in the joint impact (i.e., interaction effect) of two exogenous variables (x and M) on an endogenous variable (y), that is, whether M moderates (i.e., reduces or increases) the impact of x on y. Theoretically, an example comes from the risk and resilience framework, in which a protective factor, such as caring adults, is defined as a factor that reduces (or buffers) the negative impact of a risk factor, such as poverty, disability, or institutional discrimination, on the outcomes of a child. In this example, the presence of caring adults (M) moderates the impact of a risk (x) on an outcome (y). Distinguishing differences between mediation and moderation models is credited to the seminal work of Baron and Kenny (1986). An important point made by these authors is that questions of mediation and moderation require different analytic methods, and researchers should be cautious in choosing the method that is appropriate for their research questions. In earlier chapters, we have illustrated the robustness and effectiveness of using SEM to answer research questions concerning mediational effects. SEM also offers valuable flexibility for examining moderation
Advanced Topics Box 7-2 Illustrative Examples of Mediation vs. Moderation WorkLife Balance
1
Supervisor Support
2
Flexible Work Schedule
3
Mediation. Using variables from Jang (2009), sub-parts of the diagram above contain (a) the hypothesis that Supervisor support and Flexible work schedule have direct effects on Work-life balance (paths 1 & 3); (b) the hypothesis that Supervisor support has only an indirect or mediated effect on Work-life balance through its effect on Flexible Work Schedule (paths 2 & 3); and (c) the hypothesis that Supervisor support has both a mediated and direct effect on Work-life balance (paths 1, 2, & 3, also called a partially mediated effect). WorkLife Balance
Supervisor Support
Flexible Work Schedule
Moderation. The above diagram illustrates one common way of representing moderation effects. The hypothesis inherent in the diagram is: the effects of Supervisor support on employees’ Work-life balance is moderated by the degree to which their work schedule is flexible. For example, if agency policy results in low levels of work schedule flexibility, supervisor support may contribute little to employees’ Work-life balance. In agencies with high scheduling flexibility, Supervisor support may have a strong association with Work-life balance.
effects when the moderator is a categorical variable representing subgroups of a sample, such as males and females, private and public service providers, individuals with and without disabilities, and the like. Advantages of using SEM over conventional regression for examining moderation include comparative fit information across corresponding
181
182
Structural Equation Modeling
single- and multiple-group alternative models, separate parameter estimates for each group, the ability to test for differences across more than two groups, and the ability to pinpoint significantly different model parameters. This section focuses on models that can be employed to analyze moderating effects within the SEM framework. Overview of Moderation in SEM The most popular method for analyzing moderating effects in SEM is the multiple-group comparison approach. Suppose variable M is a moderator indicating group membership (e.g., male/female; urban/rural; renter/ homeowner). We aim to find out if and how patterns of the effects of x on y differ by group. To analyze differential patterns of effects of x on y between groups depicted by M, we run a series of SEM analyses for the groups. The procedure requires testing a series, or a hierarchy, of hypotheses. Pairs of models are compared using the c2 difference test to establish which model is most consistent with the data. Many research questions can be tested with multiple-group comparisons. For instance, social workers developing instruments to measure characteristics of individuals, groups, organizations, and systems are interested in testing measurement invariance. Precisely, in the context of scale development, researchers want to know: “… do the items comprising a particular measuring instrument operate equivalently across different populations (e.g., gender, ability, workplace, and cultural groups)? In other words, is the measurement model group-invariant?” (Byrne, 2010, p. 197). The task of testing measurement invariance is to test the same CFA model with several groups and determine whether the factor structure and parameter estimates are statistically the same for the groups. If results indicate that the model “works” differently for different groups, then the model is not invariant across groups. In the psychometric literature, the situation in which the measurement model is different across subpopulations is also known as construct bias, meaning that an instrument or scale measures different constructs in different groups, or measures the same constructs differently. If the researcher is interested in more than a yes or no answer to the question of measurement invariance, SEM permits the examination of how a measurement model operates differently across groups. For example, is the factor structure different across groups (e.g., with more
Advanced Topics
or fewer factors or different indicators for factors across groups), do items load significantly lower or higher on a factor in one group or another, or are factors more or less highly correlated in one group or another? Multiple-group comparisons in CFA also allow researchers to determine if there is partial measurement invariance. In this case, some factor loadings may vary appreciably across groups, but the values of other loadings do not (Kline, 2005, p. 295). Such analyses may provide important information about the validity of measures used in social work practice and research for different groups. Similar research questions can also be examined for general SEMs, with implications for policy, research, and practice. The relevance of a theoretical model for different groups can be tested, for example, with multiple-group comparisons of structural path coefficients. Does age moderate the pathways by which maltreated children achieve permanency? Does mentoring improve outcomes of high-risk youth to the same extent that it helps low-risk youth? Are the effects of federal healthcare policies on the health outcomes of individuals moderated by state service delivery approaches? Does the relationship between falls and health outcomes in the aging population differ by the type of assisted living facility in which individuals reside? Conducting Multiple-Group Comparisons To understand what happens in the series of tests used in multiplegroup analysis, it is useful to first consider what happens in a single-group analysis. In a sample that includes boys and girls, for example, a single-group analysis generates estimates that apply to both groups. The assumption is that the measurement and structural components of the model operate the same for both boys and girls. In a multiple-group analysis, this assumption is tested to determine if better fit can be attained by allowing some or all parameter estimates to vary across groups. Maybe boys are more or less likely than girls to endorse some indicators of depression, or the relationship between social skills and academic performance is different for girls than boys. A model may be more consistent with the data (i.e., have better fit) if these differences are accommodated. If subgroup sample sizes are adequate, more than two groups can be analyzed simultaneously. To perform a multiple-group comparison, researchers first run a baseline model that specifies the “same form” or model for all groups.
183
184
Structural Equation Modeling
Using a separate input covariance matrix for each group, the program generates parameter estimates for each group, but a χ2 value that applies to the entire multiple-group model. Because there are no cross-group constraints in the baseline model—it is the least restrictive model—it has the best possible χ2 statistic. The researchers then constrain one part of the model at a time (for example, the structural paths from exogenous latent variables to the endogenous latent variables, g ’s) to have equal coefficients across the groups. One “part” of the model is usually one matrix, such as the L, the F or the G matrix, but it can also be a subset of elements in a matrix, such as the loadings on only one factor. Because a model in which constraints on parameters have been added is nested in the previous less restrictive model, the fit of the two models can be compared using the same χ2 difference test used to compare the nested CFA models discussed in Chapter 4. The expectation is that a model with additional constraints will have a higher χ2 value, indicating worse fit, than a less restrictive model. The question is: Has the fit become statistically significantly worse with added constraints? As described in Chapter 6, pp. 150–154, the statistical significance of the change in χ2 per the change in degrees of freedom indicates if the model with added constraints can be retained. If the first χ2 difference test leads to acceptance of the null hypothesis that the two models are identical (i.e., the more constrained model is not significantly worse than the first), researchers continue by testing a third model. Parameters in another part of the model (for example, the structural paths between endogenous latent variables, b ’s) are constrained to be equal across the groups. Another χ2 difference test is performed to see whether the third model is statistically different from the second one. If at any point the fit of the model with additional constraints is statistically significantly worse according to the χ2 comparison test, the previous model is retained—that is, the most recently tested parameters are again allowed to be unconstrained. Alternatively, a new test with constraints imposed on only a subset of the parameters in question could be run. Finding that model fit is statistically better when some parameters are allowed to vary across groups indicates that group membership moderates the relationships represented by those parameters. The moderation hypothesis is rejected only when the null hypothesis that models being compared are identical is accepted at every step.
Advanced Topics
One recommended hierarchy of hypotheses testing for group differences in CFA is as follows (Bollen, 1989, pp. 360–369):
1. Hfrom
The same number of factors, pattern of loadings (not values), and pattern of covariances between factors are specified for all groups.
2. HΛx(Fn) for n=1...Q
In addition to Hfrom, factor loadings (l’s or measurement weights) within one factor (or one additional factor at a time) are constrained to be equal across groups.
3. HΛx
In addition to HΛx(Fn), all factor loadings are constrained to be equal across groups.
4. HΛxφ
In addition to HΛx, the variance–covariance matrix of factors (Φ) is constrained to be equal across groups.
5. HΛxφΘδ
In addition to HΛxφ, the matrix of measurement errors (Θδ) is constrained to be equal across groups.
It should be noted that some researchers will stop their tests of invariance at Step 4, given the difficulty of finding identical error matrices. They may conclude that measurement invariance has been achieved, based on support for the null hypotheses up to this point. As long as researchers are transparent about their testing procedures and definition of “invariance,” this conclusion may be acceptable. One hierarchy of hypotheses for testing group differences in general SEM is as follows (Bollen, 1989, p. 357):
1. Hfrom
The same pattern (not values) of fixed, free, and constrained structural paths (elements in the B and Γ matrices), relationships between exogenous factors (elements in the Φ matrix), and between endogenous factor errors (elements in the Ψ matrix) are specified across groups. (Continued)
185
186
Structural Equation Modeling
2. HΓ
In addition to Hfrom, all paths from exogenous to endogenous factors (γ ’s) are constrained to be equal across groups. This model is compared to the Hfrom model.
3. HB
In addition to Hfrom, all paths from endogenous to endogenous factors (β’s) are constrained to be equal across groups. This model is compared to the Hfrom model.
4. HBΓ
In addition to Hfrom, all paths from exogenous to endogenous (Γ’s) and endogenous to endogenous factors (β’s) are constrained to be equal. This model is compared to the Hfrom model.
5. HΓΒΨ
In addition to HΓΒ, the variance–covariance matrix of structural errors (Ψ)is constrained to be equal across groups. This model is compared to HBΓ.
6. HΓΒΨΦ
In addition to HΓΒΨ, elements of the variance–covariance matrix of the exogenous factors are all constrained to be equal across groups. This model is compared to HΓΒΨ.
For in-depth discussions and examples of multiple-group comparisons with CFA and structural models, readers are referred to Bollen (1989) and Byrne (2010), and Byrne, Shavelson, & Muthén (1989). Examples of multiple-group analyses in Amos and in Mplus are presented in the online materials associated with this book.
8
Become a Skillful and Critical Researcher
This book describes structural equation modeling, a robust and useful tool for answering a variety of social work research questions in a comprehensive yet succinct fashion. The book highlights features in SEM permitting analysis of ordinal and categorical observed variables, variables with nonnormal distributions, and datasets with missing and nested data, all of which are common in social work research. The book has emphasized basic foundations of SEM analysis and practical application issues, such as interpreting output and addressing poor fit. The previous chapter was designed to reinforce and hone readers’ skill in identifying and applying rigorous practices in SEM scholarship. This final chapter distills elements of the best practices highlighted in previous chapters and provides concluding remarks about how to become a skillful and critical researcher with SEM. The past two decades have witnessed a proliferation in social work applications of SEM. With the rapid development of computing technology and wide availability of software packages, running SEM analyses is increasingly feasible for social work researchers. However, using SEM properly to answer research questions is challenging. SEM is a useful tool, but one that must be used carefully. Criteria for high-quality statistical
187
188
Structural Equation Modeling
analysis are less clear than the criteria for research design (Guo & Fraser, 2010). In a rapidly developing field such as SEM, criteria may be murky because we are just beginning to understand the sensitivity of models to violations of the assumptions on which they rest. To promote the appropriate use of SEM, we offer the following principles that we believe to be fundamental and paramount for studies using SEM: 1. A sound SEM analysis should be guided by a theoretical model, and any subjective decision made by a user should have theoretical justification and rationale. The importance of having a theoretical model is underscored throughout this book. We have seen that, like other analytic methods, SEM involves a series of subjective decisions. For instance, to make an underidentified model identified, users must decide where to impose a constraint; to improve the fit of a model, users need to free parameters such as correlations among measurement errors. At these junctures, a statistical problem may be resolved, but a research question may also be altered. To what extent do these decisions make sense? How plausible are the rationales provided for analysis decisions? Prior knowledge and theory must be guiding forces in the decisions. 2. When facing multiple choices in modeling, other things being equal, the user should choose the model that is most parsimonious. When formulating a research question, it is important to keep the research question simple—no one can solve all research problems in one study. When the same research question can be answered by several models with different levels of complexity, it is important to choose the simpler model. For instance, when one can use either a recursive or a nonrecursive model to answer a question, the recursive model is preferable. 3. Conduct sensitivity analyses to check violations of assumptions embedded in a model. Because there are so many decision points in SEM analyses, and so many analysis options, determining the sensitivity of results to analysis decisions is a recommended procedure (Saltelli et al., 2008). Possible sensitivity analyses include comparing models run with FIML handling of missing
Become a Skillful and Critical Researcher
values versus multiple imputation of missing values; comparing models run with ML versus an estimator suitable for nonnormally distributed data (if the researcher’s data are not normally distributed); and, comparing models run with and without taking into account data clustering (for multilevel models with low ICCs). To confirm that findings are robust, users want to obtain stable and consistent results across variant input conditions. 4. Seek alternative explanations to a final model and run equivalent or competing models. Closely related to sensitivity analysis is the practice of running equivalent and competing models. This principle is often practiced after a final model is selected from among hierarchical or nonhierarchical alternatives. In SEM, equivalent models are those that yield the same fit (Kline, 2005), but with a different configuration of paths among variables. Competing models hypothesize different relationships among the same set of variables based on competing theory or inconsistent past empirical findings. Unlike equivalent models, alternative models yield different model fit indexes than those of the original model. The objective of testing equivalent and competing models is to explore and rule out alternative explanations and to increase the credibility of the final substantive conclusions. 5. Be transparent and comprehensive in reporting procedures and results. Fully report a priori analysis decisions and model evaluation criteria. Provide a rationale and citations for analysis procedures that are not agreed upon in the literature (i.e., most SEM practices). Indicate what output was used to guide statistical decisions about modifications; as always, provide theoretical justification for modifications. Consult sources on how to write up SEM results (e.g., Bollen, 1989; Hayduk, 1987; Hoyle & Panter, 1995). 6. Continue to learn about SEM and developments in best practices for conducting analyses. The SEM literature is rich and dynamic. Simulation studies, statistical theory advances, new program capabilities, and findings from applied research all contribute to
189
190
Structural Equation Modeling
the ongoing development of the SEM knowledge base. It is important for social work researchers to remain aware of important developments that affect best practices and journal expectations for SEM publications. Looking periodically at the journal Structural Equation Modeling and the SEMnet listserv and attending methodological workshops are useful strategies to stay informed. The online companion website contains a lightly annotated list of the readings that have been referred to throughout this book as good sources of further information on SEM topics. As social work researchers encounter data and analysis issues in their own structural equation models that have not been covered in depth in this Pocket Guide to Social Work Research Methods, we hope these sources serve as a useful starting point for developing expertise in using SEM. In addition to specific instructions and code for using Amos and Mplus, the online resources associated with this book include guidance on how to report SEM findings, and how to replicate and critique reports of SEM studies presented by other researchers. These resources should also be valuable to readers who themselves want to become resources to other social work researchers interested in structural equation modeling.
Glossary
alternative models Alternative models are models that might statistically explain the data as well (or better than) the model hypothesized by the researcher, but that do so with a different arrangement of relationships among the same variables. Alternative models offer a competing explanation of the data. Researchers should propose and estimate alternative models and justify why their preferred model should be retained over an explanation offered by a statistically equivalent alternative model. Comparative fit index (CFI) CFI is one of several indices available to assess model fit. A value between 0.90 and 0.95 indicates acceptable fit, and above 0.95 indicates good fit. chi-square (χ2) The most basic and common fit statistic used to evaluate structural equation models; chi-square should always be provided in reports on SEM analyses. Chi-square values resulting in a nonsignificant p-value (i.e., p > 0.05) indicate good model fit. The chi-square statistic is directly affected by the size of the sample being used to test the model. With smaller samples, this is a reasonable measure of fit. For models with more cases, the chi-square is more frequently statistically significant. Chi-square is also affected by the size of the correlations in the model: the larger the correlations, the poorer the fit. For these reasons alternative measures of fit have been developed. Both the chi-square and alternative fit indices should be reported for SEM analyses. constrained parameter Constrained parameters are those where the value of one parameter is set (or constrained) to equal some function of other parameters in the model. The most basic constraint is to set one parameter equal to another parameter. In this example, the value of the constrained parameter is not 191
192
Glossary estimated by the analysis software; rather, the unconstrained (free) parameter will be estimated by the analysis software, and this value will be applied to both parameters. A parameter is not considered constrained when its value is freely estimated (see free parameters) or when it is set to a specific value (e.g., zero) that is not dependent on the value of any other parameters in the model (see fixed parameters). control variables Control variables, also known as covariates, are variables included in an analysis because they are known to have some relationship to the outcome variable. The parameter estimates of control variables are not explicitly of interest in the current analysis, but in order to obtain the most accurate estimates of a model’s substantive relationships, it is necessary to “remove” or “control” for the control variables’ effects. Gender and race/ethnicity are common control variables. They are often included in models because they are known to be related to outcomes, even if the mechanisms of their effects are unclear. convergence Convergence is a term that describes the status of estimating model parameters using a maximum likelihood estimator and typically refers to obtaining a stable solution during the modeling process. In model estimation, the program obtains an initial solution and then attempts to improve these estimates through an iterative process of successive calculations. Iterations of model parameter estimation continue until discrepancies between the observed covariances (i.e., the covariances of the sample data) and the covariances predicted, or implied, by the researcher’s model are minimized. Convergence occurs when the incremental amount of improvement in model fit resulting from an iteration falls below a predefined (often default) minimum value. When a model converges, the software provides estimates for each of the model parameters and a residual matrix. Cook’s distance (Cook’s D) A statistic that reflects how much each of the estimated regression coefficients change when the ith case is removed. A case having a large Cook’s D (i.e., greater than 1) indicates that the case strongly influences the estimated coefficients. Cook’s D is used as a multivariate nonnormality diagnostic to detect influential cases. correlation Correlation is a standardized measure of the covariance of two variables. The correlation of two variables can be obtained by dividing their covariance by the product of their standard deviations. Correlation values range from –1 to 1. A value of 0 indicates no correlation. Values of –1 and 1 are equal in terms of magnitude, but a value of –1 indicates that scores on one variable always go up when values on the other variable go down. A positive correlation indicates that when scores on one variable increase, scores on the other tend to increase. covariance Covariance is a measure of how much the values of two variables vary together across sample members. The formula for the covariance between
Glossary two variables looks a little like the formula for variance except that, instead of multiplying the difference from the mean of each sample member’s score on a variable by itself (i.e., squaring the difference), the differences of sample members’ scores from the mean on one variable are multiplied by their difference scores from the mean on the other variable. Covariance is the basic statistic of SEM analyses. Cov is an abbreviation for covariance. covariates Covariates, also known as control variables, are variables included in an analysis whose parameter estimates are of interest but are not the major substantive variables. cross-sectional data Cross-sectional data are data collected on or about only one point in time. These data can be used to identify associations between variables but do not permit claims about time order of variables. degrees of freedom The degrees of freedom (df) of an SEM model is the difference between the number of data points and the number of parameters to be estimated. The number of data points (i.e., unique matrix elements or known pieces of data) for an SEM model is the number of unique variances and covariances in the observed data being analyzed. direct effect A variable has a direct effect on another variable when the variable’s influence is not exerted through another endogenous variable. That is, the effect is not mediated by an intervening variable. In the example below, discrimination (variable A) has a direct effect on historical loss (variable B) and a direct effect on alcohol abuse (variable C), represented by paths BA and CA, respectively. Historical loss (variable B) has a direct effect on alcohol abuse (variable C), which is represented by path CB. In this example, discrimination also has an indirect effect on alcohol abuse. Discrimination (varA)
CA
Alcohol abuse (varC)
BA
CB Historical loss (varB)
endogenous variable Endogenous variables are variables in a model that are explained or predicted by one or more other variables within the model. If a variable serves as a dependent variable in at least one equation represented in a model, it is considered endogenous and is notated by the Greek symbol η (eta). It is important to remember that an endogenous variable may also explain or predict another endogenous variable in the model (i.e., it may also be the independent variable in one or more equations represented in the model).
193
194
Glossary estimation Estimation is the process of analyzing the model by using the known information (e.g., covariances of the sample data) to estimate values for the unknown model parameters. In SEM, the goal of estimation is to obtain the parameter estimates that minimize the discrepancies between the covariance matrix implied by the researcher’s model and the covariance matrix of the observed (i.e., input) data. In SEM, estimation is both simultaneous (i.e., all model parameters are calculated at once) and iterative (i.e., the program obtains an initial solution and then attempts to improve these estimates through successive calculations). Many different estimation procedures are available (e.g., maximum likelihood, WLSMV), and the choice of estimation method is guided by characteristics of the data including sample size, measurement level, and distribution. exogenous variable Exogenous variables are variables in a model that are not explained or predicted by any other variables in the model. That is, the variable does not serve as a dependent variable in any equations represented in the model. By defining variables as exogenous, the researchers claim that these variables are predetermined, and examining causes or correlates of these variables is not the interest of the current study. Exogenous variables are represented in SEM notation by the Greek symbol ξ (ksee). factor loading A factor loading is a statistical estimate of the path coefficient depicting the effect of a factor on an item or manifest variable. Factor loadings may be in standardized or unstandardized form and are usually interpreted as regression coefficients. fixed parameter Fixed parameters are parameters represented in a model that the researcher does not allow to be estimated from the observed data. Rather, the value specified by the researcher is used by the analysis software as the obtained value of the parameter. Fixed parameters may be set at any value, but the most common are zero (i.e., to indicate no relationship between variables) and unity (or 1.0, e.g., when establishing the metric for a latent variable by fixing an indicator’s loading). free parameter Free parameters are those parameters represented in a model that the researcher allows to be estimated from the observed data by the analysis software. That is, the estimated value is not set or constrained to any particular value by the researcher but is left “free” to vary. Free parameters allow hypothesized relationships between variables to be tested. indicator Indicators, also known as manifest variables or items, are observed variables. In CFA, indicators are the observed variables that are used to infer or indirectly measure latent constructs. identification See model identification. implied matrix The implied matrix is the matrix of variances and covariances suggested (i.e., implied) from the relationships represented in a hypothesized
Glossary SEM model. Model fit is determined by the extent to which the model-implied variance–covariance matrix reproduces the matrix from the observed data (i.e., the input matrix). indirect effect A variable has an indirect effect on another variable when the effect is partially or fully exerted through at least one intervening variable. This intervening variable is called a mediator. In the example below, the effects of the social environment on children’s school success is mediated by, or explained by, the social environment’s effect on psychological wellbeing. In this example, the social environment has an indirect effect on school success.
Social Environment
School Success Psychological Well-Being
input matrix The input matrix is the variance–covariance or correlation matrix of the observed (i.e., input) variables. Model fit is determined by the extent to which the model-implied variance–covariance matrix reproduces the matrix from the observed data (i.e., the input matrix). just-identified model An identified model in which the number of free parameters exactly equals the number of known values. This model will have zero degrees of freedom. The number of “knowns” exactly equals the number of “unknowns.” latent variable An important distinction in SEM is between observed variables and latent variables. Latent variables are theoretical, abstract constructs or phenomena of interest in a study, such as attitudes, cognitions, social experiences, and emotions. These variables cannot be observed or measured directly and must be inferred from measured variables. They are also known as factors, constructs, or unobserved variables. Constructs such as intelligence, motivation, neighborhood engagement, depression, math ability, parenting style, organizational culture, and socioeconomic status can all be thought of as latent variables. linear regression Linear regression is a statistical procedure in which there is a hypothesis about the direction of the relationship between one or more independent variables and a dependent variable. If a dependent variable is regressed on only one independent variable, the standardized regression coefficient (beta) that is obtained will be the same as the correlation between
195
196
Glossary the two variables. The unstandardized regression coefficient for a variable in a linear regression equation indicates the amount of change in the dependent variable that is expected for a one-unit change in the independent variable using the independent variable’s original metric. If variables are standardized, λ “is the expected shift in standard deviation units of the dependent variable that is due to a one-standard deviation shift in the independent variable” (Bollen, 1989, p. 349). longitudinal data Longitudinal data are data that measure people or phenomena over time. Cross-sectional data—data collected on or about only one point in time—can be used to identify associations between variables; longitudinal data also permit claims about time order of variables. Short-term longitudinal studies may include pretest, posttest, and follow-up observations. More traditional longitudinal studies may include data collected at many time points over weeks, months, or years. Both types of longitudinal study can be accommodated in the SEM framework, albeit with different strategies. Mahalanobis distance A statistic that indicates (in standard deviation units) the distance between a set of scores for an individual case and the sample means for all variables. It is used as a diagnostic to assess for multivariate nonnormality. Mahalanobis distance is distributed on a chi-squared distribution with the degrees of freedom equaling the number of predictor variables used in the calculation. Individual cases with a significant Mahalanobis distance (at the p < 0.001 level) would likely be an outlier. matrix A matrix is a set of elements (i.e., numbers, values, or quantities) organized in rows and columns. Matrices vary in size based on the number of variables included and summarize raw data collected from or about individuals. The simplest matrix is one number, or a scalar. Other simple matrices are vectors, which comprise only a row or column of numbers. Analysis of SEM models relies on the covariance or correlation matrices. measurement error Measurement error refers to the difference between the actual observed score obtained for an individual and the individual’s “true” (unknowable) score for an indicator. In SEM, measurement error represents two sources of variance in an observed indicator: (a) random variance and (b) systematic error specific to the indicator (i.e., variation in indicator scores that is not caused by the latent variable(s) modeled in the measurement model, but by other unobserved factors not relevant to the current model). mediation Mediation occurs when one variable explains the effect of an independent variable on a dependent variable. In the example below, the effects of the social environment on children’s school success is mediated by, or explained by, the social environment’s effect on health and well-being. In this example, the effect of the social environment on school success is mediated by psychological well-being.
Glossary
Social Environment
School Success Psychological Well-Being
model fit Model fit refers to how well the hypothesized model explains the data (i.e., how well the model reproduces the covariance relationships in the observed data). Many indices to assess model fit are available. It is a good practice to use and report multiple fit measures to evaluate the fit of a model because each statistic/index is developed on its own assumptions about data, aims to address different features of model fit, and has both advantages and disadvantages. “Good fit” does not guarantee that a model is valid or that all parameters in a hypothesized model are statistically significant or of the magnitude expected. model identification Model identification concerns whether a unique estimate for each parameter can be obtained from the observed data. General requirements for identification are that every latent variable is assigned a scale (e.g., the metric of the variable is set by fixing the factor loading of one of its indicators to zero) and that there are enough known pieces of data (observed information) to make all the parameters estimates requested in a model. Models may be described as underidentified, just-identified, or overidentified. SEM models must be overidentified in order to test hypotheses about relationships among variables. model specification Model specification involves expressing the hypothesized relationships between variables in a structural model format. Models should be based on theory and previous research. Models are commonly expressed in a diagram but can also be expressed in a series of equations or in matrix notation. During model specification, the researcher specifies which parameters are to be fixed to predetermined values and which are to be freely estimated from the observed data. moderation Moderation occurs when the magnitude or direction of the effect of one variable on another is different for different values of a third variable. Gender, for example, would be a moderator of the relationship between social environment and psychological well-being if the regression coefficient for social environment was significantly higher for boys than for girls. In standard multiple regression models, moderation is identified through significant interaction terms. In the SEM framework, moderation is tested using multiple group
197
198
Glossary analyses. Multiple-group analyses not only indicate if a variable moderates the effects of one or more independent variables on a dependent variable, but they also provide regression coefficients for each level of the moderator (e.g., for boys and girls). Multiple group analyses in confirmatory factor analysis indicate if a measurement model differs significantly for one group versus another and, if so, which parameters differ. modification indices Modification indices are statistics indicating how much model fit can be improved by changing the model to allow additional parameters to be estimated. Modification indices are either provided by default by the analysis software or requested by the user. Changes to hypothesized models should not be made based solely on modification indices; changes must be substantively and theoretically justifiable, not just statistically justifiable. Multiple-group analysis Multiple group analysis is a technique to test whether the measurement and structural components of the model operate the same for different groups. In a single-group SEM analysis, the assumption is that parameter estimates are the same for all groups in the sample (e.g., males and females, doctors and nurses, renters and homeowners). In a multiple-group analysis, this assumption is tested to determine if better fit can be attained by allowing some or all parameter estimates to vary across groups. Multiplegroup analysis can be used in CFA to assess whether a scale performs equally well for different groups (e.g., high school versus college students). nested model A nested model is a subset of another model; that is, a nested model contains a subset of the parameters but all the same observed variables as the model in which it is nested. Nested models are commonly used to test alternative explanations of the data. Nested models contain all the same observed variables as the models in which they are nested, but different parameter configurations (e.g., omitting a path between two latent variables; constraining a path, such as a factor loading, to be equal for two groups). nonconvergence Nonconvergence of a model occurs when the iterative estimation process is unsuccessful in obtaining a stable solution of parameter estimates. nonrecursive model A nonrecursive model has one or more feedback loops in the structural part of the model or has correlated structural errors. That is, effects between variables may be bidirectional, or there are correlated errors between endogenous variables that have a direct effect between them. observed variable (manifest variables) An important distinction in SEM is between observed variables and latent variables. Observed variables are variables that are actually measured for a sample of subjects during data collection. Observed variables, which are sometimes referred to as manifest variables, may come from a number of sources, such as answers to items on a questionnaire, performance on a test or assessment, or ratings provided by an observer.
Glossary overidentified model An overidentified model is a model for which the number of parameters to be estimated is lower than the number of unique pieces of input data. An overidentified model places constraints on the model, allowing for testing of hypotheses about relationships among variables. parameter A parameter is a property of a population; population parameters are estimated using statistics obtained from sample data. The primary parameters of interest in an SEM are the variances, regression coefficients, and covariances among variables. When specifying a model, the researcher must choose whether a parameter represented in the model will be free, fixed, or constrained based on an a priori hypothesis about the relationships between variables. power Power refers to the statistical ability to reject a false hypothesis. Power is affected by the probability of making a Type I error (α; i.e., rejecting a hypothesis that in fact is true and should not be rejected), sample size, and effect size. Researchers generally desire a large level of power, such as 0.80. recursive model A recursive model is a structural model that has no paths that create a feedback loop or reciprocal causation. That is, all effects between variables are one directional, and there are no correlated errors between endogenous variables that have a direct effect between them. residual matrix The residual matrix is the matrix containing the differences between corresponding elements in the analyzed and implied matrices. It is obtained by subtracting each element of the implied matrix from its counterpart in the input matrix. If the elements of a residual matrix are small and statistically indistinguishable from zero, then the analyzed model fits the data well. RMSEA RMSEA, or the root mean square error of approximation, is one of many model fit indices available to assess how close the implied matrix is to the observed variance–covariance matrix. It is a per-degree-of-freedom measure of discrepancy. RMSEA values ≤ 0.05 indicate close fit, values between 0.05 and 0.08 indicate reasonable fit, and values ≥ 0.10 indicate poor fit. simultaneous regression equations Equations in which one variable can serve as both an independent and a dependent variable. The ability to estimate simultaneous regression equations is a critical feature of SEM and one of the key advantages of SEM over other methods. standard deviation The standard deviation of a variable is the square root of its variance and is a summary measure of how much scores obtained from a sample vary around (or deviate from) their mean. Unlike variance, it is in the same metric as the variable. SD or s may be used to denote standard deviation. specification error Specification error occurs when an assumption made in a structural model is false. For example, if a path in a model is set equal to zero
199
200
Glossary (e.g., no line connecting two variables indicates a correlation of zero), but the true value of that path is not exactly zero (e.g., there is in fact some correlation, however small, between the variables), then there is specification error in the model. It is reasonable to expect that all models contain some amount of specification error. One goal of model specification is to propose a model with the least specification error. standard error A standard error is the standard deviation of the sampling distribution of a statistic. In statistical analysis, researchers use a statistic’s standard error to construct a 95% confidence interval or to conduct statistical significance test. structural error The structural error for any dependent variable in a structural model is the variance of the variable that is not explained by its predictor variables. In a general structural model, any variable that is regressed on others in the model has an error term representing structural error. Structural error can also be thought of as the error of prediction because, as in all regression analyses, variance in a dependent variable is unlikely to be completely explained by the variables in the model; rather, it is likely to be influenced, or predicted, by something other than the variables included in a model. TLI TLI, or the Tucker-Lewis index, is one of many indices available to assess model fit. TLI values above 0.95 indicate good fit. total effect Total effect refers to the sum of all effects, both direct and indirect, of one variable on another variable. Direct effects + indirect effects = total effect underidentified model An underidentified model is one in which the number of parameters to be estimated exceeds the number of unique pieces of observed data. unobserved variable See “Latent Variable.” variance Variance is a summary measure of how much the scores on a variable from a set of individuals (a sample) vary around the mean of the scores. Mathematically, variance is the sum of the squared differences from the mean of all of a sample’s scores on a variable, divided by the number of scores (for the population data) or number of scores minus 1 (for the sample data). A common symbol for the variance of a variable in a sample is σ2 or s2. variance–covariance matrix A variance–covariance matrix contains the variances of each variable along the main diagonal and the covariances between each pair of variables in the other matrix positions. This matrix (or its corresponding correlation matrix plus standard deviation and mean vectors) is central to SEM analysis: it provides the data for the SEM analysis, and it is the foundation for testing the quality of a model. The quality of a model is measured in terms of how closely the variance–covariance matrix implied by the researcher’s model reproduces the observed (i.e., the input) variance– covariance matrix of the sample data.
Glossary variance inflation factor (VIF) A statistic that is widely used as a diagnostic to detect multicollinearity. VIF measures how much the variance of an estimated regression coefficient is increased (inflated) because of collinearity. A maximum VIF greater than 10 indicates a potentially harmful multicollinearity problem. WRMR Weighted root mean square residual is one of several fit indices available to assess model fit. WRMR is provided in Mplus output only (not Amos). WRMR values ≤ 0.90 are suggestive of good model fit.
201
Appendix 1
Guide to Notation used in SEM Equations, Illustrations, and Matrices
Symbol
Name
Definition Measurement model notation
x
x
Observed indicators of latent exogenous variables (ξ)
y
y
Observed indicators of latent endogenous variables (η)
δ
delta
Measurement errors for x-indicators
ε
epsilon
Measurement errors for y-indicators
λ
lambda
Factor loadings (coefficients) of observed indicators on latent variables
Λx
lambda x
Matrix of coefficients (factor loadings) for x-indicators; individual matrix elements are indicated by lowercase lambda (λ) (Continued)
202
Appendix 1 (Continued) Symbol
Name
Definition Measurement model notation
Λy
lambda y
Matrix of coefficients (factor loadings) for y-indicators; individual matrix elements are indicated by lowercase lambda (λ)
Θδ
theta-delta
Covariance matrix of δ (measurement errors for x-indicators); individual matrix elements are indicated by lowercase delta (δ)
Θε
theta-epsilon
Covariance matrix of ε (measurement errors for y-indicators); individual matrix elements are indicated by lowercase epsilon (ε)
Structural model notation η
eta
Latent endogenous variables
ξ
xi
Latent exogenous variables
ζ
zeta
Structural error associated with latent endogenous variables (error of prediction)
B
beta
Matrix of regression coefficients for paths between latent endogenous variables; individual matrix elements are indicated by lowercase beta (β)
Γ
gamma
Matrix of regression coefficients for paths between latent exogenous variables and latent endogenous variables; individual matrix elements are indicated by lowercase gamma (γ)
Φ
phi
Covariance matrix of latent exogenous variables; individual matrix elements are indicated by lowercase phi (φ)
Ψ
psi
Covariance matrix of latent errors; individual matrix elements are indicated by lowercase psi (ψ)
203
Appendix 2
Derivation of Maximum Likelihood Estimator and Fitting Function
Because fundamental assumptions are embedded in the derivation of the ML and because the discrepancy function ML aims to minimize is so crucial to the evaluation of model fit, we provide a brief description of the derivation of the ML. This description follows Kaplan (2000). Assuming that the observations are derived from a population that follows a multivariate normal distribution, we have the multivariate normal density function for each individual, as: f z) = (
)−(( p
q )/ 2
Σ
1/ 2
e p exp
⎡ 1 ⎤ z ’ Σ −1z ⎥ . ⎣ 2 ⎦
where z is a vector of values of the observed variables for a single individual, Σ is the population variance-covariance matrix of the observed variables, p is the number of exogenous variables, and q is the number of endogenous variables. Assuming that the N observations are independent of one another, the joint density function can be written as the product of the individual densities: f z1, z 2 , , z N )
204
f z )ff z ) f z N ).
Appendix 2 Based on these stated assumptions, we can derive a likelihood function for ML: L( ) (
)
N ( p q )/2
Σ(( )
− N /2
p
⎡ 1 N ¢ ∑zj ⎣ 2 i =1
1
⎤ (θ)z i ⎥ ⎦
As a convention, we work with the log likelihood function by taking the logarithm on both sides of the equation. Maximizing the log likelihood requires obtaining the first derivatives with respect to the parameters of the model, setting the derivatives equal to 0, and solving the equation. That is, setting ∂ log (q ) = 0 ∂q and solving the equation for q. In practice, this procedure has been transformed into a fitting function, or more generally a “discrepancy function” which ML aims to minimize. The fitting or discrepancy function associated with ML (FML) can be expressed as: FML
l g (q )
t ⎡⎣S tr
−1
(q )⎤⎦ − log S
( p + q)
where Σ(θ) is the model implied matrix, and S is the sample observed covariance matrix. The asymptotic covariance matrix of the ML estimation for q is the second-order derivatives of the likelihood function log L (q ): −1
⎧ ⎡ ∂ 2 log L(θ) ⎤ ⎫ ^ a cov(θ) ⎨ E ⎢ ⎥⎬ . ⎩ ⎣ ∂θ∂θ′ ⎦ ⎭ These second-order derivatives relate to the fitting function FML in the following way: −
∂ 2 log (q ) ⎛ N − 1⎞ ∂ 2 FML = . ⎝ 2 ⎠ ∂ ∂q ′ ∂ ∂q ′
The above derivation shows that the ratio of an estimated parameter to its standard error should approximate to a standard normal distribution for large samples. Therefore, we can perform a Z (or t) test to determine the statistical significance of parameters. In summary, the MLE algorithm assumes that the sample data should follow a multivariate normal distribution, and observations in the sample should be independent from one another. These assumptions are prone to violation in practice. For instance, variables based on Likert scales, are ordinal and often will not follow a multivariate normal distribution. When study observations are
205
206
Appendix 2 nested (e.g., students are nested within classrooms), the sample data violate the independent-observations assumption. Researchers should take remedial measures when these assumptions are violated. Finally, the discrepancy function is the key function of SEM, on which many statistics and procedures are derived to assess the goodness-of-fit of a model to empirical data.
References
Abbott, A. A. (2003). A confirmatory factor analysis of the Professional Opinion Scale: A values assessment instsrument. Research on Social Work Practice, 13(5), 641–666. Acock, A. C. (2005). Working with missing values. Journal of Marriage and the Family, 67, 1012–1028. Allison, P. D. (2002). Missing data. Thousand Oaks, CA: Sage. American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed., text revision). Washington, DC: author. Anderson, J. C., & Gerbing, D. W. (1988). Structural equation modeling in practice: A review and recommended two-step approach. Psychological Bulletin, 103(3), 411–423. Arbuckle, J. L. (1983–2007). Amos (Version 16.0.1). Spring House, PA: Amos Development Corporation. Arbuckle, J. L. (1995–2007). Amos 16.0 User’s Guide. Spring House, PA: Amos Development Corporation. Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173–1182. Benda, B. B., & Corwyn, R. F. (2000). A theoretical model of religiosity and drug use with reciprocal relationships: A test using structural equation modeling. Journal of Social Service Research, 26(4), 43–67. Bentler, P. W., & Wu, E. J. C. (1995). EQS for Windows. Encino, CA: Multivariate Software, Inc.
207
208
References Bentler, P. M., & Wu, E. J. C. (2001). EQS for Windows User’s Guide. Encino, CA: Multivariate Software, Inc. Bollen, K. A. (1989). Structural equations with latent variables. New York: John Wiley & Sons. Bollen, K. A. (2000). Modeling strategies: In search of the Holy Grail. Structural Equation Modeling, 7(1), 74–81. Bollen, K. A., & Curran, P. J. (2006). Latent curve models: A structural equation perspective. Hoboken, NJ: John Wiley & Sons. Bowen, G. L., Richman, J. M., & Bowen, N. K. (2002). The School Success Profile: A results management approach to assessment and intervention planning. In A. R. Roberts & G. J. Greene (Eds.), Social workers’ desk reference (pp. 787–793). New York: Oxford University Press. Bowen, G. L., Rose, R. A., & Bowen, N. K. (2005). Reliability and validity of the School Success Profile. Philadelphia, PA: Xlibris. Bowen, N. K. (2011). Child-report data and assessment of the social environment in schools. Research on Social Work Practice, 21, 476–486. Bowen, N. K. (2008a). Cognitive testing and the validity of child-report data from the Elementary School Success Profile. Social Work Research 32, 18–28. Bowen, N. K. (2006). Psychometric properties of the Elementary School Success Profile for Children [Instrument Development]. Social Work Research, 30(1), 51–63. Bowen, N. K. (2008b). Validation. In W. A. Darity Jr. (Ed.), International encyclopedia of the social sciences (2nd ed., Vol. 8, pp. 569–572). Detroit: Macmillan Reference. Bowen, N. K., Bowen, G. L., & Ware, W. B. (2002). Neighborhood social disorganization, families, and the educational behavior of adolescents. Journal of Adolescent Research, 17(5), 468–490. Bowen, N. K., Bowen, G. L., & Woolley, M. E. (2004). Constructing and validating assessment tools for school-based practitioners: The Elementary School Success Profile. In A. R. Roberts & K. R. Yeager (Eds.), Evidence-based practice manual: Research and outcome measures in health and human services (pp. 509–517). New York: Oxford University Press. Bower, H. A., Bowen, N. K., & Powers, J. D. (in press). Family-faculty trust as measured with the ESSP. Children & Schools. Bride, B. E., Robinson, M. M., Yegidis, B., & Figley, C. R. (2004). Development and validation of the Secondary Traumatic Stress Scale. Research on Social Work Practice, 14(1), 27–35. Browne, M. W. (1982). Covariance structures. In D. M. Hawkins (Ed.), Topics in applied multivariate analysis (pp. 72–141). Cambridge, UK: Cambridge University.
References Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit, in K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 136–162). Newbury Park, CA: Sage. Byrne, B. M. (2010). Structural equation modeling with Amos: Basic concepts, applications, and programming (2nd ed.). New York: Taylor and Francis Group. Byrne, B. M., Shavelson, R. J., & Muthén, B. (1989). Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin, 105(3), 456–466. Chen, C., Bollen, K. A., Paxton, P., Curran, P. J., & Kirby, J. B. (2001). Improper solutions in structural equation models. Sociological Methods and Research, 29(4), 468–508. Chou, C.-P., & Bentler, P. M. (1995). Estimates and tests in structural equation modeling. In R. H. Hoyle (Ed.), Structural Equation Modeling: Concepts, Issues, and Applications (pp. 37–55). Thousand Oaks, CA: Sage Publications. Clyburn, L. D., Stones, M. J., Hadjistavropoulos, T., & Tuokko, H. (2000). Predicting caregiver burden and depression in Alzheimer’s disease. Journals of Gerontology. Series B, Social Sciences, 55B(1), S2–S13. Cohen, J. (1988). Statistical power analysis for the behavioral sciences, second edition, Hillsdale, New Jersey: Lawrence Erlbaum Associates. Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates. Colarossi, L. G., & Eccles, J. S. (2003). Differential effects of support providers on adolescents’ mental health, Social Work Research, 27(1): 19–30. Cole, D. A., & Maxwell, S. E. (2003). Testing mediational models with longitudinal data: Questions and tips in the use of structural equation modeling. Journal of Abnormal Psychology, 112(4), 558–577. Conway, M., Mendelson, M., Giannopoulos, C., Csank, P. A. R., & Holm, S. L. (2004). Childhood and adult sexual abuse, rumination on sadness, and dysphoria. Child Abuse & Neglect, 28(4), 393–410. Cook, T. D. (2005). Emergent principles for the design, implementation, and analysis of cluster-based experiments in social science. Annals of the Academy of Political and Social Sciences, 599, 176–198. Cook, R. D., & Weisberg, S. (1982). Residuals and influence in regression. New York: Chapman & Hall. Crouch, J. L., Milner, J. S., & Thomsen, C. (2001). Childhood physical abuse, early social support, and risk for maltreatment: Current social support as a mediator of risk for child physical abuse. Child Abuse & Neglect, 25(1), 93–107. DeVellis, R. F. (2003). Scale development: Theory and applications (2nd ed.). Thousand Oaks, CA: Sage.
209
210
References Ecob, R., & Cuttance, P. (1987). An overview of structural equation modeling. In P. Cuttance & R. Ecob (Eds.), Structural equation modeling by example: Applications in educational, sociological, and behavioral research (pp. 9–23). New York: Cambridge University Press. Enders, C. K., & Bandalos, D. L. (2001). The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Structural Equation Modeling, 8(3), 430–457. Fabrigar, L. R., Porter, R. D., & Norris, M. E. (2010). Some things you should know about structural equation modeling but never thought to ask. Journal of Consumer Psychology, 20, 221–225. Ferron, J. M., & Hess, M. R. (2007). Estimation in SEM: A concrete example. Journal of Educational and Behavioral Statistics, 32(1), 110–120. Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychological Methods, 9(4), 466–491. Gerbing, D. W., & Anderson, J. C. (1984). On the meaning of within-factor correlated measurement errors. Journal of Consumer Research, 11, 572–580. Glisson, C., Hemmelgarn, A. L., & Post, J. A. (2002). The Shortform Assessment for Children: An assessment and outcome measure for child welfare and juvenile justice. Research on Social Work Practice, 12(1), 82–106. Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549–576. Guo, S., & Fraser, M. W. (2010). Propensity score analysis: Statistical methods and applications. Thousand Oaks: CA: Sage. Guo, S., & Lee, C. K. (January 2007). Statistical power of SEM in social work research: Challenges and strategies. Paper presented at the Eleventh Annual Conference of the Society of Social Work Research. San Francisco. Hayduk, L. A. (1987). Structural equation modeling with LISREL: Essentials and advances. Baltimore, MD: The Johns Hopkins University Press. Hayduk, L. A., & Glaser, D. N. (2000). Jiving the four-step, waltzing around factor analysis, and other serious fun. Structural Equation Modeling, 7(1), 1–35. Heinrich, C. J., & Lynn Jr., L. E. (2001). Means and ends: A comparative study of empirical methods for investigating governance and performance. Journal of Public Administration Research and Theory, 11, 109–138. Holmes, T. H., & Rahe, R. H. (1967). The social readjustment rating scale. Journal of Psychosomatic Research, 11, 213–218. Hoyle, R. H., & Panter, A. (1995). Writing about structural equation models. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 158–176). Thousand Oaks, CA: Sage Publications.
References Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55. Iacobucci, D. (2009). Everything you always wanted to know about SEM (structural equation modeling) but were afraid to ask. Journal of Consumer Psychology, 19, 673–680. Jang, S. J. (2009). The relationships of flexible work schedules, workplace support, supervisory support, work-life balance, and the well-being of working parents. Journal of Social Service Research, 35(2), 93–104. Jöreskog, K. G. (1993). Testing structural equation models. In K. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 294–316). Newbury Park, CA: Sage Publications. Jöreskog, K. G. (2005). Structural equation modeling with ordinal variables using LISREL. Retrieved March 9, 2007, from www.ssicentral.com/lisrel/techdocs/ordinal.pdf. Jöreskog, K. G., & Sörbom, D. (1999). LISREL 8: User’s reference guide. Chicago: Scientific Software International. Kaplan, D. (2009). Structural equation modeling: Foundations and extensions (2nd ed.). Thousand Oaks, CA: Sage Publications, Inc. Kelly, T. M., & Donovan, J. E. (2001). Confirmatory factor analyses of the Alcohol Use Disorders Identification Test. Journal of Studies on Alcohol, 62(6), 838–842. Kiesner, J., Dishion, T. J., Poulin, F., & Pastore, M. (2009). Temporal dynamics linking aspects of parent monitoring with early adolescent antisocial behavior. Social Development, 18(4), 765–784. Kim, H., & Lee, S. Y. (2009). Supervisory communication, burnout, and turnover intention among social workers in health care settings. Social Work in Health Care, 48(4), 364–385. Kline, R. B. (2005). Principles and practice of structural equation modeling (2nd ed.). New York: Guilford. Kreft, I. G. (1996). Are multilevel techniques necessary? An overview including simulation studies. Unpublished manuscript. Los Angeles, CA: California State University. Kreft, I. G. & de Leeuw, J. (1998). Introducing multilevel modeling. Thousand Oaks, CA: Sage Publications, Inc. Kutner, M. H., Nachtsheim, C. J., & Neter, J. (2004). Applied linear regression models (4th ed.). New York, NY: McGraw-Hill Irwin. Lee, V. E. (2000). Using hierarchical linear modeling to study social contexts: The case of school effects. Educational Psychologist, 35, 125–141.
211
212
References Li, F., & Acock, A. C. (1999). Latent curve analysis: A manual for research data analysts. Eugene, OR: Authors. Available online at: http://oregonstate.edu/ dept/hdfs/papers/lgcmanual.pdf. Little, R. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). New York: Wiley & Sons. Long, J. S. (1983). Confirmatory factor analysis. New York: Sage. MacCallum, R. C., Browne, M. W., & Cai, L. (2006). Testing differences between covariance structure models: Power analysis and null hypothesis. Psychological Methods, 11(1), 19–35. MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1, 130–149. McArdle, J., & Bell, R. Q. (2000). An introduction to latent growth models for developmental data analysis. In T. D. Little, K. U. Schnabel & J. Baumert (Eds.), Modeling longitudinal and multilevel data (pp. 69–107). Mahwah, NJ: Lawrence Erlbaum Associates. Maitland, S. B., Dixon, R. A., Hultsch, D. F., & Hertzog, C. (2001). Well-being as a moving target: Measurement equivalence of the Bradburn Affect Balance Scale. Journal of Gerontology B, 56(2), 69–77. McGowan, B. G., Auerbach, C., & Strolin-Goltzman, J. S. (2009). Turnover in the child welfare workforce: A different perspective. Journal of Social Service Research, 35(3), 228–235. McMurtry, S. L., & Torres, J. B. (2003). Initial validation of a Spanish-language version of the Client Satisfaction Inventory. Research on Social Work Practice, 12(1), 124–142. Muthén, B. O. (1998–2004). Mplus Technical Appendices. Los Angeles, CA: Muthén & Muthén. Online at: http://www.statmodel.com/techappen.shtml. Muthén, B., du Toit, S. H. C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Unpublished technical report. Muthén, L. K., & Muthén, B. O. (1998–2007). Mplus user’s guide (5th ed.). Los Angeles: Muthén & Muthén. Muthén, L. K., & Muthén, B. O. (2006). Mplus (Version 4.2). Los Angeles: Muthén & Muthén. Muthén, L. K., & Muthén, B. O. (2010). Mplus (Version 6.1). Los Angeles: Muthén & Muthén. Nugent, W. R., & Glisson, C. (1999). Reactivity and responsiveness in children’s service systems. Journal of Social Service Research, 25(3), 41–60. Pagano, R. R. (1994). Understanding statistics in the behavioral sciences (4th ed.). New York: West Publishing Company.
References Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models; Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage. Rose, R. A., & Fraser, M. W. (2008). A simplified framework for using multiple imputation in social work research. Social Work Research, 32, 171–178. Rosenthal, J. A. (2001). Statistics and data interpretation for the helping professions. Belmont, CA: Brooks/Cole Publishing Co. Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., et al. (2008). Global sensitivity analysis: The primer. New York: John Wiley & Sons. SAS Institute Inc. (1999–2000). SAS System for Windows (Version 8.01). Cary, NC: author. Satorra, A., & Bentler, P. M. (1994). Corrections to test statistic and standard errors on covariance structure analysis. In A. von Eye & C. C. Clogg (Eds.), Latent variables analysis (pp. 399–419). Thousand Oaks, CA: Sage Publications. Saunders, J. A., Morrow-Howell, N., Spitznagel, E., Dore, P., Proctor, E. K., & Pescarino, R. (2006). Imputing missing data: A comparison of methods for social work researchers. Social Work Research, 30(1), 19–31. Schafer, J. L. (1997). Analysis of incomplete multivariate data. New York: Chapman & Hall. Snijders, T., & Bosker, R. (1999). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Thousand Oaks, CA: Sage Publications. Sörbom, D., & Jöreskog, K. G. (2006). LISREL 8.8 for Windows. Chicago, IL: Scientific Software International. Spearman, C. (1904). “General intelligence” objectively determined and measured. American Journal of Psychology, 15, 201–293. Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Boston: Allyn & Bacon. Thompson, B. (2004). Exploratory and confirmatory factor analysis: Understanding concepts and applications. Washington, DC: American Psychological Association. Wegmann, K. M., Thompson, A. M., & Bowen, N. K. (2011). A Confirmatory factor analysis of home environment and home social behavior data from the ESSP for Families. Social Work Research, 35, 117–127. West, S. G., Finch, J. F., & Curran, P. J. (1995). Structural equation models with nonnormal variables: Problems and remedies. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 56–75). Thousand Oaks, CA: Sage Publications. What Works Clearinghouse. (2008). WWC Procedures and Standards Handbook, Version 2.0. Retrieved February 6, 2011, from http://ies.ed.gov/ncee/wwc/references/idocviewer/doc.aspx?docid=19&tocid=1/. Whitbeck, L. B., Chen, X., Hoyt, D. R., & Adams, G. W. (2004). Discrimination, historical loss and enculturation: Culturally specific risk and resiliency factors
213
214
References for alcohol abuse among American Indians. Journal of Studies on Alcohol, 65(4), 409–418. Willis, G. B. (2005). Cognitive interviewing: A tool for improving questionnaire design. Thousand Oaks, CA: Sage Publications. Worthington, R. L., & Whittaker, T. A. (2006). Scale development research: A content analysis and recommendations for best practices. The Counseling Psychologist, 34, 806–839. Yoo, J., & Brooks, D. (2005). The role of organizational variables in predicting service effectiveness: An analysis of a multilevel model. Research on Social Work Practice, 15, 267–277.
Index
AIC (Akaike Information Criterion), 142–143, 155–157 Alternative models (also called competing models), 71, 85–86, 106, 121, 130, 165, 178–179, 189, 191 Best practices for comparing, 156 Comparing, 149–157 For measurement models, 96–100 For general structural models, 122–124 Amos, 147, 178, 190 Estimation, 63, 65, 103, 105, 107, 138–139 Fit indices, 142, 154–155 Missing values, 56, 69, 107, 138, 160 Model identification, 136 Modification indices, 107, 159–160 Software information, 5–6 Specification of models, 34, 84, 86, 93–94, 96, 114–116, 137 Analysis/analyzed matrix, see Input matrix Best practices For comparing alternative models, 156 For estimating models, 107, 131 For evaluating models, 150 For handling missing data, 57 For improving model fit, 165 For specifying models, 99, 124
For understanding data characteristics, 58 For understanding implications of clustered data, 68 Beta matrix, see Computational matrices BIC (Bayesian Information Criterion), 142–143, 155–157 CFI (Comparative Fit Index), 135, 142–143, 145–146, 150, 191 Chi-square (χ2, CMIN in Amos), 48–49, 62, 141–144, 146, 165, 191 Change in, 151–154 CMIN, see Chi-Square Computational matrices (beta, gamma, lambda, phi, psi, theta), 40–46, 50, 84–96, 112, 115–119, 122, 132 Constrained parameter, 12, 82–83, 112, 117, 119, 150–151, 155, 184–186, 191 Control variables (also called covariates), 11–12, 192 Convergence, 53–54, 56, 64, 101, 104, 192 Resolving problems of, 138–140 Covariance matrix, also called variance-covariance matrix, see Computational, Input, Implied, and Residual matrix entries Cook’s D (Cook’s distance), 61–62, 192
215
216
Index Correlated measurement errors, 7, 42, 74–75, 88–89, 92, 94–95, 107, 162, 164–165, 196 In hierarchical or longitudinal models, 66, 107, 131–134 Correlated structural errors, 119 Covariates, see Control variables Direct effects, 12, 29, 33, 118, 148, 180–181, 193 Error of prediction, see Structural error Estimation, also see subheading in Amos and Mplus entries, 19–20, 34–35, 40, 46, 48, 53, 81, 100–108, 123, 125–131, 194 Best practices, 107, 131 Problems with, 135–140 Steps, 101–102 With clustered data, 65–68, 105–107 With missing values, 54–57, 108 With non-normal data, 62–65, 102–103, 107–108 With ordinal data, 62–65, 102–105, 107–108 Estimators, 46, 48 Maximum Likelihood, 34, 102–103, 150, 204–206 Weighted Least Squares, 34, 102–105, 108, 150 Exploratory factor analysis (EFA), 9, 73–74, 77, 80–82, 92, 97, 99, 125–126, 147 Factor loading, 10, 13, 23–24, 35, 37–38, 40–43, 50–53, 77, 83, 87–93, 100–101, 117–118, 137–138, 151, 158, 163–164, 185, 194 Evaluating, 147–149 Factor variance, 10, 91–93, 96, 147 Fit, see Model fit Fit Indices, see AIC, BIC, CFI, GFI, TLI, Chi-square (χ2), RMSEA, WRMR, Model fit, and see subheading in Amos and Mplus entries Fixed parameter, 82, 194 Examples, 87–89, 91–92, 118
Free parameter, 74, 82–83, 94–95, 103, 112, 141, 143–144, 150, 152–153, 155–156, 158, 171, 188, 194 Examples, 53, 87, 89, 91, 96, 116, 118, 122 Gamma matrix, see Computational matrices GFI (Goodness of Fit Index), 142–143, 145, 146 Identification, see Model identification Implied Matrix (also called Reproduced matrix), 35, 46–48, 100–104, 136, 141, 144, 149, 194 Indirect effects (also called mediation), 11–13, 33–34, 109–111, 114–115, 121, 123, 148, 180–181, 193, 195, 200 Input matrix (also called analysis or analyzed matrix), 34–35, 46, 51, 56–57, 64, 68, 100, 103–105, 136, 138, 141, 144, 149, 164, 195, 200 Interactions, see Moderation and Multiple group analysis Just-identified, see Model identification Lambda matrix, see Computational matrices Matrices, also see Computational, Implied, Input, Polychoric, and Residual matrix entries, 35–36, 196 In multiple group tests, 183–186 Relation to equations, 37–39 Problems with, 64–65, 137–138, 164 Measurement error (also see Correlated measurement errors), 17–29, 31–34, 37–38, 40, 42, 43, 50–51, 83–84, 87, 94–95, 115, 132–133, 137, 139, 147, 159, 161, 185–186, 196 Measurement invariance, see Multiple group analysis Mediation, see Indirect effects Missing data, 9, 53–57, 68–70, 108, 138, 160, 188–189 Best practices for handling, 57
Index Model fit, also see AIC, BIC, CFI, GFI, TLI, Chi-square (χ2), RMSEA, WRMR, and subheading in Amos and Mplus entries, 48–49, 125–127, 197 Comparing models, 149–157 Evaluating, 141–146, 150 Improving, 157–166 Power for testing, 168–174 Moderation, see also Multiple group analysis, 12–13, 110, 180–186, 197 Model identification, also see subheading in Amos and Mplus entries, 49–51, 83, 91–93, 124, 127–131, 153, 188, 197 Solutions for problems of, 136–137, 175–179 Model specification, see Specification of Models, and subheading in Amos and Mplus entries Modification indices, 101, 108, 158–160, 164–165, 198 Mplus, 147, 190 Estimation, 63–65, 103–107, 139–140, 164 Fit indices, 143–146, 150, 154–156 Missing values, 56, 69, 107, 160 Model identification, 136 Modification indices, 107, 158–159, 160 Software information, 5–6 Specification of models, 34, 84, 86–87, 89, 91, 93–94, 96, 116–117, 137 Multiple group analysis, 54, 77, 93, 95, 110, 180–186, 198 Nested model, 123, 150–154, 158, 170–175, 198 Nonconvergence, see Convergence Nonnested model, 155–157 Nonpositive definite matrix, 137–138 Nonrecursive model, 119–122, 198 Overidentified, see Model identification Phi matrix, see Computational matrices Psi matrix, see Computational matrices Polychoric correlation matrix, 62, 64, 104–105, 164 Power analysis, 53–54, 106, 167–175, 199
Recursive model, 119, 199 Residual matrix, 48, 158, 160–161, 165, 199 Reproduced matrix, see Implied matrix R2, see SMC RMSEA (Root Mean Square Error of Approximation), 142–146, 199 Sample size, 72, 102–103 Recommendations for, 52–54, 140, 164, 167–175 Role in SEM, 8, 71, 100, 105–106, 139, 141, 144, 155, 163, 183, 194 Scale development, 9, 73–81 SMC (squared multiple correlation), 147, 158, 161, 165 Specification of models, also see subheading in Amos and Mplus entries Best practices, 99, 124 CFA, 81–100 Structural, 111–123 Steps, 84 Steps in SEM, 81 In preparing for analyses, 53, 69 In specification, 84, 112 In estimation, 101–102, 124 Structural error (also called error of prediction), 30–34, 44–45, 93, 98, 112, 116–117, 119, 121, 137, 185, 198, 200 Theta matrix, see Computational matrices TLI (Tucker-Lewis Index), 142–143, 145–147, 200 Total effect, 109–110, 200 Underidentified, see Model identification Variance-covariance matrix, also called covariance matrix, see Computational, Input, Implied, and Residual matrix entries WRMR (Weighted Root Mean Square Residual), 145–146, 201
217