217 102 16MB
English Pages 306 [308] Year 1996
Analysis of Change
Analysis of Change Advanced Techniques in Panel Data Analysis
Editors
Uwe Engel and Jost Reinecke
w G DE
Walter de Gruyter · Berlin · New York
1996
Dr. Uwe Engel, Professor für Sozialwissenschaften, Universität Potsdam, Germany Dr. Jost Reinecke, Wissenschaftlicher Assistent, Institut für Soziologie/Sozialpädagogik, Westfälische Wilhelms-Universität Münster, Germany
With 7 8 figures and 3 3 tables
© Printed on acid-free paper which falls within the guidelines of the ANSI to ensure permanence and durability. Library of Congress Cataloging-in-Publication
Data
Analysis of change : advanced techniques in panel data analysis / edited by Uwe Engel and Jost Reinecke, p. cm. Selected papers presented at two scientific meetings organized by the editors. Includes bibliographical references and index. ISBN 3-11-014936-2 (acid-free paper) 1. Panel analysis — Congresses. 2. Social sciences — Statistical methods — Congresses. I. Engel, Uwe, 1954— II. Reinecke, Jost. H61.26.A53 1996 300'.l'5195-dc20 95-47022 CIP
Die Deutsche Bibliothek
— Cataloging-in-Publication
Data
Analysis of Change : advanced techniques in panel data analysis / ed. by Uwe Engel and Jost Reinecke. — Berlin ; New York : de Gruyter, 1996 ISBN 3-11-014936-2 NE: Engel, Uwe [Hrsg.]
© Copyright 1996 by Walter de Gruyter 8c Co., D-10785 Berlin All rights reserved, including those of translation into foreign languages. No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording or any information storage and retrieval system, without permission in writing from the publisher. — Printed in Germany. — Printing: WB-Druck, Rieden. — Binding: Dieter Mikolai, Berlin. — Cover Design: Johannes Rother, Berlin.
Preface The present volume contains selected papers on advanced techniques in panel data analysis. Most of these papers were presented at two scientific meetings organized by the editors. First, the working group "Structural Equation Modeling" held its 15th annual meeting on May 6-7, 1994 in Chemnitz, Germany, to devote itself to recent developments in panel data analysis. Second, at the XHIth World Congress of Sociology (July 18-23, 1994 in Bielefeld, Germany) a session on "Panel Data Analysis" was held within the Research Committee on Logic and Methodology (RC33) of the International Sociological Association (ISA). Three contributions complete this collection of recent research. Two of them were presented in other methodological sessions at the World Congress of Sociology whereas one paper was presented at the 16th annual meeting of the working group "Structural Equation Modeling" held on May 12-13, 1995 in Zürich, Switzerland. We are grateful to the authors for responding in a very professional way to our request for contributing to the present volume and its underlying ideas. Without their efforts this volume would not have been realized. We would also like to thank our colleagues from the Department of Sociology at the Technical University Chemnitz-Zwickau and the former president of RC33, Karl van Meter for their support in organizing the conference and session mentioned above. Special thanks go to Ulrich Attermeyer, Joachim Bentz, Peggy Bublak and Markus Sailer for their technical assistance in preparing the volume. We are also thankful to Hetty Heier for her language assistance as a native speaker. Last but not least we would like to mention the publisher, and in particular Bianka Ralle and her colleagues, whose support in making the present publication possible is greatly acknowledged.
Potsdam/Münster, August 1995 Uwe Engel, Jost Reinecke
Contents 1
2
3
Introduction Uwe Engel and Jost Reinecke 1.1 Panel Data Analysis 1.2 Complicating Factors 1.2.1 Measurement Error 1.2.2 Missing Data 1.2.3 Heterogeneous Populations 1.2.4 Causal Inference 1.3 Modeling Complicated Panel Data Models for the Analysis of Categorically-Scored Panel Data Allan L. McCutcheon 2.1 Introduction 2.2 Analysis of Turnover Tables for Manifest Variables 2.2.1 Independence and Quasi-independence 2.2.2 Symmetry, Quasi-symmetry, and Marginal Homogeneity 2.2.3 Distance and Diagonal Models 2.3 The Analysis of Turnover Tables as Latent Variables 2.4 Models for Multi-Wave Panels 2.5 Conclusion Causal Log-Linear Modeling with Latent Variables and Missing Data Jeroen Vermunt 3.1 Introduction 3.2 Log-linear Path Models 3.2.1 Probability Structure 3.2.2 Logit Models for Probabilities 3.2.3 Estimation 3.3 Log-linear Models with Latent Variables 3.3.1 Latent Class Models 3.3.2 Modified LISREL Models 3.3.3 Estimation 3.4 Log-linear Models for Nonresponse 3.4.1 Fuchs'Approach 3.4.2 Fay's Approach 3.4.3 Ignorable Versus Nonignorable Response Mechanisms 3.4.4 Estimation 3.5 Application
1 3 4 4 5 6 8 9 15 15 17 18 18 21 22 26 32
35 35 36 37 38 40 41 41 42 42 44 45 46 47 48 49
viii
Contents
3.6 4
5
6
3.5.1 Using Only Complete Cases 3.5.2 Fuchs'Procedure for Nonresponse 3.5.3 Models for Patterns of Nonresponse Discussion
The Analysis of Panel Data with Nonmetric Variables: Probit Models and a Heckman Correction for Selectivity Bias Gerhard Arminger 4.1 Introduction 4.2 Model Specification 4.2.1 Heckman's Model for Dichotomous Variables 4.2.2 Extension to General Threshold Models 4.3 Estimation Method 4.3.1 Sequential Marginal ML Estimation under Strict Exogeneity 4.3.2 ML Estimation for a Model with State Dependence and Unobserved Heterogeneity 4.3.3 The Problem of Initial States 4.4 Random Effect Models under Sample Selection Bias 4.4.1 Metric Response Variables 4.4.2 Non-metric Response Variables 4.5 Analysis of Production Output from German Business Test Data Linear Panel Analysis of Ordinal Data Using LISREL: Reality or Science Fiction? Steffen Kühn el 5.1 Introduction 5.2 Linear Panel Models 5.3 Coping with Ordinality in Linear Models 5.4 A Simulation Study 5.5 Results 5.6 Conclusions Continuous-Time Dynamic Models for Panel Data Hermann Singer 6.1 Introduction 6.2 Stochastic Differential Equations 6.3 ML Estimation of Linear State Space Models 6.4 EM Algorithm 6.5 Examples 6.5.1 Irregularly Sampled AR(2) Process 6.5.2 Learned Helplessness as a Dynamical System
50 53 54 57
61 61 62 62 64 67 67 70 71 74 74 78 79
87 87 88 92 99 102 110 113 113 114 115 118 119 119 121
Contents 6.6 6.7 6.8 7
ix Systems with Multiplicative Noise Terms Nonlinear Systems Conclusion
Nonstationary Longitudinal LISREL Model Estimation from Incomplete Panel D a t a Using EM and the Kaiman Smoother Johan H.L. Oud and Robert A.R.G. Jansen 7.1 Introduction 7.2 Advantages of EM 7.3 From State Space Model to LISREL Model 7.4 Expressing Σ in Terms of the Kaiman Smoother 7.5 Applying the EM Algorithm 7.6 Educational Research Example 7.7 Discussion
123 125 130
135 135 136 139 144 147 150 157
8
Model Specification and Missing Value Treatment in Panel Data: Testing the Theory of Planned Behavior in a Three-wave Panel Study 161 Jost Reinecke and Peter Schmidt 8.1 Introduction and Overview 161 8.2 Methods for Dealing W i t h Incomplete Longitudinal D a t a . . 163 8.3 Substantive Model, Sampling and Measurement Instruments . 172 8.4 Structure of Missing Values 175 8.5 Results of the Structural Equation Models 179 8.6 S u m m a r y and Conclusion 186
9
Multilevel Models for Longitudinal D a t a 191 Min Yang and Harvey Goldstein 9.1 Introduction 191 9.2 Two-level Growth Models 192 9.2.1 An Example Using Growth Data 192 9.2.2 A Basic Two-level Model for Change over Time . . . 194 9.2.3 Parameter Estimation Procedure - Iterative Generalised Least Squares 195 9.2.4 Hypothesis Tests on Parameters 196 9.2.5 Residuals 197 9.2.6 Example 197 9.3 Three-level Models 206 9.3.1 An Example of Higher Level Repeated Measures Data 206 9.3.2 A Variance Component Model and Complex Level 1 Variance 208 9.3.3 Random Coefficients at Higher Levels 215 9.4 Multilevel T i m e Series Models 217
χ
Contents 9.5
Discussion
219
10 Structural Analysis in the Study of Social Change Uwe Engel and Wolfgang
10.1 10.2 10.3 10.4 10.5
Introduction The Effects of Social Structure Structural Analysis of Social Change Handling Missing Data Illustrating Example: The Conflict-preventing Function of Social Structure
11 Latent Difference Models as Alternatives to Modeling Residuals of Cross-lagged Effects Kai
12 Specification Choices in Longitudinal Stability Estimation: The Case of Values and Macrosocial Stress 12.1 12.2 12.3 12.4
221 221 227 237 246
253
Schnabel
11.1 Introduction 11.2 Residual Score as a Measure of Change 11.3 The "Fate" of the Change-pretest Covariance in Autoregressive Models 11.4 Why be Aware of the Covariance? 11.5 What are the Consequences of this Argumentation? 11.6 Analysing Change in Structural Equation Models 11.7 Discussion and Practical Recommendations 11.8 What does the Residual Correlation mean?
Klaus
221
Meyer
253 254 257 261 262 265 274 276
279
Boehnke
Introduction Methods Results Summary and Discussion
Contributors
279 282 283 290
295
1 Introduction Uwe Engel and Jost Reinecke
In contemporary society the growing recognition of longitudinal data comes as no surprise. The less traditional ties make for static and replicable living conditions, the more arises the need of techniques capable of analyzing both an increasingly dynamic world and the stabilizing or destabilizing effects a continuing structural differentiation of society brings about. "Stability" and "change" represent ubiquitous experiences in modern society. Change occurs, first of all, within the individuals' life courses. When people grow up and age, they usually undergo continuing processes of status transitions, developmental stages and sequences (of states) in a more or less predictable way. Starting out in early childhood, for instance, people pass through a sequence of age-related stages - youth, adulthood and old age - each involving specific social statuses, roles and social relationships. People experience change as part of the societal status attainment process when they pass through the educational system and compete for jobs and associated opportunities in the labor market afterwards. Social mobility is a related case in point when looking at the status-attainment in individuals' life courses. In modern society marriage, divorce, widowhood, for instance, represent common "critical" life events just as likely as positional changes in the labor-market. There are, of course, a multiplicity of additional areas in which diachronic change takes place. People may change their attitudes, opinions, feelings, moods or whatever state in which they find themselves more or less temporarily. Change is obviously an integral part of the life course. However, the same is true for its counterpart, stability. For as far as change occurs in a systematic way, it will make for an either stable or changing pattern. Life-course developments can absolutely take place in a uniform manner. Change occurs also at the system level. In Germany as well as other European countries, for instance, post-war society underwent considerable social change. Increasing prosperity for major parts of the population and improved opportunities for social mobility, a changing value system, continued individualization and a changing cleavage structure of society are apparent indicators of this still ongoing social change. There remains, however, much structural continuity at the same time. As was always the case continued status competition makes society an only moderately changing structure. The rather stable structure of social inequality is a case in point. To understand the simultaneous occurence of stability and change, Coleman's (1990) "internal analysis of system behavior" proves useful. Roo-
2
Uwe Engel and Jost Reinecke
ted in a liberal methodological individualism, Coleman's approach starts from the premise that explanations of system behavior should be based on knowledge about its component parts below the system level. "Since the system's behavior is in fact a resultant of the actions of its component parts, knowledge of how the actions of these parts combine to produce systemic behavior can be expected to give greater predictability than will explanations based on statistical relations of surface characteristics of the system." (Coleman, 1990: 3). This premise is directly applicable to the analysis of change because how much the system changes itself mostly depends on the way change takes place below the system level. Whether society undergoes social, economic and political change strongly depends on the way how individuals' life-course developments show up in modern society. There could be no aggregate (system-level) change over time if all individual members of society stay where they are or if opposing individual changes cancel each other out. Whether or not the percentage of persons who vote for leftist parties changes over time, for instance, simply depends on the number of "stayers" as well as the number of voters whose political alignments change from left to right and in opposite direction respectively. Furthermore, a high degree of system stability may originate in more or less change at the individual level. Social systems differ in their extent to which structural stability depends on state persistence. Systems may remain in aggregate equilibrium because people stay where they are or because there is an equal number of persons who move in opposite directions and thus bring about an aggregate "net" change equal to zero. In view of the effects social (ex-)change exercises upon people and society, however, it makes a difference whether a given degree of structural stability depends on little or much individual mobility, competition and social exchange in the system. Since structural stability (including moderate social change) is a possible outcome in both static and dynamic societies, social systems analysis has to look for the social and economic forces that operate below the system level to understand how structural "surface" phenomena are brought about. The more society is characterized by social change rather than stability and the more aggregate stability is based on individual change rather than stability, the more arises the societal need of longitudinal data analysis simply because of the increased importance of (social, economic and psychological) processes going on within society and the limited possibility to infer them from crosssectional data. There is even an identification problem that cannot be solved until longitudinal data are available. It would be impossible, for instance, to safely conclude whether cross-sectional age differences reflect change over the life course or change over periods. It would be indistinguishable to what extent such differences reflect diachronic change or differences between generations: Young people may differ from older ones irrespective of whether they are
Introduction
3
born, say, in the fifties, seventies or nineties; we would have to conclude that people change when they grow into adulthood and old age. However, older generations ("cohorts") may differ from younger generations irrespective of whether the individuals themselves are young or old. Generations having grown up in poverty, for instance, may value material goods later on more than generations that never made comparable experiences of material deprivation. The youth of the fifties, for instance, might hold more materialistic values than the youth of the nineties. The crux is simply that we cannot decide to what extent cross-sectional age differences reflect the former and latter source of variation respectively until corresponding longitudinal data are available.
1.1
Panel Data Analysis
Although both "trend data" and "panel data" can straightforwardly be embedded in a "cohort design", there is good reason to base longitudinal analysis in particular on panel data and associated statistical techniques: Panel data analysis is preferable whenever pure aggregate research proves inadequate in view of the research questions posed. This is supposed to be the case whenever research is done for reasons of "causal analysis" or "process analysis". Aggregate analysis simply falls short of providing information about individual -level processes and how these combine to produce emergent (e.g. structural) results at the system level. Such information is necesssary, however, whenever causal analysis and process analysis are to be employed respectively. Trend analysis might provide sufficient descriptions of change, but it falls short of providing empirical explanations of the aggregate developments observed. However, understanding rather than only describing social, economic and political change represents an essential part of any "internal analysis of system behavior". The panel design represents one of the major types of longitudinal data designs. It is similar to the popular trend design in that both involve repeated (usually large) samples drawn from the same statistical population. However, trend and panel analysis differ from each other in one basic way: While trend data originate in repeated draws of independent samples panel data originate in dependent ones. Panel data are obtained by drawing a probability sample of size η from a well-defined statistical population at a given point in time, + uf «
+„f
exp (ti? + u ™ + u £ 5 ) ""»l/r = 7 · Σ , exp («? + + ««/)
=
(3-7)
This is equivalent to assuming log-linear model {EFX, XR, EFR] for marginal table EFXR and model {FR, FS, RS} for table FRS. The effect RS is included in the last model to reproduce exactly the number of persons in every particular subgroup. 3-4-3
Ignorable Versus Nonignorable Response
Mechanisms
Because much of the theoretical work on nonresponse is based on the distinction between ignorable and nonignorable response mechanisms, I will pay some attention to the link between the approach presented here and these two types of response mechanisms (see also Vermunt, 1995). As we saw above, the nonresponse is MAR if the probability of belonging to a particular subgroup depends only on the observed variables for every sample unit. So in terms of the response indicators R and S, the response mechanism is MAR if v >\defxghabc equals to either K u \ d e j g h a b c , ft2i\efghabc, π ΐ 2 | d e f a b c or ΐΓ22\efabcThis is the least restrictive assumption about the distribution of R and S under which the response mechanism is ignorable for likelihood based inference. Note that the values on R and S depend on different variables for the different subgroups. More precisely, in every subgroup, the probability of observing just those variables depends only on the observed variables. In other words, the probability of belonging to a particular subgroup is independent of the missing variables for the subgroup concerned, including the latent variable X . These rather strange restrictions on the probability on R and S can, however, not be imposed by means of the approach presented here. So there is no direct link between the loglinear models for nonresponse and the distinction between ignorable and nonignorable nonresponse. Using the models for nonresponse, the least restrictive model which fulfills the requisites of an ignorable response model is a model in which the values of the response indicators depend on all variables which are observed for all persons, that is, ft s\de]xghabc = "Vjle/aic- On the other hand, in the most restrictive ignorable model, the MCAR model, ir s\defxghabc "VJ, that is, the response indicators are assumed to be independent of all other variables. The MCAR random model is the response model which is actually tested by Fuchs (1982). r
T
=
r
Jeroen Vermunt
48
The only situation in which there exists a log-linear path model which is equivalent to a MAR response mechanism occurs in case of nested or monotone patterns of nonresponse (Vermunt, 1995). A pattern of nonresponse is nested when particular variables are missing more often than other ones, and for all persons with a particular missing variable all variables which are missing equally or more often are missing as well. Nested patterns of nonresponse occur often in panel studies: nonparticipation in one panel wave generally leads to nonparticipation in the subsequent waves too. In case of a nested pattern of nonresponse, a MAR model can be obtained by specifying a log-linear path model in which every response indicator is assumed to depend on the variables which are observed more often and on the response indicators belonging to these variables. All response models which do not fulfill the above-mentioned conditions for ignorability are nonignorable. If R depends on D, it is clear that the response mechanism is nonignorable since the variable with missing d a t a is directly related to its own response indicator. In other words, the probability of nonresponse depends on the variable with missing data. But also when S depends on D, the response mechanism is nonignorable. Although S does not indicate missingness on D, the mechanism is nonignorable because D is missing for some persons. The response model proposed in Equations (3.7) is nonignorable as well because both R and S depend on X which is missing for all persons. It will be clear that, although the distinction between ignorable and nonignorable response mechanisms is valuable, it is just a theoretical distinction based on the fact whether it is necessary or not to specify the response mechanism for likelihood inference about the model parameters. Ignorability has no substantive meaning like the log-linear models for nonresponse discussed in this section. Therefore, one must be cautious when labeling a particular log-linear response model as ignorable or nonignorable. In the context of log-linear models for nonresponse, to my opinion, it has more sense to use another type of classification of response mechanism: the probability of responding on a particular variable depends also on the variable concerned or it does only depend on other variables. In the former case, the response mechanism is always nonignorable. Baker and Laird (1988) gave a nice example of an application in which the probability of nonresponse is allowed to depend on the variable with nonresponse. In the latter case, the response mechanism can be either ignorable or nonignorable.
3-4-4
Estimation
Fay (1986) proposed to estimate his causal models for patterns of nonresponse using the EM algorithm. However, he did not consider log-linear models with latent variables. Vermunt (1988, 1995) demonstrated how to
Causal Log-Linear Modeling with Latent Variables and Missing Data
49
adapt the Ε step of the EM algorithm proposed by Fay to situations in which also latent variables are included in the modified path model (see also Hagenaars, 1990). In fact, it is the same kind of solution as Hagenaars applied to generalize Fuchs' approach (Hagenaars, 1990). In the Ε step, the unobserved frequencies of the table D E F X G H A B C R S are computed via fldefxghabcll
=
n
hdejxghabcl2
—
Ke/ghabc
fidefxghabc21
=
^defabc
fldefxghabc22
=
Kejabc
defghahc
^x\dej
ghabcll
,
%dx\efghabcl2
,
Kxgh\defabc21 *dxgh\efabc22
ι •
It can be seen that in every subgroup, or in every level of the joint response indicators, the expectation of the complete data given the observed data and the parameters estimates from the last iteration is obtained in a slightly different manner. The above-mentioned procedure is the standard way of handling partially observed data in the program £em (Vermunt, 1993). So estimation of loglinear models for nonresponse, including models with latent variables, can easily be performed using IeM. After specifying which variables are manifest variables, latent variables and response indicators, all variables can be used in the same way when specifying the various submodels of a modified path model.
3.5
Application
In this section, an application of the models discussed in the previous sections will be presented. For that purpose, the already mentioned 'Mathijssen-Sonnemans data' will be used. The observed variables are again: three ability tests (A, Β and C), father's educational level (£)), father's occupation (E), sex (F), educational level (G) and occupation (H). All variables are dichotomized, except for father's education, which has the following 4 categories: employees (1), independents (2), workers (3) and others (4). The dichotomous variables all have the categories low (1) and high (2), except for the variable sex (F), which has the categories male (1) and female (2). The reason why most variables are dichotomized is that when preparing the frequency tables, the models had to be estimated by means of LCAG (Vermunt, 1988). But still days of computer time were needed to estimate the models presented in this section. Using £em 0.11, estimation of each of the models to be presented below toke less than two minutes. First, I will present the analysis performed using only complete cases. Then, incomplete cases will be used in the analysis by means of Fuchs' (1982) procedure, that is, assuming MCAR nonresponse. And finally, Fay's models for nonresponse will be used to specify and test different kinds of models for the response mechanism.
50 3.5.1
Jeroen Vermunt Using Only Complete Cases
The model which serves as starting point is the modified LISREL model given in Equation (3.5), where the aim is, of course, to restrict the marginal and conditional probabilities using a logit parameterization. As recommended by Hagenaars (1993), when specifying a modified LISREL model, one can best start by investigating the measurement part of the model, or in other words, the relationships among the latent variables and their indicators. In this case, a latent class model has to be specified for the relationships among the latent variable ability X and the three ability tests A, B, and C. The latent variable ability X is assumed to have two categories.
Table 3.1: Test results for some models using only complete cases Model fit Model 1 2 3 4 5 6
{DEFXGH, ΧΑ, XB, XC} (1) + {DE,DF} (1) + {DX,EFX} (1) + {DG,EG,XG} (1) + {EH, FH, XH, GH} (1) + (2) + (3) + (4) + (5)
L'
df
363.6 374.2 366.6 388.5 417.7 458.8
378 385 385 404 435 474
Ρ .70 .64 .74 .70 .72 .68
Conditional tests models Ir df (2)-(l) (3)-(l) (4)-(l) (5)-(l) (6)-(l)
10.6 3.0 24.9 54.1 95.2
7 7 26 57 96
Ρ .18 .89 .52 .58 .50
To test the fit of the measurement model, one can start assuming the relationships among the other variables to be saturated. So no restrictions need to be imposed yet on the relationships among the structural variables D, E, F, X, G and H. The simplest way to accomplish this is by specifying log-linear model {DEFXGH, ΧΑ, XB, XC} for the complete table DEFXGH ABC. Note that, given X, the variables A, Β and C are not only assumed to be mutually independent, but also to be independent of the joint variable DEFGH. As can be seen from the test results given in Table 3.1, the measurement model (Model 1) fits very well (L2 = 363.6, df = 378, ρ = .70). Next, it was tried to find a more parsimonious specification for the structural relationships among the variables D, E, F, X, G and Η. For that purpose, a step-wise model selection procedure was used per subtable, leaving the other submodels saturated. By testing these models against each other and against Model 1, it could be seen whether the more parsimonious specification for the marginal or conditional probability concerned deteriorated the fit or not. Here, only the test results for the best fitting subtable specific models will be presented. For the relationships among the exogenous variables father's education (£>), father's occupation's (E) and sex (F), model {DE, DF} (Model 2)
Causal Log-Linear Modeling with Latent Variables and Missing Data
51
provides a good description of the data. Model 2 does not fit worse than Model 1 (L 2 = 10.6, df = 7,ρ = .18). Since it is not plausible that in the population sex is related to the father's educational level, the significance of the effect DF is almost certainly the result of the selectivity of the nonresponse. The latent variable ability (X) was found to depend on D, E, F and on the interaction of Ε and F. Restricting Kx\de} via log-linear model {DEF, EFX, DX} for marginal table DEFX (Model 3) does not deteriorate the fit compared to Model 1 (L 2 = 3.0, df = 7,ρ = .89). A good fitting parsimonious model for ira\deix (educational level) is obtained via model {DEFX, DG, EG, XG} for subtable DEFXG. So G depends on D, Ε and X. The conditional test of this model (Model 4) against Model 1 gives a nonsignificant result (L 2 = 24.9, df = 26,ρ = .52). Variable Η (occupational level) seemed to depend only on E, F, X and G. Model {DEFXG,EH,FH,XH,GH} for table DEFXGH (Model 5) does no fit worse than Model 1 ( I 2 = 54.1, df = 57, ρ = .58). The final model in which all best fitting submodels were combined (Model 6) fits the data very well ( I 2 = 458.8, df = 474,ρ = .68). Moreover, like all submodels presented above, it does not significantly fit worse than Model 1 ( I 2 = 95.2, df = 96, ρ = .50). The final model which is depicted in Figure 3.1 is very parsimonious. It contains only 34 independent log-linear parameters. Table 3.2 presents the estimates for the two-variable and three-variable interaction terms according to Model 6. The log-linear parameters for the measurement model show that the indicators are strongly related to the latent variable X: = .9225, = .5307 and u f f = .6829. The parameters for the relationships among the exogenous variables indicate that children of employees have higher educated father's that other children (wf^ = — · 8441). The value.1233 for indicates that males have lower educated father's than females. It can be expected that this artificial effect disappears when the partially observed data is used in the analysis. Children of lower educated fathers have a much lower school ability than children of higher educated fathers (ufi^ = .4125), males have a lower school ability than females ( u f * = .1089), and children of employees have a much higher school ability than other children ( u f * = —.4970). Moreover, the three-variable parameters show that the relationship between father's occupation and school ability is stronger for males than for females. The educational level of the father has a positive effect on the educational level of the respondent (wfxG = .1848). Moreover, children of employees are relatively high educated ( u f f = —.3988) and children of workers are relatively low educated (wf® = .3893). School ability has a strong positive effect on the final educational level = .3724). The most important determinant of the occupational status of the respondent is the respondent's own educational level ( u f f 1 = .5426). Moreover,
52
Jeroen Vermunt
holding constant other factors, males have more often an occupation with a high status than females ( u f f 1 = —.3387). Also, there exists a small positive effect of X on Η ( u f ^ = .1316). And finally, children of workers have a higher probability of having an occupation with a low status ( u f ^ = .1605) than others.
Table 3.2: Log-linear effects among the survey variables according to Model 6 and Model 11a
Parameter «11
,.« DE 12
«13 «11
,.υχ
«11 „EX «11 ,.EX «21
,.« EX 31
«11 „EFX «111 ,.EFX «211 „EFX «311 «11
,.EG
«11 «21 EG «31
,,
«11 «11
,.EH
«21 -EH «31 „« 1FH1 «11 «11 «111 „EXH «211 ,.EXH «311 «11 XB «11
,,
,.«11xc
Model 6 -0.8441 0.3724 0.5061 0.1233 0.4125 -0.4970 0.2137 0.2708 0.1089 -0.2039 0.2346 0.2944 0.1848 -0.3988 -0.0576 0.3893 0.3724 -0.1846 -0.1045 0.1605 -0.3387 0.1316 0.5426
0.9225 0.5307 0.6829
Model 11a -0.8461 0.2908 0.4940 0.3047 -0.5370 0.1041 0.1478 -0.0680
0.2062 -0.3587 -0.0813 0.3929 0.4216 -0.2460 -0.1166 0.1541 -0.3353 0.1736 0.5484 -0.0837 -0.1204 0.0311 1.0489 0.5289 0.6975
Causal Log-Linear Modeling with Latent Variables and Missing Data 3.5.2
Fuchs' Procedure for
53
Nonresponse
Fuchs' procedure to handle partially observed tables (Fuchs, 1982) can be implemented in IEM by specifying a model for the response mechanism which is equivalent to the MCAR assumption. In other words, the probability of R = r and S = s is assumed to be independent of the other variables included in the model. Table 3.3 presents the test results for some models which were estimated using all the available data. The best way to start the analysis when some data is missing, is to test the MCAR assumption itself. This can be accomplished by specifying model {DEFGHABC, RS} for the table DEFGH ABC RS (Model 7). By assuming a saturated model for the relationships among the variables and, moreover, R and S to be independent of the other variables, one has a direct test for the MCAR assumption. As could be expected on the basis of the available information on the response mechanism, this response model must be rejected (L 2 = 2419.6, df = 445,ρ = .00). Nevertheless, more parsimonious models for the relationships among the research variables may be specified, that is, for the structural model and the measurement model. The fit of such models can be tested by comparing them with Model 7. Because any ignorable response model gives the same parameters estimates for the structural and the measurement model, a conditional test against Model 7 is a test of the model concerned, given that the response mechanism is MAR (Fuchs, 1982).
Table 3.3: Test results for some models using Fuchs' approach
7 8 9 9a 9b 9c
Model MCAR (1) + MCAR (6) + MCAR (9) - DF (9a) - EFX (9b) + EX Η
Model fit L* df 2419.6 445 2858.2 823 2950.9 919 2951.7 920 2957.8 923 2949.8 920
Ρ .00 .00 .00 .00 .00 .00 .00
Conditional tests models L? df
Ρ
378 96 1 3 3 475
.02 .58 .37 .08 .05 .04
(8)-(7)
(9)-(8) (9a)-(9) (9b)-(9a) (9b)-(9c) (9c)-(7)
438.6 92.7 0.8 6.9 8.0 530.2
Model 8 is equivalent to Model 1, that is, it can be used to test separately the measurement part of the model. On the basis of the conditional test against Model 7, it must be concluded that the measurement model fit less well when all available data is used (L2 = 438.6, d/ = 378,ρ = .00). This may be the result of the increased power of the significance test. However, it may also be an indication that the measurement model is only appropriate for the selective group without missing data. Although the fit of Model 8 can be improved by supplying it with some additional parameters, here the
Jeroen Vermunt
54
measurement model will be left just as it is. In other words, prevalence will be given to the principle of parsimony. Like Model 1, Model 8 will be used as a reference point for the more restricted structural models. Model 9, which is equivalent to the final model for the complete d a t a (Model 6), does not fit worse than Model 8 ( I 2 = 9 2 . 7 , d f = 9 6 , p = .58). In Model 9a, the effect DF was excluded from the model to test whether the significance of the effect DF was caused by the selectivity of the nonresponse. Since leaving out this effect does not lead to a worse fit (L 2 = 0.8, df = l , p = .37), it can be concluded t h a t the significance of the effect DF resulted from analyzing complete cases only. Starting with Model 9a, an a t t e m p t was made to find more parsimonious models for the other submodels as well. T h e only effect which was not significant anymore was the three-variable interaction among E, F and X. In Model 9b, this effect is set equal to zero (L 2 = 6.9, df = 3 , ρ = .08). It is plausible t h a t this effect is also caused by the nonresponse because particular combinations of Χ, Ε and F have higher probabilities of responding on D. Because of the greater power of the significance tests when using all available data, particular effects which where not significant when using only the complete d a t a can become significant now. This was checked for effects with significance levels between 5 and 10 percent in the analysis performed using only complete cases. There is only one effect which becomes significant now: the interaction between Ε and X with respect to their effect on Η. Model 9c containing the effect EX Η fits better t h a n Model 9b (L2 = 8,df=3,p< .05). 3.5.3
Models for Patterns
of
Nonresponse
W h e n modeling the response mechanism, Model 9c will be taken as starting point. An a t t e m p t will be m a d e to make this model better fitting by adding parameters describing the response mechanism. From Model 7, it is known t h a t at most 2419.6 in L 2 can be gained by using all 445 degrees of freedom for the specification of the response model. In t h a t case, the L 2 -value of the model would be 530.2 with df = 475 (see (9c)-(7) in Table 3.3), which is the test result for Model 9c in case the nonresponse is MAR. T h e M A R model can be seen as a saturated ignorable response model. Of course, here we are interested in much more parsimonious specifications of the response mechanism.
Causal Log-Linear Modeling with Latent Variables and Missing D a t a
55
Table 3.4: Test results for some models for nonresponse
10 10a 11
Model (9c) + { Ε Γ Χ Λ }
L
Model fit
Conditional tests
dj
Ρ
models
L
df
Ρ
+ {DEFXS, RS] (9c) + {EFABCR} + {DEFABCS, RS) (9c) + {EFR, XR} + {RS,FS}
1032.0
874
.00
(9c)-(10)
1917.8
46
.00
735.4
730
.44
(9c)-(10a)
2214.4
190
.00
1150.5
911
.00
FXS + FXR
1097.5
908
.00
1097.3 1093.5
907 906
.00 .00
(9c)-(ll) (11)-(10) (ll)-(lla) (lla)-(10) (lla)-(12) (lla)-(13)
1799.3 118.5 53.0 65.5 0.2 4.0
9 37 3 34 1 2
.00 .00 .00 .00 .65 .14
11a
(11) +
12 13
(IIa) + (IIa) +
DR GS + HS
First, log-linear models {EFXR} and {DEFXS, RS} were specified for the probability that R = r and S = s respectively (Model 10). The other part of the model is equivalent to Model 9c. Actually, Model 10 is the most extended plausible model in which the response indicators are not influenced by the variables which response probability they indicate. In Model 10, R is assumed to depend on E, F and X, including all their interaction terms. Of course, it not possible that R depends on the variables G and Η which are measured many years later. Furthermore, S is assumed to depend on all variables, except for G and Η, the variables which missingness it indicates. The effect RS is included in the model to fix the sample sizes of the four subgroups. Comparison of Model 10 with Model 9c, shows that most of the information on the response mechanism is captured by Model 10 (see Table 3.4): The L 2 -value improves with 1917.8, using only 46 degrees of freedom. However, there are still 399 degrees of freedom left to gain 502.2 in L2. The reason why Model 10 does not fit as perfect as could be expected is that R and S are regressed on the latent variable X instead of the indicators A, Β and C . This can be seen from the fact that Model 10a, where X is replaced by A, Β and C , does not fit significantly worse than the saturated M A R model ( L 2 = 735.4 - 530.2 = 205.2, df = 730 - 475 = 245,ρ = .97). Apparently, the assumption that A, B, C and R are mutually independent given X is a bit too strong. This is the same kind of problem that was encountered when testing the measurement model (Model 8). On the basis of the same arguments that we used above, we will continue assuming that given X the indicators are conditional, independent of all other variables, including the response indicators. Therefore, X will be used as the regressor when specifying the response mechanism and not the indicators A, Β and C. Model 11 is the response model of Equations (3.7), the model that was formulated on the basis of the a priori knowledge about the response mechanism. Using only 9 additional parameters compared to Model 9c, Model 11 captures a very large part of the mechanism causing the nonres-
Jeroen Vermunt
56
ponse: L2 = 1799.3. So the a priori information on the mechanism causing the nonresponse is confirmed by the analysis. Omitting any of the parameters describing the response mechanism of Model 11 deteriorates the fit a lot. However, in terms of fit, Model 11 is inferior to Model 10 ( L 2 = 118.5, df = 37, ρ = .00). Therefore, it was attempted to improve the fit of Model 11 by adding some extra parameters. This resulted in Model 11a which contains 3 additional parameters, namely: the interaction of F and X with respect to their effect on R, the effect of X on S and the interaction effect of F and X on S. Although Model 11a still differs significantly from Model 10 ( L 2 = 65.5, df = 34,ρ = .00), no other single parameter could considerably improve the fit anymore. The parameters estimates for Model 6 and Model 11a are given in Table 3.2. It can be seen that apart from the fact that particular parameters are not significant anymore and that one effect becomes significant, the parameter estimates for the relationships among the research variables do not change very much when partially observed cases are used in the analysis. Perhaps the most interesting difference between the two models is that according to Model 6, males have a bit lower school ability than females ( u f * = .1089), while according to Model 11a, males have a bit higher school ability than females ( u f * = —.0680).
Table 3.5: Log-linear effects for the response model according to Model 11a
Parameter «11 «21 ER ,. «31 FR ,. «11 «11 EFR «111 „.EFR «211 „.EFR «311 «f.f* «Γ «11 „RS «11 FXS «111
Model IIa 0.0929 0.2356 0.3747 0.2037 -0.7070 -0.0968 0.1427 0.1133 0.1814 0.2182 -0.0672 0.1424 0.0603
Table 3.5 contains the parameter estimates for the response model according to Model 11a. The parameter estimates show that school ability determines very strongly the probability of observing D: = —.7070. Moreover, children of independents and workers have higher probabilities
Causal Log-Linear Modeling with Latent Variables and Missing Data
57
of D being observed: uf/* = .2356 and u f / 1 = .3747. The three-variable interaction term shows that this effect is stronger for males than for females. And, for males D is observed more often than for females (uf/ 1 = .2037). The parameter u f f = .2182 shows that for males there is a higher probability of observing G and Η than for females. Moreover, females with a low school ability have a lower probability of responding on G and Η ( u f f - u ™ s = -.0672 - .0603 = -.1275), while males with a high school ability have a higher probability of responding on G and Η. Finally, two models were estimated in which a direct effect between the variables with missing variables and their response indicators was included. Model 12 is equivalent to Model 11a, except that it contains a direct effect of D on R. This effect is clearly not significant. The same applies to Model 13: Neither the effect of G on S nor the effect of Η on S is significant. Summarizing, for this application, the modified LISREL model extended with Fay's approach to partially observed data gave a very parsimonious description of both the causal relationships among the research variables and the response mechanism. The 960 observed frequencies were described using only 52 parameters: 35 for the causal log-linear model and 17 for the response model. Using partially observed data made it possible to detect some artificial effects. Moreover, because of the greater power of the tests, one effect became significant. The models for nonresponse confirmed the a priori knowledge about the response mechanism. However, the overall fit of the final model is not perfect. This is caused by the fact that the measurement model does not fit very well when all available data is used. By means of a more elaborate analysis of this data set, it would, of course, be possible to find the additional parameters to get an even better description of the observed data.
3.6
Discussion
A general approach for specifying and testing causal models for categorical variables was presented. It can be used to specify log-linear models with observed, partially observed and unobserved variables. The general log-linear model combines a structural model in which the causal relations among the structural variables are specified, a measurement model in which the relations between the latent variables and their indicators are specified, and a model for the response mechanism. The application demonstrated very well the value of this model. The relationships among the research variables could be decribed using a small number of log-linear parameters. When using the partially observed data, particular effects became significant as a result of the increased power of the statistical tests, and artificial effects were discovered which resulted from selective nonresponse. Although not demonstrated by the example, when the probability of nonresponse on a
58
Jeroen Vermunt
particular variablejJepends strongly on the value of the variable concerned, the parameter estimates of the model for the survey variables may change a lot as well (Baker and Laird, 1988). In the application, only hierarchical log-linear model were used. However, it is also possible to impose all kinds of linear restriction on the log-linear parameters, that is, to specify non-hierarchical log-linear models per subtable. This becomes more important, when the model contains several poly tomous variables with ordered or equidistant categories. The possibility to restrict the log-linear parameters can, for instance, be used to specify discrete approximations of latent trait models (Heinen, 1992; Vermunt and Georg, 1995). Because of the equivalence of the more general log-linear or logit model to the logistic regression model, it can also be used to incorporate continuous exogenous variables into a modified LISREL model (Vermunt, 1995). The way of specifying the causal order among the structural variables in modified LISREL models is similar to the specification of (latent) Markov models (Van de Pol and Langeheine, 1990). Actually, the (latent) Markov model is a special case of the modified LISREL model. The approach presented here is more general since the more flexible way of specifying the conditional probability structure makes it possible to relax the basic assumptions of the (latent) Markov model. The modified LISREL model can, for instance, be used to specify multivariate Markov models (Vermunt, 1995) and latent Markov models with correlated errors (Bassi, Croon, Hagenaars and Vermunt, 1995). Furthermore, the possibility to parameterize the conditional probabilities by means of a logit model can, among other things, be used to specify regression models for the transition probabilities which are similar to discrete-time event history models (Vermunt, 1995). Although in the example the latent class model was used as a measurement model, it can also be used for unmixing populations having different structural parameters (Titterington, Smith and Makov, 1985). Examples of the use of the latent class model as a finite mixture or discrete compound model are the mixed logit model (Formann, 1992), the mixed Markov model (Van de Pol and Langeheine, 1990), and the mixed Rasch model (Rost, 1990). Specifying these kinds of mixed models within the modified LISREL approach involves including a latent variable without indicators into the model. And finally, the causal log-linear models that were presented are recursive models, that is, models in which the causal relationships among the variables are uni-directional. Recently Mare and Winship (1991) proposed 'non-recursive' log-linear models with reciprocal effects. Although not demonstrated here, the rather complicated 'non-recursive' log-linear models proposed by Mare and Winship can also be handled within the modified LISREL approach.
Causal Log-Linear Modeling with Latent Variables and Missing Data
59
References Agresti, A. (1990): Categorical d a t a analysis. New York: Wiley. Baker, S. G.; Laird, Ν. M. (1988): Regression analysis for categorical variables w i t h outcome s u b j e c t t o nonignorable nonresponse. J o u r n a l of t h e American Statistical Association, 83, 62-69. Bassi, F.; Croon, M.; Hagenaars, J.; Vermunt, J. (1995): E s t i m a t i n g l a t e n t turn-over tables when t h e d a t a are affected by classification errors. Statistica Neerlandica. Clogg, C. C. (1981): New developments in l a t e n t s t r u c t u r e analysis. In: D. J . Jackson; E. F . B o r g o t t a (Eds.): Factor analysis a n d m e a s u r e m e n t in sociological research. Beverly Hills: Sage Publications, 215-246. Clogg, C. C. (1982): Some m o d e l for t h e analysis of association in multiway crossclassifications having ordered categories. J o u r n a l of t h e American Statistical Association, 77, 803-815. Clogg, C. C.; G o o d m a n , L. A. (1984): L a t e n t s t r u c t u r e analysis of a set of multidimension a l contingency tables. J o u r n a l of t h e American Statistical Association, 79, 762-771. D e m p s t e r , A. P.; Laird, Ν. M.; R u b i n , D. B. (1977): M a x i m u m likelihood e s t i m a t i o n f r o m incomplete d a t a via t h e E M algorithm (with discussion). J o u r n a l of t h e Royal Statistical Society, Ser. B., 39, 1-38. Fay, R. E. (1986): Causal models for p a t t e r n s of nonresponse. J o u r n a l of t h e American Statistical Association, 81, 354-365. F o r m a n n , A. K. (1992): Linear logistic l a t e n t class analysis for p o l y t o m o u s d a t a . J o u r n a l of t h e American Statistical Association, 87, 476-486. Fuchs, C. (1982): M a x i m u m likelihood estimation a n d model selection in contingency t a b l e s with missing d a t a . J o u r n a l of t h e American Statistical Association, 77, 270278. G o o d m a n , L. A. (1972): A modified multiple regression a p p r o a c h for t h e analysis of dichotomous variables. American Sociological Review, 37, 28-46. G o o d m a n , L. A. (1973): T h e analysis of multidimensional contingency tables when some variables Eire posterior to others: a modified p a t h analysis approach. Biometrika, 60, 179-192. G o o d m a n , L. A. (1974): Exploratory latent s t r u c t u r e analysis using b o t h indentifiable a n d unidentifiable models. Biometrika, 61, 215-231. G o o d m a n , L. A. (1979): Simple models for t h e analysis of association in crossclassifications having ordered categories. J o u r n a l of t h e American Statistical Association, 74, 537-552. H a b e r m a n , S. J. (1979): Analysis of qualitative d a t a , Vol 2, New developments. New York: Academic Press. Hagenaars, J. A. (1990): Categorical longitudinal d a t a - loglinear analysis of panel, t r e n d a n d cohort d a t a . Newbury P a r k : Sage. H a g e n a a r s , J . A. (1993): Loglinear models with latent variables. Newbury P a r k : CA: Sage. Hagenaars, J.; Luijkx, R . (1990): LCAG: A p r o g r a m to e s t i m a t e latent class models a n d o t h e r loglinear models with latent variables with a n d without missing d a t a . Working P a p e r 17, Tilburg University, D e p a r t m e n t of Sociology, Tilburg. Hartog, J. (1986): Survey nonresponse in relation to ability a n d family background: s t r u c t u r e a n d effects on estimated earning functions. Research M e m o r a n d u m no. 8620, University of A m s t e r d a m .
60
Jeroen Vermunt
Heinen, Α. (1993): Discrete latent variables models. Tilburg: Tilburg University Press. Jöreskog, Κ. G.; Sörbom, D. (1988): Lisrel 7: a guide to the program and applications. Lazarsfeld, P. F.; Henry, N. W. (1968): Latent structure analysis. Boston: Houghton Mill. Little, R . J . (1982): Models for nonresponse in sample surveys. Journal of the American Statistical Association, 77, 237-250. Little, R . J . ; Rubin, D. B. (1987): Statistical analysis with missing data. New York: Wiley. Mare, R . D.; Winship, C. (1991): Loglinear models for reciprocal and other simultaneous effects. In: C. C. Clogg (Ed.): Sociological Methodology 1991. Oxford: Basil Blackwell, 199-234. McCutcheon, A. L. (1988): Sexual morality, pro-life values and attitudes toward abortion. Sociological Methods and Research, 16, 256-275. Meng, X. L.; Rubin, D. B . (1993): Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika, 80, 267-278. Van de Pol, F.; Langeheine, R . (1990): Mixed Markov latent class models. In: C. C. Clogg (Ed.): Sociological Methodology 1990. Oxford: Basil Blackwell. Rost, J . (1990): Rasch models in latent classes: an integration of two approaches to item analysis. Journal of Applied Psychological Measurement, 14, 271-282. Rubin, D. B . (1976): Inference and missing data. Biometrika, 63, 581-592. Titterington, D. M.; Smith, A. F.; Makov, U. E. (1985): Statistical analysis of finite mixture dsitributions. Chichester: John Wiley & Sons. Vermunt, J . K. (1988): Loglineaire modellenmet latente variabelen en missing data. Tilburg: Thesis. Vermunt, J . K. (1993): LEM: log-linear and event history analysis with missing data using the EM algorithm. WORC P A P E R 93.09.015/7, Tilburg University. Vermunt, J . K. (1995): Log-linear event history analysis; a general approach withmissing data, latent variables, and unobserved heterogeneity. Tilburg: Tilburg University Press. Vermunt, J . K; Georg, W. (1995): Die Analyse kategorialer Panel-Daten mit Hilfe von log-linearen Kausalmodellen: eine Anwendung am Beispiel der Skala 'Jugendzentrismus' [Analyzing categorical panel data by means of causal log-linear models with latent variables: An application to the change in youth-centrism]. ZA-Information, 36, 61-90. Xie, Y . (1992): The log-multiplicative layer effects model for comparing mobility tables. American Sociological Review, 57, 380-395.
4 The Analysis of Panel Data with Nonmetric Variables: Probit Models and a Heckman Correction for Selectivity Bias1 Gerhard. Arminger
4.1
Introduction
Panel data are observations from a random sample of η elements from population collected over a fixed number Τ of time points. Usually, η is fairly large (for instance approximately 5000 households in the German Socio-Economic Panel (GSOEP, Wagner, Schupp and Rendtel, 1991) or approximately 4000 firms in the 1993 panel of the German bureau of labor) and Τ is fairly small (T=10 panel waves in the GSOEP). Usually, the time points t = 1 , . . . , T are equidistant. In psychology, panel data are often referred to as repeated measurements, in epidemiology they are referred to as cohort data. In analyzing panel data using regression models, one often finds that the dependent variables of interest are non-metric, that is dichotomous, censored metric and ordered or unordered categorical. Typical examples are found in following areas: • The analysis of individual behavior in the labor market. Flaig, Licht and Steiner (1993) model and test hypotheses whether a person is unemployed or not at time t depending on variables such as unemployment at t — 1, length of former unemployment, education, professional experience and other personal characteristics using data from the GSOEP. Here, the dependent variable is dichotomous with the categories "employed" vs. "unemployed". • The analysis of the behaviour of individual firms. Arminger and Ronning (1991) analyze the simultaneous dependence of changes in output, prices and stock using data from the German business test conducted quarterly by IFO Institute in Munich. Here, the dependent variables change in output (less, equal, more than at time t — 1) and change in price are ordered trichotomous and the variable stock is metric since it is measured in production months. The special problem in specifying a model for these data is that trichotomous variables appear on the left and on the right side of regression equations.
1
I am grateful to the IFO Institute Munich for providing the business test data and to Prof. G. Ronning of the University of Tübingen and to R. Jung of the University of Konstanz for preparing the data.
Gerhard Arminger
62
• The analysis of ordered categorical data from the social sciences. Social science data come often in the form of ratings measured on a Likert scale with five or seven ordered categorical outcomes such sis "1 = I agree very much with statement A " to "5 = I don't agree at all with statement A " . Questionnaire items are usually ordered categorical and often a great number of variables must be reduced to smaller set of variables using factor analytic models for ordered categorical data. Ordered categorical variables may again appear on the left and the right side of regression equations. In the following section the construction principles of Heckman (1981a, 1981b) on the specification and estimation of dichotomous outcomes in panel data are extended to include ordered categorical and censored dependent variables as well sis simultaneous equation systems of non-metric dependent variables. The parameters of many of these models can be estimated by assuming multivariate normality of the error terms in threshold models and using conditional polychoric and polyserial covariance coefficients in the framework of mean- and covariance structure models for non-metric dependent variables. These models and estimation techniques have been introduced by Muthen (1984) and extended by Küsters (1987) and Schepers and Arminger (1992). As an example, the trichotomous output variable of a four wave panel of 656 firms from the German business test conducted by the IFO Institute, Munich, is analyzed. 4.2 4-2.1
Model Specification Heckman's
Model for Dichotomous
Variables
Heckman (1981a, chap. 3.3) considers the following model for an unobserved variable y*t, i = 1 , . . . , n, t — 1 , . . . , Τ where i denotes the individual and t denotes a sequence of equispaced time points: ylt = μα + 4
(4·ΐ) oo
oo
μα = xuß + Σ lt-i,ty>,t-i j=1 ** = ( & , · • • , * « · ) ' , yit
ΓΙ ~ I 0
if if
+ Σ ί—Ι
j Π 3/, '· ί -' ί-1
Κ +
Σ htf.i-k k=1
(4·2) (4.3)
j/*t > 0 0
The unobserved variable y*t is considered as a disposition or utility which is connected to the observed dependent variable yn through a dichotomous threshold model with threshold 0. The values y,·«, y*t and the 1 χ Ρ vectors
The Analysis of Panel Data with Nonmetric Variables
63
xu are collected in Τ χ 1 vectors y,·, y* and the Τ χ Ρ matrix Xi. The random variables {t/i,-X\} are identically and independently distributed which corresponds to a simple random sample from a population. The model specification consists in determining the structure of the systematic part μα and of the stochastic part of the model. A discussion of the different parts of the model specification for y*t is found in Heckman (1981a) and in Hamerle and Ronning (1995). Here, only the most important elements of the specification are presented. The first component of μα is the variation induced by possibly time varying explanatory variables Xa- The Ρ χ 1 parameter vector β is time constant in this model which may changed to Ρ χ 1 parameter vectors ßt,t — 1 , . . . ,T that vary over time. The second component of μα captures the influence of former states of the observed dependent variable y i , t - j , j > 1 which is called true state d e p e n d e n c e . If f t - i , t — 7ι
φ 0 a n d f t - j , t — 0 for all j
> 1 and t =
1 , . . . , Τ we have a simple Markov model. Note that the inclusion of former states requires the knowledge of initial states j/»o> J/»,-i> • • • depending on the specification of the parameters J t - j , t · If the initial states are known and non-stochastic they can be included in the vectors as additional explanatory variables. If the initial states are themselves outcomes of the process that generates y*t, the distribution of the initial states must be taken into account as discussed by Heckman (1981b) and in Section 4.3 of this paper. Note that the effects of the former states y i t t - j may change for each time point. This is captured by J t - j , t - In most applications, J t - j , t is set to 7 t - j and almost all parameters are set to 0. The third component of μα models the dependence of y*t on the duration of the state j/j t = 1. Again, the effects of duration may be different for each time length. These different effects are parameterized in λ;· ( _ ; ·. If λ j i t - j = λ we have the simple case of a linear duration effect. The fourth component of μα models the dependence of y*t on former values y*t_j of the unobserved endogenous variables. Structures that incorporate this component are called models with habit formation or habit persistence. The idea behind such models is that y*t is not dependent on the actual former state of the observed variable but on the former disposition or habit identified rather with y * t t - j than y i , t - j - If the initial dispositions ViO'Vi -i> · · · 1 y * - ( K - i ) a r e known and non-stochastic they may be included in the list of explanatory variables. Otherwise, assumptions about the distribution of the initial variables y*ο > ϊ/? — ι > · · · 12/*_(jf_i) must be made. Now we turn to the specification of the error term e*t. The error term is often decomposed in the form e it
= a'
+
en
(4.5)
64
Gerhard Arminger
where α,· denotes the error term which varies across individuals but not over time and may be considered as unobserved heterogeneity just as in the metric case (cf. Hsiao, 1986). The values of a,· may be considered as fixed effects for each i or may be considered as random effects such that α,· ~ Λ/"(0, σ2). In the first case, a; is an individual specific parameter. If y*t is modelled by y*t = χαβ + α, + e*t and is actually observed as in the metric case then a,· may be eliminated by taking the first differences y*t — y*t_x = (x,< — Xiit-i)ß + e*t — £*k Κ(Ψ, i>k) = Εφι[1ψ(ζ, y)| z]
(pseudo likelihood)
• Μ step: rpk+i = a r g m a x ^ Κ(φ,
φ^)
Defining the gradient of pseudo likelihood β •= =
0φΚ(·ψ,φιt) E+„[jLl*(z,y)\z]
(6.22)
we find a relation between the exact score and the gradient of pseudo likelihood at ψ^: 9(4>k,ipk)
=
E^,k[s^k(z,y)\z]
=
(6-23)
Thus it is seen that at φ = ψι likelihood and pseudo likelihood are locally equivalent (cf. Campillo and LeGland, 1989 ). The EM algorithm generates a sequence of approximating functions to Ιψ{ζ) which are simpler to compute and maximize, but it is well known that in a vicinity of the maximum the convergence is slow (cf. Watson and Engle, 1983). Therefore I prefer the quasi Newton algorithm with BFGS update which is very efficient near the maximum of Ιψ(ζ). It is usually initialized with a choice of Fq = I (steepest descent) but other starting values such as the Fisher information with inserted conditional moments can be used (see Singer, 1990).
6.5 6.5.1
Examples Irregularly Sampled AR(2)
Process
In Figure 6.1 an irregular measurement scheme was shown, where both components y\(t) and 1/2 (') of a stochastically oscillating phenomenon were sampled. It may represent a model for endogenous depression etc. where
Hermann Singer
120
cyclic episodes occur (cf. Singer, 1986, 1988). The AR(2) model can be written in state space form as 0 ,-WQ
yin(t) .V2n(t).
1 ' -7.
yin(t) dt + V2n(t)_
Xn(t)dt + (6.24)
+
W2n{t) (6.25)
n
J
= 1 ,...,N,j
1 0 = \,...,J
0' 1 n
yin(tj) y2n(tj)
i = (tT-t0)/6t,
nj,
(6.26)
where Ν = 10,J = 100,t T =
10, to = 0,i< = 0.1. The parameters were chosen as Σ
,μ
=
Ό.01 0 Data are 0 0.01 assumed to be measured at times τχ = {0,0.5,1,2,4,5,7,8,8.5,9,9.1,9.2, 9.3,10} (first component yi(t)) and r? = {0,1.5,7,9} (second component 2/2(0)· The exogenous (control) variables representing external interventions are measured at T3 = {0,1.5,5.5,9,10} and interpolated by step functions. Other interpolation methods are discussed in Singer (1995a). Data were simulated with these true values φ for all times, but only the measured time points (τι, 7"2,73) were used for ML parameter estimation. Between times, data were declared as missing (see Figure 6.1). Simulation results are reported in Singer (1995a) . Figure 6.2 shows data for one unit n, the true trajectory (time path) together with filtered (left) and smoothed (right) estimates. Furthermore, the filtering and smoothing error is shown in the form of a 95 % HPD interval E[yn(t)\z] ± 1.96x/Var[i/„( 1 (Risken, 1989; W o n g and Hajek, 1985) . In the context of M L estimation we have to maximize the likelihood w.r.t the parameter φ, which requires the solution of (6.40) for several fi(y,ip,t)
and
Φ, t) leading
to densities p ^ ( y , t|x,s). Instead we discuss a comparatively simple estimation method based on extended K a i m a n filtering which is well known basically (Jazwinski, 1970) . Consider the unknown parameters φ as stochastic processes which are unobserved and extend the state y(t)
to a vector
— [ j / i ( i ) , . . . , J/TV(0> V"] with dynamics and measurement model (setting
η(ί) Vn(t)
=
[νη{ί),φ],ηηί
=
dyn(t)
=
f(vn(t),t)dt
άφ{ί)
=
0
=
h
Znj
n j
^
n j
Vn{tj)) + g{t)dWn(t),n
) +
€nj,
j
=
0,..
=
,,J
=
(6.41)
l , . . . , N
(t
T
(6.42)
-to)/St,
T h i s means that φ is constant over time. T h e output function h n j ( . ) is obtained f r o m h ( i ] n ( t ) , t ) : k χ 1 by dropping components if the datum z n jk> is unobserved (the same applies to e „ j ) . T h e conditional mean φ := E'fV'lz] then represents the E A P (expected a posteriori) estimator of φ given the data. T h e complicated task, of course, is the computation of the conditional mean, requiring nonlinear filter methods. A simple approximation method is the extended K a i m a n filter ( E K F ) (Jazwinski, 1970), which replaces the exact dynamics of the conditional density (Stratonovich-Kushner-equation) by locally linearized equations. A t times of measurement, the system is again linearized around the new optimal estimate (a posteriori). T h i s means that linear K a i m a n filter methods can be adapted to nonlinear problems (Jazwinski, 1970, Chapter 8). Linearizing the vector field / ( . ) around the estimate ή η ϊ , i.e. f(*l,tj) -
where Anj f
n
j
f(Vnj,tj) + A
n j
(i] -
fj
= df([y^],tj)/d[y^]
n
(6.43)
j)
is the Jacobian matrix, and setting
= f ( i ] n j ι t j ) we have an approximate linear system for panel unit n near
Vnjdyn(t)
=
fnjdt +
+ f
y
g(t)dWn(t)
t f
n j
) ( y
n
( t ) - y
n j
) d t +/ψ(ήηί)(φ(ί)
-
φ)άί
+
(6.44)
128
Hermann Singer
Figure 6.8: Trajectories of the Lorenz equations involving process noise
dxp(t)
=
0
(6.45)
where and U ( r , n ] ) := M i g L M This piecewise linear system can be transformed to an exact discrete model in the usual way and the linear Kaiman filter can be applied (see, e.g. Jazwinski, 1970; Liptser and Shiryayev, 1977; Singer, 1992a; Oud et al., 1990 ). The filter must be initialized with the quantities ή(ίο) := Ε[η(ίο)\ζηο,τι = 1,...,JV] and P(to) := Var[j7(io)|2nO> n = 1, · · ·, N] involving the a priori quantities E[V(t0)} and Var[77(to)]. In order to obtain ML estimates of φ, a diffuse prior Var(V>) —* oo has to be inserted. Example: The Lorenz model The Lorenz model (Lorenz, 1963) is the best known simple system exhibiting chaos and has been used for (too) many purposes. Among them are the Navier-Stokes equations in meteorology (the original application), laser theory (Haken, 1977; Graham, 1989) , psychology (Singer, 1986, 1992b) and sociology (Troitzsch, 1990). Usually the parameters are given and solutions are simulated. Here we attempt to estimate them from data using the methods just developed. We assume that a vector of variables [xn, yn, zn]( 2 are, respectively, the initial time point and the total number of time points considered. The output or measurement equation (Equation 7.2) is an extended form of the factor model equation of factor analysis with C ( the factor pattern matrix. The terms B t _ i U t _ i and D ( u t represent effects of fixed so-called input-variables in u ( _ i and u t . Notice that because of the time subscripts all model matrices are chosen time-varying. Equations 7.1 and 7.2 represent, in fact, a very general nonstationary dynamic factor model. Instead of Equation 7.1, many econometric and social science models choose a so-called structural equation, which adds an instantaneous effects term K t x t to the right hand side of Equation 7.1. K t may be estimated together with the other parameter matrices via specification in the LISREL model. However, before applying the Kaiman smoother in the EM algorithm, the estimated structural equation should be reduced to the so-called reduced form estimate (which is in the form of Equation 7.1), on which the Kaiman smoother is to be based (Oud, van den Bercken, and Essers, 1990). The process errors in successive vectors w< and the measurement errors in successive vectors v ( are assumed to have (a) zero expectations: E(wt) = E(yt) = 0 for all t, (b) zero covariances between vectors: E(vrt vj,) = 0 for all t and t', E(wt w't,) = E(vt v{,) = 0 for all t φ i' (nonzero variances and, possibly, nonzero covariances for errors within vectors are in Q< and R t', can be made. These are essential in Kaiman smoothing and will give rise to a block-recursive LISREL model.
Johan H.L. Oud and Robert A.R.G. Jansen
140
Zero input-effects: Bt_iutit_k = FiG't+lit_k
+ (I-FtAt)Pt,t-k
,
(7.26)
starting with t + 1 — s, k = 1,... ,T — 2, and continuing with t + 1 = s — 1, k = 1 , . . . , T— 3 etc. until all off-diagonal blocks have been computed ( i + l = ,Ymii)\Yob!,dr]
.
(7.27)
The expectation is taken over the distribution of the missing data Ymi, given the observed data Yob, and the current estimate ΘΓ. M-step: 0 r + i = argmax
Κ(θ|0r)
.
(7.28)
It can be jproven that iteration of these steps yields successive pairs of estimates 0 r and 0 r + i with non-decreasing loglikelihoods ^(0Γ|Υοδ») and i ( 0 r + 1 | Y o J j ) , converging to 0 = argmax 1(θ\Y0b>) (see Dempster et al., 1977; Little and Rubin, 1987; Singer, 1990, 1992). For implementation of the EM algorithm in the missing value jproblem the conditionally expected moment matrix S r + i = £V ml ,(S|Y 0 i,j, 0 r ) is to be calculated in the Ε-step and inserted for S in Equation 7.10. This is due to the fact t h a t the loglikelihood function in Equation 7.10 is linear in S. Except for the replacement of S by S r + i the pseudo-likelihood Κ ( θ | 0 r ) does not differ from Equation 7.10 (which is handled in the LISREL program by means of Equation 7.11, where again S is to be replaced by S r +x). Note, that in case of no missing data £yTOi,(S|Υοδ», 0r) = S and Κ(Θ\ΘΓ) becomes equal to ί{θ\Ύ) of Equation 7.10. Because missing data occur in subject specific patterns, dependent on the observed data vector y,·_ with missing yt>t—t, missing y l t with observed yi}t-k,ob>, and fixed observed u,· with missing y,< become, respectively: E(yit,obs y'i,t-k\yi,ob"er) E{yit y,'lt_t>Oi,|yilO».,0r)
=
yit,obiE(y'i i_k\yiiobs, ör)
=
yit,ob'y'i',t-k,r
=
^(y.tly.-.oft»,ör)y;(:9r) = u,y?; r (7.33) Above we tacitly assumed that the observations within vectors yt are either completely missing or not missing at all. However, it can happen that the observations at t, t — k or both are only partially missing. Partial missingness is processed in the forward recursion of the Kaiman smoother (Kaiman filter computation: see Equations 7.16 and 7.17) by fixing not the entire matrix C f at zero but only the factor loading rows corresponding to missing observations in y< and then using this C t to compute H ( , Mt, and the Kaiman filter estimates with missing observations in y< fixed at zero (Shumway and Stoffer: 259). Next, let each partially observed vector yt fall apart in two subvectors y(i),t and y(2)« R(i2)t> and R ( 2i ) t . If R(i2)t = R( 2 i)t ^ 0 (correlated measurement errors), the observed y(i),ti06e contains additional information about missing y(2)it, which should be processed by using the following conditional expectation (Shumway and Stoffer, 1982: 257): 9(2 )itr = =
^(y(2)it|y»,oi»,0r) (2 )tr*itr + D ( 2 ) t r u i t +
C
R(21)trR(_i1l)(r(y(l)« 9t0+2> the latent initial mean μ χ , ο , the latent initial moment φ Χ(ο (from which the latent initial variance is obtained), and the variance 1 of the unit input-variable. Altogether 16 parameters were to be estimated on the basis of a moment matrix of 9 observed variables (8 decoding speed variables and the unit input-variable), leaving 29 degrees of freedom for the LISREL model as a whole. In the data set of 838 pupils and 8 decoding speed variables 3.94% of the data was missing. The subsample of complete cases consisted of 740 pupils, which is only 88.3% of the total number. Application of the EM missing data procedure using the LISMIS and LISREL programs led to the results in Tables 7.1a and 7.1b. For initialisation of the EM algorithm, the complete cases (CC) LISREL solution was used. Four iterations were needed for the LISREL parameters estimates to converge to the EM solution given in Tables 7.1a and 7.1b. By convergence is meant that more iterations did not result in any change of any of the LISREL parameter estimates. The results show that in both the EM and the CC solution the estimates of the disattenuated autoregressive parameters are very high, indicating
152
Johan H.L. Oud and Robert A.R.G. Jansen
t h a t the developmental curves of individual pupils are almost parallel and regression toward the population mean negligible: pupils with a high or low level of decoding speed at one point in time tend to keep almost the same distance to other pupils at subsequent time points. Autoregression is stronger, however, going from the first to the second and from the second to the third time point than from time point three to the last time point (EM solution values are respectively: 0.952, 0.958, and 0.916). This reduction in autoregression across time is less in the CC solution (0.948, 0.952, and 0.921 respectively).
Table 7.1: a) State equation parameter estimates for the model in Figure 7.1 from the EM and CC solution
Latent autoregressive parameters EM
Ν
α
0.952 (0.015) (0.015) (0.016) 0.948 (0.016)
(838) (805) (740) CC (740) Latent initial mean and intercepts EM
Ν (838) (805) (740)
CC (740) Latent initial variance and unexplained variances EM
Ν (838) (805) (740)
CC (740)
32.12 (0.48) (0.49) (0.52) 31.71 (0.50)