Estimation in regression models for longitudinal binary data with outcome-dependent follow-up

327 30 95KB

English Pages 17 Year 2006

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Regression analysis of longitudinal binary data with time-dependent environmental covariates bias an

386 69 176KB Read more

Linear Mixed Models for Longitudinal Data

468 40 42MB Read more

Linear Mixed Models for Longitudinal Data [Corrected] 9780387227757, 9780387950273, 0387950273

This book provides a comprehensive treatment of linear mixed models for continuous longitudinal data. Next to model form

432 61 5MB Read more

Sensitivity analysis of longitudinal binary data with non-monotone missing values (2004)(en)(14s)

384 23 139KB Read more

Building Regression Models with SAS: A Guide for Data Scientists 1635261554, 9781635261554

Advance your skills in building predictive models with SAS! Building Regression Models with SAS: A Guide for Data Scien

184 44 18MB Read more

[Article] Semiparametric regression analysis of longitudinal data with informative drop-outs

345 64 98KB Read more

Marginal Models in Analysis of Correlated Binary Data with Time Dependent Covariates 3030489035, 9783030489038

This monograph provides a concise point of research topics and reference for modeling correlated response data with time

407 16 6MB Read more

Functional Estimation For Density, Regression Models And Processes (second Edition) [2 ed.] 9811272832, 9789811272837

Nonparametric kernel estimators apply to the statistical analysis of independent or dependent sequences of random variab

120 74 1MB Read more

Functional Estimation for Density, Regression Models and Processes [2 ed.] 9811272832, 9789811272837

Nonparametric kernel estimators apply to the statistical analysis of independent or dependent sequences of random variab

117 76 8MB Read more

Functional Estimation for Density, Regression Models and Processes [2 ed.] 9789811272837, 9789811272844, 9789811272851

98 36 8MB Read more

Estimation in regression models for longitudinal binary data with outcome-dependent follow-up

Author / Uploaded
Fitzmaurice G.M.
Lipsitz S.R.
Ibrahim J.G.

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Biostatistics (2006), 7, 3, pp. 469–485 doi:10.1093/biostatistics/kxj019 Advance Access publication on January 20, 2006

Estimation in regression models for longitudinal binary data with outcome-dependent follow-up GARRETT M. FITZMAURICE∗ Department of Biostatistics, Harvard School of Public Health, 655 Huntington Avenue and Brigham and Women’s Hospital, Boston, MA, USA [email protected] STUART R. LIPSITZ Medical University of South Carolina, Charleston, SC, USA JOSEPH G. IBRAHIM School of Public Health, University of North Carolina, Chapel Hill, NC, USA RICHARD GELBER Dana Farber Cancer Institute, Boston, MA, USA STEVEN LIPSHULTZ University of Miami School of Medicine, Miami, FL, USA

S UMMARY In many observational studies, individuals are measured repeatedly over time, although not necessarily at a set of pre-specified occasions. Instead, individuals may be measured at irregular intervals, with those having a history of poorer health outcomes being measured with somewhat greater frequency and regularity. In this paper, we consider likelihood-based estimation of the regression parameters in marginal models for longitudinal binary data when the follow-up times are not fixed by design, but can depend on previous outcomes. In particular, we consider assumptions regarding the follow-up time process that result in the likelihood function separating into two components: one for the follow-up time process, the other for the outcome measurement process. The practical implication of this separation is that the follow-up time process can be ignored when making likelihood-based inferences about the marginal regression model parameters. That is, maximum likelihood (ML) estimation of the regression parameters relating the probability of success at a given time to covariates does not require that a model for the distribution of follow-up times be specified. However, to obtain consistent parameter estimates, the multinomial distribution for the vector of repeated binary outcomes must be correctly specified. In general, ML estimation requires specification of all higher-order moments and the likelihood for a marginal model can be intractable except in cases where the number of repeated measurements is relatively small. To circumvent these difficulties, we propose a pseudolikelihood for estimation of the marginal model parameters. The pseudolikelihood uses a ∗ To whom correspondence should be addressed. c The Author 2006. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected].

470

G. M. F ITZMAURICE ET AL .

linear approximation for the conditional distribution of the response at any occasion, given the history of previous responses. The appeal of this approximation is that the conditional distributions are functions of the first two moments of the binary responses only. When the follow-up times depend only on the previous outcome, the pseudolikelihood requires correct specification of the conditional distribution of the current outcome given the outcome at the previous occasion only. Results from a simulation study and a study of asymptotic bias are presented. Finally, we illustrate the main results using data from a longitudinal observational study that explored the cardiotoxic effects of doxorubicin chemotherapy for the treatment of acute lymphoblastic leukemia in children. Keywords: Follow-up time process; Generalized estimating equations; Maximum likelihood; Multinomial distribution; Pseudolikelihood.

1. I NTRODUCTION In many observational studies, individuals are followed over time, and the outcome variable of interest is measured repeatedly at different follow-up times. Unlike designed longitudinal studies, individuals are not necessarily measured at a set of common time points specified by the study protocol, but instead are measured at irregular intervals. Furthermore, the follow-up times may depend on the previous outcome measures. For example, consider a recent longitudinal observational study (Lipshultz et al., 1995) that explored the cardiotoxic effects of doxorubicin chemotherapy for the treatment of acute lymphoblastic leukemia in childhood. In this study, 111 children with acute lymphoblastic leukemia were treated according to one of the five protocols that included doxorubicin in total doses ranging from 45 to 550 mg per square meter of body-surface area. Although doxorubicin has proven successful in curing the leukemia (Schorin et al., 1994), long-term survivors of childhood cancer often develop progressive abnormalities of the heart (Lipshultz et al., 1991, 1995). In this study, the median time between the initial and final follow-up visit was 6 years. The primary outcome of interest is a binary response denoting normal or abnormal ‘left ventricular mass’, as determined by echocardiogram. Table 1 provides illustrative data from 10 of the 111 patients enrolled in the study. The majority of children (90%) had at least one follow-up echocardiogram; however, there was no explicit design for the patients to receive echocardiograms at regular intervals. Instead, the decision to have a follow-up measurement taken was at the discretion of each patient’s physician. Consequently, the follow-up times are unequally spaced and the number and timing of follow-up measurements varied from individual to individual. In particular, including the initial visit, 3 patients (2.7%) were assessed seven times, 1 patient was assessed six times, 9 patients (8.1%) were assessed five times, 11 patients (9.9%) were assessed four times, 16 patients (14.4%) were assessed three times, 31 patients (27.9%) were assessed twice, and 40 patients (36%) were assessed only at the initial visit. The maximum follow-up time for an echocardiogram was 13.9 years from the initial visit. In this particular study, each individual has a set of random follow-up times and it is reasonable to conjecture that the follow-up times might depend on previous outcomes. That is, the physician of a patient with an abnormal left ventricular mass measurement may demand more frequent echocardiograms and at shorter follow-up intervals. For example, in this study, the median time to the first follow-up visit was 2.6 years for patients who initially had normal left ventricular mass, and 1.7 years for patients who initially had abnormal left ventricular mass (log-rank test, p-value < 0.025). The focus of this paper is on estimation of regression parameters in marginal models for longitudinal binary data where the follow-up times are not fixed by design, but can depend on previous outcomes. We discuss assumptions about the distribution of follow-up times that result in the likelihood function for the observed data separating into two components: one for the follow-up time process, the other for the outcome measurement process. The practical implication of this separation of the likelihood function is that the former process can be ignored when making likelihood-based inferences about the latter. That is,

Estimation in regression models for longitudinal binary data

471

Table 1. Selected data on 10 patients from the longitudinal study of cardiotoxicity Patient 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 8 8 8 8 8 9 9 9 10 10 10

Time (years) since treatment

Wall thickness

Dose (in mg)

Age at end of treatment

Gender

7.281 8.849 9.703 10.112 11.144 11.770 7.514 8.400 9.862 10.997 11.906 13.003 13.256 2.064 2.175 5.270 7.863 11.886 4.300 11.447 9.122 10.501 12.410 9.892 12.236 6.755 8.197 9.966 10.929 12.199 13.203 3.012 4.592 6.799 2.229 8.715 10.643

0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 0 0 0 1 1 0 0 0 1 0 1 0

405 405 405 405 405 405 440 440 440 440 440 440 440 280 280 294 294 294 335 335 450 450 450 446 446 330 330 330 330 330 330 323 323 323 337 337 337

3.3 3.3 3.3 3.3 3.3 3.3 4.8 4.8 4.8 4.8 4.8 4.8 4.8 15.7 15.7 2.7 2.7 2.7 7.6 7.6 3.2 3.2 3.2 3.8 3.8 5.2 5.2 5.2 5.2 5.2 5.2 15.5 15.5 15.5 9.4 9.4 9.4

M M M M M M F F F F F F F M M F F F M M F F F M M F F F F F F F F F M M M

maximum likelihood (ML) estimation of the regression parameters does not require that a model for the distribution of follow-up times be specified. If the follow-up time process is considered to be a form of selection, then there are very close parallels with ‘ignorable’ missing data mechanisms (Little and Rubin, 1987). However, to obtain consistent regression parameter estimates, the multinomial distribution for the vector of repeated binary outcomes must be correctly specified. ML estimation requires specification of all higher-order moments and it is prohibitively difficult to compute the likelihood function for marginal models except in cases where the number of repeated measurements is small. When the likelihood is intractable, one alternative is the generalized estimating equations (GEE) approach (Liang and Zeger, 1986).

472

G. M. F ITZMAURICE ET AL .

However, when the follow-up times depend on previous outcomes, the standard GEE approach yields biased parameter estimates (Lipsitz et al., 2002). To circumvent problems with intractable likelihoods for marginal models and the potential bias of standard GEE methods, we propose a pseudolikelihood to estimate the regression parameters when the follow-up times depend on previous outcomes. The pseudolikelihood uses a linear approximation to the conditional distribution of the response at any occasions, given the history of previous responses. The appeal of this approximation is that the conditional distributions are functions of the first two moments of the binary responses only. Recently, Lin et al. (2004) have proposed a weighted GEE approach for handling outcome-dependent follow-up that also avoids specification of the joint distribution for the vector of repeated outcomes. In their weighted GEE, contributions to the estimating equations are weighted by the inverse of the intensity-of-visit process. The weighted GEE approach makes weaker assumptions about the follow-up time process but requires correct specification of a model for the follow-up time process. In addition, it can incorporate auxiliary, internal time-dependent covariates into the model for the follow-up time process. In contrast, the proposed pseudolikelihood estimator makes stronger assumptions about the follow-up time process but does not require specification of a model for the follow-up time process. In the following, we refer to the model for the outcome of interest as the ‘measurement model’. Furthermore, given that the follow-up times are generated from some underlying distribution, we also refer to the ‘follow-up time’ process. In Section 2, we introduce some notation and consider models for the measurement and follow-up time processes. We discuss assumptions regarding the follow-up time process and consider likelihood-based inferences about the measurement model. A pseudolikelihood approach is proposed to circumvent difficulties with intractable likelihoods for marginal models for longitudinal binary data. In Section 3, we demonstrate how misspecification of the model for the within-subject association among the repeated binary responses will, in general, result in regression parameter estimates that are asymptotically biased. The results of a simulation study, assessing the potential magnitude of the bias in finite samples, and also comparing estimators in terms of variance and coverage probabilities, are presented in Section 3. Finally, we illustrate the main results using data from the longitudinal study of the cardiotoxic effects of doxorubicin chemotherapy introduced earlier. 2. M ODELS FOR MEASUREMENT AND FOLLOW- UP PROCESSES 2.1

Notation

Suppose that n independent individuals are to be followed up intermittently over an interval [0, τ ], where the constant τ > 0 is determined with the knowledge that outcome measurements could potentially be observed up to time τ . It is assumed that the outcome at time t, denoted by Yi (t), is binary, i.e. Yi (t) equals 1 if the ith individual has response 1 (say ‘success’) at time t, and 0 otherwise. Associated with Yi (t), each individual has a (J × 1) covariate vector Xi (t) which can include both time-stationary and time-varying covariates. A logistic regression model is assumed for the marginal mean of Yi (t), pi (t) = E{Yi (t)|Xi (t)} = pr{Yi (t) = 1|Xi (t)} =

β} exp{Xi (t)β , β} 1 + exp{Xi (t)β

(2.1)

where β is the vector of logistic regression parameters. Although a logit link function is adopted, in principle any suitable link function can be used. Note that if Xi (t) includes time-varying covariates, they are assumed to be ‘external’ covariates in the sense described by Kalbfleisch and Prentice (1980). Specifically, it is assumed that any time-varying covariate process at time t is conditionally independent of all previous responses, given the past history of the covariate process. When this assumption is violated, the validity and interpretability of the model given by (2.1) is questionable (see, for example, Fitzmaurice et al., 1993;

Estimation in regression models for longitudinal binary data

473

Pepe and Anderson, 1994; Robins et al., 1999). Since Yi (t) is binary, the variance function over time is determined completely by the marginal mean, i.e. Var{Yi (t)} = pi (t){1 − pi (t)}.

(2.2)

We denote the correlation function by ρi (u, t) = Corr{Yi (t), Yi (t − u)},

(2.3)

and it depends upon the higher-order moments; this is specified in terms of an additional set of parameters, α . Next, let Ni (t) denote a counting process for the number of follow-up measurements on the ith individual by continuous time point t 0. The indicator random variable d Ni (t) equals 1 if a follow-up measurement is obtained on the ith individual at follow-up time t, and equals 0 otherwise. Note that all the information about the times of follow-up is contained in the counting process Ni (t), or equivalently, in d Ni (t). Thus, the ith individual (i = 1, . . . , n) has a set of binary responses at Mi = Ni (τ ) random follow-up times. Finally, we consider the distribution of the follow-up times and their dependence on the outcome i (t) = {Yi (s): 0 s t} and N i (t) = {Ni (s): 0 s t} measurement function {Yi (t)}. Let Y denote the history of the outcome measurement and the follow-up time process through time t; further, let i (t) = {X i (s): 0 s t}. Let Y O (t) = {Yi (s): d Ni (s), 0 s t} denote the history of the ‘observed’ X i outcome measurement process through time t. Also, let Y i (t) = {Yi (s): t s τ }. Next, we assume i (t−) and Y i (t), depends only on the history of the that the conditional distribution of d Ni (t), given N i (t). For example, patients with observed outcome measurements at the previous times, and perhaps on X a history of poor health outcomes on previous visits may be expected to have shorter times of follow-up. Specifically, we assume that i (t−), Y O (t−), X i (t). d Ni (t) ⊥ Y i (t)| N i

(2.4)

Thus, (2.4) implies that follow-up at time t depends only on the previously observed data (i.e. the previously observed outcomes and the times at which they were observed). Equation (2.4) is identifiable only to the extent that it asserts other possible endogenous covariates, say W , do not affect the timing of the measurements. In principle, the conditional distribution of the follow-up times can follow any time-toevent distribution. For reasons that will become apparent in Section 2.2, we do not consider particular families of distributions for d Ni (t). 2.2 ML estimation Recall that in most longitudinal studies the parameters of primary interest are β , and sometimes α ; the parameters of the follow-up time distribution, say γ , are usually regarded as nuisance parameters. Indeed, in many longitudinal studies, both α and γ might be regarded as nuisance parameters. In this section, we β , α ). discuss estimation of (β The assumption about the conditional distribution of the follow-up times given by (2.4) implies the following conditional independence relationship for the distribution of the outcome measurement process at time t, i (t)|Y O (t−), X i (t), Yi (t) ⊥ N (2.5) i which follows by expressing O O O (t−), N (t), X (t)) = f ( N (t)|Y (t), Y (t−), X (t)) f (Y (t), Y (t−)| X (t)) f (Y (t)|Y (t)|Y O (t−), X (t)) f (Y O (t−)| X (t)) f (N

474

G. M. F ITZMAURICE ET AL .

and substituting from (2.4). That is, the conditional distribution of the outcome measurement process at time t, given the history of the observed outcome measurement process and the visit history, is the same (t), for those individuals who were followed up as for those who were not. Note, however, that f (Y (t)| N (t)) = f (Y (t)| X (t)) due to the assumed dependence of visit times on the history of the observed X outcome measurement process. Since (2.5) holds under the assumption about the conditional distribution of the follow-up time process given by (2.4), the log-likelihood for the ‘observed’ data is n

ℓ(γγ , β , α ) =

i=1

+

τ

i (t−), Y O (t−), X i (t); γ )}d Ni (t) log{ f (d Ni (t)| N i

0

n

τ

0

i=1

O (t−), X i (t); β , α )}d Ni (t). log{ f (Yi (t)|Y i

(2.6)

β , α ) are separable, the ML estimate of (β β , α ) is obtained by maximizing the Assuming that γ and (β second term on the right-hand side of (2.6) and solving ∂ℓ(γγ , β , α ) , s(β α) = β , α ) (ββ =β,αα = ∂(β α) n τ ∂ O (t−), X i (t); β , α )}d Ni (t) = log{ f (Yi (t)|Y = 0, (2.7) i β, α) ∂(β 0 β =β,α α = (β α)

i=1

, for (β α ). β , α ) are functionally distinct sets of parameters, Thus, given (2.4) and the assumption that γ and (β β , α ) can be estimated ignoring the distribution of the follow-up times by maximizing (β β, α) = ℓ(β

n i=1

τ 0

O (t−), X i (t); β , α )}d Ni (t). log{ f (Yi (t)|Y i

(2.8)

Using standard results from likelihood theory, and under suitable regularity conditions, it can be shown , that the ML estimate (β α ) is consistent and asymptotically multivariate normal, with asymptotic covariance matrix approximated by the inverse of the Fisher information matrix. To formulate the likelihood, we must specify the conditional distributions i O (t−), X i (t); β , α ). f (Yi (t)|Y

(2.9)

In general, this requires specification of the success probabilities (2.1) and correlations (2.3), in addiβ , α ), specification of these higher-order moments is tion to higher-order moments. In principle, given (β relatively straightforward (e.g. using the Bahadur, 1961, representation for the multinomial distribution). However, because the number of higher-order moments increases rapidly with the number of repeated measurements, in practice, it is prohibitively difficult to compute the likelihood function for marginal models except in cases where the number of repeated measurements is relatively small. 2.3

Pseudo ML estimation

Although ML estimation yields consistent parameter estimates when the follow-up times depend on previous outcomes, it does require that the multinomial distribution for the vector of repeated binary outcomes

Estimation in regression models for longitudinal binary data

475

be correctly specified. Because the likelihood is intractable except when Mi is small, we propose a pseudolikelihood (PL) approach to circumvent these difficulties. The proposed pseudolikelihood is a modification of the likelihood to reduce its complexity and is based on approximating the conditional distributions in (2.8). Specifically, the proposed pseudo log-likelihood is n τ β, α) = [Yi (t) log{πi (t)} + {1 − Yi (t)} log{1 − πi (t)}]d Ni (t), (2.10) ℓ p (β i=1

0

O (t−), X i (t); β , α ) is approximated by where πi (t) = pr(Yi (t) = 1|Y i i (t)}], O (t−)}]−1 [Y O (t−) − E{Y O (t−)| X O (t−)}[Var{Y πi (t) ≈ pi (t) + Cov{Yi (t), Y i i i i

(2.11)

where pi (t) is the marginal probability at time t given by (2.1). This is a Gaussian approximation for πi (t), with πi (t) taking the same form as the conditional mean from the multivariate normal distribution. Thus, the pseudolikelihood uses a linear approximation for the conditional distribution of the response at any occasion, given the history of previous responses. The appeal of this approximation is that the conditional distributions are functions of the first two moments of the binary responses only. An interesting special case of this pseudolikelihood arises when an exponential correlation, ρi (u, t) = α |u| , is assumed. Then, under (2.11), the conditional distribution of Yi (t), given the history of previous i (t−), depends only on the most recently observed value of the response prior to t, Y O (t−) = responses, Y i Y (max{s: 0 s < t, d Ni (s) = 1}). This is a first-order Markov-type dependence, and it can be shown that the pseudo log-likelihood, n τ i (t); β , α )}d Ni (t), log{ f (Yi (t)|YiO (t−), X (2.12) i=1

0

β , α ) under the following conditional independence assumption, yields consistent estimators of (β i (t). d Ni (t) ⊥ Yi (t)|YiO (t−), X

(2.13)

Assumption (2.13) is stronger than assumption (2.4) and is not verifiable from the data at hand (see Appendix) unless further parametric modeling assumptions are made [e.g. with additional modeling assumptions of the kind discussed in Diggle and Kenward, 1994, assumption (2.13) becomes weakly testable]. It implies that the conditional distribution of the outcome measurement process at time t, given the previous observed outcome, is the same for those individuals who were followed up at time t as for those who were not. Thus, when assumption (2.13) holds, maximization of (2.12) yields pseudo maximum likelihood es, timates (PMLEs), (β α ), which are consistent (see Appendix) provided the true conditional distribution O f (Yi (t)|Yi (t−), X i (t); β , α ) has been correctly specified in (2.12), even though O (t−), X i (t); β , α ) = f (Yi (t)|Y O (t−), X i (t); β , α ). f (Yi (t)|Y i i

When the true conditional distribution does satisfy the first-order Markov-type assumption, O (t−), X i (t); β , α ) = f (Yi (t)|Y O (t−), X i (t); β , α ), f (Yi (t)|Y i i β , α ). then the PMLE is the ML estimate of (β Next, for the more general case given by (2.10) and (2.11), we derive the pseudoscore vector denoted by n τ n ∂ℓ p ′ −1 β, α) = β, α) = = Di (t) φi (t) {Yi (t) − πi (t)}d Ni (t) , s p (β s p i (β β, α) ∂(β 0 i=1

i=1

476

G. M. F ITZMAURICE ET AL .

O (t−), X i (t); β , α ] is specified by (2.11), Di (t) = Di (t; β , α ) = where πi (t) = πi (t; β , α ) = E[Yi (t)|Y i ∂πi (t) O β ,α α ) , and φi (t) = φi (t; β , α ) = Var[Yi (t)|Yi (t−), X i (t), β , α ] = πi (t)(1 − πi (t)). We note that PL ∂(β estimators can be derived from (2.10) and (2.11) under more general assumptions about the correlation , , structure, e.g. second-order antedependence. The PMLE (β α ) is obtained as the solution to s p (β α ) = 0, e.g. using the Newton–Raphson algorithm. However, in general, the inverse of the pseudo-Fisher information matrix does not yield a valid estimate of the asymptotic variance. Instead, a so-called empirical or robust estimator of variance is required. The appropriate adjustment takes the usual form of the ‘sandwich estimator’, 1 L , β , α )] −→ N (0, ), n 2 [(β α ) − (β where

n ′ 1 β , α ) −1 1 β , α ) −1 ∂s(β ∂s(β 1 β , α ) s p i (β β, α) E s p i (β . E E

= β, α) β, α) n ∂(β n n ∂(β

(2.14)

i=1

, β , α ) with (β The variance estimate is obtained by replacing (β α ) in (2.14). It must be recognized that the PL estimator of β (and α ) is consistent only if it can be shown that β , α )] = 0 at the true β and α . Under the assumption about the conditional distribution of the E[s p (β β , α ) has mean equal to 0 at the true β and α provided πi (t; β , α ) follow-up times given by (2.4), s p (β is a linear function of the previous responses. In the bivariate case, the linear approximation of πi (t) can be shown to be exact provided that the covariance matrix has been correctly specified. However, in the multivariate case this no longer holds and the adequacy of the approximation will depend on two factors: (i) how well πi (t) is approximated by a linear function of the previous responses and (ii) how O (t−)) approximates the true underlying covariance of the responses. We examine closely Cov(Yi (t), Y i the adequacy of the approximation in Section 3 with studies of asymptotic and finite-sample bias. Finally, we note that although the proposed PL estimator requires a Gaussian approximation of πi (t), β and α are estimated jointly by maximizing (2.10). An alternative approach is to make a Gaussian approximation for πi (t) and, in addition, ‘decouple’ estimation of β from the estimation of α (see, for example, Hand and Crowder, 1996). This is precisely the approach taken in the modified GEE based on Gaussian estimation, as suggested by Lipsitz et al. (1992), Lee et al. (1999, unpublished manuscript), and Lipsitz et al. (2000). The modified GEE uses the multivariate normal estimating equations for estimation of the correlation parameters, α , combined with the standard GEE for the marginal mean regression parameters, β , thereby ‘decoupling’ estimation of β from estimation of α . The use of the multivariate normal estimating equations for α ensures that the estimated correlation matrix is non-negative definite, while avoiding full specification of the multinomial joint distribution of the responses. Lipsitz et al. (2000) have shown that this modified GEE yields little bias when there are missing data that are assumed to be missing at random. As mentioned earlier, if the follow-up time process is considered to be a form of selection, then there are very close parallels with ‘missing at random’ missing data mechanisms (Little and Rubin, 1987). In Sections 3 and 4, we explore the properties of the proposed PL estimator and compare it to the standard and modified (or ‘decoupled’) GEE estimators of the regression parameters in marginal models. 3. S TUDIES OF BIAS In this section, we consider the asymptotic and finite-sample bias resulting from the different methods when follow-up times are random and depend on previous outcomes. Our main concern is with the potential bias in estimating β . We consider the PL estimator and compare it to both the standard and modified (or ‘decoupled’) GEE estimators (Lipsitz et al., 2000), with possible bias arising because of the nature of the follow-up mechanism.

Estimation in regression models for longitudinal binary data 3.1

477

Study of asymptotic bias

We consider a two-group longitudinal design configuration with a binary response measured on four occasions. We assume that half of the individuals are assigned to each of the two groups, i.e. the group indicator variable, xi , equals 0 or 1 with pr(xi = 1) = 0.5. The marginal model for the mean of Yi (t) is pi (t) = E[Yi (t)|xi ] =

exp[β0 + β1 xi + β2 t + β3 xi t] , 1 + exp[β0 + β1 xi + β2 t + β3 xi t]

(3.15)

and an autoregressive correlation is assumed, ρi (u, t) = Corr[Yi (t), Yi (t − u)|xi ] = α |u| ,

(3.16)

for α = 0.2 or 0.4. It is assumed that each individual is seen four times, with the first time point set equal to Ti1 = 0. A detailed description of the specification of the joint distribution of the responses and the follow-up times can be found as supplementary material available at Biostatistics online (www.biostatistics.oupjournals.org). We specify the conditional distributions of the follow-up times as ‘first-order’, ‘second-order’, or ‘thirdorder’, depending on how many of the previous outcomes the follow-up time depends. Finally, the distribution for the ‘measurement model’ is specified through a series of conditional distributions, given the past history of responses, via the Bahadur representation (Bahadur, 1961); the Bahadur representation of those distributions is specified such that (2.15) and (2.16) hold. We formulate two sets of conditional distributions. The first set is the ‘first-order’ dependence model, in which the distribution of the outcome at a given time depends only on the previous outcome; a Markov-type dependence. The second set is the ‘full Bahadur’ model, in which the distribution of the outcome at a given time depends on all previous outcomes. A more detailed description can be found as supplementary material available at Biostatistics online. In the study of asymptotic bias, we fix (β0 , β1 , β2 , β3 ) = (1, −1, 1, −1) and set α = 0.2 or 0.4. Thus, we specify two models for the outcome distribution, ‘full Bahadur’ and ‘first-order’ dependence, and three models for the follow-up time distribution, ‘first-order’, ‘second-order’, and ‘third-order’. P denote an estimate from one of the methods described in Section 2.3, β −→ Letting β β ∗ , where β ∗ is β ∗ − β ). not necessarily equal to β . Here we are primarily interested in assessing the asymptotic biases, (β Following Rotnitzky and Wypij (1994), the asymptotic bias can be ascertained by simply considering an artificial sample comprised of one suitably weighted observation for each possible realization of the outcomes and follow-up times. Then, we can solve for β ∗ in the usual way, except that each pseudoindividual’s contribution to the estimating equations is weighted by its respective probability. We examine the asymptotic bias of the various estimators of β , and explore how different specifications of the ‘working’ correlation with GEE can affect the asymptotic bias. The three ‘working’ correlation structures considered are ‘independence’ [ρi (u, t) = 0], ‘exchangeable’ [ρi (u, t) = ρ], and the correct model, ‘first-order autoregressive’. Table 2 displays the asymptotic bias for the group–time interaction effect (β3 ), the effect usually of most interest in a longitudinal study. It can be seen that the PL estimator is unbiased under a ‘firstorder’ dependence measurement model, regardless of the nature of the follow-up time process. The PL estimator has relatively small bias (