Analysis of clustered recurrent event data with application to hospitalization rates among renal failure patients


270 50 119KB

English Pages 16 Year 2005

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Analysis of clustered recurrent event data with application to hospitalization rates among renal failure patients

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Biostatistics (2005), 6, 3, pp. 404–419 doi:10.1093/biostatistics/kxi018 Advance Access publication on April 14, 2005

Analysis of clustered recurrent event data with application to hospitalization rates among renal failure patients DOUGLAS E. SCHAUBEL∗ Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109-2029, USA [email protected] JIANWEN CAI Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7420, USA

S UMMARY End-stage renal disease (commonly referred to as renal failure) is of increasing concern in the United States and many countries worldwide. Incidence rates have increased, while the supply of donor organs has not kept pace with the demand. Although renal transplantation has generally been shown to be superior to dialysis with respect to mortality, very little research has been directed towards comparing transplant and wait-list patients with respect to morbidity. Using national data from the Scientific Registry of Transplant Recipients, we compare transplant and wait-list hospitalization rates. Hospitalizations are subject to two levels of dependence. In addition to the dependence among within-patient events, patients are also clustered by listing center. We propose two marginal methods to analyze such clustered recurrent event data; the first model postulates a common baseline event rate, while the second features cluster-specific baseline rates. Our results indicate that kidney transplantation offers a significant decrease in hospitalization, but that the effect is negated by a waiting time (until transplant) of more than 2 years. Moreover, graft failure (GF) results in a significant increase in the hospitalization rate which is greatest in the first month post-GF, but remains significantly elevated up to 4 years later. We also compare results from the proposed models to those based on a frailty model, with the various methods compared and contrasted. Keywords: Clustered failure time data; Frailty model; Proportional means model recurrent events; Semiparametric model; Transplant.

1. I NTRODUCTION End-stage renal disease (ESRD) is of increasing public health concern in the United States and many other countries worldwide. Patients who reach ESRD (commonly referred to as renal failure) must either undergo kidney transplantation or receive dialysis in order to remain alive. Although renal transplantation is generally the preferred therapeutic modality, patients usually begin on dialysis and many remain there due to the unavailability of donor organs. For the United States, although organ donation rates have ∗ To whom correspondence should be addressed.

Published by Oxford University Press 2005.

Analysis of clustered recurrent event data

405

increased, they have not nearly kept pace with the increase in ESRD incidence. As such, organs are in increasingly short supply, causing health care providers and public health officials to pay even closer attention to the distribution of a life-saving therapy. Indeed, the algorithms by which patients are sequenced on the transplant wait-list are under review for several organs. Correspondingly, quantifying the benefit of organ transplantation has recently received increased interest. Although renal transplantation has been definitively shown to be superior to dialysis with respect to patient survival (Wolfe et al., 1999; Rabbat et al., 2000; Ojo et al., 2001), few studies have compared morbidity between the two modalities. Therefore, our analytical objective was to compare hospitalization rates between renal transplant patients and comparable patients on dialysis. In the comparison of renal transplantation and dialysis, many issues merit consideration. First, results from single-center studies are often not generalizable to the underlying population of interest, since patient characteristics and practice patterns may differ greatly among centers. Second, patients who receive a transplant are a select group since patients must generally be deemed medically suitable for transplantation to be placed on the transplant wait-list. In evaluating the risk or benefit associated with transplantation, it is preferable to compare transplanted patients to comparable patients on dialysis; hence, patients receiving dialysis, but not on the wait-list, should not be included. Few previous studies have had access to reliable wait-list data. Third, standard errors may be greatly underestimated if the clustering of patients within center is not accounted for in the analysis, which could lead to distorted inference and incorrect conclusions. Using data obtained from the Scientific Registry of Transplant Recipients (SRTR) and collected by the Organ Procurement and Transplantation Network, a retrospective cohort study was conducted to compare hospitalization rates between transplanted renal failure patients and those on the wait-list. The study population consisted of all patients wait-listed in the United States during 1999. The data structure consists of two levels of clustering. Naturally, hospitalizations for a given patient will not likely be independent. In addition, patients are clustered with respect to listing center (i.e. facility at which they would, or did, receive a transplant). Patients with the same listing center are more similar than patients from different listing centers, with respect to unmeasured factors which could impact morbidity rates, such as baseline health, disease severity, access to health care, propensity to use health services, medical coverage, education level, and socioeconomic status. In addition to patient-specific factors, center-specific issues, such as quality of care and comorbidity management strategies, may impact patient morbidity. Since the estimation of center-specific differences is not the chief objective in the current investigation, a marginal approach is indicated and it is necessary to account for clustering in the analysis. Marginal models have great appeal for modeling clustered failure time data when the dependence structure is not of direct interest. Examples include the marginal hazard models proposed by Wei et al. (1989), the common baseline hazard model of Lee et al. (1992), and the mixed baseline hazard models of Spiekerman and Lin (1998) and Clegg et al. (1999). Several authors have advocated modeling covariate effects on the mean or rate function. Pepe and Cai (1993) developed semiparametric methods for modeling the rate function, wherein each numbered recurrent event has a distinct baseline rate. Lawless and Nadeau (1995) considered modeling the mean number of events, and developed the theory for the discrete-time case. Subsequently, Lin et al. (2000) developed the asymptotic theory through empirical processes for the continuous time setting. Cook and Lawless (2002) have provided a comprehensive review of methods for the analysis of recurrent events. Each of the above-listed methods assume independence among individuals, and, therefore, cannot be directly applied to studies with clustered subjects. We propose two semiparametric methods for the analysis of clustered recurrent event data; i.e. multiple events per subject with subjects clustered. The event times may be right censored. The methods are based on marginal proportional rates models; one model featuring a baseline event rate which is common across clusters, with the second allowing for cluster-specific baselines. Due to the frequency of clustered recurrent events in clinical and epidemiologic studies, the proposed methods are widely applicable. In

406

D. E. S CHAUBEL AND J. C AI

addition to hospitalizations, examples include events such as infections, acute myocardial infarctions, and tumor metastases. Often, study subjects will be correlated, resulting in clustered data. For example, in familial studies, individuals within a family may be correlated due to shared genetic factors; in a childhood school asthma study, children from the same neighborhood may share certain environmental risk factors (e.g. air particulate levels), or in a multicenter study of technique failures among patients on dialysis, patients from the same center may be correlated due to center-specific characteristics with respect to practice patterns. Factors through which patients may be clustered are often unmeasured. The remainder of this article is organized as follows. In Section 2, we introduce the models and methods for estimating their parameters. Asymptotic results are provided in Section 3, along with a summary of the finite-sample properties studied through simulation. In Section 4, we analyze the aforedescribed renal failure data using both proposed methods. For comparison, we also analyze the data using a frailty model. In addition, we compare results based on the proposed method with those from methods which incorrectly assume various levels of independence. In Section 5, a discussion of the study results and a comparison of the competing approaches to analyzing clustered recurrent event data are provided. 2. M ODELS AND METHODS We first establish the required notation. Let n represent the number of independent clusters, with the number of subjects in cluster j denoted by n j . Let Ni∗j (t) represent the number of events experienced by the ith subject from the jth cluster as of time t. We consider the following models: β 0T Zi j (s)} dµ0 (s), E[dNi∗j (s)|Zi j (s)] = exp{β

(2.1)

E[dNi∗j (s)|Zi j (s)]

(2.2)

=

β 0T Zi j (s)} dµ0 j (s), exp{β

where dµ0 (t) and dµ0 j (t) are unspecified baseline rate functions, β 0 is a p × 1 parameter vector, and Zi j (s) is a p × 1 vector of possibly time-dependent covariates of interest. An alternative to model (2.1) could be a clustered data analog of the familiar Andersen–Gill (1982) model, which would have the same model equation as (2.1), but would contain the implicit assumption that E[dNi∗j (s)|F i j (s)] = E[dNi∗j (s)|Zi j (s)],

(2.3)

where Fi j (t) is a filtration containing the event and covariate history, Fi j (t) = {Ni j (s−), Zi j (s); s ∈ [0, t]}. Since the relationship between future events and the event history may be very complicated to specify through a covariate vector, provided a reasonably specifiable model even exists, (2.3) represents a strong and unverifiable assumption. Moreover, the marginal interpretation of the rate model parameters may be preferred over the conditional interpretation of the intensity model parameters, with respect to describing covariate effects. When Zi j (s) = Zi j , models (2.1) and (2.2) can be written as β 0T Zi j }µ0 (t), E[Ni∗j (t)|Zi j ] = exp{β

(2.4)

β 0T Zi j }µ0 j (t), E[Ni∗j (t)|Zi j ] = exp{β

(2.5)

respectively. Models (2.1) and (2.2) are proportional rates models, while (2.4) and (2.5) are proportional means models, following terminology in Lin et al. (2000). The quantity, E[Ni∗j (t)], has the interpretation of a mean function if all time-dependent covariates are external in the sense of Kalbfleisch and Prentice (2002, pp. 196–200), e.g. if the covariate pathways are known at baseline, either because Zi j (s) = Zi j or because any time-dependent elements vary in a way which is known at time 0. Since models (2.4)–(2.5) are special cases of (2.1)–(2.2), respectively, we hereafter focus on the latter.

Analysis of clustered recurrent event data

407

Since the study is of finite duration, events will be censored. Let the observed number of events for subject i from cluster j be represented by Ni j (t) = Ni∗j (t ∧ Ci j ), where Ci j denotes censoring time. It is assumed that the censoring mechanism is independent in the sense that E[dNi∗j (s)|Zi j (s), Ci j > s] = E[dNi∗j (s)|Zi j (s)]. We first consider model (2.1). To derive estimation methods for the model parameters, let Mi∗j (t; β ) = Ni∗j (t) −



t

β T Zi j (s)} dµ0 (s). exp{β

(2.6)

0

Under the assumed model, Mi∗j (t; β ) does not have the familiar Martingale structure, since the model specifies the rate, not the intensity or hazard function. Clearly, the quantity in (2.6) has expectation 0 at β 0 , since E[dMi∗j (t; β 0 )|Zi j (t)] = 0, under the assumed model. We define the observed analog of (2.6), Mi j (t; β ) = Ni j (t) −



t

β T Zi j (s)} dµ0 (s), Yi j (s) exp{β

(2.7)

0

where Yi j (s) = I (Ci j > s) and I (A) takes the value 1 when event A occurs and 0 otherwise. Under the independent censoring assumption, E[dMi j (t; β 0 )|Zi j (t)] = 0, suggesting the following estimating equations, nj  n   j=1 i=1

τ

Zi j (s) dMi j (s; β ) = 0 p×1 ,

(2.8)

0

nj  n   j=1 i=1

t

dMi j (s; β ) = 0,

(2.9)

0

for β 0 and µ0 (t), respectively, where 0 p×1 is a p × 1 vector of zeros and τ satisfies P(Ci j > τ ) > 0 and is typically set to maxi, j {Ci j } such that all observed data contribute to the estimation procedure. Based on (2.9), for fixed β , an estimator of the baseline mean is given by µˆ 0 (t; β ) = n

−1

nj  n   j=1 i=1

t

S (0) (s; β )−1 dNi j (s),

(2.10)

0

 n j β T Zi j (s)} for d = 0, 1, 2, where, for a vector where S(d) (s; β ) = n −1 nj=1 i=1 Yi j (s)Zi j (s)⊗d exp{β a, a⊗0 = 1, a⊗1 = a, a⊗2 = aaT . Substituting (2.10) into (2.8), we obtain an estimating equation for β 0 , β ) = 0 p×1 , where Uc (β nj  τ n     β) = Uc (β Zi j (s) − E(s; β ) dNi j (s), (2.11) j=1 i=1

0

with E(s; β ) = S(1) (s; β )/S (0) (s; β ). Let βˆ c denote the solution to (2.11); the baseline mean function can be estimated by µˆ 0 (t; βˆ c ). For model (2.2), since the baseline rate functions are cluster-specific, a given µ0 j (t) can only be estimated using information from the jth cluster. Since subjects within a cluster are correlated, it is not possible to estimate µ0 j (t) consistently. That is, we are not assuming that subjects within a cluster are

408

D. E. S CHAUBEL AND J. C AI

independent, even after allowing for distinct cluster-specific baseline rates. Reasoning similar to that which leads to (2.11) yields the following estimating function for β 0 : nj  τ n     β) = Ud (β Zi j (s) − E j (s; β ) dNi j (s), (2.12) j=1 i=1

0

(1) (0) (d) where E j (s; β ) = S j (s; β )/S j (s; β ), with S j (s; β ) = n −1 j d = 0, 1, 2. We denote the solution to (2.12) by βˆ d .

n j

i=1 Yi j (s)Zi j (s)

⊗d

β T Zi j (s)} for exp{β

3. A SYMPTOTIC AND FINITE - SAMPLE PROPERTIES We summarize the essential asymptotic behavior of the regression parameter estimator for the common baseline model in the following theorem. T HEOREM 1 Under certain regularity conditions, βˆ c converges to β 0 almost surely and n 1/2 (βˆ c − β 0 ) is asymptotically normally distributed with mean 0 p×1 and a covariance matrix which can be consistently  ˆ  −1  −1 estimated by  c  c (β c ) c , where  τ c = V(s; βˆ c )S (0) (s; βˆ c ) dµˆ 0 (s; βˆ c ),  0

S(2) (s; β ) V(s; β ) = (0) − E(s; β )⊗2 , S (s; β ) n   c (β  cj (β β ) = n −1 β )⊗2 ,   j=1

 nj

 c (β  j β) =

i=1

τ 0

i j (s; β ), {Zi j (s) − E(s; β )} d M

i j (t; β ) = Ni j (t) − M



t

β T Zi j (s)} dµˆ 0 (s; β ). Yi j (s) exp{β

0

The consistency proof in Theorem 1 essentially follows several applications of the Strong Law of Large Numbers, followed by results from convex function theory. The proof of asymptotic normality combines the Multivariate Central Limit Theorem and properties from empirical processes. We next consider the baseline mean estimator for model (2.1) as a process over [0, τ ]. Define φˆ c (t) = {ˆ µ0 (t; βˆ c ) − µ0 (t)}. We describe the asymptotic behavior of {φˆ c (t); t ∈ [0, τ ]} by the following theorem. T HEOREM 2 Under certain regularity conditions, φˆ c (t) converges almost surely to 0, uniformly in t ∈ [0, τ ]; in addition, n 1/2 φˆ c (t) converges weakly to a zero-mean Gaussian process with a covariance function which can be estimated consistently by σˆ c (s, t), where σˆ c (s, t) = n −1

n 

ξˆ j (s)ξˆ j (t),

j=1

ˆ with h(t) =−

t 0

 −1   c (βˆ c ) + ˆ T ξˆ j (t) = h(t) c j E(s; βˆ c ) dµˆ 0 (s; βˆ c ).

nj   i=1

t 0

i j (s; βˆ c ), S (0) (s; βˆ c )−1 d M

Analysis of clustered recurrent event data

409

With respect to model (2.2), the asymptotic properties of βˆ d are summarized in Theorem 3. T HEOREM 3 Under certain regularity conditions, βˆ d converges to β 0 almost surely, and n 1/2 (βˆ d − β 0 ) converges in distribution to a mean-zero normal random vector with covariance consistently estimated by  ˆ  −1  −1  d  d (β d ) d , where nj  τ n   −1  d = n V j (s; βˆ d ) dNi j (s), (3.1) j=1 i=1

(2)

0

(0)

with V j (s; β ) = S j (s; β )/S j (s; β ) − E j (s; β )⊗2 and  d (β β ) = n −1 

n  j=1

 nj

 dj (β β) = 

i=1

τ 0

 dj (β β )⊗2 ,  

 i j (s; β ). Zi j (s) − E j (s; β ) d M

Hence, under the setup we consider, a robust variance estimator is required for (2.2). The allowance for cluster-specific baseline rates does not accommodate the lack of independence among subjects within a cluster. As an example, if there is an unmeasured cluster-specific covariate (say, R j ) which affects the event rate but is independent of a covariate, Zi j , then the regression parameter corresponding to Zi j can be estimated consistently even though R j is not included in the model; but, a variance estimator which assumed independence of subjects would be inconsistent. Both βˆ c and βˆ d were found to perform well in finite samples. Both were approximately unbiased for all data configurations examined. When µ0 j (t) = µ0 (t), βˆ c remained unbiased when the covariate distribution was equal across clusters; if the covariate distribution was cluster-dependent, then βˆ c was considerably biased. Even in small samples (n = 50) and a small number of subjects per cluster (n j = 5), βˆ d was virtually unbiased. The variance estimators, for both methods, were reasonably accurate, although a tendency to underestimate their empirical counterparts was observed. Empirical significance levels, for testing H0 : β 0 = 0, were shown to be greater than the nominal 0.05 when based on a variance estimator which incorrectly assumed within-subject and/or within-cluster independence, with the degree of inaccuracy increasing as the correlation and the number of subjects per cluster increased. Overall, the simulation study revealed that the asymptotic approximations were accurate in even moderate-sized samples and that, given the large number of clusters, could be expected to be accurate for application to the renal failure data. 4. A NALYSIS OF RENAL FAILURE DATA We applied the proposed model and methods to the analysis of hospitalizations among renal failure patients. For the purpose of comparison, we also fitted a frailty model. The study population consisted of 15 784 U.S. patients wait-listed for transplantation between January 1, 1999, and December 31, 1999. Patients were clustered by listing center (i.e. facility at which they were transplanted and/or wait-listed), since listing center is strongly associated with residence, which is correlated with several unmeasured factors (e.g. baseline health, socioeconomic status, propensity to use health services) which could affect hospitalization rates. In total, there were 240 listing centers; cluster size ranged from 3 to 644 patients, with a mean of 65.8.

410

D. E. S CHAUBEL AND J. C AI

Patients began follow-up at the date of initial wait-listing, and they were followed until the earliest of death, loss to follow-up, or the conclusion of the observation period, which was December 31, 2002. Patients who received a transplant without appearing on the wait-list began follow-up at the date of transplantation. In total, 29 353 hospitalizations were observed, for a mean of approximately 1.9 per patient. The number of hospitalizations per patient ranged from 0 to 56. There were 7773 (49%) patients who received a kidney transplant, 5380 (69%) of which were cadaveric. Graft failure (GF) was experienced by 451 (6%) of the transplanted patients. Although there is no universally applied definition of GF in the SRTR database, GF is generally considered to occur when the transplanted kidney fails to function sufficiently to preserve life, necessitating a return to dialysis. Approximately 44% of transplanted patients waited more than 24 months on dialysis, prior to receiving a transplant. There were 2894 deaths observed among the cohort members. In terms of morbidity and mortality, the joint experience of the cohort, at any point in time, can be expressed as the product of the probability of survival and the conditional rate of hospitalization given survival. Although it has already been shown that kidney transplant patients have reduced mortality relative to wait-listed patients, the effect of transplantation on morbidity has not been well studied, mostly due to lack of pertinent data. The fact that we wish to describe the hospitalization rate among patients currently alive suggests the following proportional rates model: β 0T Zi j (t)} dµ0 (t), (4.1) E[dNi∗j (t)|t < Di j , Zi j (t)] = exp{β where Di j is the time of death for patient i in center j. Model (4.1) represents a recurrent event analog of the cause-specific hazards model in the study of competing risks and has been discussed by Cook and Lawless (1997) and briefly in the discussion of Lin et al. (2000). We also fitted the cluster-specific baseline rate model: β 0T Zi j (t)} dµ0 j (t). E[dNi∗j (t)|t < Di j , Zi j (t)] = exp{β (4.2) In fitting models (4.1) and (4.2), we assume an independent censoring mechanism under which E[dNi∗j (s)|Zi j (s), s < Di j ∧ Ci j ] = E[dNi∗j (s)|Zi j (s), s < Di j ]. The risk set indicators for models (4.1) and (4.2) are Yi j (s) = I (Ci j ∧ Di j > s). Transplantation was represented as a time-dependent covariate, as was GF. There are three levels: wait-list, functioning transplant, and GF, with wait-list serving as the reference to which transplant (i.e. functioning transplant) and GF are compared. The effects of transplantation and GF were allowed to vary for different periods posttransplant and post-GF, respectively. Functioning transplants were classified based on donor source (living or cadaveric) and duration of dialysis prior to transplant. For model (4.1), adjustment covariates included age, gender, race, underlying disease leading to renal failure, and region. Adjustment covariates were the same for model (4.2), except that region was not included, since listing centers are nested within regions. For each model, ignoring the adjustment covariates, the linear predictor can be expressed as β T Z(t) = FT(t) {β1 LIV(t) + β2 CAD(t) + β3 I {(TX − TW ) < 6} + β4 I {6  (TX − TW ) < 12} + β5 I {24  (TX − TW ) < 48} + β6 I {(t − TX ) < 1} + β7 I {1  (t − TX ) < 12} + β8 I {12  (t − TX ) < 24}} + GF(t) {β9 I {(t − TGF ) < 1} + β10 I {1  (t − TGF ) < 48}} ,

(4.3)

where TW , TX , and TGF denote the times of wait-listing, transplant, and GF, respectively; FT(t), LIV(t), CAD(t), and GF(t) are indicator covariates for functioning transplant, living-donor transplant, cadaveric transplant, and GF, respectively, as of time t.

Analysis of clustered recurrent event data

411

The chief objective was to compare hospitalization rates between transplanted patients and patients on the wait-list. Since transplants are not assigned through randomization, we are examining the difference in the hospitalization rate associated with transplantation, as opposed to estimating a causal effect. Moreover, since model (4.3) includes time-dependent covariates for GF, the transplant parameters represent the effect of transplantation, given that the transplanted organ continues to function, as opposed to simply the effect of transplantation. To measure the latter, the GF covariates would be removed. In addition, among those transplanted, it is of interest to examine the effect of time on dialysis, since this is a modifiable risk factor. That is, time on dialysis would decrease, particularly among cadaveric transplant patients, if organ donation rates were increased. Parameter estimates based on the common baseline model are presented in Table 1, for all covariates of interest. For the time interval beyond 24 months posttransplant, for patients who received a cadaveric transplant after 12–24 months of dialysis, functioning transplant is associated with significantly decreased hospitalization rates, relative to wait-list, with estimated rate ratio (RR) of exp{−0.429} = 0.58 and Table 1. Analysis of hospitalization rates among renal failure patients—common baseline model  βˆc ) SE( —

exp{βˆc } 1

(95% CI: exp{βc }) —

−0.551 −0.429

0.085 0.078

0.49 0.58

(0.42, 0.57) (0.49, 0.68)

Pretransplant time on dialysis 24 months was estimated at RR = 0.58 × 1.50 = 0.98. The effect of transplant depends strongly on time posttransplant. For example, for patients who receive a cadaveric transplant after 12–24 months on dialysis, the RR equals 5.12 × 0.58 = 2.97 during the first month posttransplant, 1.58 × 0.58 = 0.92 during posttransplant months 1–12, and 0.58 × 0.98 = 0.57 during months 12–24. GF was associated with a dramatic increase in hospitalization rates, the effect being concentrated in the first month post-GF (RR = 4.80 relative to wait-list) but sustained through the remainder of the post-GF follow-up (RR = 1.60 for months 1–48 post-GF). Among the five nonreferent regions, two demonstrated significantly increased hospitalization rates relative to Region ‘A’, arbitrarily selected as the reference, specifically, Region ‘B’ (RR = 1.16) and Region ‘F’ (RR = 1.17). The United Network for Organ Sharing (UNOS) has defined 11 regions in the United States. The regions defined in this analysis represent groupings of those defined by UNOS, based on geographic proximity. Results based on the cluster-specific baseline rate model (Table 2) were very similar to those of the common baseline model. There was little difference in the regression parameter estimates and the estimated standard errors were approximately equal. As stated, region effects are not identifiable since region is constant within any center. Plots of the conditional cumulative rate are presented in Figure 1 for three hypothetical patients: one patient who remains on the wait-list (solid line); one who is transplanted (cadaveric organ) 6 months after being put on the wait-list (dashed line); and a third who is transplanted (cadaveric organ) at 6 months post-wait-listing and experiences GF 6 months posttransplant (dotted line). All three patients are at the reference category of the adjustment covariates, i.e. age 50–59, male, Caucasian, nondiabetic, from Ret gion ‘A’. The plot is of an estimator of the conditional cumulative rate function, 0 E[dNi∗j (s)|s < Di j ], representing the average experience of such patients, conditional on their survival. In Table 3, we compare three variance estimates, each based on different independence assumptions. The first variance estimator is based on the assumption that hospitalizations are independent, withinpatient and between patients within the same center. For the cluster-specific baseline model, this is given by ⎞−1 ⎛ nj  τ nj n  n     −1 .  −1 i j (s; βˆ d )}⊗2  ⎠ ⎝ N (τ ) {{Zi j (s) − E j (s; βˆ d )} d M (4.4)  ij d d j=1 i=1

j=1 i=1

0

The second variance estimator accounts for the dependence of events within-patient, but not that among patients within-cluster: ⎞−1 ⎛ n j  τ ⊗2 n  n   ˆ ˆ  −1 ⎝  −1  ⎠  (4.5) {Z (s) − E (s; β )} d M (s; β ) n  ij j ij j d d d , d j=1

j=1 i=1

0

which is the estimator proposed by Lin et al. (2000), but intended for independent subjects. Finally, the third variance estimate is computed in accordance with the proposed method, and accounts for withinpatient and within-cluster dependence. Without exception, the variance estimates decrease in magnitude, as a function of the degree of independence assumed. Noteworthy differences are observed between the proposed variance estimates and those which assume various degrees of independence. Results were very similar when the common baseline analogs of (4.4) and (4.5) were computed (data not shown).

Analysis of clustered recurrent event data

413

Table 2. Analysis of hospitalization rates among renal failure patients—distinct baseline model  βˆd ) SE( —

exp{βˆd } 1

(95% CI: exp{βd }) —

−0.544 −0.440

0.083 0.078

0.58 0.64

(0.49, 0.68) (0.55, 0.75)

Pretransplant time on dialysis