110 49
English Pages [452] Year 2015
SMALL AREA ESTIMATION Second Edition
J.N.K. RAO AND ISABEL MOLINA Wiley Series in Survey Methodology
Copyright © 2015 by John Wiley & S o n s, Inc. Published by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada
Library of Congress Cataloging-in-Publication Data: Rao, J. N. K., 1937- author. Small area estimation / J.N.K. Rao and Isabel Molina. – Second edition. pages cm – (Wiley series in survey methodology) Includes bibliographical references and index. ISBN 978-1-118-73578-7 (cloth) 1. Small area statistics. 2. Sampling (Statistics) 3. Estimation theory. I. Molina, Isabel, 1975- author. II. Title. III. Series: Wiley series in survey methodology. QA276.6.R344 2015 519.5′ 2–dc23 2015012610 Printed in the United States of America
CONTENTS
List of Figures
xv
List of Tables
xvii
Foreword to the First Edition
xix
Preface to the Second Edition
xxiii
Preface to the First Edition
xxvii
1
*Introduction 1.1 1.2 1.3 1.4 1.5 1.6
What is a Small Area? 1 Demand for Small Area Statistics, 3 Traditional Indirect Estimators, 4 Small Area Models, 4 Model-Based Estimation, 5 Some Examples, 6 1.6.1 Health, 6 1.6.2 Agriculture, 7 1.6.3 Income for Small Places, 8 1.6.4 Poverty Counts, 8 1.6.5 Median Income of Four-Person Families, 8 1.6.6 Poverty Mapping, 8
1
2
Direct Domain Estimation 2.1 2.2 2.3
2.4
2.5 2.6
2.7
2.8
3
Introduction, 9 Design-Based Approach, 10 Estimation of Totals, 11 2.3.1 Design-Unbiased Estimator, 11 2.3.2 Generalized Regression Estimator, 13 Domain Estimation, 16 2.4.1 Case of No Auxiliary Information, 16 2.4.2 GREG Domain Estimation, 17 2.4.3 Domain-Specific Auxiliary Information, 18 Modified GREG Estimator, 21 Design Issues, 23 2.6.1 Minimization of Clustering, 24 2.6.2 Stratification, 24 2.6.3 Sample Allocation, 24 2.6.4 Integration of Surveys, 25 2.6.5 Dual-Frame Surveys, 25 2.6.6 Repeated Surveys, 26 *Optimal Sample Allocation for Planned Domains, 26 2.7.1 Case (i), 26 2.7.2 Case (ii), 29 2.7.3 Two-Way Stratification: Balanced Sampling, 31 Proofs, 32 2.8.1 Proof of Ŷ GR (𝐱) = 𝐗, 32 2.8.2 Derivation of Calibration Weights 𝑤∗j , 32 ̂ when cj 𝝂 T 𝐗j , 32 2.8.3 Proof of Ŷ 𝐗̂ T 𝐁
Indirect Domain Estimation 3.1 3.2
3.3
3.4
9
Introduction, 35 Synthetic Estimation, 36 3.2.1 No Auxiliary Information, 36 3.2.2 *Area Level Auxiliary Information, 36 3.2.3 *Unit Level Auxiliary Information, 37 3.2.4 Regression-Adjusted Synthetic Estimator, 42 3.2.5 Estimation of MSE, 43 3.2.6 Structure Preserving Estimation, 45 3.2.7 *Generalized SPREE, 49 3.2.8 *Weight-Sharing Methods, 53 Composite Estimation, 57 3.3.1 Optimal Estimator, 57 3.3.2 Sample-Size-Dependent Estimators, 59 James–Stein Method, 63
35
3.5 4
Common Weight, 63 Equal Variances 𝜓i 𝜓, 64 Estimation of Component MSE, 68 Unequal Variances 𝜓i , 70 Extensions, 71 71
Small Area Models 4.1 4.2 4.3 4.4
4.5
4.6
5
3.4.1 3.4.2 3.4.3 3.4.4 3.4.5 Proofs,
Introduction, 75 Basic Area Level Model, 76 Basic Unit Level Model, 78 Extensions: Area Level Models, 81 4.4.1 Multivariate Fay–Herriot Model, 81 4.4.2 Model with Correlated Sampling Errors, 82 4.4.3 Time Series and Cross-Sectional Models, 83 4.4.4 *Spatial Models, 86 4.4.5 Two-Fold Subarea Level Models, 88 Extensions: Unit Level Models, 88 4.5.1 Multivariate Nested Error Regression Model, 88 4.5.2 Two-Fold Nested Error Regression Model, 89 4.5.3 Two-Level Model, 90 4.5.4 General Linear Mixed Model, 91 Generalized Linear Mixed Models, 92 4.6.1 Logistic Mixed Models, 92 4.6.2 *Models for Multinomial Counts, 93 4.6.3 Models for Mortality and Disease Rates, 93 4.6.4 Natural Exponential Family Models, 94 4.6.5 *Semi-parametric Mixed Models, 95
Empirical Best Linear Unbiased Prediction (EBLUP): Theory 5.1 5.2
5.3
5.4
75
Introduction, 97 General Linear Mixed Model, 98 5.2.1 BLUP Estimator, 98 5.2.2 MSE of BLUP, 100 5.2.3 EBLUP Estimator, 101 5.2.4 ML and REML Estimators, 102 5.2.5 MSE of EBLUP, 105 5.2.6 Estimation of MSE of EBLUP, 106 Block Diagonal Covariance Structure, 108 5.3.1 EBLUP Estimator, 108 5.3.2 Estimation of MSE, 109 5.3.3 Extension to Multidimensional Area Parameters, 110 *Model Identification and Checking, 111
97
5.5 5.6
6
Empirical Best Linear Unbiased Prediction (EBLUP): Basic Area Level Model 6.1
6.2
6.3 6.4
6.5 7
5.4.1 Variable Selection, 111 5.4.2 Model Diagnostics, 114 *Software, 118 Proofs, 119 5.6.1 Derivation of BLUP, 119 5.6.2 Equivalence of BLUP and Best Predictor E 𝐦T 𝐯|𝐀T 𝐲 , 120 5.6.3 Derivation of MSE Decomposition (5.2.29), 121
EBLUP Estimation, 123 6.1.1 BLUP Estimator, 124 6.1.2 Estimation of 𝜎𝑣2 , 126 6.1.3 Relative Efficiency of Estimators of 𝜎𝑣2 , 128 6.1.4 *Applications, 129 MSE Estimation, 136 6.2.1 Unconditional MSE of EBLUP, 136 6.2.2 MSE for Nonsampled Areas, 139 6.2.3 *MSE Estimation for Small Area Means, 140 6.2.4 *Bootstrap MSE Estimation, 141 6.2.5 *MSE of a Weighted Estimator, 143 6.2.6 Mean Cross Product Error of Two Estimators, 144 6.2.7 *Conditional MSE, 144 *Robust Estimation in the Presence of Outliers, 146 *Practical Issues, 148 6.4.1 Unknown Sampling Error Variances, 148 6.4.2 Strictly Positive Estimators of 𝜎𝑣2 , 151 6.4.3 Preliminary Test Estimation, 154 6.4.4 Covariates Subject to Sampling Errors, 156 6.4.5 Big Data Covariates, 159 6.4.6 Benchmarking Methods, 159 6.4.7 Misspecified Linking Model, 165 *Software, 169
Basic Unit Level Model 7.1
7.2
123
EBLUP Estimation, 173 7.1.1 BLUP Estimator, 174 7.1.2 Estimation of 𝜎𝑣2 and 𝜎e2 , 177 7.1.3 *Nonnegligible Sampling Fractions, 178 MSE Estimation, 179 7.2.1 Unconditional MSE of EBLUP, 179 7.2.2 Unconditional MSE Estimators, 181
173
7.2.3
7.3 7.4
7.5 7.6
7.7 7.8
8
EBLUP: Extensions 8.1 8.2 8.3
8.4 8.5 8.6 8.7 8.8 8.9 8.10 8.11 9
*MSE Estimation: Nonnegligible Sampling Fractions, 182 7.2.4 *Bootstrap MSE Estimation, 183 *Applications, 186 *Outlier Robust EBLUP Estimation, 193 7.4.1 Estimation of Area Means, 193 7.4.2 MSE Estimation, 198 7.4.3 Simulation Results, 199 *M-Quantile Regression, 200 *Practical Issues, 205 7.6.1 Unknown Heteroscedastic Error Variances, 205 7.6.2 Pseudo-EBLUP Estimation, 206 7.6.3 Informative Sampling, 211 7.6.4 Measurement Error in Area-Level Covariate, 216 7.6.5 Model Misspecification, 218 7.6.6 Semi-parametric Nested Error Model: EBLUP, 220 7.6.7 Semi-parametric Nested Error Model: REBLUP, 224 *Software, 227 *Proofs, 231 7.8.1 Derivation of (7.6.17), 231 7.8.2 Proof of (7.6.20), 232
*Multivariate Fay–Herriot Model, 235 Correlated Sampling Errors, 237 Time Series and Cross-Sectional Models, 240 8.3.1 *Rao–Yu Model, 240 8.3.2 State-Space Models, 243 *Spatial Models, 248 *Two-Fold Subarea Level Models, 251 *Multivariate Nested Error Regression Model, 253 Two-Fold Nested Error Regression Model, 254 *Two-Level Model, 259 *Models for Multinomial Counts, 261 *EBLUP for Vectors of Area Proportions, 262 *Software, 264
Empirical Bayes (EB) Method 9.1 9.2
235
Introduction, 269 Basic Area Level Model, 270 9.2.1 EB Estimator, 271 9.2.2 MSE Estimation, 273 9.2.3 Approximation to Posterior Variance, 275 9.2.4 *EB Confidence Intervals, 281
269
9.3
9.4
9.5
9.6
9.7 9.8
9.9
9.10 9.11 9.12
10
Linear Mixed Models, 287 9.3.1 EB Estimation of 𝜇i 𝐥i T 𝜷 + 𝐦Ti 𝐯i , 287 9.3.2 MSE Estimation, 288 9.3.3 Approximations to the Posterior Variance, 288 *EB Estimation of General Finite Population Parameters, 289 9.4.1 BP Estimator Under a Finite Population, 290 9.4.2 EB Estimation Under the Basic Unit Level Model, 290 9.4.3 FGT Poverty Measures, 293 9.4.4 Parametric Bootstrap for MSE Estimation, 294 9.4.5 ELL Estimation, 295 9.4.6 Simulation Experiments, 296 Binary Data, 298 9.5.1 *Case of No Covariates, 299 9.5.2 Models with Covariates, 304 Disease Mapping, 308 9.6.1 Poisson–Gamma Model, 309 9.6.2 Log-Normal Models, 310 9.6.3 Extensions, 312 *Design-Weighted EB Estimation: Exponential Family Models, 313 Triple-Goal Estimation, 315 9.8.1 Constrained EB, 316 9.8.2 Histogram, 318 9.8.3 Ranks, 318 Empirical Linear Bayes, 319 9.9.1 LB Estimation, 319 9.9.2 Posterior Linearity, 322 Constrained LB, 324 *Software, 325 Proofs, 330 9.12.1 Proof of (9.2.11), 330 9.12.2 Proof of (9.2.30), 330 9.12.3 Proof of (9.8.6), 331 9.12.4 Proof of (9.9.1), 331
Hierarchical Bayes (HB) Method 10.1 10.2
Introduction, 333 MCMC Methods, 335 10.2.1 Markov Chain, 335 10.2.2 Gibbs Sampler, 336 10.2.3 M–H Within Gibbs, 336 10.2.4 Posterior Quantities, 337 10.2.5 Practical Issues, 339 10.2.6 Model Determination, 342
333
10.3
10.4 10.5
10.6 10.7
10.8 10.9 10.10
10.11
10.12 10.13
10.14 10.15 10.16 10.17 10.18
Basic Area Level Model, 347 10.3.1 Known 𝜎𝑣2 , 347 10.3.2 *Unknown 𝜎𝑣2 : Numerical Integration, 348 10.3.3 Unknown 𝜎𝑣2 : Gibbs Sampling, 351 10.3.4 *Unknown Sampling Variances 𝜓i , 354 10.3.5 *Spatial Model, 355 *Unmatched Sampling and Linking Area Level Models, 356 Basic Unit Level Model, 362 10.5.1 Known 𝜎𝑣2 and 𝜎e2 , 362 10.5.2 Unknown 𝜎𝑣2 and 𝜎e2 : Numerical Integration, 363 10.5.3 Unknown 𝜎𝑣2 and 𝜎e2 : Gibbs Sampling, 364 10.5.4 Pseudo-HB Estimation, 365 General ANOVA Model, 368 *HB Estimation of General Finite Population Parameters, 369 10.7.1 HB Estimator under a Finite Population, 370 10.7.2 Reparameterized Basic Unit Level Model, 370 10.7.3 HB Estimator of a General Area Parameter, 372 Two-Level Models, 374 Time Series and Cross-Sectional Models, 377 Multivariate Models, 381 10.10.1 Area Level Model, 381 10.10.2 Unit Level Model, 382 Disease Mapping Models, 383 10.11.1 Poisson-Gamma Model, 383 10.11.2 Log-Normal Model, 384 10.11.3 Two-Level Models, 386 *Two-Part Nested Error Model, 388 Binary Data, 389 10.13.1 Beta-Binomial Model, 389 10.13.2 Logit-Normal Model, 390 10.13.3 Logistic Linear Mixed Models, 393 *Missing Binary Data, 397 Natural Exponential Family Models, 398 Constrained HB, 399 *Approximate HB Inference and Data Cloning, 400 Proofs, 402 10.18.1 Proof of (10.2.26), 402 10.18.2 Proof of (10.2.32), 402 10.18.3 Proof of (10.3.13)–(10.3.15), 402
References
405
Author Index
431
Subject Index
437
FIGURES
Direct, Census, composite SPREE, and GLSM estimates of Row M M T Profiles 𝜽M i = (𝜃i1 , … , 𝜃iA ) for Canadian Provinces Newfoundland and Labrador (a) and Quebec (b), for Two-digit Occupation class A1.
53
Direct, Census, composite SPREE, and GLSM estimates of row M M T profiles 𝜽M i = (𝜃i1 , … , 𝜃iA ) for Canadian provinces Newfoundland and Labrador (a) and Nova Scotia (b), for Two-digit occupation class B5.
54
EBLUP and Direct Area Estimates of Average Expenditure on Fresh Milk for Each Small Area (a). CVs of EBLUP and Direct Estimators for Each Small Area (b). Areas are Sorted by Decreasing Sample Size.
171
7.1
Leverage measures sii versus scaled squared residuals.
231
8.1
Naive Nonparametric Bootstrap MSE Estimates Against Analytical MSE Estimates (a). Bias-corrected Nonparametric Bootstrap MSE Estimates Against Analytical MSE Estimates (b).
267
EBLUP Estimates, Based on the Spatial FH Model with SAR Random Effects, and Direct Estimates of Mean Surface Area Used for Production of Grapes for Each Municipality (a). CVs of EBLUP Estimates and of Direct Estimates for Each Municipality (b). Municipalities are Sorted by Increasing CVs of Direct Estimates.
268
Bias (a) and MSE (b) over Simulated Populations of EB, Direct, and ELL Estimates of Percent Poverty Gap 100 F1i for Each Area i.
297
True MSEs of EB Estimators of Percent Poverty Gap and Average of Bootstrap MSE Estimators Obtained with B = 500 for Each Area i.
298
3.1
3.2
6.1
8.2
9.1 9.2
Bias (a) and MSE (b) of EB, Direct and ELL Estimators of the Percent Poverty Gap 100 F1i for Each Area i under Design-Based Simulations. 9.4 Index Plot of Residuals (a) and Histogram of Residuals (b) from the Fitting of the Basic Unit Level Model with Response Variable log(income+constant). 10.1 Coefficient of Variation (CV) of Direct and HB Estimates.
9.3
10.2 CPO Comparison Plot for Models 1–3. 10.3 Direct, Cross-sectional HB (HB2) and Cross-Sectional and Time Series HB (HB1) Estimates.
299
327 354 376 379
10.4 Coefficient of Variation of Cross-sectional HB (HB2) and Cross-Sectional and Time Series HB (HB1) Estimates. 379
TABLES
True State Proportions, Direct and Synthetic Estimates, and Associated Estimates of RRMSE
41
3.2
Medians of Percent ARE of SPREE Estimates
48
3.3
Percent Average Absolute Relative Bias (ARB%) and Percent Average RRMSE (RRMSE%) of Estimators
62
Batting Averages for 18 Baseball Players
68
3.1
3.4
2 𝜎̂ 𝑣m
6.1
Values of
for States with More Than 500 Small Places
131
6.2
Values of Percentage Absolute Relative Error of Estimates from True Values: Places with Population Less Than 500
132
Average MSE of EBLUP estimators based on REML, LL, LLM, YL, and YLM Methods of estimating 𝜎𝑣2
153
6.3
MSE(𝜃̂iYL )
6.4
% Relative Bias (RB) of Estimators of
7.1
EBLUP Estimates of County Means and Estimated Standard Errors of EBLUP and Survey Regression Estimates
188
Unconditional Comparisons of Estimators: Real and Synthetic Population
190
Effect of Between-Area Homogeneity on the Performance of SSD and EBLUP
191
EBLUP and Pseudo-EBLUP Estimates and Associated Standard Errors (s.e.): County Corn Crop Areas
210
7.2 7.3 7.4
154
7.5
8.1 8.2
Average Absolute Bias (AB), Average Root Mean Squared Error (RMSE) of Estimators, and Percent Average Absolute Relative Bias (ARB) of MSE Estimators Distribution of Coefficient of Variation (%)
215 243
Average Absolute Relative Bias (ARB) and Average Relative Root MSE (RRMSE) of SYN, SSD, FH, and EBLUP (State-Space)
249
9.1 Percent Average Relative Bias (RB) of MSE Estimators 10.1 MSE Estimates and Posterior Variance for Four States 10.2 1991 Canadian Census Undercount Estimates and Associated CVs
281 350 359
10.3 EBLUP and HB Estimates and Associated Standard Errors: County Corn Areas
366
10.4 Pseudo-HB and Pseudo-EBLUP Estimates and Associated Standard Errors: County Corn Areas 10.5 Estimated % CVs of the Direct, EB, and HB Estimators of Poverty Incidence for Selected Provinces by Gender 10.6 Average Absolute Relative Error (ARE%): Median Income of Four-Person Families 10.7 Comparison of Models 1–3: Mortality Rates
367 373 382 387
FOREWORD TO THE FIRST EDITION
The history of modern sample surveys dates back to the nineteenth century, but the field did not fully emerge until the 1930s. It grew considerably during the World War II, and has been expanding at a tremendous rate ever since. Over time, the range of topics investigated using survey methods has broadened enormously as policy makers and researchers have learned to appreciate the value of quantitative data and as survey researchers—in response to policy makers’ demands—have tackled topics previously considered unsuitable for study using survey methods. The range of analyses of survey data has also expanded, as users of survey data have become more sophisticated and as major developments in computing power and software have simplified the computations involved. In the early days, users were mostly satisfied with national estimates and estimates for major geographic regions and other large domains. The situation is very different today: more and more policy makers are demanding estimates for small domains for use in making policy decisions. For example, population surveys are often required to provide estimates of adequate precision for domains defined in terms of some combination of factors such as age, sex, race/ethnicity, and poverty status. A particularly widespread demand from policy makers is for estimates at a finer level of geographic detail than the broad regions that were commonly used in the past. Thus, estimates are frequently needed for such entities as states, provinces, counties, school districts, and health service areas. The need to provide estimates for small domains has led to developments in two directions. One direction is toward the use of sample designs that can produce domain estimates of adequate precision within the standard design-based mode of inference used in survey analysis (i.e., “direct estimates”). Many sample surveys are now designed to yield sufficient sample sizes for key domains to satisfy the precision requirements for those domains. This approach is generally used for socio-economic
domains and for some larger geographic domains. However, the increase in overall sample size that this approach entails may well exceed the survey’s funding resources and capabilities, particularly so when estimates are required for many geographic areas. In the United States, for example, few surveys are large enough to be capable of providing reliable subpopulation estimates for all 50 states, even if the sample is optimally allocated across states for this purpose. For very small geographic areas such as school districts, either a complete census or a sample of at least the size of the census of long-form sample (on average about 1 in 6 households nationwide) is required. Even censuses, however, although valuable, cannot be the complete solution for the production of small area estimates. In most countries, censuses are conducted only once a decade. They cannot, therefore, provide satisfactory small area estimates for intermediate time points during a decade for population characteristics that change markedly over time. Furthermore, census content is inherently severely restricted, so a census cannot provide small area estimates for all the characteristics that are of interest. Hence, another approach is needed. The other direction for producing small area estimates is to turn away from conventional direct estimates toward the use of indirect model-dependent estimates. The model-dependent approach employs a statistical model that “borrows strength” in making an estimate for one small area from sample survey data collected in other small areas or at other time periods. This approach moves away from the design-based estimation of conventional direct estimates to indirect model-dependent estimates. Naturally, concerns are raised about the reliance on models for the production of such small area estimates. However, the demand for small area estimates is strong and increasing, and models are needed to satisfy that demand in many cases. As a result, many survey statisticians have come to accept the model-dependent approach in the right circumstances, and the approach is being used in a number of important cases. Examples of major small area estimation programs in the United States include the following: the Census Bureau’s Small Area Income and Poverty Estimates program, which regularly produces estimates of income and poverty measures for various population subgroups for states, counties, and school districts; the Bureau of Labor Statistics’ Local Area Unemployment Statistics program, which produces monthly estimates of employment and unemployment for states, metropolitan areas, counties, and certain subcounty areas; the National Agricultural Statistics Service’s County Estimates Program, which produces county estimates of crop yield; and the estimates of substance abuse in states and metropolitan areas, which are produced by the Substance Abuse and Mental Health Services Administration (see Chapter 1). The essence of all small area methods is the use of auxiliary data available at the small area level, such as administrative data or data from the last census. These data are used to construct predictor variables for use in a statistical model that can be used to predict the estimate of interest for all small areas. The effectiveness of small area estimation depends initially on the availability of good predictor variables that are uniformly measured over the total area. It next depends on the choice of a good prediction model. Effective use of small area estimation methods further depends on a careful, thorough evaluation of the quality of the model. Finally, when small
area estimates are produced, they should be accompanied by valid measures of their precision. Early applications of small area estimation methods employed only simple methods. At that time, the choice of the method for use in particular case was relatively simple, being limited by the computable methods then in existence. However, the situation has changed enormously in recent years, and particularly in the last decade. There now exist a wide range of different, often complex, models that can be used, depending on the nature of the measurement of the small area estimate (e.g., a binary or continuous variable) and on the auxiliary data available. One key distinction in model construction is between situations where the auxiliary data are available for the individual units in the population and those where they are available only at the aggregate level for each small area. In the former case, the data can be used in unit level models, whereas in the latter they can be used only in area level models. Another feature involved in the choice of model is whether the model borrows strength cross-sectionally, over time, or both. There are also now a number of different approaches, such as empirical best linear prediction (EBLUP), empirical Bayes (EB), and hierarchical Bayes (HB), which can be used to estimate the models and the variability of the model-dependent small area estimates. Moreover, complex procedures that would have been extremely difficult to apply a few years ago can now be implemented fairly straightforwardly, taking advantage of the continuing increases in computing power and the latest developments in software. The wide range of possible models and approaches now available for use can be confusing to those working in this area. J.N.K. Rao’s book is therefore a timely contribution, coming at a point in the subject’s development when an integrated, systematic treatment is needed. Rao has done a great service in producing this authoritative and comprehensive account of the subject. This book will help to advance the subject and be a valuable resource for practitioners and theorists alike. Graham Kalton
PREFACE TO THE SECOND EDITION
Small area estimation (SAE) deals with the problem of producing reliable estimates of parameters of interest and the associated measures of uncertainty for subpopulations (areas or domains) of a finite population for which samples of inadequate sizes or no samples are available. Traditional “direct estimates,” based only on the area-specific sample data, are not suitable for SAE, and it is necessary to “borrow strength” across related small areas through supplementary information to produce reliable “indirect” estimates for small areas. Indirect model-based estimation methods, based on explicit linking models, are now widely used. The first edition of Small Area Estimation (Rao 2003a) provided a comprehensive account of model-based methods for SAE up to the end of 2002. It is gratifying to see the enthusiastic reception it has received, as judged by the significant number of citations and the rapid growth in SAE literature over the past 12 years. Demand for reliable small area estimates has also greatly increased worldwide. As an example, the estimation of complex poverty measures at the municipality level is of current interest, and World Bank uses a model-based method, based on simulating multiple censuses, in more than 50 countries worldwide to produce poverty statistics for small areas. The main aim of the present second edition is to update the first edition by providing a comprehensive account of important theoretical developments from 2003 to 2014. New SAE literature is quite extensive and often involves complex theory to handle model misspecifications and other complexities. We have retained a large portion of the material from the first edition to make the book self-contained, and supplemented it with selected new developments in theory and methods of SAE. Notations and terminology used in the first edition are largely retained. As in the first edition, applications are included throughout the chapters. An added feature of
the second edition is the inclusion of sections (Sections 5.5, 6.5, 7.7, 8.11, and 9.11) describing specific R software for SAE, concretely the R package sae (Molina and Marhuenda 2013; Molina and Marhuenda 2015). These sections include examples of SAE applications using data sets included in the package and provide all the necessary R codes, so that the user can exactly replicate the applications. New sections and old sections with significant changes are indicated by an asterisk in the book. Chapter 3 on “Traditional Demographic Methods” from first edition is deleted partly due to page constraints and the fact that the material is somewhat unrelated to mainstream model-based methods. Also, we have not been able to keep up to date with the new developments in demographic methods. Chapter 1 introduces basic terminology related to SAE and presents selected important applications as motivating examples. Chapter 2, as in the first edition, presents a concise account of direct estimation of totals or means for small areas and addresses survey design issues that have a bearing on SAE. New Section 2.7 deals with optimal sample allocation for planned domains and the estimation of marginal row and column strata means in the presence of two-way stratification. Chapter 3 gives a fairly detailed account of traditional indirect estimation based on implicit linking models. The well-known James–Stein method of composite estimation is also studied in the context of sample survey data. New Section 3.2.7 studies generalized structure preserving estimation (GSPREE) based on relaxing some interaction assumptions made in the traditional SPREE, which is often used in practice because it makes fuller use of reliable direct estimates at a higher level to produce synthetic estimates. Another important addition is weight sharing (or splitting) methods studied in Section 3.2.8. The weight-sharing methods produce a two-way table of weights with rows as the units in the full sample and columns as the areas such that the cell weights in each row add up to the original sample weight. Such methods are especially useful in micro-simulation modeling that can involve a large number of variables of interest. Explicit small area models that account for between-area variability are introduced in Chapter 4 (previous Chapter 5), including linear mixed models and generalized linear mixed models such as logistic linear mixed models with random area effects. The models are classified into two broad categories: (i) area level models that relate the small area means or totals to area level covariates; and (ii) unit level models that relate the unit values of a study variable to unit-specific auxiliary variables. Extensions of the models to handle complex data structures, such as spatial dependence and time series structures, are also considered. New Section 4.6.5 introduces semi-parametric mixed models, which are studied later. Chapter 5 (previous Chapter 6) studies linear mixed models involving fixed and random effects. It gives general results on empirical best linear-unbiased prediction (EBLUP) and the estimation of mean squared error (MSE) of the EBLUP. A detailed account of model identification and checking for linear mixed models is presented in the new Section 5.4. Available SAS software and R statistical software for linear mixed models are summarized in the new Section 5.5. The R package sae specifically designed for SAE is also described. Chapter 6 of the First Edition provided a detailed account of EBLUP estimation of small area means or totals for the basic area level and unit level models, using
the general theory given in Chapter 5. In the past 10 years or so, researchers have done extensive work on those two models, especially addressing problems related to model misspecification and other practical issues. As a result, we decided to split the old Chapter 6 into two new chapters, with Chapter 6 focusing on area level models and Chapter 7 addressing unit level models. New topics covered in Chapter 6 include bootstrap MSE estimation (Section 6.2.4) and robust estimation in the presence of outliers (Section 6.3). Section 6.4 deals with practical issues related to the basic area level model. It includes important topics such as covariates subject to sampling errors (Section 6.4.4), misspecification of linking models (Section 6.4.7), benchmarking of model-based area estimators to ensure agreement with a reliable direct estimate when aggregated (Section 6.4.6), and the use of “big data” as possible covariates in area level models (Section 6.4.5). Functions of the R package sae designed for estimation under the area level model are described in Section 6.5. An example illustrating the use of these functions is provided. New topics introduced in Chapter 7 include bootstrap MSE estimation (Section 7.2.4), outlier robust EBLUP estimation (Section 7.4), and M-quantile regression (Section 7.5). Section 7.6 deals with practical issues related to the basic unit level model. It presents methods to deal with important topics, including measurement errors in covariates (Section 7.6.4), model misspecification (Section 7.6.5), and semi-parametric nested error models (Sections 7.6.6 and 7.6.7). Most of the published literature assumes that the assumed model for the population values also holds for the sample. However, in many applications, this assumption may not be true due to informative sampling leading to sample selection bias. Section 7.6.3 gives a detailed treatment of methods to make valid inferences under informative sampling. Functions of R package sae dealing with the basic unit level model are described in Section 7.7. The use of these functions is illustrated through an application to the County Crop Areas data of Battese, Harter, and Fuller (1988). This application includes calculation of model diagnostics and drawing residual plots. Several important applications are also presented in Chapters 6 and 7. New chapters 8, 9, and 10 cover the same material as the corresponding chapters in the first edition. Chapter 8 contains EBLUP theory for various extensions of the basic area level and unit level models, providing updates to the sections in the first edition, in particular a more detailed account of spatial and two-level models. Section 8.4 on spatial models is updated, and functions of the R package sae dealing with spatial area level models are described in Section 8.11. An example illustrating the use of these functions is provided. Section 8.5 presents theory for two-fold subarea level models, which are natural extensions of the basic area level models. Chapter 9 presents empirical Bayes (EB) estimation. The EB method (also called empirical best) is more generally applicable than the EBLUP method. New Section 9.2.4 gives an account of methods for constructing confidence intervals in the case of basic area level model. EB estimation of general area parameters is the theme of Section 9.4, in particular complex poverty indicators studied by the World Bank. EB method is compared to the World Bank method in simulation experiments (Section 9.4.6). R software for EB estimation of general area parameters is described in Section 9.11, which includes an example on estimation of poverty indicators. Binary data and disease mapping from count data are studied in Sections 9.5 and 9.6,
respectively. An important addition is Section 9.7 dealing with design-weighted EB estimation under exponential family models. Previous sections on constrained EB estimation and empirical linear Bayes estimation are retained. Finally, Chapter 10 presents a self-contained account of the Hierarchical Bayes (HB) approach based on specifying prior distributions on the model parameters. Basic Markov chain Monte Carlo (MCMC) methods for HB inference, including model determination, are presented in Section 10.2. Several new developments are presented, including HB estimation of complex general area parameters, in particular poverty indicators (Section 10.7), two-part nested error models (Section 10.12), missing binary data (Section 10.14), and approximate HB inference (Section 10.17). Other sections in Chapter 10 more or less cover the material in the previous edition with some updates. Chapters 8–10 include brief descriptions of applications with real data sets. As in the first edition, we discuss the advantages and limitations of different SAE methods throughout the book. We also emphasize the need for both internal and external evaluations. To this end, we have provided various methods for model selection from the data, and comparison of estimates derived from models to reliable values obtained from external sources, such as previous census or administrative data. Proofs of some basic results are provided, but proofs of results that are technically involved or lengthy are omitted, as in the first edition. We have provided fairly self-contained accounts of direct estimation (Chapter 2), EBLUP and EB estimation (Chapters 5 and 9), and HB estimation (Chapter 10). However, prior exposure to a standard text in mathematical statistics, such as the 2001 Brooks/Cole book Statistical Inference (second edition) by G. Casella and R. L. Berger, is essential. Also, a basic course in regression and mixed models, such as the 2001 Wiley book Generalized, Linear and Mixed Models by C. E. McCulloch and S. E. Searle, would be helpful in understanding model-based SAE. A basic course in survey sampling techniques, such as the 1977 Wiley book Sampling Techniques (third edition) by W.G. Cochran is also useful but not essential. This book is intended primarily as a research monograph, but it is also suitable for a graduate level course on SAE, as in the case of the first edition. Practitioners interested in learning SAE methods may also find portions of this text useful, in particular Chapters 3, 6, 7 and Sections 10.1–10.3 and 10.5 as well as the examples and applications presented throughout the book. We are thankful to Emily Berg, Yves Berger, Ansu Chatterjee, Gauri Datta, Laura Dumitrescu, Wayne Fuller, Malay Ghosh, David Haziza, Jiming Jiang, Partha Lahiri, Bal Nandram, Jean Opsomer, and Mikhail Sverchkov for reading portions of the book and providing helpful comments and suggestions, to Domingo Morales for providing a very helpful list of publications in SAE and to Pedro Dulce for providing us with tailor made software for making author and subject indices. K
ao
sabel olina January, 2015
PREFACE TO THE FIRST EDITION
Sample surveys are widely used to provide estimates of totals, means, and other parameters not only for the total population of interest but also for subpopulations (or domains) such as geographic areas and socio-demographic groups. Direct estimates of a domain parameter are based only on the domain-specific sample data. In particular, direct estimates are generally “design-based” in the sense that they make use of “survey weights,” and the associated inferences (standard errors, confidence intervals, etc.) are based on the probability distribution induced by the sample design, with the population values held fixed. Standard sampling texts (e.g., the 1977 Wiley book Sampling Techniques by W.G. Cochran) provide extensive accounts of design-based direct estimation. Models that treat the population values as random may also be used to obtain model-dependent direct estimates. Such estimates in general do not depend on survey weights, and the associated inferences are based on the probability distribution induced by the assumed model (e.g., the 2001 Wiley book Finite Population Sampling and Inference: A Prediction Approach by R. Valliant, A.H. Dorfman, and R.M. Royall). We regard a domain as large as if the domain sample size is large enough to yield direct estimates of adequate precision; otherwise, the domain is regarded as small. In this text, we generally use the term “small area” to denote any subpopulation for which direct estimates of adequate precision cannot be produced. Typically, domain sample sizes tend to increase with the population size of the domains, but this is not always the case. For example, due to oversampling of certain domains in the US Third Health and Nutrition Examination Survey, sample sizes in many states were small (or even zero). It is seldom possible to have a large enough overall sample size to support reliable direct estimates for all the domains of interest. Therefore, it is often necessary
to use indirect estimates that “borrow strength” by using values of the variables of interest from related areas, thus increasing the “effective” sample size. These values are brought into the estimation process through a model (either implicit or explicit) that provides a link to related areas (domains) through the use of supplementary information related to the variables of interest, such as recent census counts and current administrative records. Availability of good auxiliary data and determination of suitable linking models are crucial to the formation of indirect estimates. In recent years, the demand for reliable small area estimates has greatly increased worldwide. This is due, among other things, to their growing use in formulating policies and programs, the allocation of government funds, and in regional planning. Demand from the private sector has also increased because business decisions, particularly those related to small businesses, rely heavily on the local conditions. Small area estimation (SAE) is particularly important for studying the economies in transition in central and eastern European countries and the former Soviet Union countries because these countries are moving away from centralized decision making. The main aim of this text is to provide a comprehensive account of the methods and theory of SAE, particularly indirect estimation based on explicit small area linking models. The model-based approach to SAE offers several advantages, most importantly increased precision. Other advantages include the derivation of “optimal” estimates and associated measures of variability under an assumed model, and the validation of models from the sample data. Chapter 1 introduces some basic terminology related to SAE and presents some important applications as motivating examples. Chapter 2 contains a brief account of direct estimation, which provides a background for later chapters. It also addresses survey design issues that have a bearing on SAE. Traditional demographic methods that employ indirect estimates based on implicit linking models are studied in Chapter 3. Typically, demographic methods only use administrative and census data and sampling is not involved, whereas indirect estimation methods studied in later chapters are largely based on sample survey data in conjunction with auxiliary population information. Chapter 4 gives a detailed account of traditional indirect estimation based on implicit linking models. The well-known James–Stein method of composite estimation is also studied in the context of sample surveys. Explicit small area models that account for between-area variation are presented in Chapter 5, including linear mixed models and generalized linear mixed models, such as logistic models with random area effects. The models are classified into two broad groups: (i) area level models that relate the small area means to area-specific auxiliary variables; (ii) unit level models that relate the unit values of study variables to unit-specific auxiliary variables. Several extensions to handle complex data structures, such as spatial dependence and time series structures, are also presented. Chapters 6–8 study in more detail linear mixed models involving fixed and random effects. General results on empirical best linear unbiased prediction (EBLUP) under the frequentist approach are presented in Chapter 6. The more difficult problem of estimating the mean squared error (MSE) of EBLUP estimators is also considered. A basic area level model and a basic unit level model are studied thoroughly in
Chapter 7 by applying the EBLUP results developed in Chapter 6. Several important applications are also presented in this chapter. Various extensions of the basic models are considered in Chapter 8. Chapter 9 presents empirical Bayes (EB) estimation. This method is more generally applicable than the EBLUP method. Various approaches to measuring the variability of EB estimators are presented. Finally, Chapter 10 presents a self-contained account of hierarchical Bayes (HB) estimation, by assuming prior distributions on model parameters. Both chapters include actual applications with real data sets. Throughout the text, we discuss the advantages and limitations of the different methods for SAE. We also emphasize the need for both internal and external evaluations for model selection. To this end, we provide various methods of model validation, including comparisons of estimates derived from a model with reliable values obtained from external sources, such as previous census values. Proofs of basic results are given in Sections 2.7, 3.5, 4.4, 6.4, 9.9, and 10.14, but proofs of results that are technically involved or lengthy are omitted. The reader is referred to relevant papers for details of omitted proofs. We provide self-contained accounts of direct estimation (Chapter 2), linear mixed models (Chapter 6), EB estimation (Chapter 9), and HB estimation (Chapter 10). But prior exposure to a standard course in mathematical statistics, such as the 1990 Wadsworth & Brooks/Cole book Statistical Inference by G. Casella and R.L. Berger, is essential. Also, a course in linear mixed models, such as the 1992 Wiley book Variance Components by S.R. Searle, G. Casella, and C.E. McCulloch, would be helpful in understanding model-based SAE. A basic course in survey sampling methods, such as the 1977 Wiley book Sampling Techniques by W.G. Cochran, is also useful but not essential. This book is intended primarily as a research monograph, but it is also suitable for a graduate level course on SAE. Practitioners interested in learning SAE methods may also find portions of this text useful; in particular, Chapters 4, 7, 9, and Sections 10.1–10.3 and 10.5 as well as the applications presented throughout the book. Special thanks are due to Gauri Datta, Sharon Lohr, Danny Pfeffermann, Graham Kalton, M.P. Singh, Jack Gambino, and Fred Smith for providing many helpful comments and constructive suggestions. I am also thankful to Yong You, Ming Yu, and Wesley Yung for typing portions of this text; to Gill Murray for the final typesetting and preparation of the text; and to Roberto Guido of Statistics Canada for designing the logo on the cover page. Finally, I am grateful to my wife Neela for her long enduring patience and encouragement and to my son, Sunil, and daughter, Supriya, for their understanding and support. K
ao
Ottawa, Canada January, 2003
1 *INTRODUCTION
1.1
WHAT IS A SMALL AREA?
Sample surveys have long been recognized as cost-effective means of obtaining information on wide-ranging topics of interest at frequent intervals over time. They are widely used in practice to provide estimates not only for the total population of interest but also for a variety of subpopulations (domains). Domains may be defined by geographic areas or socio-demographic groups or other subpopulations. Examples of a geographic domain (area) include a state/province, county, municipality, school district, unemployment insurance (UI) region, metropolitan area, and health service area. On the other hand, a socio-demographic domain may refer to a specific age-sex-race group within a large geographic area. An example of “other domains” is the set of business firms belonging to a census division by industry group. In the context of sample surveys, we refer to a domain estimator as “direct” if it is based only on the domain-specific sample data. A direct estimator may also use the known auxiliary information, such as the total of an auxiliary variable, x, related to the variable of interest, y. A direct estimator is typically “design based,” but it can also be motivated by and justified under models (see Section 2.1). Design-based estimators make use of survey weights, and the associated inferences are based on the probability distribution induced by the sampling design with the population values held fixed (see Chapter 2). “Model-assisted” direct estimators that make use of “working” models are also design based, aiming at making the inferences “robust” to possible model misspecification (see Chapter 2).
2
*INTRODUCTION
A domain (area) is regarded as large (or major) if the domain-specific sample is large enough to yield “direct estimates” of adequate precision. A domain is regarded as “small” if the domain-specific sample is not large enough to support direct estimates of adequate precision. Some other terms used to denote a domain with small sample size include “local area,” “subdomain,” “small subgroup,” “subprovince,” and “minor domain.” In some applications, many domains of interest (such as counties) may have zero sample size. In this text, we generally use the term “small area” to denote any domain for which direct estimates of adequate precision cannot be produced. Typically, domain sample size tends to increase with the population size of the domain, but this is not always the case. Sometimes, the sampling fraction is made larger than the average fraction in small domains in order to increase the domain sample sizes and thereby increase the precision of domain estimates. Such oversampling was, for example, used in the US Third Health and Nutrition Examination Survey (NHANES III) for certain domains in the cross-classification of sex, race/ethnicity, and age, in order that direct estimates of acceptable precision could be produced for those domains. This oversampling led to a greater concentration of the sample in certain states (e.g., California and Texas) than normal, and thereby exacerbated the common problem in national surveys that sample sizes in many states are small (or even zero). Thus, while direct estimates may be used to estimate characteristics of demographic domains with NHANES III, they cannot be used to estimate characteristics of many states. States may therefore be regarded as small areas in this survey. Even when a survey has large enough state sample sizes to support the production of direct estimates for the total state populations, these sample sizes may well not be large enough to support direct estimates for subgroups of the state populations, such as school-age children or persons in poverty. Due to cost considerations, it is often not possible to have a large enough overall sample size to support reliable direct estimates for all domains. Furthermore, in practice, it is not possible to anticipate all uses of the survey data, and “the client will always require more than is specified at the design stage” (Fuller 1999, p. 344). In making estimates for small areas with adequate level of precision, it is often necessary to use “indirect” estimators that “borrow strength” by using values of the variable of interest, y, from related areas and/or time periods and thus increase the “effective” sample size. These values are brought into the estimation process through a model (either implicit or explicit) that provides a link to related areas and/or time periods through the use of supplementary information related to y, such as recent census counts and current administrative records. Three types of indirect estimators can be identified (Schaible 1996, Chapter 1): “domain indirect,” “time indirect,” and “domain and time indirect.” A domain indirect estimator makes use of y-values from another domain but not from another time period. A time indirect estimator uses y-values from another time period for the domain of interest but not from another domain. On the other hand, a domain and time indirect estimator uses y-values from another domain as well as from another time period. Some other terms used to denote an indirect estimator include “non-traditional,” “small area,” “model dependent,” and “synthetic.”
DEMAND FOR SMALL AREA STATISTICS
3
Availability of good auxiliary data and determination of suitable linking models are crucial to the formation of indirect estimators. As noted by Schaible (1996, Chapter 10), expanded access to auxiliary information through coordination and cooperation among different agencies is needed.
1.2
DEMAND FOR SMALL AREA STATISTICS
Historically, small area statistics have long been used. For example, such statistics existed in eleventh-century England and seventeenth-century Canada based on either census or administrative records (Brackstone 1987). Demographers have long been using a variety of indirect methods for small area estimation (SAE) of population and other characteristics of interest in postcensal years. Typically, sampling is not involved in the traditional demographic methods (see Chapter 3 of Rao 2003a). In recent years, the demand for small area statistics has greatly increased worldwide. This is due, among other things, to their growing use in formulating policies and programs, in the allocation of government funds and in regional planning. Legislative acts by national governments have increasingly created a need for small area statistics, and this trend has accelerated in recent years. Demand from the private sector has also increased significantly because business decisions, particularly those related to small businesses, rely heavily on the local socio-economic, environmental, and other conditions. Schaible (1996) provides an excellent account of the use of traditional and model-based indirect estimators in US Federal Statistical Programs. SAE is of particular interest for the economies in transition in central and eastern European countries and the former Soviet Union countries. In the 1990s, these countries have moved away from centralized decision making. As a result, sample surveys are now used to produce estimates for large areas as well as small areas. Prompted by the demand for small area statistics, an International Scientific Conference on Small Area Statistics and Survey Designs was held in Warsaw, Poland, in 1992 and an International Satellite Conference on SAE was held in Riga, Latvia, in 1999 to disseminate knowledge on SAE (see Kalton, Kordos, and Platek (1993) and IASS Satellite Conference (1999) for the published conference proceedings). Some other proceedings of conferences on SAE include National Institute on Drug Abuse (1979), Platek and Singh (1986), and Platek, Rao, Särndal, and Singh (1987). Rapid growth in SAE research in recent years, both theoretical and applied, led to a series of international conferences starting in 2005: Jyvaskyla (Finland, 2005), Pisa (Italy, 2007), Elche (Spain, 2009), Trier (Germany, 2011), Bangkok (Thailand, 2013), and Poznan (Poland, 2014). Three European projects dealing with SAE, namely EURAREA, SAMPLE and AMELI, have been funded by the European Commission. Many research institutions and National Statistical Offices spread across Europe have participated in these projects. Centers for SAE research have been established in the Statistical Office in Poznan (Poland) and in the Statistical Research Division of the US Census Bureau. Review papers on SAE include Rao (1986, 1999, 2001b, 2003b, 2005, 2008), Chaudhuri (1994), Ghosh and Rao (1994), Marker (1999), Pfeffermann (2002,
4
*INTRODUCTION
2013), Jiang and Lahiri (2006), Datta (2009), and Lehtonen and Veijanen (2009). Text books on SAE have also appeared (Mukhopadhyay 1998, Rao 2003a, Longford 2005, Chaudhuri 2012). Good accounts of SAE theory are also given in the books by Fuller (2009) and Chambers and Clark (2012).
1.3
TRADITIONAL INDIRECT ESTIMATORS
Traditional indirect estimators, based on implicit linking models, include synthetic and composite estimators (Chapter 3). These estimators are generally design based, and their design variances (i.e., variances with respect to the probability distribution induced by the sampling design) are usually small relative to the design variances of direct estimators. However, the indirect estimators will be generally design biased, and the design bias will not decrease as the overall sample size increases. If the implicit linking model is approximately true, then the design bias is likely to be small, leading to significantly smaller design mean-squared error (MSE) compared to the MSE of a direct estimator. Reduction in MSE is the main reason for using indirect estimators.
1.4
SMALL AREA MODELS
Explicit linking models with random area-specific effects accounting for the between-area variation that is not explained by auxiliary variables will be called “small area models” (Chapter 4). Indirect estimators based on small area models will be called “model-based estimators.” We classify small area models into two broad types. (i) Aggregate (or area) level models are the models that relate small area direct estimators to area-specific covariates. Such models are necessary if unit (or element) level data are not available. (ii) Unit level models are the models that relate the unit values of a study variable to unit-specific covariates. A basic area level model and a basic unit level model are introduced in Sections 4.2 and 4.3, respectively. Various extensions of the basic area level and unit level models are outlined in Sections 4.4 and 4.5, respectively. Sections 4.2–4.5 are relevant for continuous responses y and may be regarded as special cases of a general linear mixed model (Section 5.2). However, for binary or count variables y, generalized linear mixed models (GLMMs) are often used (Section 4.6): in particular, logistic linear mixed models for the binary case and loglinear mixed models for the count case. A critical assumption for the unit level models is that the sample values within an area obey the assumed population model, that is, sample selection bias is absent (see Section 4.3). For area level models, we assume the absence of informative sampling of the areas in situations where only some of the areas are selected to the sample, that is, the sample area values (the direct estimates) obey the assumed population model. Inferences from model-based estimators refer to the distribution implied by the assumed model. Model selection and validation, therefore, play a vital role in model-based estimation. If the assumed models do not provide a good fit to the
MODEL-BASED ESTIMATION
5
data, the model-based estimators will be model biased which, in turn, can lead to erroneous inferences. Several methods of model selection and validation are presented throughout the book. It is also useful to conduct external evaluations by comparing indirect estimates (both traditional and model-based) to more reliable estimates or census values based on past data (see Examples 6.1.1 and 6.1.2 for both internal and external evaluations).
1.5
MODEL-BASED ESTIMATION
It is now generally accepted that, when indirect estimators are to be used, they should be based on explicit small area models. Such models define the way that the related data are incorporated in the estimation process. The model-based approach to SAE offers several advantages: (i) “Optimal” estimators can be derived under the assumed model. (ii) Area-specific measures of variability can be associated with each estimator unlike global measures (averaged over small areas) often used with traditional indirect estimators. (iii) Models can be validated from the sample data. (iv) A variety of models can be entertained depending on the nature of the response variables and the complexity of data structures (such as spatial dependence and time series structures). In this text, we focus on empirical best linear unbiased prediction (EBLUP) (Chapters 5–8), parametric empirical Bayes (EB) (Chapter 9), and parametric hierarchical Bayes (HB) estimators (Chapter 10) derived from small area models. For the HB method, a further assumption on the prior distribution of model parameters is also needed. EBLUP is designed for estimating linear small area characteristics under linear mixed models, whereas EB and HB are more generally applicable. The EBLUP method for general linear mixed models has been extensively used in animal breeding and other applications to estimate realized values of linear combinations of fixed and random effects. An EBLUP estimator is obtained in two steps: (i) The best linear unbiased predictor (BLUP), which minimizes the model MSE in the class of linear model unbiased estimators of the quantity of interest is first obtained. It depends on the variances (and covariances) of random effects in the model. (ii) An EBLUP estimator is obtained from the BLUP by substituting suitable estimators of the variance and covariance parameters. Chapter 5 presents some unified theory of the EBLUP method for the general linear mixed model, which covers many specific small area models considered in the literature (Chapters 6 and 8). Estimation of model MSE of EBLUP estimators is studied in detail in Chapters 6–8. Illustration of methods using specific R software for SAE is also provided. Under squared error loss, the best predictor (BP) of a (random) small area quantity of interest such as mean, proportion, or more complex parameter is the conditional expectation of the quantity given the data and the model parameters. Distributional assumptions are needed for calculating the BP. The empirical BP (or EB) estimator is obtained from BP by substituting suitable estimators of model parameters (Chapter 9). On the other hand, the HB estimator under squared error loss is obtained by integrating the BP with respect to the (Bayes) posterior distribution
6
*INTRODUCTION
derived from an assumed prior distribution of model parameters. The HB estimator is equal to the posterior mean of the estimand, where the expectation is with respect to the posterior distribution of the quantity of interest given the data. The HB method uses the posterior variance as a measure of uncertainty associated with the HB estimator. Posterior (or credible) intervals for the quantity of interest can also be constructed from the posterior distribution of the quantity of interest. The HB method is being extensively used for SAE because it is straightforward, inferences are “exact,” and complex problems can be handled using Markov chain Monte Carlo (MCMC) methods. Software for implementing the HB method is also available (Section 10.2.4). Chapter 10 gives a self-contained account of the HB method and its applications to SAE. “Optimal” model-based estimates of small area totals or means may not be suitable if the objective is to produce an ensemble of estimates whose distribution is in some sense close enough to the distribution of the corresponding estimands. We are also often interested in the ranks (e.g., ranks of schools, hospitals, or geographical areas) or in identifying domains (areas) with extreme values. Ideally, it is desirable to construct a set of “triple-goal” estimates that can produce good ranks, a good histogram, and good area-specific estimates. However, simultaneous optimization is not feasible, and it is necessary to seek a compromise set that can strike an effective balance between the three goals. Triple-goal EB estimation and constrained EB estimation that preserves the ensemble variance are studied in Section 9.8.
1.6
SOME EXAMPLES
We conclude the introduction by presenting some important applications of SAE as motivating examples. Details of some of these applications, including auxiliary information used, are given in Chapters 6–10. 1.6.1
Health
SAE of health-related characteristics has attracted a lot of attention in the United States because of a continuing need to assess health status, health practices, and health resources at both the national and subnational levels. Reliable estimates of health-related characteristics help in evaluating the demand for health care and the access that individuals have to it. Healthcare planning often takes place at the state and substate levels because health characteristics are known to vary geographically. Health System Agencies in the United States, mandated by the National Health Planning Resource Development Act of 1994, are required to collect and analyze data related to the health status of the residents and to the health delivery systems in their health service areas (Nandram, Sedransk, and Pickle 1999). (i) The US National Center for Health Statistics (NCHS) pioneered the use of synthetic estimation based on implicit linking models. NCHS produced state synthetic estimates of disability and other health characteristics for different
SOME EXAMPLES
7
groups from the National Health Interview Survey (NHIS). Examples 3.2.2 and 10.13.3 give health applications from national surveys. Malec, Davis and Cao (1999) studied HB estimation of overweight prevalence for adults by states, using data from NHANES III. Folsom, Shah, and Vaish (1999) produced survey-weighted HB estimates of small area prevalence rates for states and age groups, for up to 20 binary variables related to drug use, using data from pooled National Household Surveys on Drug Abuse. Chattopadhyay et al. (1999) studied EB estimates of state-wide prevalences of the use of alcohol and drugs (e.g., marijuana) among civilian non-institutionalized adults and adolescents in the United States. These estimates are used for planning and resource allocation and to project the treatment needs of dependent users. (ii) Mapping of small area mortality (or incidence) rates of diseases, such as cancer, is a widely used tool in public health research. Such maps permit the analysis of geographical variation that may be useful for formulating and assessing etiological hypotheses, resource allocation, and the identification of areas of unusually high-risk warranting intervention (see Section 9.6). Direct (or crude) estimates of rates called standardized mortality ratios (SMRs) can be very unreliable, and a map of crude rates can badly distort the geographical distribution of disease incidence or mortality because the map tends to be dominated by areas of low population. Disease mapping, using model-based estimators, has received considerable attention. We give several examples of disease mapping in this text (see Examples 9.6.1, 9.9.1, 10.11.1, and 10.11.3). Typically, sampling is not involved in disease mapping applications.
1.6.2
Agriculture
The US National Agricultural Statistics Service (NASS) publishes model-based county estimates of crop acreage using remote sensing satellite data as auxiliary information (see Example 7.3.1 for an application). County estimates assist the agricultural authorities in local agricultural decision making. Also, county crop yield estimates are used to administer federal programs involving payments to farmers if crop yields fall below certain levels. Another application, similar to Example 7.3.1, to estimate crop acreage in small areas using ground survey and remote sensing data, is reported in Ambrosio Flores and Iglesias Martínez (2000). Remote sensing satellite data and crop surveys are used in India to produce direct estimates of crop yield at the district level (Singh and Goel 2000). SAE methods are also used in India to obtain estimates of crop production at lower administrative units such as “tehsil” or block, using remote sensing satellite data as auxiliary information (Singh et al. 2002). An application of synthetic estimation to produce county estimates of wheat production in the state of Kansas based on a non-probability sample of farms is presented in Example 3.2.4. Chapters 6 and 7 of Schaible (1996) provide details of traditional and model-based indirect estimation methods used by NASS for county crop acreage and production.
8
1.6.3
*INTRODUCTION
Income for Small Places
Example 6.1.1 gives details of an application of the EB (EBLUP) method of estimation of small area incomes, based on a basic area level linking model (see Section 6.1). The US Census Bureau adopted this method, proposed originally by Fay and Herriot (1979), to form updated per capita income (PCI) for small places. This was the largest application (prior to 1990) of model-based estimators in a US Federal Statistical Program. The PCI estimates are used to determine fund allocations to local government units (places) under the General Revenue Sharing Program. 1.6.4
Poverty Counts
The Fay–Herriot (FH) method is also used to produce model-based county estimates of poor school-age children in the United States (National Research Council 2000). Using these estimates, the US Department of Education allocates annually several billions of dollars called Title I funds to counties, and then states distribute the funds among school districts. The allocated funds support compensatory education programs to meet the needs of educationally disadvantaged children. In the past, funds were allocated on the basis of updated counts from the previous census, but this allocation system had to be changed since the poverty counts vary significantly over time. EBLUP county estimates in this application are currently obtained from the American Community Survey (ACS) using administrative data as auxiliary information. Example 6.1.2 presents details of this application. 1.6.5
Median Income of Four-Person Families
Estimates of the current median income of four-person families in each of the states of the United States are used to determine the eligibility for a program of energy assistance to low-income families administered by the US Department of Health and Human Services. Current Population Survey (CPS) data and administrative information are used to produce model-based estimates, using extensions of the FH area level model (see Examples 8.1.1 and 8.3.1). 1.6.6
Poverty Mapping
Poverty measures are typically complex non-linear parameters, for example, poverty measures used by the World Bank to produce poverty maps in many countries all over the World. EB and HB methods for estimating poverty measures in Spanish provinces are illustrated in Example 10.7.1.
2 DIRECT DOMAIN ESTIMATION
2.1
INTRODUCTION
Sample survey data are extensively used to provide reliable direct estimates of totals and means for the whole population and large areas or domains. As noted in Chapter 1, a direct estimator for a domain uses values of the variable of interest, y, only from the sample units in the domain. Sections 2.2–2.5 provide a brief account of direct estimation under the design-based or repeated sampling framework. We refer the reader to standard textbooks on sampling theory (e.g., Cochran 1977, Hedayat and Sinha 1991, Särndal, Swensson, and Wretman 1992, Thompson 1997, Lohr 2010, Fuller 2009) for a more extensive treatment of direct estimation. Model-based methods have also been used to develop direct estimators and the associated inferences. Such methods provide valid conditional inferences referring to the particular sample that has been drawn, regardless of the sampling design (see Brewer 1963, Royall 1970, Valliant, Dorfman, and Royall 2001). But unfortunately, model-based strategies can perform poorly under model misspecification as the sample size in the domain increases. For instance, Hansen, Madow, and Tepping (1983) introduced a model misspecification that is not detectable through tests of significance from sample sizes as large as 400, and then showed that the repeated sampling coverage probabilities of model-based confidence intervals on the population mean, Y, are substantially less than the desired level, and that the understatement becomes worse as the sample size increases. This poor performance is largely due to asymptotic design-inconsistency of the model-based estimator with respect to the
10
DIRECT DOMAIN ESTIMATION
stratified random sampling design employed by Hansen et al. (1983). We consider m o d e l - b a s e d d i r e c t e s t i m a t o r s o n l y b r i e fly i n t h i s b w ill b e e x te n s iv e ly u s e d in th e c o n te x t o f in d ir e c t e s in th e d o m a in s o f in te re s t. A s n o te d in C h a p te r 1 , a n “b o r r o w s s t r e n g t h ” b y u s i n g t h e v y, a l fur eo sm o sf at hm e p s l teu ud ny i o u ts id e th e d o m a in o f in te re s t. T h e m a in in te n tio n o f C h a p te r 2 is to p ro v id e s o m la te r c h a p te rs a n d to in d ic a te th a t d ire c t e s tim a tio n p a rtic u la rly a fte r a d d re s s in g s u rv e y d e s ig n is s u e s t e s t i m a t i o n ( s e e S e c t i o n 2 . 6 ) . Ef f e c t i v e u s e o f a u x i l a n d re g re s s io n e s tim a tio n is a ls o u s e fu l in re d u c in g e s tim a to r s ( S e c tio n s 2 .3 – 2 .5 ) .
2.2
DESIGN-BASED APPROACH
W e a s s u m e th e fo llo w in g s o m e w h a t id e a liz e d s e t-u p o p u la tio n to ta l o r m e a n in S e c tio n 2 .3 . D ir e c t e s tim S e c t i o n 2 . 4 u s i n g t h e r e s u l t s f o r p o p u lUa ct oi on ns it so t tsa ol sf . A N d i s t i n c t e l e m e n t s ( o r u l t i m a t e u n i t sj =) 1, i d…e, N. n tW i fi ee d t h r o a s s u m e t h a t a c h a r a c y, t e ar si ss toi c ioa ft ei nd t w e rj icet ash tne, bl ee mm e en at s u r e d e x a c t l y b y o b s e rj. vTi nh gu se , l m e me ae sn ut r e m e n t e r r o r s a r e a s s u ∑ T h e p a r a m e t e r o f i n t e r e s t Yi s= t hU eyj op ro tph ue l ap toi op nu ltao t ti ao ln m ∑ Y = Y∕N, w h e rU ed e n o t e s s u m m a t i o n o v e r t j.h e p o p u l a t i o n A s a m p l i n g d e s i g n i s u s e d t o ss fer loe U m cw t ai tsha m p r po lbe a (bs iul -b i t yp(s). T h e s a m p l e s e l e c tp(s) i o cn a pn r do eb pa eb ni ldi t oy n k n o w n d e s a b le s s u c h a s s tra tu m in d ic a to r v a ria b le s a n d s iz e m a s a m p lin g s c h e m e is u s e d to im p le m e n t a s a m p lin g r a n d o m s a m npc lae no bf es ioz be t a i n e dn rba yn d ro amw ni nu gm b eN r s f r o m w ith o u t re p la c e m e n t. C o m m o n ly u s e d s a m p lin g d e d o m s a m p lin g ( e .g ., e s ta b lis h m e n t s u r v e y s ) a n d s tr la rg e -s c a le s o c io -e c o n o m ic s u rv e y s s u c h a s th e C a n C u r r e n t Po p u l a t i o n S u r v e y ( C PS ) o f t h e Un i t e d S t a t e s T o m a k e i n f e r e n c Y,e sw oe n o t bh se e tyrovvt ea lt uh e s a s s o c i a t e d w i s e l e c t e d ss. a Fmo pr l se i m p l i c i t y , w e a s s u m ej ∈t sh ca at na lbl et h e e o b s e r v e d , t h a t i s , c o m p l e t e r e s p o n s e . In t h e Ŷd e s i g n o fY i s s a i d t o b e d e s i g np--uu nn bb i iaa ss ee dd ) (i of rt h e d e s i Yĝ n e x p e e qu aY;l st h a t i s , ∑ ̂ = Ep (Y) p(s)Ŷ s = Y, ( 2 .2 .1 ) ̂es dp dl ee ss i g w h e r e t h e s u m m a t i o n i s o vs ue nr ad l el rp toh ses si bp lee c si afi Ym ̂ ̂ ̂ i s t h e v a Yl uf oe ro t fh e s s. a mT hp el e d e s i g n vYai rs i da ne nc eo tVoep (fY) d = a s ̂ 2 . A n e s t i m Vpa(Y) ̂t oi sr od fe n o 𝑣(Y) Ep [Ŷ − Ep (Y)] t e d =as2s(Y), a n d t h e v a r i a n c
11
ES T IM A T IO N O F T O T A LS
e s t i m 𝑣(Y) a t oi spr u n b i a s V(Y) e d if fE o pr[𝑣(Y)] ≡ Vp (Y). A n e s t iYmi sa ct oa rl l e d d e s i g n - c o n ps ics ot en ns it s( toeYrinf tt)hpf eob ri a sY∕N o ft e n d s t o z e r o a s t h e s p l e s i z e i n c r Ne −2a Vsp (Y) e s t ae n d s t o z e r o a s t h e s a m p l e s i z e i n s p e a k i n g , w e n e pe dc ot on cs ios nt es ni dc ey r i n t h e c o n t e x t o f a s e q t i o nU𝜈s s u c h t h a t b o t h t h ne𝜈 as na m d tph l ee ps iozpe u Nl 𝜈a t tei on nd ∞st oai zs e 𝜈 → ∞. p- c o n s i s t e n c y o f t h e v𝑣(Y) a r i as ns ic me ei lsa t ri lmy ad teo fir n e d i n t e N −2 𝑣(Y). If t h e e s t Y i ma na dt o vr a r i a n c e𝑣(Y) e sat ri m e ba pot otc hro n s i s t e n t , t h e t h e d e s i g n - b a s e d a p p r o a c h p r Y, o vr ei dg ea sr dv lae l si sd oi nf ft eh re e pn oc t i o n v a l u e s i n t h e s e n st = e tY(h−aY)∕s(Y) t t h ec po invvoe t ra gl e s i n d i s t r i b a 1) v a r i a b l e a s t h e s a m p l e s i z e i n c r e a s e s . T h u (→d ) t o N(0, a b o u t(11−0𝛼)% 0 o f t h e c o n fi d e n[Yc −e z𝛼∕2 i ns(Y), t e rYv+az𝛼∕2 l ss(Y)] c o n t a i n t h e t r u e Yva as l tuh ee s a m p l e s i z e i zn𝛼∕2c irse tah see us (𝛼∕2),p wp ehpreor ien t o f a N(0, 1) v a r i a b l e . In p r a c t i c e , o n e o f t e n r e p o r t s oY)n l y t h e e a n d t h e a s s o c i a t e d s t a n d a r d s(Y)) e r r oo rr c( ro eeaf lfii zc ei ed n vt ao l fu ve aor f ( r e a l i z e d v (Y) a l u= es(Y)∕Y) o f c . vC o e f fi c i e n t o f v a r i a t i o n o r s t a n d a s a m e a s u re o f v a ria b ility a s s o c ia te d w ith th e e s tim T h e d e s ig n -b a s e d (o r p ro b a b ility s a m p lin g ) a p p r g ro u n d s th a t th e a s s o c ia te d in fe re n c e s , a lth o u g h a s s a m p l i n g i n s t e a d o f ju s t tsht he ap t a hr at isc bu el ae rn s da rma w p l ne . A c o d e s ig n -b a s e d a p p ro a c h th a t a llo w s u s to re s tric t th e t o a “r e l e v a n t ” s u b s e t h a s a l s o b e e n p r o p o s e d . T h i s a v a l i d i n f e r e n c e s . Fo r e x a m p l e , i n t h e c o n t e x t o f p o s a fte r s e le c tio n o f th e s a m p le ), it m a k e s s e n s e to m a d itio n a l o n th e re a liz e d p o s ts tra ta s a m p le s iz e s (H o S i m i l a r l y , w h e n t h eX op fo ap nu al au t xi oi lni at oxr iytsavkl an r oi awb nl e, c o n d i t i o i n g o n t h e e Xs ot ifX m i sa tjuo sr t i fi e d b e c a u s|X e −t X|∕X h e dp ir sot va ni dc ee s a m e a s u re o f im b a la n c e in th e re a liz e d s a m p le (R o b in V a llia n t 1 9 9 3 ).
2.3 2.3.1
ESTIMATION OF TOTALS Design-Unbiased Estimator
D e s i g n w𝑤je(s)i pg lha t ys a n i m p o r t a n t r o l e i n c o n s t r u c t i n g d Y o fY. T h e s e b a s i c w e i g h t s ms a an yd dt he ep e jnl(jed∈ms). b eoAnt htn oi m n p o r∑ t a n t c h o𝑤ij (s) c e= i1∕𝜋 s j, w h 𝜋 e j r=e {s∶j∈s} p(s), j = 1, 2, , N a r e t h e i n c l u s i o n p r o b a b i l i{st i∶ej s∈ as}nd de n o t e s s u m m a t i o ns coovne t ra ai nl l i sn ag mt hp el ee sl m e j.n T t o s i m p l i f y t h e n o 𝑤t ja(s)t i=o𝑤nj e, xw c ee pw t r wi t eh e n t h e f u l l n o h j em wa ye i bg eh it n t e r p r e t e d a s t h e n u m b e r o 𝑤j (s) i s n e e d e d . T 𝑤 p o p u l a t i o n r e p r e s e n t e d bj. y t h e s a m p l e e l e m e n t In t h e a b s e n c e o f a u x i l i a r y p o p u l a t i o n i n f o r m a t e s tim a to r ∑ Y= 𝑤j yj , ( 2. 3 . 1 ) s
12
D IR EC T D O M A IN ES T IM A
∑ w h e rs ed e n o t e s s u m mj ∈as.t iIno nt h oi sv ce ar s e , t h e d e s i g n - u n b i a d i t i o n ( 2. 2. 1 ) r e d u c e s t o ∑ p(s)𝑤j (s) = 1; j = 1, , N. ( 2. 3 . 2) {s∶j∈s}
T h e c h𝑤oj (s)i c=e1∕𝜋j s a t i s fi e s t h e u n b i a s e d n e s s c o n d i t i o n ( 2 w e ll-k n o w n H o rv itz – T h o m p s o n (H – T ) e s tim a to r ( ∑ o Y(y). ta tio n a s It i s c o n v e n i e n tY t=o sd𝑤jeyj ni on t ae n o p e r a t o r nY = Us i n g t h i s n o t a t i o n , f o r x awn iot th h ve raxjl(jvu=ae1, rsi a ,bN)l ew e u s e ∑ ∑ Y(x) = s 𝑤j xj , w h e r e a s t h e t r a d i t i o n a l ns 𝑤 o j xtj aa tsX. i o Sn i ims ti ol a dr leyn, o t w e d e n o t e a v a r i a n Yc ea s𝑣(Y) e s t= im 𝑣(y). a tUs o ri no gf t h i s n o t a t i o n , f o v a r i a n c e e s tX,i mw ae t oh 𝑣(X) ra vo = ef 𝑣[Y(x)] = 𝑣(x). No t e t h a t t h e f o r m u l a Y(x) a n 𝑣d(x) a r e o b t a i n e d b y a t t a c hj ti on gt h t eh ec hs aux,rbai sncc ttrehi rpe, t u l a ne 𝑣(y), df o r r e s p e c t i v e l y . b r a c k e t s a n d t h yej bn yxrj ei np lt ah ce i fn og r m Y(y) H a rtle y (1 9 5 9 ) in tro d u c e d th e o p e ra to r n o ta tio n . W ( 1 9 7 7 ) , S ä r n d a l e t a l . ( 1 9 9 2) , a n d W o l t e r ( 20 0 7 ) f o r d R a o ( 1 9 7 9 ) h a s s h o w n th a t a n o n n e g a tiv e - u n b ia s v a r i a n Yc ies on fe c e s s a r i l y o f t h e f o r m ( ) ∑∑ yj yk 2 𝑤jk (s)bj bk − , 𝑣(Y) = 𝑣(y) = − bj bk jc 0). o nIt s itsa an lt ss o u s e f YuGRl it no twh re i te ex p a n s i o n f o w i t h d e s i g n𝑤j wc he ai gn hg tes d t o “r e v i𝑤s ∗j e. W d ”w e he ai gv he t s ∑ 𝑤∗j yj =∶ YGR (y)
YGR =
( 2. 3 . 8 )
s
in o p e ra to r n o ta tio n , w h 𝑤 e ∗jr =e 𝑤t ∗jh(s)e i rs et vh ie s pe rd o wd ue ic gt ho tf t h e d w e i g𝑤jh(s)t a n d t h e e s t i m agjt = i ogjn(s),wt he ai gt𝑤hi∗j s= t , 𝑤j gj , w i t h (
)−1
∑ gj = 1 + (X − X)T
𝑤j xj xTj ∕cj
xj ∕cj .
( 2. 3 . 9 )
s
No t e t 𝑤 h ∗j ai nt ( 2. 3 . 8 ) d o e s n o t y-d ve ap leu ne ds .o Tn ht hu es , t h 𝑤e∗j s a m e w i s a p p l i e d t o a l l v a r i a b l e sy,oa f s i innt et hr ee s ct at rs ee aot fe dt h ae s e x p a n m a tY.o Tr h i s e n s u r e s c o n s i s t e n c y o f r e s u l t s w h e n a g g r e y1 , , yr a t t a c h e d t o e a c h u n i t , t h a t i s , YGR (y1 ) + · · · + YGR (yr ) = YGR (y1 + · · · + yr ).
14
D IREC T D O M A IN ES T IM A T I
A n i m p o r t a n t p r o p e r t y o f t h e GREG e s t i m a t o r i s t h a t t h e k n o w n a u xXi il ni a t rhye t so et na ls se ∑ 𝑤∗j xj = X.
YGR (x) =
( 2. 3 . 10 )
s
A p r o o f o f ( 2. 3 . 10 ) i s g i v e n i n S e c t i o n 2. 8 . 1. T h i s p r o b a s i c e x p a n s i oY.nMe sa tni m y sa ttao t ri s t i c a l a g e n c i e s r e g a r d t h i a b l e f r o m t h e u s e r ’s v i e w p o i n t . B e c a u s e o f t h e p r o p e is a ls o c a lle d a c a lib ra tio n e s tim a to r (D e v ille a n d S ∑ ht ti s f y i n g t h e c a l i b r a c a l i b r a t i o n e s t i m a t os hrj ysj wo if t thh w e fehoji sgr am ∑ h t isn i m i z e a c h i - s qu a r e d d c o n s t r a sihnj xjt = s X, t h e GREG w e 𝑤i ∗jg m ∑ 2 a j sa i nc dw t eh ieg ch at ls i b r a htj i(os ne ew e i g h s cj (𝑤j − hj ) ∕𝑤j , b e t w e e n t h e b 𝑤 o d wi f ey i gt hh et sd e s i g n w e i g h t s S e c t i o n 2. 8 . 2) . T h u s , t h 𝑤e∗j m GREG p o s s i b l e s u b je c t t o t h e c a l i b r a t i o n c o n 𝑤s∗j t m r a ai ny t tsa. kHe o w n e g a t i v e o r v e r y l a r g e v j a∈ls,u ee ss pf eo cr isaol m l y ew u hn ei nt s t h e n u m b ib r a tio n c o n s tr a in ts is n o t s m a ll. A lte r n a tiv e m e th o th is is s u e o f n e g a tiv e o r to o la rg e c a lib r a tio n w e ig h H u a n g a n d Fu l l e r 19 7 8 , Ra o a n d S i n g h 20 0 9 ) . T h e GREG e s t i m a t o r t a k e s a s i m p l e rcjfi on r (m2. 3w. 7h )e (no tr hi en c t h e c h i - s qu a r e d d i s t a n c e ) a r e t a k e n a s a l i n e a r c o m b i cj = 𝝂 T xj f o r aj ∈l lU a n d f o r a 𝝂voef c st po er c i fi e d c o n s t a n t scj., Fo r t h i s w e h a v e ∑ YGR = XT B = ( 2. 3 . 11) 𝑤̃ j yj s
∑ b e c a u s e i n tYh−iXsT Bc =a s se𝑤j ej (s) = 0 ( s e e S e c t i o n 2. 8 . 3 f o r t h e ẽ js=i 𝑤 d̃ j (s) u a=l 𝑤 s j (s)̃ a gnj (s) d w h eejr(s) e = ej = yj − xTj B a r e t h e s a m p l e r 𝑤 w ith ( )−1 ∑ T T g̃ j (s) = X 𝑤j xj xj ∕cj xj ∕cj . ( 2. 3 . 12) s
T h e s i m p l i fi e d GREG e s t i m a t o r ( 2. 3 . 11) i s k n o w n a s t h e T h e GREG e s t i m a t o r ( 2. 3 . 11) c o v e r s m a n y p r a c t i c a l s p e c i a l c a s e s . Fo r e x a m p l e , i n t h e c a s xe wo if t ah sv i an l gu lee s a u xj (j = 1 , N), s e t t ci jn=gxj i n ( 2. 3 . 12) a n d n g̃oj (s) t i= n X∕X, g t h wa te g e t t h e w e ll-k n o w n ra tio e s tim a to r Y YR = X. ( 2. 3 . 13 ) X T h e r a t i o e YsRt ui m s e as t ot hr e w𝑤̃ je=i 𝑤 g j (X∕X). h t s If o n l y t h e p o ∑ N i s k n o w n ,xj w= 1e i ns e ( t2. 3 . 13 ) sX o= N t ha an tXd = N = s 𝑤j . If w xj = (1, xj )T a n cdj = 1, t h e𝝂 n= (1, 0)T a n d ( 2. 3 . 11) r e d u c e s t o re g re s s io n e s tim a to r YLR = Y + BLR (X − X), ( 2. 3
p u la tio n e se t th e fa m . 14 )
15
ES T IM A T IO N O F T O T A LS
w h e re
∑
∑ 𝑤j (xj − X)2
𝑤j (xj − X)(yj − Y)∕
BLR = s
s
w i tYh= Y∕N a n Xd = X∕N. T h e GREG e s t i m a t o r ( 2. 3 . 11) a l s o c o v e i a r p o s t s t r a t i fi e d e s t i m a t o r a s a s p e c i a l Uc i an st G eo . S u p p o = w 1,n p, G). o p u la tio p o s tsU t rg a( et a. g . , a g e /s e x g r o u p s ) w i t h k Nng (g T h e n w xje= s(xe1j ,t , xGj )T w i txgj h = 1 i fj ∈ Ug a n xdgj = 0 o t h e r w i s e s o t h X = (N1 , , NG )T . S i n c e t h e p o s t s t r a t a a r e𝟏Tm xj =u 1tfuo ar lajl∈yl lU. e x c lu s iv T a k icjn=g1 a n 𝝂d = 𝟏, t h e GREG e s t i m a t o r ( 2. 3 . 11) r e d u c e s t o t h e s tim a to r G ∑ N⋅g Y⋅g , ( 2. 3 . 15 ) YPS = g=1 N⋅g
∑ ∑ w h eNr⋅ge= s⋅g 𝑤j a n Yd⋅g = s⋅g 𝑤j yj , w i st⋅gh d e n o t i n g t h e s a m p l e o f e b e l o n g i n g t o pg. o s t s t r a t u m T u rn in g to v a ria n c e e s tim a tio n , th e tra d itio n a l T a 𝑣L (YGR ) = 𝑣(e),
( 2. 3 . 16 )
w h i c h i s o b t a i n e d b y s u b esj ft oi ytrjui n𝑣(y). t i n gSitmh eu rl ea st ii od nu as tl us d i e s h i n d i c a t e𝑣dL (YtGR h ) amt a y l e a d t o s l i g h t u n d e r e s t i m a t i o n , w h v a ria n c e e s tim a to r 𝑣(YGR ) = 𝑣(ge), ( 2. 3 . 17 )
o b t a i n e d b y s guj ejbf so ytrji it n𝑣(y) u t i nr eg d u c e s t h i s u n d e r e s t i m a t i o n Es t e v a o , H i d i r o g l o u , a n d Sä r n d a l 19 9 5 ) . T𝑣(Y h GR e )a l t e r n a a ls o p e rfo rm s b e tte r fo r c o n d itio n a l in fe re n c e in th u n b i a s e d f o r t h e m YoGRd ce ol nv da ri ti ia on nsc faeol olr yfs eo vn e r a l d e s i g n s , u t h e f o l l o w i n g l i n e a r r e g r e s s i o n m o d e l ( o r GREG m o d e yj = xTj 𝜷 + 𝜖j , j ∈ U,
( 2. 3 . 18 )
w h eErm (𝜖 e j ) = 0, Vm (𝜖j ) = cj 𝜎 2 , C om (e v j , ek ) = 0 f oj ≠ r k; a n Edm , Vm , a n d mC, o v r e s p e c tiv e ly , d e n o te th e m o d e l e x p e c ta tio n , v a r ia Sw e n s s o n , a n d W r e t m a n 19 8 9 , Ra o 19 9 4 ) . yjIni s t ah e m o d r a n d o m v a rsi ias b fil ex ea dn .d T h e GREG e s t i m a t o r i s a l s o m o d e r yt h e d e s i g n - b a s e d f r a m ( 2. 3 . 18 ) i n t hEem (YsGR e )n=sEem (Y) f o r e vs.e In YGR , 𝑣L (YGR ), a n 𝑣(Y d GR ) a r pe- c o n s i s t e n t . T h e GREG e s t i m a t o r ( 2. 3 . 6 ) m a y b e e x p r e s s e d a s ∑
∑ yj +
YGR = j∈U
𝑤j (yj − yj ), j∈s
( 2. 3 . 19 )
16
D IREC T D O M A IN EST IM A T IO
w h eyjr =e xTj B i s t h e p r e d yij cu tno dr eo r f t h e “w o r k i n g ” m o d e l ( 2. 3 th e e s tim YGRa i tso trh e s u m o f a l l p r yej ,dj ∈i cU t ae nd d v tap-hlueu ne bs i a s e d ∑ e x p a n s i o n e s t i m a t o r o f t h e j∈U t o(ytj a− lyj ). p rTe hd ei ca t bi oo nv e rar po pr r o a f o r d e r i vp-i cn og n as i s t e n t e s t i m a t o r o f a t o t a l u s i n g a w o m o d e l-a s s is te d a p p ro a c h . T h e c h o ic e o f th e w o rk in t h e e s t i m a t o r a l t h o u g h t ph ec oe ns tsi ims t ae tno t r, r e gm a ar idnl se s s o f t o f t h e m o d e l . It m a y b e n o t e d t h a t t h e p r o je c t i o n GREG f o crj = 𝝂 T xj f o r aj ml l a y b e w r i t t e n a s t h e s u m o f a l l p r e d i c t e ∑ YGR =
yj . j∈U
W e r e f e r t h e r e a d e r t o Sä r n d a l e t a l . ( 19 9 2) f o r a d e t a m a t i o n a n d t o Es t e v a o e t a l . ( 19 9 5 ) f o r t h e h i g h l i g h t s Sy s t e m a t St a t i s t i c s , C a n a d a , b a s e d o n GREG e s t i m a t i o n
2.4
DOMAIN ESTIMATION
2.4.1
Case of No Auxiliary Information
Su p p oUisde e n o t e s a d o m a i n ( o r s u b p o p u l a t i o n ) o f i n t e r e ∑ n Ym Nir, e t o e s t i m a t e t h e dYi o= m Uiayji on r t toh t ea ld o m aY ii = i ∕Nie, aw n h e t h e n u m b e r o f Uei ,l m e ma ye no trs mi n a y n o tyjbi se bk inn oa wr yn (. 1Ifo r 0 ) , t h e Y i r e d u c e s t o t h e d o mPi ;af ion r pe rxoa pmo pr tlieo , nt h e p r o p o r t i o n i n it h d o m a i n . M u c h o f t h e t h e o r y i n Se c t i o n 2. 3 f o r a t o ∑ e s t i m a t i o n b y u s i n g t h e f o l l o w i n g rY e=l a j∈U t i oyj in ns ht hi pe s . W o p e r a t o r n oY(y) t a at ino dn da es fi n i n g { yij =
yj 0
i fj ∈ Ui o t h e ,r w i s e
{ 1 i fj ∈ Ui aij = 0 o t h e ,r w i s e w e c a n e x p r e s s t h Yei adn od mt ha ei nd too mt a al i n s i z e a s ∑ Y(yi ) =
∑ yij =
j∈U
a n d
∑ aij yj =
j∈U
∑ j∈U
( 2. 4 . 1)
∑ 1 = Ni .
aij =
Y(ai ) =
yj = Yi j∈Ui
j∈Ui
( 2. 4 . 2)
17
D O M A IN EST IM A T IO N
T h e r e f o r e , t h e f o r m u l a e o f Se c t i o Yn c 2. a n3 bf oe r aeps pt il m i e ad t ib n c h a n gyj ibn yy g ij . If t h e d o m a i n s o f Ui1 ,n t ,eUrme, sf ot , r sma ya , p a r t i t i o n o U(o r o f a la rg e r d o m a in ), it is d e s ira b le fro m a u s e r’ e s tim a te s o f d o m a in to ta ls a d d u p to th e e s tim a te o f d o m a in ). In t h e a b s e n c e o f a u x i l i a r y p o p u l a t i o n i n f o r m a t i o n ∑ ∑ ∑ 𝑤j yij = 𝑤j aij yj = 𝑤j yj , ( 2. 4 . 3 ) Yi = Y(yi ) = j∈s
j∈s
j∈si
w h esirde e n o t e s t h e s a m p l e o f e l e m e Uni . t It s br ee al od ni l gy i fnogl lt oo wd r u n b i a s Y. e dIt fi os r f r o m ( 2. 4 . 1) a n d Y(i 2.i sp4 . u3 n) bt hi a st Yei di fYf oi spa l s poc o n s is te n t if th e e x p e c te d d o m a in N s ia=mY(api ) l e s i z e i i sp- u n b i a s Nei , du fs oi nr g ( 2. 4 . 2) . W e n o t e f r o m ( 2. 4 . 3 ) t h a t t h s a t i s fiY1e+d· ·:· + Ym = Y. No t i n g 𝑣(Y) t h a= t𝑣(y), a n e s t i m a t o r o f tYhi ies vs ia mr i pa nl yc oe bo tf a i n e d f 𝑣(y) b y c h a nyj gt oy i ijn, gt h a t i s , ( 2. 4 . 4 )
𝑣(Yi ) = 𝑣(yi ).
It f o l l o w s f r o m ( 2. 4 . 3 ) a n d ( 2. 4 . 4 ) t h a t n o n e w t h e o r y d o m a in to ta l. T h e d o m a Yii n= Y(y m ie)∕Y(a a n i) i s e s t i m a t e d b y Yi =
Y(yi ) Y(ai )
=
Yi
( 2. 4 . 5 )
.
Ni
Fo r t h e s p e c i a l c a s e o f yaj ∈b{0, i n1}, a Yr iyr ev da ur ic aePbi s, late on o f t h e d o m a i n Ppi . rIfo tph oe r et ixo pn e c t e d d o m a i n s e s t i m a t o r (p-2.c4o . n5 s) ii ss t e n t i n t h e s e n s e t h a t i t v a ria n c e g o e s to z e ro a s th e s a m p le s iz e e s tim a to r is g iv e n b y 𝑣L (Y i ) = 𝑣(̃ei )∕Ni2 , w h eẽ ijr =e yij − Y i aij = (yj − Y i )aij . No t e t𝑣(̃ heia) it s yj t õeij . It f o l l o w s f r o m ( 2. 4 . 5 ) a n m e a n s a s w e l lẽ .ij = No0 t fe o t rh aa jtu∈ns in t o r e a d e r t o H a r t l e y ( 19 5 9 ) f o r d 2.4.2
e s tim a to r a m p le s iz e is a p p ro x im in c re a s e s . ( 2. 4 . 6 )
o b t a i n e𝑣(y) d bf ry o cmh a n g i n g d ( 2. 4 . 6 ) t h a t n o n e w t h t b e l o n Ugi . i W n g e t or e f e r t h e e ta ils o n d o m a in e s tim
GREG Domain Estimation
GREG e s t i m a t Yi oi sn aol sf o e a s i l y a d a p t e d t o e s t i mYi .a Itt i o n o f f o l l o w s f r o m ( 2. 3 . 8 ) t h a t t h e GREG e Ysi it si m a t o r o f d o m a i ∑ YiGR = YGR (yi ) = 𝑤∗j yj , ( 2. 4 . 7 ) j∈si
18
D IREC T D O M A IN EST IM A T IO
w h e n t h e p o p u l a t i o n t o t a l xoi fs tkh ne o awu nx .i lIti af or yl l vo ewc st of r o m t h a t t h e GREG e s t i m a t o r a l s o s a t i s fi e s t hYe1GRa +d· ·d· + itiv e p r o i iGR m ia st oa rp p r o x pi mu na tb ei lays e d i f t h e o v e r a l l YmGR = YGR . T h e e s t Y s i z e i s l a rp-g ce o, bn us it s t e n c y r e qu i r e s a l a r g e e x p e c t e d d o m T h e GREG d o m a i n e s t i m a t o r ( 2. 4 . 7 ) m a y b e e x p r e s s e ∑
∑ yj +
YiGR = j∈Ui
𝑤j (yj − yj ),
( 2. 4 . 8 )
j∈si
e i )l ( 2. 3 . 18 ) f o r p r e d i c t eyjd= xvTj aB(y l iu) ue sn d e r t h e w o r k i n g m o dB(y i s o b t a i n eB(y) d fbr oy mc h a nyj gt oy i nij .g It m a y b e n o t e d t h a t t h e l a s t h e e s t i m a t o r o f t h e d o m a i n t o t a clj = ( 2. 𝝂 T4xj ,. 8u ) n ilsi kn eo t hz ee r o ∑ t e r m j∈s 𝑤j ej (s) i n t h e e s t i m a t o r o f t h e p o p u l a t i o n t o t a l ( 2. ∑ p r o je c t i o n - GREG d o m aj∈U i ni yj ei ss tni om t aa tsoy r m pp tuo nt ib ci aa ls l ey d u n l i k t h e b i a s - a d ju s t e d e s t i m a t o r ( 2. 4 . 8 ) . Fo r t h e s p e c i a l c a s e o f r a t i o e ysj tt oa im e ng ge it n ij yj ai nt i (o 2.n 3, .b13y ) c wh a YiR =
Yi
( 2. 4 . 9 )
X.
X
Si m i l a r l y , a p o s t s t r a t i fi e d d o m a i n e s t i m a t o r yij s o b t a i n t oaij yj : YiPS =
G ∑ N⋅g ∑ g=1
N⋅g
𝑤j yj ,
( 2. 4 . 10 )
j∈sig
w h esigr ies t h e s a m p l e f (ig)t a l l hi n cge il nl ot hf et h e c r o s s - c l a s s i fi c a t i o a n d p o s ts tra ta . A T a y l o r l i n e a r i z a t i o n v YaiGRr ii as ns ci m e ep sl tyi mo ba t ao𝑣(y) ir n oe fd f r o m n si tn o t b y c h a nyj gt oe i nij g= yij − xTj B(yi ). No t e t ehij a= t−xTj B(yi ) f o r a uj ∈ b e lo n gU i in. T g ht oe l a r g e n e g a t i v e r e s i d u a l s f o rUial lel as da m p l e t o i n e f fi c i e n c y , u n l iYkGRew i nh et hr ee tcha es ve ao rf i a b i leji w t y i ol l f br ee s i d u s m a l l r e l a t i v e t o t h e yvj ’sa .r iTa hb ii sl i it ny e of ffi tchi e n c y o f t h e GREG e s t i m YaiGRt oi sr d u e t o t h e f a c t t h a t t h e a u x i l i a r y p o p u l a t i o n o t d o m a i n s pYiGR e chi afi sc t. hB e u at d v a n t a g e t h a t pi tu i ns ba ip aps reod x i m e v e n if th e e x p e c te d d o m a in s a m p le s iz e is s m a ll, w h o n d o m a i n - s p e c i fi c a u x i l i a r y p-p bo i pa us el ad t iu on nl ei ns sf ot hr m e ea xt ipo d o m a in s a m p le s iz e is a ls o la rg e .
2.4.3
Domain-Specific Auxiliary Information
W e n o w t u r n t o GREG e s t i m a t iYoi un nodf ear dd oo m m aa ii nn -t os pt ae lc i fi c ∑ i a r y i n f o r m a t i o n . W e a s s u mXi e= t hj∈U a i txj t=h Y(x e di ) ao rme ka inno two tna , l s
19
D O M A IN EST IM A T IO N
w h exijr = e xj i fj ∈ Ui a n xdij = 0 o t h e r w i s xeij ;=t haij axj .t In i s ,t h i s c a s e , a GREG e s t i m a Yt ioi sr go if v e n b y ∗ = Yi + (Xi − Xi )T Bi , YiGR
w h e re
( 2. 4 . 11)
)
( ∑
∑ 𝑤j xij xTij ∕cj
𝑤j xij yij ∕cj
Bi =
j∈s
( 2. 4 . 12)
j∈s
∑ a nX di =
si 𝑤j xj
= Y(xi ). W e m a y a l s o w r i t e ( 2. 4 . 11) a s ∑ ∗ YiGR =
𝑤∗ij yij ,
( 2. 4 . 13 )
j∈s
w h e𝑤r∗ij e= 𝑤j g∗ij w i t h (
)−1
∑ g∗ij = 1 + (Xi − Xi )T
𝑤j xij xTij ∕cj
xij ∕cj .
j∈s
w∗j . eTi gh he tr se f o r e , t h e No t e t h a t t h e 𝑤w∗ij ne oi gw h dt s e p ie un nd l iokne t h e 𝑤 ∗ ∗ e s t i m aYiGR t o dr so n o t a d dYGRu. pA tlosYiGR o , i s n o t a p p r o p-x ui m n ba it ae sl ey d ∗he e cn o n d i T u n l e s s t h e d o m a i n s a m p l e s i z ecj i=s 𝝂 l xaj rh go el .d Ifs , tYthiGR t a k e s t h e p r o je c t i o n f o r m ∗ YiGR = XTi Bi . ( 2. 4 . 14 )
Le h t o n e n , Sä r n d a l , a n d V e i ja n e n ( 20 0 3 ) n a m e d t h e e s W e n o w g i v e t h r e e s p e c i a l c a s e s o f t h e p r o je c t i o ( 2. 4 . 14 ) . In t h e fi r s t c a s e , w e c o n s i d e rx w a si ti hn gk lne o awu nx i l d o m a i nXi tao nt da l csj e= txj i n ( 2. 4 . 14 ) , Bw i s h g e r i v e e n i n ( 2. 4 . 12) . T h i s i to th e ra tio e s tim a to r Y ∗ YiR = i Xi . ( 2. 4 . 15 ) Xi
In t h e s e c o n d c a s e , w e c o n s i d e r k n o w n Ndigoa m n da si ne -t s p e T cj = 1 a n xdj = (x1j , , xGj ) w i xt gjh = 1 i fj ∈ Ug a n xdgj = 0 o t h e r w i s e . T h i s l e t o t h e p o s t s t r a PS/C) t i fi e de sc toi m u na tt (o r
∗ YiPS/C =
G ∑ Nig g=1
Nig
Yig ,
( 2. 4 . 16 )
20
D IRECT D O M A IN EST IM A T IO
∑ ∑ w h eNrige= sig 𝑤j a n Ydig = sig 𝑤j yj . In t h e t h i r d c a s e , w e c o n s i d e r t o t aXig l so f a n a u x i l i a xr ay nv da csrj ei=at1ba lnexdj = (x1j , , xGj )T w i xt gjh = xj , t h e p eos st ti smt raat toi fi r e d i fj ∈ Ug a n xdgj = 0 o t h e r w i s e . T h i s l e a d s t o PS/R) ∗ YiPS/R
=
G ∑ Xig g=1
Xig
Yig ,
( 2. 4 . 17 )
∑ w h eXig r e= sig 𝑤j xj . If t h e e x p e c t e d d o m a i n s a m p l e s i z e i s l a r g e , t h e n a ∗o r i o s fo b t a i n e𝑣(y) d bf ryo cm h a nyj gt oe i n∗ij =g yij − xTij Bi . No t e t h a t e s t i m a Yt iGR e∗ij = 0 i jf ∈ s a n jd∉ Ui u n l i k e t h e l a r g e n e egij ai nt i tvhe e r ce asYiGR si de. uo af l s ∗ c wGREG T h u s , t h e d o m a i n - s p e c iYfiiGR i m o ar te o er fYfiiGRc, i e n t t h a i l l be es t m p r o v i d e d t h e e x p e c t e d d o m a i n - s p e c i fi c s a m p l e s i z e
Example 2.4.1. Wages and Salaries. Sä r n d a l a n d H i d i r o g l o u ( 19 8 9 ) Ch o u d h r y ( 19 9 5 ) c o n s i dU eo rfNe =d 1,a6 p7 o8 puunl ia nt ic oo nr p o r a t e d t a ( u n i t s ) f r o m t h e p r o v i n c e o f No v a Sc o t i a , Ca n a d a , d i v T h is p o p u la tio n is a c tu a lly a s im p le ra n d o m s a m p l la tio n fo r s im u la tio n p u rp o s e s . T h e p o p u la tio n is a e x c l u s i v e i n d u s t r y g r o u p s : r e t a i l ( 5 15 u n i t s ) , c o n s t r d a t i o n ( 114 u n i t s ) , a n d o t h e r s ( 5 5 3 u n i t s ) . D o m a i n s c r o s s - c l a s s i fi c a t i o n o f t h e f o u r i n d u s t r y t y p e s w i t h t to 7 0 n o n e m p ty d o m a in s o u t o f 7 2p o s s ib le d o m a in s d o m a i n Yit oo ft at hyl sev a r i a b l e ( w a g e s a n d s a l a r i e s ) , u t i l i z i n g x ( g r o s s b u s i n e s s i n c o m e ) a s s u mN ue nd i tt os ibne t kh ne op wo pn uf loa r t a A s i m p l e r a n d s,o om f s ani zm i se pd lr ea ,w nU fa rnoydmv a l u e s o b s e r v e d . T d a t a c o n (ysj ,ixsj ) tf o jfr∈ s a n d t h e a u x i l i a r y p o p u l a t i o n i n f o r m si h a nsi (≥ 0) u n i t s . Un d e r t h e a b o v e s e𝜋j t=- n∕N, u p ,𝑤wj =eN∕n, h a av ne d t h e e x p a n s i o n e s t ( 2. 4 . 3 ) r e d u c e s t o ∑ ⎧ ⎪(N∕n) yj i fni ≥ 1, si Yi = ⎨ ( 2. 4 . 18 ) ⎪0 i nfi = 0. ⎩ T h e e s t iYm a t uo nr b i a s Yei bd ef co ar u s e i i spEp (Yi ) = Ep [Y(yi )] = Y(yi ) = Yi .
H o w e v ep-r b, ii ta is se d c o n d i t i o n a l o n t h e r e a nli .i zIne df a dc ot ,m a i n mi ,p al ne do ft hs ei z e c o n d i t i onin, tahl eo sn a smi i sp al es i m p l e r a n d o m ni sf ra omU c o n d i tpi ob ni a slYi oi sf ( ) ni Ni B2 (Yi ) = E2 (Yi ) − Yi = N Y i , ni ≥ 1, − n N
21
M O D IFIED GREG EST IM A T O R
w h eE2r ed e n o t e s c o n d i t i o n a l e x p e c t a t i o n . T h u s , t h e c o n p ro p o rtio n t h e s a m p l e pnir∕n o ep qu o ra t li so tnh e p o p u l a tNi io∕N. Su p p o s e w Gep foo s rt m s t r a t a b a x-s ve da r oi an b t lhe e k n o wN nu nf oi trs a. l l t h T h e n ( 2. 4 . 16 ) a n d ( 2. 4 . 17 ) r e d u c e t o G ∑ ∗ YiPS/C =
( 2. 4 . 19 )
Nig yig g=1
a n d
G ∑ ∗ YiPS/R =
Xig g=1
yig
( 2. 4 . 20 )
,
xig
w h eyigr ea n xdig a r e t h e s a m p l e mnigeua nn i st sf of ar l tl hi ne g (ig), i n tahne d c e l l t h e c e l l cNoig au nn dt s t h e c e Xl igl at or et a al s s u m e d t o bnige = k0 n, ow we n . If ∗ e si tspi mu an tbo i ra s e d s e ytig = 0 i n ( 2. 4 . 19yig)∕xaig n= d0 i n ( 2. 4 . 20 ) . T h YeiPS/C ∗eeasis s o n l y c o n d i t i o n a l o n t h e r e a nl igi (≥ z e1 df os ra am g)l ,lp wl eh se iYrziPS/R a p p r o x i pm ua nt eb li ya s e d c o n d niigt’si o, pn ra ol vo ind teh de a l l t h e e x p e c t e s i z eE(n s ig ) a r e l a r g e . It i s d e s i r a b l e t o m a k e i n f e r e n c e s c o s a m p le s iz e s , b u t th is m a y n o t b e p o s s ib le u n d e r d e p l e r a n d o m s a m p l i n g ( Ra o 19 8 5 ) . Ev e n u n d e r s i m p l e nig ≥ 1 f o r ag, l w l h i c h l i m i t s t h e u s e o f p noi is st ss tmr a at li lfi. c a t i o n w If p o s t s t r a t i fi c a t i o n i s n o t u s e d , w e c a n u s e t h e r a t re d u c e s to y ∗ YiR = Xi i , ( 2. 4 . 21) xi
u n d e r s i m p l e r a n d o myi as na xdm h epr lee mni ue na ni t ss f o r t h r el it nh ge , swa m ia p f r o m d oi. mIf a xinnv a r i a b l e i s n o t N oi bi s s ke nr vo ewd nb , uwt e c a n u s e a n n a tiv e e s tim a to r ∗ YiC = Ni yi . ( 2. 4 . 22) T h i s e s t i mp- ua tnobr i ias s e d c o n d i t i o n a l o n t nhi (≥ e 1) r e. a l i z e d s a m
2.5
MODIFIED GREG ESTIMATOR
W e n o w c o n s id e r m d o m a i n b u pt ru en mb iaai sn e s i z e i n c r e a s e s . In p a c o e f fi B, c i ge invt e n b y ( 2. 3
o d i fi e d GREG y- v ea sl ut i ems af tr oo rms t oh ua t s ui ds ee d o r a pp-p ur on xb i m a s ae tde la ys t h e o v e r a l l r t i Bci ui nl a (r 2. , 4w . 11) e r eb py l at hc ee o v e r a l l r e g .7 ) , to g e t ∑
Ỹ iGR = Yi + (Xi − Xi )T B =
𝑤̃ ij yj j∈s
( 2. 5 . 1)
22
D IRECT D O M A IN EST IM A T IO
w ith
)−1
( ∑ 𝑤j xj xTj ∕cj
𝑤̃ ij = 𝑤j aij + (Xi − Xi )
(𝑤j xj ∕cj ),
s
i agb i lve e. nT bh ye (e 2.s 5t i .m1) ai st w h eaijr ei s t h e d o m a i n i n d i c a t o r v aỸ riGR a p p r o x i pm ua nt eb liya s e d a s t h e o v e r a l l s a m p l e s i z e i n c r e a s a m p le s iz e is s m a ll. T h is e s tim a to r is a ls o c a lle d th ( B a t t e s e , H a r t e r , a n d Fu l l e r 19 8 8 ; W o o d r u f f 19 6 6 ) . T ∑ g t𝑤 h̃i ijmtms a i tno i rm i z m a y a l s o b e v i e w e d a s a c sa𝑤̃lij iyjbwr ai tt hi own eheiji s= ∑ ∑ i n g a c h i - s qu a r e sdcj (𝑤 d j ai ijs −t ahijn)2 ∕𝑤 c ej s u b je c t t o t h e c os hijnxjs=t r a i n t s Xi ( Si n g h a n d M i a n 19 9 5 ) . T h e s u r v e y r e g r e s s i o n e s t i m e rty in th e s e n s e m ∑ Ỹ iGR = Y + (X − X)T B = YGR . i=1
al e raai tni ot of toa rl m In t h e c a s e o f a s i n g l e ax wu xi ti hl i ka nr yo vw a nr idaXobi , m o f ( 2. 5 . 1) i s g i v e n b y Y ( 2. 5 . 2) Ỹ iR = Yi + (Xi − Xi ). X A lth o u g h re g re s s io n c e s tim a to rs s s a m p lin g o f
t h e m o d i fi e d GREG e s t i m a t o r b o r r o w s s t o e f fi c i e n t , i t d o e s n o t i n c r e a s e t h e “e f f e c t u d i e d i n Ch a p t e r 3 . T o i l l u s t r a t e t h i s p o i Ex a m p l e 2. Ỹ iR4 r .e1.d Inu ct he is s t co a s e , ] [ ̃YiR = Ni yi + y (X i − xi ) , ( 2. 5 . 3 ) x
w h ey rae n xd a r e t h e o v e r a l l s a m Xpi =l eXi ∕N m i . eFo a nr sl aan, r ngw de e c a n o n ar na td i ot h e c o n d i t i o n r e p l a c e t h e s a y∕x m bp yl e t hr ae t ipoo p u l aR t=i Y∕X v a r i a n c eni gb ie vc eo nm e s ( ) 1 1 2 2 ̃ V2 (YiR ) ≈ Ni − , ( 2. 5 . 4 ) SEi ni Ni
2r e= ∑ 2 w h eSEi di i s t h e d o m a i n m e a j = yj − Rxj a n E j∈Ui (Ej − E i ) ∕(Ni − 1) w i tEh 2 −1 ̃ 2.)∕N 5 .i 4i s) tohf aot nri d se or t h a t t h e e f f e c t i v o f t hEj ’s e . It f o l l o w s f r o mV2 ((YiR s a m p l e i s n o t i n c r e a s e d , a l t h oEuj ’s g mh at hy e b ve a sr m i a ab lillei tr yt ho af n v a r i a b i l i tyyj ’s of ofj ∈ tr hUei . No t e t h a t t h e v a r iEaj ’sb wi l i tl yl bo ef lt ah reg e r t h a t h e v a r i a b i l i t y o f t h e d o m yja−i Rni x-j fs opjre∈ cUii w fi ch reReir = se iYdi ∕X u i ,a l s u n l eRis ≈s R. T h e m o d i fi e d GREG e s t i m a t o r ( 2. 5 . 1) m a y b e e x p r e s s ∑ Ỹ iGR = XTi B + 𝑤j ej . ( 2. 5 . 5 ) j∈si
23
D ESIGN ISSUES
T h e fi r sXt TitBei rs mt h e s y n t h e t i c r e g r e s s i o n e s t i m a t o r ( s e e C ∑ t e r m si 𝑤j ej a p p r o x i m a t e l yp- cb oi ar rs eoc ft st ht he es y n t h e t i c r e g r e s s i ∑ W e c a n i m ỸpiGR r ob vy er eo pn l a c i n g t h e e x p asi 𝑤 n j esj i on n( 2.e 5s t. i5 m) wa t iot hr a r a t i o e s t i m a t o r ( Sä r n d a l a n d H i d i r o g l o u 19 8 9 ) : (
)−1
∑ EiR = Ni
∑ 𝑤j ej ;
𝑤j si
( 2. 5 . 6 )
si
∑ n o t e tNhi =a t si 𝑤j . T h e r e s u l t i n g e s t i m a t o r Ỹ iSH = XTi B + EiR ,
( 2. 5 . 7 )
h o w e v e r, s u ffe rs fro m th e ra tio b ia s w h e n th e d o m Ỹ iGR . A T a y l o r l i n e a r i z a t i o n vỸ iGR a ri is aonbc t ea ei ns e𝑣(y) t idmbf ray ot comhr ao nf g i n g yj t oaij ej , t h a t i s , 𝑣L (Ỹ iGR ) = 𝑣(ai e). ( 2. 5 . 8 ) T h is v a ria n c e e s tim a to r is v a lid e v e n w h e n th e s m a v id e d th e o v e ra ll s a m p le s iz e is la rg e .
2.6
DESIGN ISSUES
“Op t i m a l ” d e s i g n o f s a m p l e s f o r u s e w i t h d i r e c t e s t i m a r e a s h a s r e c e iv e d a lo t o f a tte n tio n o v e r th e p a s t 6 0 is s u e s , s u c h a s n u m b e r o f s tra ta , c o n s tru c tio n o f s tra t i o n p r o b a b i l i t i e s , h a v e b e e n a d d r e s s e d ( s e e , e . g . , Co i s t o fi n d a n “o p t i m a l ” d e s i g n t h a t m i n i m i z e s t h e M SE a g iv e n c o s t. T h is g o a l is s e ld o m a c h ie v e d in p r a c tic a n d o th e r fa c to rs . A s a re s u lt, s u rv e y p ra c titio n e rs o t h a t i s “c l o s e ” t o t h e o p t i m a l d e s i g n . In p r a c t i c e , i t i s n o t p o s s i b l e t o a n t i c i p a t e a n d p l a d o m a i n s ) a n d u s e s o f s u r v e y d a t a a s “t h e c l i e n t w i l s p e c i fi e d a t t h e d e s i g n s t a g e ” ( Fu l l e r 19 9 9 , p . 3 4 4 ) . A w ill a lw a y s b e n e e d e d in p r a c tic e , g iv e n th e g r o w in s t a t i s t i c s . Ho w e v e r , i t i s i m p o r t a n t t o c o n s i d e r d e s i o n s m a ll a re a e s tim a tio n , p a rtic u la rly in th e c o n te l a r g e - s c a l e s u r v e y s . In t h i s s e c t i o n , w e p r e s e n t a b r d e s ig n is s u e s . A p ro p e r re s o lu tio n o f th e s e is s u e s c re lia b ility o f d ire c t (a n d a ls o in d ire c t) e s tim a te s fo d o m a i n s . Fo r a m o r e d e t a i l e d d i s c u s s i o n o f d e s i g n Si n g h , Ga m b i n o , a n d M a n t e l ( 19 9 4 ) a n d M a r k e r ( 20 0
24
2.6.1
D IRECT D OM A IN EST IM A T ION
Minimization of Clustering
M o s t la rg e -s c a le s u rv e y s , u s e c lu s te rin g to a v a ry in s u r v e y c o s t s . Cl u s t e r i n g , h o w e v e r , r e s u l t s i n a d e c r e a It c a n a l s o a d v e r s e l y a f f e c t t h e e s t i m a t i o n f o r u n p l a n to s itu a tio n s w h e re s o m e d o m a in s b e c o m e s a m p le s a m p l e a t a l l . It i s t h e r e f o r e u s e f u l t o m i n i m i z e t h e c h o ic e o f s a m p lin g fra m e p la y s a n im p o rta n t ro le i fo r e x a m p le , th e u s e o f a lis t fra m e , re p la c in g c lu s t B u s i n e s s Re g i s t e r s f o r b u s i n e s s s u r v e y s a n d A d d r e s s is a n e ff e c tiv e to o l. A ls o , th e c h o ic e o f s a m p lin g u n i s a m p l i n g s t a g e s h a v e s i g n i fi c a n t i m p a c t o n t h e e f f e 2.6.2
Stratification
On e m e t h o d o f p r o v i d i n g b e t t e r s a m p l e s i z e d i s t r i b u to re p la c e la rg e s tra ta b y m a n y s m a ll s tra ta fro m w h a p p ro a c h , it m a y b e p o s s ib le to m a k e a n u n p la n n e d c o m p l e t e s t r a t a . Fo r e x a m p l e , e a c h Ca n a d i a n p r o v i n n o m i c Re g i o n s ( ERs ) a n d Un e m p l o y m e n t In s u r a n c e Re 7 1 ERs a n d 6 1 UIRs i n Ca n a d a . In t h i s c a s e , t h e n u m b e r o f s tre a tin g a ll th e a re a s c re a te d b y th e in te rs e c tio n s o f s t r a t e g y w i l l l e a d t o 13 3 i n t e r s e c t i o n s ( s t r a t a ) . A s a n o Un i t e d St a t e s Na t i o n a l He a l t h In t e r v i e w Su r v e y ( NHIS) u s m e tro p o lita n a re a s ta tu s , la b o r fo rc e d a ta , in c o m e , a T h e r e s u ltin g s a m p le s iz e s f o r in d iv id u a l s ta te s d id n m a te s fo r s e v e ra l s ta te s ;in fa c t, tw o o f th e s ta te s d id T h e NHIS s t r a t i fi c a t i o n s c h e m e w a s r e p l a c e d b y s t a t e a 19 9 5 , t h u s e n a b l i n g s t a t e - l e v e l d i r e c t e s t i m a t i o n f o r d e ta ils ). 2.6.3
Sample Allocation
B y a d o p tin g c o m p ro m is e s a m p le a llo c a tio n s , it m a r e qu i r e m e n t s a t a s m a l l a r e a l e v e l a s w e l l a s l a r g e a r m a t e s . Si n g h e t a l . ( 19 9 4 ) p r e s e n t e d a n e x c e l l e n t i l l u a l l o c a t i o n i n t h e Ca n a d i a n LFS t o s a t i s f y r e l i a b i l i t y r e l e v e l a s w e l l a s s u b p r o v i n c i a l l e v e l . Fo r t h e LFS w i t h a h o u s e h o l d s , “o p t i m i z i n g ” a t t h e p r o v i n c i a l l e v e l y i e l o f t h e d i r e c t e s t i m a t e f o r “u n e m p l o y e d ” a s h i g h a s 17 % T h e y a d o p te d a tw o -s te p c o m p ro m is e a llo c a tio n , w t h e fi r s t s t e p t o g e t r e l i a b l e p r o v i n c i a l e s t i m a t e s a n d a llo c a te d a t th e s e c o n d s te p to p ro d u c e b e s t p o s s ib l r e d u c e d t h e w o r s t CV o f 17 % f o r UIR t o 9 . 4 % a t t h e e x p e t h e p r o v i n c i a l a n d n a t i o n a l l e v e l s : CV f o r On t a r i o i n c
D ESIGN ISSUES
25
f o r Ca n a d a f r o m 1. 3 6 % t o 1. 5 1%. T h u s , b y o v e r s a m p l i n t o d e c r e a s e t h e CV o f d i r e c t e s t i m a t e s f o r t h e s e a r e a a s m a l l i n c r e a s e i n CV a t t h e n a t i o n a l l e v e l . T h e U. S. N D r u g A b u s e u s e d s t r a t i fi c a t i o n a n d o v e r s a m p l i n g t o s t a t e . T h e 20 0 0 D a n i s h He a l t h a n d M o r b i d i t y Su r v e y e a c h o f 6 ,0 0 0 r e s p o n d e n ts , a n d d is tr ib u te d a n a d d it a n t e e a t l e a s t 1, 0 0 0 r e s p o n d e n t s i n e a c h c o u n t y ( s m a Co m m u n i t y He a l t h Su r v e y ( CCHS) c o n d u c t e d b y St a t i s t i s a m p l e a l l o c a t i o n i n t w o s t e p s . Fi r s t , i t a l l o c a t e s 5 0 0 He a l t h Re g i o n s , a n d t h e n t h e r e m a i n i n g s a m p l e ( a b o h o l d s ) i s a l l o c a t e d t o m a x i m i z e t h e e f fi c i e n c y o f p r o 20 0 0 f o r d e t a i l s ) . Se c t i o n 2. 7 g i v e s s o m e m e t h o d s o f c i n c l u d i n g a n “o p t i m a l ” m e t h o d t h a t u s e s a n o n l i n e a r 2.6.4
Integration of Surveys
Ha r m o n i z i n g qu e s t i o n s a c r o s s s u r v e y s o f t h e s a m e e ff e c tiv e s a m p le s iz e s f o r th e h a r m o n iz e d ite m s . T h l e a d t o i m p r o v e d d i r e c t e s t i m a t e s f o r s m a l l a r e a s . Ho c is e d b e c a u s e th e d a ta m a y n o t b e c o m p a ra b le a c ro s w o r d i n g i s c o n s i s t e n t . A s n o t e d b y Gr o v e s ( 19 8 9 ) , d i a n d t h e p l a c e m e n t o f qu e s t i o n s c a n c a u s e d i f f e r e n c e A n u m b e r o f c u r r e n t s u r v e y s i n Eu r o p e a r e h a r m o n b e t w e e n c o u n t r i e s . Fo r e x a m p l e , t h e Eu r o p e a n Co m m ( ECHP) c o l l e c t s c o n s i s t e n t d a t a a c r o s s m e m b e r c o u n t r a c o m m o n p ro c e d u re to c o lle c t b a s ic in fo rm a tio n a 2.6.5
Dual-Frame Surveys
D u a l-fra m e s u rv e y s c a n b e u s e d to in c re a s e th e e ffe c a d u a l-fra m e s u rv e y , s a m p le s a re d ra w n in d e p e n d e t h a t t o g e t h e r c o v e r t h e p o p u l a t i o n o f i n t e r e s t . Fo r e c o m p le te a re a fra m e a n d d a ta c o lle c te d b y p e rs o n a a n in c o m p le te lis t fra m e a n d d a ta c o lle c te d b y te le p th e d u a l- f r a m e d e s ig n a u g m e n ts th e e x p e n s iv e f r a m a d d itio n a l in fo rm a tio n fro m B . T h e re a re m a n y s u rv e x a m p l e , t h e D u t c h Ho u s i n g D e m a n d Su r v e y c o l l e c t b u t u s e s t e l e p h o n e s u p p l e m e n t a t i o n i n o v e r 10 0 m u n e s t i m a t e s f o r t h o s e m u n i c i p a l i t i e s ( s m a l l a r e a s ) . St a re c e n t e x a m p le , w h e re a n a re a s a m p le is a u g m e n te s e le c te d h e a lth re g io n s . Ha r t l e y ( 19 7 4 ) d i s c u s s e d d u a l - f r a m e d e s i g n s a n d d d u a l-fra m e e s tim a tio n o f to ta ls b y c o m b in in g in fo Sk i n n e r a n d Ra o ( 19 9 6 ) d e v e l o p e d d u a l - f r a m e e s t i m w e i g h t s f o r a l l t h e v a r i a b l e s . Lo h r a n d Ra o ( 20 0 0 ) a p o b ta in v a ria n c e e s tim a to rs fo r d u a l-fra m e e s tim a to
26
2.6.6
D IRECT D OM A IN EST IM A T ION
Repeated Surveys
M a n y s u r v e y s a r e r e p e a te d o v e r tim e a n d e ff e c tiv e c o m b in in g d a ta f r o m tw o o r m o r e c o n s e c u tiv e s u r v e Na t i o n a l He a l t h In t e r v i e w Su r v e y ( NHIS) i s a n a n n u a l s u p i n g s a m p l e s a c r o s s y e a r s . Co m b i n i n g c o n s e c u t i v e im p ro v e d e s tim a te s , a lth o u g h th e c o rre la tio n b e tw s a m e p s u ’s , r e d u c e s t h e e f f e c t i v e s a m p l e s i z e . Su c h s i g n i fi c a n t b i a s i f t h e c h a r a c t e r i s t i c o f i n t e r e s t i s n o t M a r k e r ( 20 0 1) s t u d i e d t h e l e v e l o f a c c u r a c y f o r s t a t 19 9 5 NHIS s a m p l e w i t h t h e p r e v i o u s y e a r s a m p l e o r s a He s h o w e d t h a t a g g r e g a t i o n h e l p s a c h i e v e CV ’s o f 3 0 v a r i a b l e s , b u t 10 % CV c a n n o t b e a c h i e v e d f o r m a n y s a c ro ss 3 y e a rs. Ki s h ( 19 9 9 ) r e c o m m e n d e d “r o l l i n g s a m p l e s ” a s a m e t i m e . Un l i k e t h e c u s t o m a r y p e r i o d i c s u r v e y s , s u c h a o v e r t i m e o r t h e Ca n a d i a n LFS a n d t h e Un i t e d St a t e s CPS p a rtia l o v e rla p o f s a m p le e le m e n ts , ro llin g s a m p le s to fa c ilita te m a x im a l s p a tia l ra n g e fo r c u m u la tio n o im p ro v e d s m a ll a re a e s tim a te s w h e n th e p e rio d ic s a i c a n Co m m u n i t y Su r v e y ( A CS) , w h i c h s t a r t e d i n 20 0 5 RS d e s i g n ( h t t p ://w w w . c e n s u s . g o v /a c s /w w w /) . It a i m s o f 25 0 , 0 0 0 h o u s e h o l d s a n d d e t a i l e d a n n u a l s t a t i s t i c s p r e a d a c r o s s a l l c o u n t i e s i n t h e Un i t e d St a t e s . It w i l l a d e c e n n ia l c e n s u s s a m p le s .
2.7
*OPTIMAL SAMPLE ALLOCATION FOR PLANNED DOMAINS
W e n o w s tu d y o p tim a l s a m p le a llo c a tio n in th e c o n Y i a s w e l l a s t h e a g g r e g a tYe u pn od pe ur lsa t tr iao t ni fi me de as ni m p l e r a n p l i n g . W e a s s u m e t h a t t h e d o m a i n s a r e s p e c i fi e d i n a s t a g e ( o r p l a n n e d ) a n d c o n s i d e r t w o c a s e s :( i ) D o m a i p le ra n d o m s a m p le s a re d ra w n in d e p e n d e n tly in d if a c ro s s s tra ta a n d d o m a in fra m e s a re n o t a v a ila b le fo in d iffe re n t d o m a in s . 2.7.1
Case (i)
Su p p o s e t h a t t h eU ip s o pp aur lt ai tt ii oo nLn se tdr aiUnt(h) at(ho = 1, , L) o f s i z e s s oa rf es it zo ebs e s e l e c t e d f r o m t h e s t r a t a s u b N(h) , a n d s a m p l en(h) ∑ C = c0 + Lh=1 c(h) n(h) , w h ce0 ri es t h e o v e r h e a d cc(h)oi sn ts ht ae nc toa snt dp e r u n i ∑ i n s t r ah;t uf omr s i m p l i c i ct (h) y =w1 ef ot ra kah aleln d Cl e− tc0 = Lh=1 n(h) = n. n a te d , r e s p e c tiv e ly , T h e s t r a t aY (h) ma en ad n t sh e o v e Yr a lr le me se t ai m ∑ mW(h) e ay(h)n, w h eWr(h)e = N(h) ∕N i s p l e m ey(h)a an ns d t h e w e i g yhst t=e d Lh=1
*OPT IM A L SA M PLE A LLOCA T ION FOR PLA NNED D OM A INS
27
t h e r e l a t i v e s i z h. e Sa o fms tpr la i tnu gm l i t e r a t u r e ( s e e e . g . , Co c h r a 5 a n d 5 A ) h a s l a r g e l y f o c u s e d {no(h)n} ft oh re es sa t m i mYp talhet ai ant lgl o c a ∑L t on(h) = n, l e a d i n g t o t h e w e l l - k n o w n N m i n i mVpi(yzste) ss u b je c th=1 c a tio n N(h) Sy(h) , ( 2. 7 . 1) nN(h) = n L ∑ N(h) Sy(h) h=1
2r e = ∑ 2 w h eSy(h) v a r i a n c e . Ne y j∈U(h) (yhj − y(h) ) ∕(N(h) − 1) i s t h e s t r a t u m 2 no fp tohpe u vl aa y, c a t i o n d e p e n d s o n t h e u n k n o Sw tr iiaoa nnb dlveian r i a n c y(h) 2 t ai sv ua sr ie adn icn e py. a c e o f p r a c t i c e a p r o zxwy i vt ha rk i na ob w l e n s t r Saz(h) s l On
cna= tn∕L i o an n d t h e o t h e r h a n d , p r o p on(h) r t=i onWn(h)a, le aqul l ao l c aa l tl iono(h) √ ∑L √ s qu a r e r o o t anl(h)l o= cn a Nt (h) i o∕ n h=1 N(h) ( B a n k i e r 19 8 8 ) d o n o t d e
y, b u t a r e t h e n s u b o p t i m Y. a l f o r e s t i m a t i n g Ne y m a n a n d p r o p o r t i o n a l a l l o c a t i o n s m a y c a u s e s . On ) toh fe mo te ha en r s h a n d , e qu a l a l l o c a t i c i e n t s o f v a r i a t i o n y(h) ( CV f o r e s t i m a t i n g s t r a t a m e a n s , b u t i t m yast yc ol ema dp at or e ad mt ou c Ne y m a n a l l o c a t i o n . A c o m p r o m i s e a l l o c a t i o n t h a t i m e a n s a n d th e a g g re g a te p o p u la tio n m e a n is n e e d ( 2004 ) p r o p o s e d a c o m p r o m i s e a l l o c a t i o n g i v e n b y nC(h) = d(nW(h) ) + (1 − d)(n∕L), fo r a sp e re d u c e s Lo n g f o m e ay(h) n sa n
c i fi e dd( 0≤ c od n≤ s1)t ,a an st s u n∕L m i≤n Ng(h) f o t o e qu a l a ldl = o 0c aa nt ido tno wp rh oe pn o r d ( 2006 ) a t t e m p t e d t o s i m d t h e w e i g yhst tb eyd mm i en ai m n iz in
( 2. 7 . 2)
r ah. l Tl h i s a l l o c a t i o n r t i o n a ld a=l1.l o c a t i o n u lta n e o u s ly c o n tr g t h e o b je c t i v e f u n
L ∑
𝜙(n(1) ,
, n(L) ) =
P(h) Vp (y(h) ) + G P+ Vp (yst )
( 2. 7 . 3 )
h=1
∑ w i t h r e s p e c t t o t h e s t rn(h) a t sa u sba jem c pt Lh=1 lt eon(h) s i=z n,e sw h eP+r e= ∑L P . T h e fi r s t c o m 𝜙 s p p o e n c e i n fi t e o s f r e l a t i v e s t Pr (h) a twa hi m i l ep o r t a h=1 (h) em twph oe ir gt ah nt c t h e s e c o n d c o m p o n e n t a t t a c hyste tsh rr eo l ua gt i hv et hi G; t a nb ce er so f s t te rm P+ o f f s e t s t h e e f f e c t o f t h e Ps(h)t ra an t da ti hm e p no ur m q L, o n t h e wG.e Lo i g nh gt f o r d ( 2006 ) p r o p o Ps(h)e =d Nt(h)o f co or ns os qimd ee r ( 0≤ q ≤ 2) . Lo n g f o r d ’s a l l o c a t i o n , o b t a i n e d b y m i n i m i z i √ Sy(h) P∗(h) n(h) = n , ( 2. 7 . 4 ) L √ ∑ ∗ Sy(h) P(h) h=1
28
D IRECT D OM A IN EST IM A T ION q
2 . IfP w h eP∗(h) r e= P(h) + G P+ W(h) (h) = N(h) a n qd= 2, t h e n i t r e d u c e s t o t h e Ne N a l l o c a{nt (h) i o} gn i v e n b y ( 2. 7 . 1) . No n e o f t h e a b o v e c o m p r o m i s e a l l o c a t i o n s a r e o s t r a t a m e a n s a n d t h e p o p u l a t i o n m e a n . Ch o u d h r y , p r o p o s e d a n o n l i n e a r p r o g r a m m i n g ( NLP) m e t h o d o f ∑ b y m i n i m iLh=1 z in(h) n g s u b je c t t po(y(h)CV ) ≤ CV0(h) , CVp (yst ) ≤ CV0 , a n d e CV a n d 0CV a r e s p e c i fi e d t o l e r a n c e s 0 ≤ n(h) ≤ N(h) , h = 1, , L, w h e r 0(h) CV s oy(h)f a n ydst , r e s p e c t i v e l y . It i s e a s y t o s e e t h a t t h e o p re d u c e s to m in im iz in g th e s e p a ra b le c o n v e x fu n c tio L ∑
𝜓(k(1) ,
−1 N(h) k(h)
, k(L) ) =
( 2. 7 . 5 )
h=1
w i t h r e s p e c t t o kt(h) h= e Nv(h)a∕nr (h) i as bu lbe jes c t t o t h e f o l l o w i n g l i n e a i nk(h) , RV(y(h) ) =
k(h) − 1 N(h)
2 Cy(h) ≤ RV0(h) ,
L ∑ 2 W(h)
RV(yst ) = Y h=1
k(h) ≥ 1,
h = 1,
k(h) − 1 N(h) , L,
h = 1,
2 Sy(h) ≤ RV0 ,
, L,
( 2. 7 . 6 )
( 2. 7 . 7 ) ( 2. 7 . 8 )
w h e r e RV d e n o t e s r e l a t i v e v a r i aCny(h)c =e Sy(h) ( o∕Y r (h)s qu i s at rh ee d CV ) w i t h i n - s t r a t u m CV . T h e c o n s t r a i n t ≤( k2.(h)7≤. N 8 (h)) ∕2c ta on b e m o e n s u r e t h a t t h e o p t i m a l sn0(h) a m ≥ 2pf ol er sah,ilzlwe hs iact hi s pfi ee rsm i t s u n a s e d v a r i a n c e e s t i m a t i o n . Ch o u d h r y e t a l . ( 2012) u s e d t 0 t h e Ne w t o n – Ra p h s o n o p t i o n t ok(h) d (eot er remqu i inv ean0(h) lt eh=ne t ol yp t i m 0 N(h) ∕k(h) ) . T h e NLP m e t h o d c a n b e r e a d i l y e x t e n d e d t o h a n f o r m u l t i p l e v a r i a yb1 , l e, ysp (p o ≥ f 2). in te re s t
Example 2.7.1. Monthly Retail Trade Survey (MRTS). Ch o u d h r y e t a l . ( 2012) s tu d ie d th e r e la tiv e p e r f o r m a n c e o f th e th r e e c o m p r e t a l . , NLP a n d Lo n g f o r d ) u s i n g p o p u l a t i o n d a t a f r o m Re t a i l T r a d e Su r v e y ( M RT S) o f s i n g l e e s t a b l i s h m e n t s w e r e t r e a t e d a s s t r a t a , a n d t h e t o l e r a n c e s f 0(h) o r =t h e NLP m 15% f o r a l l t h e p hr ao nv di n0CV c= e6%s f o r Ca n a d a . T a b l e 1 i n Ch o u d h r y e t a l . ( 2012) r e p o rNt(h) e , dY (h)t ,hSy(h) e ,p o p u l a a n Cdy(h) . Us i n g t h o s e v a l u e s , t h e o p {n t i 0(h) m} a nl NLP d t ha el l ao sc sa ot ic oi an tse CV CV(y(h) ) a n d (y st ) w e r e c o m p u t e d a n d r e p o r t e d i n T a b l e 2. t h e NLP a l l o c a t i o n r e s p e c t s t h e 0s=p 6%, e c gi fii ve ed s t CV o l e sr as m n c ae l lCV e t h a n t h e s p e c i fi e d 15 % f o r t w o o f t h e l a r g e r p r o v i n c e
*OPT IM A L SA M PLE A LLOCA T ION FOR PLA NNED D OM A INS
29
11. 0%) , a n d a t t a i n s a 15 % CV f o r t h e r e m a i n i n g p r o v i n c ∑ o v e r a l l s a m np0 = l e sLh=1 i zn0(h) e = i s3, 4 4 6 . Us i n g t h e o p t i m a nl0 fs oa n,rm Ch p loe u sdi zh er y e t a l . ( 2012) (yc(h)o) m p u t e
a n d (y CV oC(h)n, a, Co n ds tt ha ee tLoa nl . ga -l l st ) u n d e r t h e s qu a r e r o o t a l l o c a t i n L f o r d a l l onc(h) awt iiot hn s e lq.e Tc the ed s t u d y i n d i c a t e d t h a t n o s u i t a o fq a n G d i n Lo n g f o r d ’s m e t h o d c a n b e f o u n d t h a t e n s u r e i t y r e qu i r e m e n t s a r e s a t i s fi e d e v e n a p p r o x i m a t e l y . O a l l o c a t i od n= 0.5 w ipt he r f o r m e d qu i t e w e(ystl )l = , l6.4% e a da ni nd g (y t(h)o) CV CV a r o u n d 15 % e x c e p t f o r t h e p r o v i n c e s o f No v a Sc o t i a a n a n d 16 . 5 %. Squ a r e r o o t a l l o c a t i o n p e r f o r m e d s o m e w h 2.7.2
Case (ii)
W e n o w t u r n t o t h e ci ca us et t oi nf gd ao cmr oa si ns sdh. e Su s i pg pn os st er a t tha a t t h e p o p u l a t i o n i s p a r t i tUi io(i = n 1, e d ,im), n t ao nddo tmh aa ti nt hs e e s t i m a t d o m a i n m e a n s n e e d t o s a t i s f y r e 0il ,ai t=i 1, v e ,vm.a If{n r i a0(h)n} c e t o l e i s t h e c u r r e n t o p t i m a l s a m p l e a l l o c a t i o n t h a t s a t i s fi t h e s t r a t a , w e n e e d t o fi n d t h e o p t i m a l{̃na(h)d−dn0(h) i t} i o n a l s t t h a t a r e n e e d e d t o s a t i s f y a l s o t h e ñ d(h) odme na oi nt e tso lt eh re a n r e v i s e d t o t a l s a m p l eh. s Fo i z re t ihn i ss tpr ua tr up m o se , w e u se a g a i m e th o d . ∑ T o e s tim a te th e Y d i o= m n i ymj , w e ae n u s e t h e r a t i o e s t i m a t o Ni−1a ij∈U L ∑ h=1
Yi =
N(h) ñ −1 (h)
∑ aij yj j∈s(h)
, L ∑ h=1
N(h) ñ −1 (h)
( 2. 7 . 9 )
∑ aij j∈s(h)
w h eaijr = e 1 i fj ∈ Ui a n adij = 0 o t h e r w i s(h)e i, sa tnh de s a m p l e f rh.o m
s tra
T h e v a r i aY in i cs eo ob ft a i n e d b y t h e T a y l o r l i n e a r i z a t i o n f o 2
RV(Y i ) = Vp (Y i )∕Y i . b ya lma il nl oi m W e t h e n o b t a i n t h e o {̃ pn(h) t i} m c a i tzi io nn g t h e t o t a l s a in c re a s e L L ∑ ∑ 0 ̃ f̃(1) , , f̃(L) ) − 𝜙( n0(h) = (f̃(h) − f(h) )N(h) , ( 2. 7 . 10) h=1
h=1
0 w i t h r e s p e c t t o f̃t(h)h =e ñv(h)a∕Nr(h) i a, w b l he fes(h) r= e n0(h) ∕N(h) , s u b je c t t o
RV(Y i ) ≤ RV0i , 0 f(h) ≤ f̃(h) ≤ 1,
i = 1, h = 1,
, m, , L,
( 2. 7 . 11) ( 2. 7 . 12)
30
D IRECT D OM A IN EST IM A T ION
w h e r 0ie i RV s t h e s p e c i fi e d t o l e r a n c e o n t hY ie. TRVh eo cf ot hn es tdr oa m i n at ( 2. 7 . 12) e n s u nr0(h)e ≤s ñt(h)h ≤ aN t (h) f o r ah. l l It i s e a s y t o s e e t h a t t h e a b o v e o p t i m i z a t i o n p r o b l e m a ra b le c o n v e x fu n c tio n L ∑
𝜓(k̃ (1) ,
−1 N(h) k̃ (h)
, k̃ (L) ) =
( 2. 7 . 13 )
h=1
w i t h r e s p e c t t o k̃t(h) h= e Nv(h)a∕̃nr (h) i as bu lbe jes c t t o t h e f o l l o w i n g l i n e a i nk̃ (h) : )2 L ( −2 ∑ N(h) RV(Y i ) ≈ Y i Ni h=1 0 , 1 ≤ k̃ (h) ≤ k(h)
h = 1,
k̃ (h) − 1 N(h)
2 Se(h),i ≤ RV0i ,
, m, ( 2. 7 . 14 ) ( 2. 7 . 15 )
, L.
2, He r eSe(h),i is th e s tra tu m
i = 1,
v a r i a n ceije= oaijf(ytj − h Ye i ) rf eo sjr∈ i dUiu. No a l st e
t h a t t h e f o r m (Yui ) lua s feodr iRV n ( 2. 7 . 14 ) i s b a s e d o n T a y l o r l i n e a 0 t h e r e s u l t i n k̃g(h) oa np ñdt(h)i m l a n ñd0(h) , r e s p e c t i v e l y , t h e o p t i m a l s b yk̃a(h) i n c r e a s e i nh is s̃tnr0(h)a−t nu0(h)m. In p r a c t i c e , d o m a i n r e l i a b i l i t y r e qu i r e m e n t s a r e o f t ñ 0(h) a n d i n t h a t c a s e t h e p r o p o s e d t w o - s t e p m e t h o d i s r e l i a b i l i t y r e qu i r e m e n t s a r e s i m u l t a n e o u s l y s p e c i fi e ∑ s a m p l e Lh=1 s i nz (h)e s u b je c t t o a l l t h e r e l i a b i l i t y c o n s t r a i n t s : RV(y(h) ) ≤ RV0(h) , RV(Y i ) ≤ RV0i ,
h = 1, i = 1,
, L,
, m,
RV(yst ) ≤ RV0 , 0 < f(h) = n(h) ∕N(h) ≤ 1,
h = 1,
, L.
T h e r e s u l t i n g o p ñt0(h)i m i s ai dl se on lt ui ct iaol nt o t h e p r e v i o u s t w o - s t
Example 2.7.2. Ch o u d h r y e t a l . ( 2012) a p p l i e d t h e d o m a i n d e s c r i b e d a b o v e t o t h e M RT S p o p u l a t i o n d a t a o n s i n tra d e g ro u p s th a t c u t a c ro s s p ro v in c e s a s d o m a in s o f ( 2. 7 . 13 ) – ( 2. 7 . 15 ) , t h e y o b t a i n e d t h e f o l l o w i n g r e s u l t s ∑ t o CV 0 f o r ai, lt lh e n t h e o p t i m a l r e v iñ s0 = e d Lh=1 s añ 0(h) m =p nl0 ,e s i z e i s 0i = 3 % 0 t h a t i s , n o i n nc ri es an s ee e idn e d ; ( i i ) i f d o m a0i i an r et o rl ee dr au nc ce ed s t CV o 25%, t h e n t h e o p t i m a l i n c r e a s e i n t h e t o t a l s a m p l e s i z e s a m p l e ñs0 i=z 4, e 06 i s 8 , n o t i nn0g= t3,h4 a4t 6 ; ( i i i ) i f t o l 0ie ar ar en cf ue rs t CV h e r
*OPT IM A L SA M PLE A LLOCA T ION FOR PLA NNED D OM A INS
31
r e d u c e d t o 20%, ñ 0 = t5,h5e 4n 6 s a t i s fi e s r e l i a b i l i t y r e qu i r e m e n n o t e d t h a t a s t h e t o t a l s a m p l e sy(h) i za en iydstn dc er ec ar es ae s e, t. h e CV s 2.7.3
Two-Way Stratification: Balanced Sampling
In s o m e a p p l i c a t i o n s , i t m a y b e o f i n t e r e s t t o c o n s t o t w o d i f f e r e n t p a r t i t i o nU.s W o f e t hd ee np oo tpe u t lha et i town o p a r t i U(1h) , h = 1, , L1 , a n Ud(2h) , h = 1, , L2 . Fo r e x a m p l e , Fa l o r s i a n d Ri g c o n s i d e r e d a p o p u l a t i o n o f e n t e r p r i s e s i n It a l y p a r t i t r e g i o n Lw t hm a r g i n a l s t r a t a a n d e c o n o m i c a c t i v i t y g 1 =i20 L2 = 24 m a r g i n a l s t r a t a . T h e p a r a mL e(=t Le1 r+sL2o) fmi na tr eg r ien sat l a r s t r a t a t o t a l s o r m e a n s . T h n, e oi sv ae lrl ao l cl as taemd pt ol e t hs iez tew, o p s e p a r a t e l y , l e a d i n g t o fi x e d m na(th) r ,gh i=n 1,a l ,sLtt ,r sa ut ac hs at m h apt l e ∑Lt n = n f o r p a r t t = i t 1, i o 2. n Fo r e x a m p l e , t h e Co s t a e t a l . ( 2004 ) h=1 (th) a l l o c a t i o n m e t h o d , m e n t i o n e d i n Se c t i o n 2. 7 . 1, l e a d s b y ( 2. 7 . 16 ) nC(th) = dt (nW(th) ) + (1 − dt )(n∕Lt ),
f o r s p e c i fi e d dtc(0o ≤ndts ≤t a1),n ta ss s u mn∕L i nt ≤gN(th) , w h eN(th) re is th e k n o w n s i z e oh ifn s pt raa rt tut.i tmi o n Gi v e n t h e a l l no(th) c , a t thi oe nn se , x t s t e p i s t o o b t a i n c a l i b r a t e d ∑ t a G(𝜋 n cj∗ ,e𝜋j )f su un bc jet i co tn t o a b i l i t𝜋ij∗ e, j s∈ U, b y m i n i m i z i n g a d i s j∈U ∑ ∑ ∗ ∗ , Lt − 1, t = 1, 2, w h 𝜋e j r=en∕N f o r a l l j∈U 𝜋j = n a n d j∈U(th) 𝜋j = n(th) , h = 1, j ∈ U a r e p r e l i m i n a r y i n c l u s i o n pG(𝜋 r oj∗ ,b𝜋j )a =b 𝜋ij∗l li ot i(g𝜋ej∗s∕𝜋. j )T− h e c h o ∗ ∗ (𝜋j + 𝜋j ) a v o i d s n e g a t i v𝜋je. v a l u e s f o r T h e fi n a l s t e p u s e s b a l a n c e d s a m p l i n g ( T i l l é 2006 , C t h a t l e a d s t o r e a l i z e d sñ (th) t r ae txa a sc at m e as il zt eo st h e d e s i r e d l y pe lqu ∑ s j∗t ihs a t t h e H n(th) . In g e n e r a l , b a l a n c e d s a m p l i n g e n s uj∈srzTje ∕𝜋 ∑ T T e x a c t l y e qu a l t oj∈Ut zhj fe ot ro st ap le c i fi e zdj . vT eh ce t o“c r us b e m e t h o d ” a l th e s e le c tio n o f b a la n c e d s a m p le s fo r a la rg e s e t o f a zTj . Fo r t h e p r o b l e m a t h a n d , w e l e t zTj = 𝜋j∗ (a(11)j ,
, a(1L1 )j ; a(21)j ,
, a(2L2 )j ),
( 2. 7 . 17 )
w h ea(th)j r e = 1 i f u nj bi te l o n U g s at on ad = 0 o t h e r w i s e . Su b s t i t u t i n g ∑ (th) T ∗(th)j i n t o t h e H– T e s t i m a j∈s t ozjr ∕𝜋 , w n 1 ) ; ñ (21) , , ñ (2L2 ) ). On =e(̃no(11)b, t a, nĩ (1L j ∑ T .o17zt Tja)=l i fs o r t h e o t h e r h a n d , t h e c o r r e s pzj og ni vd ei nn g i nt r (u 2.e 7 tj∈U ∑ ∑ ∗ ∗ h 𝜋aj at (th)j = j∈U(th) 𝜋j = n(th) . T h u s , (n(11) , , n(1L1 ) ; n(21) , , n(2L2 ) ) n o t i n g tj∈U b a l a n c e d s a m p l i n g l e a d s t o nt(th) h .e s p e c i fi e d s t r a t a s i z e A d i r e c t e x p a n s i o n e s t i m a t i o n oY(th) f ti hs eg di vo em Yn(th)ab=iyn ( s t r a t ∑ ∗ . A GREG e s t i m Ya t o a y ∕𝜋 m r a o y f b e u s e d i f a n a x uj wx i tl hi a r y v e (th) j∈s (th)j j j k n o w n Xtios taa vl a i l a b l e . A v a r i a Yn(th)c ies eo sbt ti am i na et od r bf yo rv i e w i n a n c e d s a m p l i n g a s a “c o n d i t i o n a l Po i s s o n ” s a m p l i n g
32
D IRECT D OM A IN EST IM A T ION
a c o n d i t i o n a l Po i s s o n s a m p l i n g a p p r𝜋oj bx yi mm a i tni oi mn it zo - o ∑ i n g j∈U 𝜋j s u b je c t t o s p e c i fi e d t o l e r a n c e s o n t h e v a r i a n c e u s in g m a th e m a tic a l p ro g ra m m in g , s im ila r to th e m
2.8 2.8.1
PROOFS Proof of Ŷ GR (x) = X
W r i t Yi GR n (x g T ) i n t h e e x p a n s i o n f o r𝑤m∗j = w 𝑤j gij tah n w gdj ge i vg eh nt s b y ( 2. 3 . 9 ) , w e h a v e ∑ YGR (xT ) =
𝑤j gj xTj s
∑ = s
( )−1 ⎡ ⎤ ∑ T T T 𝑤j ⎢xj + (X − X) 𝑤j xj xj ∕cj xj xTj ∕cj ⎥ ⎢ ⎥ s ⎣ ⎦
= XT + (X − X)T = XT .
Derivation of Calibration Weights 𝒘∗j ∑ W e m i n i m i z e t h e c h i - s squ cj (𝑤aj − r ehjd)2 ∕𝑤 dj i w s t iat nh c re e s p e c t t o t h ∑ c ot hn as t r ias i, n wt s e m i n i m i z e hj ’s s u b je c t t o t h e c a l i b r a ts hi joxj n= X, ∑ ∑ T La g r a n g i a n f𝜙u=n cs ctj (𝑤 i o j n− hj )2 ∕𝑤j − 2𝝀 ( s hj xj − X), w h e 𝝀r e i s t h e v e c t o r o f La g r a n g e m u l t i p l i𝜙ewr si .t hT ar ek si phnj ega cndt𝝀det or i v a t i v a n d e qu a t i n g t o hzj e= r𝑤oj (1, +wxTje𝝀∕c g j ), e tw h e r e 2.8.2
∑ (𝑤j xj xTj ∕cj )−1 (X − X).
𝝀= s
e ∗jr = e 𝑤j gj a n gdj i s g i v e n b y ( 2. 3 . 9 ) . T h uhjs=, 𝑤∗j , w h 𝑤 ̂ TB ̂ when cj = 𝝂 T xj Proof of Ŷ = X ∑ Si n cXeT = s 𝑤j xTj , m u l t i p l y i n g a n cdj =d𝝂 Ti xvj w i d iitnh gi nb tyh e p r e v i o u s a n d u s i n g t h e dBei nfi n( 2.i t3i o. 7n ) ,o wf e g e t 2.8.3
)
( ∑ T
𝑤j xTj
X B=
B
s
( ∑ =𝝂
T
) 𝑤j xj xTj ∕cj
s
B
33
PROOFS
( ∑ =𝝂
T
) 𝑤j xj yj ∕cj
s
∑ 𝑤j (𝝂 T xj )yj ∕cj
= s
∑ 𝑤j yj = Y.
= s
∑ T h i s e s t a b l i s h e ss𝑤jtejh(s)e =r 0ens ou tl et d b e l o w ( 2. 3 . 11) .
3 INDIRECT DOMAIN ESTIMATION
3.1
INTRODUCTION
In Chapter 2, we studied direct estimators for domains with sufficiently large sample sizes. We also introduced a “survey regression” estimator of a domain total that borrows strength for estimating the regression coefficient, but it is essentially a direct estimator. In the context of small area estimation, direct estimators lead to unacceptably large standard errors for areas with unduly small sample sizes; in fact, no sample units may be selected from some small domains. This makes it necessary to find indirect estimators that increase the “effective” sample size and thus decrease the standard error. In this chapter, we study indirect domain estimators based on implicit models that provide a link to related small areas. Such estimators include synthetic estimators (Section 3.2), composite estimators (Section 3.3), and James–Stein (JS) (or shrinkage) estimators (Section 3.4); JS estimators have attracted much attention in mainstream statistics. We study their statistical properties in the design-based framework outlined in Chapter 2. Explicit small area models that account for local variation will be presented in Chapter 4, and model-based indirect estimators and associated inferences will be studied in subsequent chapters. Indirect estimators studied in Chapter 3 and later chapters are largely based on sample survey data in conjunction with auxiliary population data, such as a census or an administrative register.
36
3.2
INDIRECT DOMAIN ESTIMATION
SYNTHETIC ESTIMATION
An estimator is called a synthetic estimator if a reliable direct estimator for a large area, covering several small areas, is used to derive an indirect estimator for a small area under the assumption that the small areas have the same characteristics as the large area (Gonzalez 1973). Hansen, Hurwitz, and Madow (1953, pp. 483–486) described the first application of a synthetic regression estimator in the context of a radio listening survey (see Section 3.2.2). The National Center for Health Statistics (1968) in the United States pioneered the use of synthetic estimation for developing state estimates of disability and other health characteristics from the National Health Interview Survey (NHIS). Sample sizes in many states were too small to provide reliable direct state estimates. 3.2.1
No Auxiliary Information
Suppose that auxiliary population information is not available and that we are interested in estimating the small area mean Y i , for example, the proportion Pi of persons in poverty in small area i (Y i = Pi ). In this case, a synthetic estimator of Y i is given by Ŷ ̂ ̂ Y iS = Y = , (3.2.1) N̂ ∑ ̂ where Y is the direct estimator of the overall population mean Y, Ŷ = s 𝑤j yj , and ∑ ̂ N̂ = s 𝑤j . The p-bias of Y iS is approximately equal to Y − Y i , which is small relative to Y i if Y i ≈ Y. If the latter implicit model that the small area mean is approximately equal to the overall mean is satisfied, then the synthetic estimator will be very efficient because its mean squared error (MSE) will be small. On the other hand, it can be heavily biased for areas exhibiting strong individual effects which, in turn, can lead to large MSE. The condition Y i ≈ Y may be relaxed to Y i ≈ Y(r) where Y(r) is the ̂ ̂ mean of a larger area (region) covering the small area. In this case, we use Y iS = Y(r) ̂ ̂ where Y(r) is the direct estimator of the regional mean Y(r). The p-bias of Y(r) is approximately equal to Y(r) − Y i , which is negligible relative to Y i under the weaker ̂ condition Y i ≈ Y(r), and the MSE of Y(r) will be small provided that the regional sample size is large. 3.2.2
*Area Level Auxiliary Information
Suppose survey estimates Ŷ i of area totals Yi and related area level auxiliary variables xi1 , … , xip are available for m out of M local areas (i = 1, … , m). We can then fit by least squares a linear regression to the data (Ŷ i , xi1 , … , xip ) from the m sampled areas. Resulting estimators 𝛽̂0 , 𝛽̂1 , … , 𝛽̂p of the regression coefficients lead to the regression-synthetic predictors (estimators) for all the M areas given by Ỹ i = 𝛽̂0 + 𝛽̂1 xi1 + · · · + 𝛽̂p xip ,
i = 1, … , M.
(3.2.2)
37
SYNTHETIC ESTIMATION
The estimators (3.2.2) can be heavily biased if the underlying model assumptions are not valid. Example 3.2.1. Radio Listening. Hansen, Hurwitz, and Madow (1953, pp. 483–486) applied (3.2.2) to estimate the median number of stations heard during the day in each of the more than 500 county areas in the United States. In this application, the direct estimate, yi , of the true median y0i , obtained from a radio listening survey based on personal interviews, plays the role of Ŷ i . The estimate xi of the true median y0i , obtained from a mail survey, was used as a single covariate in the linear regression of yi on xi . The mail survey was first conducted by sampling 1,000 families from each county area and mailing questionnaires. Incomplete list frames were used for this purpose and the response rate was about 20%. The estimates, xi , are biased due to nonresponse and incomplete coverage but are expected to have high correlation with the true median values, y0i . Direct estimates yi for a sample of 85 county areas were obtained through an intensive interview survey. The sample county areas were selected by first grouping the population county areas into 85 primary strata on the basis of geographical region and type of radio service available, and then selecting one county area from each stratum with probability proportional to the estimated number of families in the county. A subsample of area segments was selected from each of the sampled county areas, and the families within the selected area segments were interviewed. The correlation between yi and xi was 0.70. Regression-synthetic estimates were calculated for the nonsampled counties as ỹ i = 0.52 + 0.74xi , using the estimates xi obtained from the mail survey. Ericksen (1974) used the fitted regression equation (3.2.2) for both nonsampled and sampled counties.
3.2.3
*Unit Level Auxiliary Information
Suppose that unit level sample data {yj , xj ; j ∈ s} and domain-specific auxiliary information are available in the form of known totals Xi . Then, we can estimate the domain total Yi using the GREG-synthetic estimator ̂ Ŷ iGRS = XTi B,
(3.2.3)
where B̂ is given by (2.3.7). We can express (3.2.3) as ∑ Ŷ iGRS =
̃ ij yj ,
(3.2.4)
j∈s
where
( ̃ ij =
)−1
XTi
T j xj xj ∕cj j∈s
j xj ∕cj
(3.2.5)
38
INDIRECT DOMAIN ESTIMATION
and cj = 𝝂 T xj for a vector of constants 𝝂. The above form (3.2.4) shows that the same weight ̃ ij is attached to all variables of interest treated as y. Furthermore, we have the weight-sharing (WS) property ( )−1 m ̃ ij = XT
T j xj xj ∕cj
j xj ∕cj
= ̃j
j∈s
i=1
∑ and the calibration property j∈s ̃ ij xTj = XTi . The WS property implies that the GREG-synthetic estimators ∑ (3.2.3) add up to the projection-GREG estimator of the population total Ŷ GR = j∈s ̃ j yj , obtained as a special case of the GREG estimator with cj = 𝝂 T xj . Note that the projection-GREG estimator at a large area level is considered to be reliable. The p-bias of Ŷ iGRS is approximately equal to XTi B − Yi , where B = ∑ ∑ ( U xj xTj ∕cj )−1 U xj yj ∕cj is the population regression coefficient. This p-bias will be ∑ to Yi if the domain-specific regression coefficient ∑ small relative Bi = ( Ui xj xTj ∕cj )−1 Ui xj yj ∕cj is close to B and Yi = XTi Bi . The last condition Yi = XTi Bi is satisfied if cj = 𝝂 T xj . Thus, the GREG-synthetic estimator will be very efficient when the small area i does not exhibit strong individual effect with respect to the regression coefficient. A special case of (3.2.3) is the ratio-synthetic estimator in the case of a single auxiliary variable x. It is obtained by letting cj = xj in (3.2.3) and it is given by Ŷ Ŷ iRS = Xi . X̂
(3.2.6)
The p-bias of Ŷ iRS relative to Yi is approximately equal to Xi (R − Ri )∕Yi , which will be small if the area-specific ratio Ri = Yi ∕Xi is close to the overall ratio R = Y∕X. The ̂ X)X. ̂ ratio-synthetic estimators (3.2.6) add up to the direct ratio estimator Ŷ R = (Y∕ We now consider poststratified synthetic estimators when the cell counts Nig are known for poststrata g = 1, … , G. In this case, a count-synthetic poststratified estimator is obtained as a special case of the GREG-synthetic estimator (3.2.3) by letting xj = (x1j , … , xGj )T with xgj = 1 if j ∈ Ug and xgj = 0 otherwise. It is given by G
Ŷ iS/C =
Nig g=1
Ŷ ⋅g N̂ ⋅g
(3.2.7)
,
where Ŷ ⋅g and N̂ ⋅g are estimators of the poststratum total Y⋅g and size N⋅g (National Center for Health Statistics 1968). In the special case of a binary variable yj ∈ {0, 1}, a count-synthetic estimator of the domain proportion Pi is obtained from (3.2.7) as (
)−1 (
G
P̂ iS/C =
Nig P̂ ⋅g
Nig g=1
)
G
,
g=1
where P̂ ⋅g is the direct estimator of the gth poststratum proportion, P⋅g .
(3.2.8)
39
SYNTHETIC ESTIMATION
More generally, a ratio-synthetic poststratified estimator is obtained if the cell totals Xig of an auxiliary variable x are known. By setting xgj = xj if j ∈ Ug and xgj = 0 otherwise in (3.2.3), we get the ratio-synthetic poststratified estimator G
Ŷ iS/R =
Xig g=1
Ŷ ⋅g
,
X̂ ⋅g
(3.2.9)
∑
(Ghangurde and Singh 1977). The p-bias of Ŷ iS/R is approxi∑ mately equal to g=1 Xig (Y⋅g ∕X⋅g − Yig ∕Xig ) = G g=1 Xig (R⋅g − Rig ). Thus, the p-bias relative to Yi will be small if the area-specific ratio Rig is close to the poststratum ratio R⋅g for each g. In the special case of counts, the latter implicit model is equivalent to saying that the small area mean Y ig is close to the poststratum mean Y ⋅g for each g. If poststrata can be formed to satisfy this model, then the count-synthetic estimator will be very efficient, provided the direct poststrata estimators Ŷ ⋅g ∕N̂ ⋅g are reliable. Furthermore, note that the estimators Ŷ iS/C and Ŷ iS/R add up to the ∑G ∑G ̂ ̂ ̂ ̂ direct poststratified estimators g=1 (N⋅g ∕N⋅g )Y⋅g and g=1 (X⋅g ∕X⋅g )Y⋅g of Y, respectively. In the poststratification context, changing N̂ ⋅g to N⋅g in (3.2.7) and X̂ ⋅g to X⋅g in (3.2.9), we obtain the following alternative synthetic estimators: where X̂ ⋅g =
s⋅g
j xj
∑G
G
Ỹ iS/C =
Nig g=1
and
G
Ỹ iS/R =
Xig g=1
Ŷ ⋅g N⋅g
Ŷ ⋅g X⋅g
;
(3.2.10)
(3.2.11)
see Purcell and Linacre (1976) and Singh and Tessier (1976). In large samples, the alternative synthetic estimators (3.2.10) and (3.2.11) are less efficient than the estimators (3.2.7) and (3.2.9), respectively, when Ŷ ⋅g and X̂ ⋅g (or N̂ ⋅g ) are positively cor̂ X̂ compared to the expansion estimator related, as in the case of the ratio estimator Y∕ ̂Y. The p-bias of the alternative estimator Ỹ iS/R (Ỹ iS/C ) in large samples remains the same as the p-bias of Ŷ iS/R (Ŷ iS/C ), but in moderate samples the p-bias will be smaller because the ratio bias is not present. Moreover, the alternative synthetic estimators ∑ ∑ ̂ add up to the direct estimator Ŷ = G s j yj . g=1 Y⋅g = It should be noted that all synthetic estimators given earlier can be used also for nonsampled areas. This is an attractive property of synthetic estimation. The synthetic method can also be used even when sampling is not involved. For example, suppose that Y, but not Yi , is known from some administrative source, and that Xi and X are also known. Then a synthetic estimator of Yi may be taken as (Xi ∕X)Y, whose bias relative to Yi will be small when Ri ≈ R. Note that (Xi ∕X)Y is not an estimator in the usual sense of a random quantity.
40
INDIRECT DOMAIN ESTIMATION
Example 3.2.2. Health Variables. The count-synthetic estimator (3.2.7) has been used to produce state estimates of proportions for certain health variables from the 1980 U.S. National Natality Survey (NNS). This survey was based on a probability sample of 9,941 live births with a fourfold oversampling of low-birth-weight infants. Information was collected from birth certificates and questionnaires sent to married mothers and hospitals. G = 25 poststrata were formed according to mother’s race (white, all others), mother’s age group (6 groups), and live birth order (1, 1–2, 1–3, 2, 2+, 3, 3+, 4+). In this application (Gonzalez, Placek, and Scott 1996), a state is a small area. To illustrate the calculation of the count-synthetic estimate (3.2.7), suppose i denotes Pennsylvania and yj is a binary variable taking the value 1 if the jth live birth is jaundiced and 0 otherwise. Direct estimates of percent jaundiced, P̂ ⋅g , were obtained from the NNS for each of the 25 poststrata. The number of hospital births in each cell, Nig , were obtained from State Vital Registration data. Multiplying Nig by P̂ ⋅g and summing over the poststrata g, we get the numerator of (3.2.8) as 33,806. Now ∑ dividing 33,806 by the total hospital births G g=1 Nig = 156,799 in Pennsylvania, the count-synthetic estimate of percent jaundiced live births in Pennsylvania is given by (33,806∕156,799) × 100 = 21.6%. External Evaluation: Gonzalez, Placek, and Scott (1996) also evaluated the accuracy of NNS synthetic estimates by comparing the estimates with “true” state values, Pi , of selected health variables: percent low birth weight, percent late or no prenatal care, and percent low 1-minute “Apgar” scores. Five states covering a wide range of annual number of births (15,000–160,000) were selected for this purpose. True values Pi were ascertained from the State Vital Registration System. Direct state estimates were also calculated from the NNS data. Standard errors (SE) of the direct estimates were estimated using balanced repeated replication method with 20 replicate half-samples; see Rust and Rao (1996) for an overview of replication methods for estimating SE. The MSE of the synthetic estimator P̂ iS/C was estimated as (P̂ iS/C − Pi )2 . This MSE estimator is unbiased but very unstable. Table 3.1 reports true values, direct estimates, and synthetic estimates of state proportions, as well as√the estimated values of relative root mean squared error (RRMSE), where RRMSE= MSE∕(true value). It is clear from Table 3.1 that the synthetic estimates performed better than the direct estimates in terms of estimated RRMSE, especially for states with small numbers of sample cases (e.g., Montana). The values of estimated RRMSE ranged from 0.14 (Pennsylvania) to 0.62 (Montana) for the direct estimates, whereas those for the synthetic estimates ranged from 0.000 (Pennsylvania) to 0.32 (Kansas). The National Center for Health Statistics (NCHS) used a maximum estimated RRMSE of 25% as the standard for reliability of estimates, and most of the synthetic estimates met this criterion for reliability, unlike the direct estimates. But this conclusion should be interpreted with caution due to the instability of the MSE estimator of P̂ iS/C that was used. Example 3.2.3. *Labor Force Counts in Lombardy, Italy. Bartoloni (2008) studied the performance of the direct poststratified count estimator (2.4.16) and the alternative count-synthetic poststratified estimator (3.2.10) for the industrial districts in
41
SYNTHETIC ESTIMATION
TABLE 3.1 True State Proportions, Direct and Synthetic Estimates, and Associated Estimates of RRMSE
Variable/State
True (%)
Direct Estimate Estimates(%) RRMSE(%)
Synthetic Estimate Estimates(%) RRMSE(%)
Low birth: Pennsylvania Indiana Tennessee Kansas Montana
6.5 6.3 8.0 5.8 5.6
6.6 6.8 8.5 6.8 9.2
15 22 23 36 71
6.5 6.5 7.2 6.4 6.3
0 3 10 10 13
Prenatal care: Pennsylvania Indiana Tennessee Kansas Montana
3.9 3.8 5.4 3.4 3.7
4.3 2.0 4.7 2.1 3.0
21 21 26 35 62
4.3 4.7 5.0 4.5 4.3
10 24 7 32 16
Apgar score: Pennsylvania Indiana Tennessee Kansas Montana
7.9 10.9 9.6 11.1 11.6
7.7 9.5 7.3 12.3 12.9
14 16 18 25 40
9.4 9.4 9.7 9.4 9.4
19 14 1 15 19
Source: Adapted from Tables 4 and 5 in Gonzalez et al. (1996).
Lombardy (Italy). Those districts are underrepresented in the Italian Labor Force Survey (LFS), thus resulting in small sample sizes. Bartoloni (2008) conducted a simulation study using Lombardy’s 1991 census data. Repeated samples were selected from this census applying the LFS design used in Lombardy. Poststratification was done based on age–sex grouping. Results of the simulation study indicated superior performance of the alternative count-synthetic poststratified estimator in terms of RRMSE. This is largely due to the low values of absolute relative bias (ARB) averaged over the areas. Example 3.2.4. County Crop Production. Stasny, Goel, and Rumsey (1991) used a regression-synthetic estimator to produce county estimates of wheat production in the state of Kansas. County estimates of farm production are often used in local decision making and by companies selling fertilizers, pesticides, crop insurance, and farm equipment. Stasny et al. (1991) used a nonprobability sample of farms, assuming a linear regression model relating wheat production of the jth farm in the ith county, yij , to a vector of predictors xij = (1, xij1 , … , xijp )T . The predictor variables xijk (k = 1, … , p) selected in the model have known county totals Xik and include a measure of farm size, which might account for the fact that the sample is not a probability sample.
42
INDIRECT DOMAIN ESTIMATION
The regression-synthetic estimator of the ith county total Yi is obtained as Ỹ iS = ∑Ni ŷ , where ŷ ij = 𝛽̂0 + 𝛽̂1 xij1 + · · · + 𝛽̂p xijp is the least squares predictor of yij , j = j=1 ij 1, … , Ni , and Ni is the total number of farms in the ith county. The least squares estimators, 𝛽̂k , are obtained from the linear regression model yij = 𝛽0 + 𝛽1 xij1 + · · · + 𝛽p xijp + 𝜀ij with independent and identically distributed (iid) errors 𝜀ij , using the sample data {(yij , xij ); j = 1, … , ni ; i = 1, … , m}, where ni is the number of sample farms from the ith county. The estimator Ỹ iS reduces to Ỹ iS = Ni 𝛽̂0 + Xi1 𝛽̂1 + · · · + Xip 𝛽̂p , which requires only the known county totals Xik , k = 1, … , p. It is not necessary to know the individual values xij for the nonsampled farms in the ith county. The synthetic estimates Ỹ iS do not add up to the direct state estimate Ŷ of wheat production obtained from a large probability sample. The state estimate Ŷ is regarded ∑ ̃ as more accurate than the total m i=1 YiS . A simple ratio benchmarking of the form Ỹ Ỹ iS (a) = ∑m iS Ŷ ̃ i=1 YiS was therefore used to ensure that the adjusted estimates Ỹ iS (a) add up to the reliable ̂ direct estimate Y. The predictor variables xijk chosen for this application consist of acres planted in wheat and district indicators. A more complex model involving the interaction between acres planted and district indicators was also studied, but the two models gave similar fits. If the sampling fractions fi = ni ∕Ni are not negligible, a more efficient estimator of Yi is given by ∗ Ŷ iS = yij + ŷ ij , j∈si
j∈ri
where ri is the set of nonsampled units from area i (Holt, Smith, and Tomberlin 1979). ∗ reduces to The estimator Ŷ iS ∗ Ŷ iS =
∗ ̂ ∗ ̂ 𝛽1 + · · · + Xip 𝛽p , yij + (Ni − ni )𝛽̂0 + Xi1 j∈si
∑ where Xik∗ = Xik − j∈si xijk is the total of xijk for the nonsampled units ri . This estimator also requires only the county totals Xik . 3.2.4
Regression-Adjusted Synthetic Estimator
Levy (1971) proposed a regression-adjusted synthetic estimator that attempts to account for local variation by combining area-specific covariates with the synthetic
43
SYNTHETIC ESTIMATION
̂ ̂ estimator. Covariates zi are used to model the relative bias (RB) Bi = (Y i − Y iS )∕Y iS ̂ associated with the synthetic estimator Y iS of the mean Y i : Bi = 𝛾0 + 𝜸 T zi + 𝜀i , where 𝛾0 and 𝜸 are the regression parameters and 𝜀i is a random error. Since Bi is not observable, the regression model is fitted by least squares to estimated bias values ̂ ̂ ̂ ̂ B̂ a = (Y a − Y aS )∕Y aS for large areas a using reliable direct estimators Y a and syn̂ thetic estimators Y aS , a = 1, … , A. Denoting the resulting least squares estimates as ̂ Bi is estimated as 𝛾̂0 + 𝜸̂ T zi , assuming that the above area level model holds 𝛾̂0 and 𝜸, for the large areas. This in turn leads to a regression-adjusted synthetic estimator of Y i given by ̂ ̂ Y iS (a) = Y iS (1 + 𝛾̂0 + 𝜸̂ T zi ). (3.2.12) Levy (1971) obtained state level regression-adjusted synthetic estimates by fitting the model to the estimated bias values B̂ i at the regional level a. 3.2.5
Estimation of MSE
The p-variance of a synthetic estimator Ŷ iS will be small relative to the p-variance of a direct estimator Ŷ i because it depends only on the precision of direct estimators at a large area level. The p-variance is readily estimated using standard design-based methods, but it is more difficult to estimate the MSE of Ŷ iS . For example, the p-variance of the ratio-synthetic estimator (3.2.9) or of the count-synthetic estimator (3.2.7) can be estimated using Taylor linearization. Similarly, the p-variance of the GREG-synthetic estimator (3.2.3) can be estimated using the results of Fuller (1975) on the large sample covariance matrix of B̂ or by using a resampling method such as the jackknife. We refer the readers to Wolter (2007), Shao and Tu (1995, Chapter 6), and Rust and Rao (1996) for a detailed account of resampling methods for sample surveys. An approximately p-unbiased estimator of MSE of Ŷ iS can be obtained using a p-unbiased direct estimator Ŷ i for sampled area i. We have MSEp (Ŷ iS ) = Ep (Ŷ iS − Yi )2 = Ep (Ŷ iS − Ŷ i + Ŷ i − Yi )2 = Ep (Ŷ iS − Ŷ i )2 − Vp (Ŷ i ) + 2 Covp (Ŷ iS , Ŷ i ) = Ep (Ŷ iS − Ŷ i )2 − Vp (Ŷ iS − Ŷ i ) + Vp (Ŷ iS ).
(3.2.13)
It now follows from (3.2.13) that an approximately p-unbiased estimator of MSEp (Ŷ iS ) is mse(Ŷ iS ) = (Ŷ iS − Ŷ i )2 − 𝑣(Ŷ iS − Ŷ i ) + 𝑣(Ŷ iS ), (3.2.14)
44
INDIRECT DOMAIN ESTIMATION
where 𝑣(⋅) is a p-based estimator of Vp (⋅); for example, a jackknife estimator. The estimator (3.2.14), however, can be very unstable and it can take negative values. Consequently, it is customary to average the MSE estimator over small areas i(= 1, … , m) belonging to a large area to get a stable estimator (Gonzalez and Waksberg ̂ ̂ 1973). Let Y iS = Ŷ iS ∕Ni be the estimated area mean, so that mse(Y iS ) = mse(Ŷ iS )∕Ni2 . ∑ ̂ ̂ ̂ We take m−1 m 𝓁=1 mse(Y 𝓁S ) as an estimator of MSE(Y iS ) so that we get msea (YiS ) = ̂ 2 ̂ Ni msea (Y iS ) as an estimator of MSE(YiS ), where m
m
m
1 1 1 1 ̂ 1 1 ̂ msea (Y iS ) = (Y − Ŷ 𝓁 )2 − 𝑣(Ŷ 𝓁S − Ŷ 𝓁 ) + 𝑣(Ŷ 𝓁S ). m 𝓁=1 N 2 𝓁S m 𝓁=1 N 2 m 𝓁=1 N 2 𝓁 𝓁 𝓁 (3.2.15) But such a global measure of uncertainty can be misleading since it refers to the average MSE rather than to the area-specific MSE. A good approximation to (3.2.14) is given by mse(Ŷ iS ) ≈ (Ŷ iS − Ŷ i )2 − 𝑣(Ŷ i ),
(3.2.16)
noting that the variance of the synthetic estimator Ŷ iS is small relative to the variance of the direct estimator Ŷ i . Using the approximation (3.2.16) and applying the above averaging idea, we obtain m
m
1 1 1 ̂ 1 ̂ (Y − Ŷ 𝓁 )2 − 𝑣(Ŷ 𝓁 ). msea (Y iS ) ≈ m 𝓁=1 N 2 𝓁S m 𝓁=1 N 2 𝓁
(3.2.17)
𝓁
Marker (1995) proposed a simple method of getting a more stable area-specific estimator of MSE of Ŷ iS than (3.2.14) and (3.2.16). It uses the assumption that the squared ̂ p-bias B2p (Y iS ) is approximately equal to the average squared bias: m
1 ̂ ̂ ̂ B2p (Y iS ) ≈ B2 (Y ) =∶ B2a (Y iS ). m 𝓁=1 p 𝓁S
(3.2.18)
The average squared bias (3.2.18) can be estimated as ̂ b2a (Y iS )
m
1 ̂ ̂ = msea (Y iS ) − 𝑣(Y 𝓁S ), m 𝓁=1
(3.2.19)
noting that average MSE = average variance + average (bias)2 . The variance estî mator 𝑣(Y 𝓁S ) = 𝑣(Ŷ 𝓁S )∕N𝓁2 is readily obtained using traditional methods, as noted earlier. It now follows under the assumption (3.2.18) that MSEp (Ŷ iS ) can be estimated as ̂ mseM (Ŷ iS ) = 𝑣(Ŷ iS ) + Ni2 b2a (Y iS ), (3.2.20)
SYNTHETIC ESTIMATION
45
which is area-specific if 𝑣(Ŷ iS ) depends on the area. However, the assumption (3.2.18) may not be satisfied for areas exhibiting strong individual effects. Nevertheless, (3.2.20) is an improvement over the global measure (3.2.17), provided the variance term dominates the bias term in (3.2.20). Note that both mseM (Ŷ iS ) and msea (Ŷ iS ) require the knowledge of the domain sizes, Ni . Furthermore, averaging the ̂ area-specific estimator mseM (Y iS ) = Ni−2 mseM (Ŷ iS ) over i = 1, … , m leads exactly ̂ to the average MSE estimator msea (Y iS ) given by (3.2.17), as noted by Lahiri and Pramanik (2013). It is important to note that the average MSE estimator (3.2.17), as well as (3.2.15), may take negative values in practice. Lahiri and Pramanik (2013) ̂ proposed modifications to msea (Y iS ) that always lead to positive average MSE estimates. To illustrate the calculation of (3.2.20), suppose Ŷ iS∑is the ratio-synthetic estî i and Ŷ i is the expansion estimator ̂ ̂ X)X ̂ i = RX mator (Y∕ si j yj . Then 𝑣(Yi ) = 𝑣(yi ) 2 ̂ ̂ and 𝑣(YiS ) ≈ (Xi ∕X) 𝑣(e) in the operator notation introduced in Section 2.4, where ̂ j , j ∈ s. Using these variance estimators we can compute mseM (Ŷ iS ) from ej = yj − Rx (3.2.17), (3.2.19), and (3.2.20). Note that 𝑣(yi ) is obtained from 𝑣(y) by changing yj to yij , and 𝑣(e) is obtained by changing yj to ej . 3.2.6
Structure Preserving Estimation
Structure preserving estimation (SPREE) is a generalization of synthetic estimation in the sense that it makes a fuller use of reliable direct estimates. The parameter of interest is a count such as the number of employed in a small area. SPREE uses the well-known method of iterative proportional fitting (IPF) (Deming and Stephan 1940) to adjust the cell counts of a multiway table such that the adjusted counts satisfy specified margins. The cell counts are obtained from the last census while the specified margins represent reliable direct survey estimates of current margins. Thus, SPREE provides intercensal estimates of small area totals of characteristics that are also measured in the census. We illustrate SPREE in the context of a three-way table of census counts {Niab }, where i(= 1, … , m) denotes the small areas, a(= 1, … , A) the categories of the variable of interest y (e.g., employed/unemployed) and b(= 1, … , B) the categories of some variable closely related to y (e.g., white/nonwhite). The unknown current counts are denoted by {Miab }, and the parameters of interest are the marginal counts Mia⋅ = ∑B b=1 Miab . We assume that reliable survey estimates of some of the margins are avail̂ able. In particular, we consider two cases: ∑m 1) Survey estimates {M⋅ab } of the margins {M⋅ab } are available, where M⋅ab = i=1 Miab . Note that the margins correspond to a ̂ i⋅⋅ } of the ̂ ⋅ab }, estimates {M larger area covering the small areas. 2) In addition to {M ∑A margins {Mi⋅⋅ } are also available, where Mi⋅⋅ = a=1 Mia⋅ . Such estimates of current small area population counts Mi⋅⋅ may be obtained using demographic methods such as the sample regression method considered in Section 3.3 of Rao (2003a). SPREE is similar to calibration estimation studied in Section 2.3. We seek values {xiab } of Miab , which minimize a distance measure to {Niab } subject to the constraints ∑m ∑m ∑A ∑B ̂ ̂ ̂ i=1 xiab = M⋅ab in case 1, and i=1 xiab = M⋅ab and a=1 b=1 xiab = Mi⋅⋅ in case 2.
46
INDIRECT DOMAIN ESTIMATION
̃ iab as the resulting values, the estimate of Mia⋅ is then obtained as M ̃ ia⋅ = Denoting M ∑B ̃ b=1 Miab . Case 1 One-Step SPREE Using a chi-squared distance, we minimize m
A
A
B
D =
(Niab − xiab ) ∕ciab − i=1 a=1 b=1
(
B
2
)
m
̂ ⋅ab xiab − M
𝜆ab a=1 b=1
i=1
with respect to {xiab }, where ciab are some prespecified weights and 𝜆ab are the Lagrange multipliers. If we choose ciab = Niab , then the “optimal” value of xiab is given by N ̃ iab = iab M ̂ . M N⋅ab ⋅ab The resulting estimate of Mia⋅ is B
̃ ia⋅ = M b=1
Niab ̂ , M N⋅ab ⋅ab
(3.2.21)
which has the same form as the alternative count-synthetic estimator (3.2.10). Note ̃ satisfy the additive property when that the structure preserving estimators M ∑m ̃ ∑B ̂ ia⋅ ̂ ⋅a⋅ . summed over i, that is, i=1 Mia⋅ = b=1 M⋅ab = M ̃ The same optimal estimates Miab are obtained minimizing the discrimination information measure m
A
B
Niab log
D ({Niab }, {xiab }) = i=1 a=1 b=1
Niab xiab
(3.2.22)
instead of the chi-squared distance. ̃ iab preserve the association structure in the three-way table of The estimates M ̂ ⋅ab }; that is, counts {Niab } without modifying the marginal allocation structure {M the interactions as defined by cross-product ratios of the cell counts are preserved. It ̃ iab } preserve census area effects in the sense is easy to verify that the values {M ̃ iab N M = iab , ̃ Ni′ ab Mi′ ab
(3.2.23)
′ ̃ iab also preserve the two-way interactions of for all pairs of areas (i, i ). The values M area i and variable of interest a from the census, in the sense that the cross-product ratios remain the same:
̃ ′ ′ ̃ iab M Niab Ni′ a′ b M i a b = ̃ ′ M ̃ ′ Ni′ ab Nia′ b M i ab ia b
′
′
for all i, i , a, a .
47
SYNTHETIC ESTIMATION
The two-way interactions of area i and associated variable b, and the three-way interactions of area i, variable of interest a and associated variable b, that is, the ratios of cross-product ratios, also remain the same. Because of this structure preserving property, the method is called SPREE. This property is desirable because one would expect the association structure to remain fairly stable over the intercensal period. Under the implicit assumption that the association structures of {Niab } and {Miab } are identical in the above sense, SPREE gives an exactly p-unbiased estimator of ̂ ⋅ab is p-unbiased for M⋅ab . To prove the p-unbiasedness of M ̃ ia⋅ , first Mia⋅ , provided M note that B Niab ̃ ia⋅ ) = M⋅ab . Ep (M N b=1 ⋅ab Second, applying the condition (3.2.23) to the true counts, Miab gives Miab Ni′ ab = Niab Mi′ ab , ′
which, when summed over i , leads to Niab ∕N⋅ab = Miab ∕M⋅ab . Case 2 Two-Step SPREE In case 2 with additional estimated margins {Mi⋅⋅ } available, the values that minimize the chi-squared distance do not preserve the association structure, unlike those based on the discrimination information measure (3.2.22). We therefore consider only the latter measure and minimize D ({Niab }, {xiab }) subject to the constraints ∑A ∑B ∑m ̂ ̂ i=1 xiab = M⋅ab for all a, b and a=1 b=1 xiab = Mi⋅⋅ for all i. The “optimal” solũ iab cannot be obtained in a closed form, but the well-known method of IPF tion M ̃ iab iteratively. The IPF procedure involves a sequence of can be used to obtain M iteration cycles each consisting of two steps. At the kth iteration cycle, the values ̃ (k−1) } at the end of (k − 1)th iteration cycle are first adjusted to the set of con{M iab ∑ ̂ straints, m i=1 xiab = M⋅ab , as follows: ̃ (k) 1 Miab =
̃ (k−1) M iab ̃ (k−1) M ⋅ab
̂ ⋅ab . M
̃ (k) , is then adjusted to the second set of The solution to the previous adjustment, 1 M iab ∑A ∑B ̂ i⋅⋅ as constraints a=1 b=1 xiab = M ̃ (k) = M iab
̃ (k) 1 Miab ̃ (k) 1 Mi⋅⋅
̂ i⋅⋅ . M
̃ (0) , are set equal to the census counts Niab The starting values for the iteration, M iab specifying the initial association structure. If all the counts Niab are strictly positive, ̃ iab } as k → ∞ (Ireland ̃ (k) } converges to the optimal solution {M then the sequence {M iab and Kullback 1968). The resulting estimator of Mia⋅ is given by the marginal value
48
INDIRECT DOMAIN ESTIMATION
̃ ̃ ia⋅ = ∑B M ̃ M b=1 iab . The estimator Mia⋅ will be approximately p-unbiased for Mia⋅ if the population association structure remains stable over the intercensal period. Purcell and Kish (1980) studied SPREE under different types of marginal constraints, including the above-mentioned Cases 1 and 2. ̃ (1) , is simpler than The two-step estimator obtained by doing just one iteration, M ia⋅ ̃ ia⋅ . It also makes effective use of the current marginal information the SPREE M ̂ i⋅⋅ }, although both marginal constraints may not be exactly satisfied. ̂ ⋅ab } and {M {M Rao (1986) derived a Taylor linearization variance estimator of the two-step estimator ̃ (1) . Chambers and Feeney (1977) gave an estimator of the asymptotic covariance M ia⋅ ̃ ia⋅ in a general form without details needed for matrix of the SPREE estimators M computation. Example 3.2.5. Vital Statistics. Purcell and Kish (1980) made an evaluation of the one-step and the two-step SPREE estimators by comparing the estimates to true counts obtained from the Vital Statistics registration system. In this study, SPREE estimates of mortality due to each of four different causes (a) and for each state (i) in the United States were calculated for five individual years ranging over the postcensal period 1960–1970. Here, the categories b denote 36 age-sex-race groups, {Niab } the ̂ i⋅⋅ = Mi⋅⋅ } the known current counts. ̂ ⋅ab = M⋅ab }, {M 1960 census counts and {M Table 3.2 reports the medians of the state percent absolute relative errors ARE = |estimate − true value|/true value of the one-step and the two-step SPREE estimates. It is clear from Table 3.2 that the two-step estimator performs significantly better TABLE 3.2
Medians of Percent ARE of SPREE Estimates
Cause of Death
Year
One-Step
Two-Step
Malignant neoplasms
1961 1964 1967 1970
1.97 3.50 5.58 8.18
1.85 2.21 3.22 2.75
Major CVR diseases
1961 1964 1967 1970
1.47 1.98 3.47 4.72
0.73 1.03 1.20 2.22
Suicides
1961 1964 1967 1970
5.56 8.98 7.76 13.41
6.49 8.64 6.32 8.52
Total others
1961 1964 1967 1970
1.92 3.28 4.89 6.65
1.39 2.20 3.36 3.85
Source: Adapted from Table 3 in Purcell and Kish (1980).
49
SYNTHETIC ESTIMATION
than the one-step estimator (3.2.21) in terms of median ARE. Thus, it is important to incorporate, through the allocation structure, the maximum available current data into SPREE. 3.2.7
*Generalized SPREE
GLSM Approach Zhang and Chambers (2004) proposed a generalized linear structural model (GLSM) for estimating cell counts in two-way or three-way tables with given margins. The GLSM model assumes that the interactions in the table of true counts, measured in terms of ratios of counts, are proportional (but not necessarily equal) to the corresponding interactions in the census table. SPREE is a special case of this model, in which the proportionality constant is equal to one. For simplicity, we introduce GLSM only for a two-way table. Let {Mia } be the true counts and {Nia } be the known census counts, for a = 1, … , A and i = 1, … , m. Also, M = M ∕M and 𝜃 N = N ∕N be, respectively, the true and census proportions let 𝜃ia ia i⋅ ia i⋅ ia of individuals in the category a of the variable of interest within domain i, where Ni⋅ = ∑A ∑A a=1 Nia and the true margins Mi⋅ = a=1 Mia are assumed to be known. Consider the deviations of these proportions in the log scale from their means, M M = log(𝜃ia )− 𝜇ia
N N 𝜇ia = log(𝜃ia )−
1 A 1 A
A
log(𝜃 M′ ) =∶ ga (𝜽M i ), ia
′
a =1 A
log(𝜃 N ′ ) =∶ ga (𝜽Ni ), ia
′
a =1
N N N T M M T where 𝜽M i = (𝜃i1 , … , 𝜃iA ) and 𝜽i = (𝜃i1 , … , 𝜃iA ) . The model proposed by Zhang and Chambers (2004) assumes that M N = 𝜆a + 𝛽𝜇ia , 𝜇ia
i = 1, … , m,
(3.2.24)
∑ where the condition Aa=1 𝜆a = 0 ensures that the deviations add up to zero, that is, ∑A M a=1 𝜇ia = 0 for each i. Model (3.2.24) has an interpretation in terms of log-linear models. Consider the saturated log-linear model representations for the census and the true counts: M , log(Mia ) = 𝛼0M + 𝛼iM + 𝛼aM + 𝛼ia
log(Nia ) =
𝛼0N
+
𝛼iN
+
𝛼aN
+
N 𝛼ia ,
(3.2.25) (3.2.26)
where 𝛼0M and 𝛼0N are the overall means, 𝛼iM and 𝛼iN are the marginal effects of the M and 𝛼 N are the areas, 𝛼aM and 𝛼aN are the marginal effects of the categories, and 𝛼ia ia interaction effects of each crossing area-category. Note that the log-linear models (3.2.25) and (3.2.26) lead to M M = 𝛼aM + 𝛼ia , 𝜇ia
N N 𝜇ia = 𝛼aN + 𝛼ia .
(3.2.27)
50
INDIRECT DOMAIN ESTIMATION
Using (3.2.27), model (3.2.24) implies the following: (a) 𝛼aM = 𝜆a + 𝛽𝛼aN , that is, the marginal effects of the categories a in the above log-linear model for the true counts {Mia } are a linear function of the corresponding effects in the model for the census counts {Nia }. M = 𝛽𝛼 N , that is, the interaction effects (category by area) in the log-linear (b) 𝛼ia ia model for the true counts {Mia } are proportional to those in the model for the census counts {Nia }. From (b), it is easy to see that SPREE is obtained by setting 𝛽 = 1, that is, when the interaction effects in the log-linear models for true and census counts are exactly equal. Let us now express model (3.2.24) in matrix notation. For this, let us define the vectors M M T 𝝁M i = (𝜇i1 , … , 𝜇iA ) ,
and the matrix
N N T 𝝁Ni = (𝜇i1 , … , 𝜇iA ) ,
( ′
−1A−1
Zi =
I(A−1)
× (A−1)
| | N | 𝝁i | |
𝝃 = (𝜆2 , … , 𝜆A , 𝛽)T ) ,
where 𝟏r denotes a column vector of ones of size r. Then the model (3.2.24) may be represented as a multivariate generalized linear model in the form g(𝜽M i ) = Zi 𝝃,
i = 1, … , m,
M where g(𝜽M i ) is a column vector with elements ga (𝜽i ), a = 1, … , A. T Let us denote 𝜂i = Zi 𝝃 = (𝜂i1 , … , 𝜂iA ) . The link function g(⋅) is one-to-one and T its inverse is given by 𝜽M i = h(𝜂i ) = (h1 (𝜂i ), … , hA (𝜂i )) , where
[
]−1
A
exp(𝜂ia′ )
ha (𝜂i ) =
exp(𝜂ia ).
′
a =1 M
The model is fitted using design-unbiased direct estimators 𝜽̂ i of 𝜽M i obtained from independent samples of size ni from each domain i. The design-based covariance M matrix of 𝜽̂ i , denoted Gi , is assumed to be known for each area i. Then the model parameters 𝝃 are estimated by using iterative weighted least squares (IWLS). This M method relies on linearizing g(𝜽̂ i ) as ′ M M ̂M M g(𝜽̂ i ) ≈ g(𝜽M i ) + g (𝜽i )(𝜽i − 𝜽i ) =∶ ui , ′
i = 1, … , m,
where g (𝜽M of g(𝜽M i ) is the matrix of partial′ derivatives i ). Now noting that ′ M M M T Ep (ui ) = g(𝜽i ) = Zi 𝝃 and Covp (ui ) = g (𝜽i )Gi [g (𝜽i )] =∶ Vi , we can represent
51
SYNTHETIC ESTIMATION
ui = (ui1 , … , uiA )T in terms of a linear model ui = Zi 𝝃 + ei , i = 1, … , m, where the errors ei are independent with zero mean vector and covariance matrix Vi . ∑ However, Vi is singular because Aa=1 uia = 0. Therefore, one of the elements of ui is redundant. Removing the first element a = 1 from ui leads to the model (3.2.28)
ui(1) = Zi(1) 𝝃 + ei(1) ,
N N T where ui(1) = (ui2 , … , uiA )T , Zi(1) = (I(A−1) × (A−1) |𝝁i(1) ) with 𝝁i(1) = (𝜇i2 , … , 𝜇iA ) , T and ei(1) = (ei2 , … , eiA ) . Let Vi(1) = Vp (ei(1) ) be the resulting covariance matrix, which is equal to Vi with the first row and column deleted. Model (3.2.28) can be fit by IWLS. The updating equation for 𝝃 is given by
[ 𝝃
(k)
]−1
m
m
ZTi(1) (V(k−1) )−1 Zi(1) i(1)
=
ZTi(1) (V(k−1) )−1 u(k−1) . i(1) i(1)
i=1
(3.2.29)
i=1
In this equation, V(k−1) and u(k−1) are equal to Vi(1) and ui(1) , respectively, evaluated i(1) i(1)
at the current vector of proportions 𝜽M,(k−1) , which is obtained from the estimator of i 𝝃 in the previous iteration of the algorithm, 𝝃 (k−1) , through the inverse link function 𝜽M,(k−1) = h(Zi 𝝃 (k−1) ). If 𝝃̂ = 𝝃 (K) is the estimate of 𝝃 obtained in the last iteration i M ̂ = (𝜃̃ M , … , 𝜃̃ M ) is the vector of estimated proportions. The estiK, then 𝜽̃ i = h(Zi 𝝃) iA i1 ̃ ia = Mi⋅ 𝜃̃ia , a = 1, … , A. mated counts are in turn given by M Composite SPREE Another generalization of the one-way SPREE is the composite SPREE (Molina, Rao, and Hidiroglou 2008). The composite SPREE finds estimates of the cell counts that tend to be close (in terms of a chi-squared distance) to the census counts for cells in which direct estimates are not reliable due to small sample sizes, and close to direct estimates for cells in which the sample sizes are large enough. In the composite SPREE, the counts for the categories in the ith domain {Mia ; a = 1, … , A} are estimated by minimizing a sum of composite distances to the two available sets of counts, namely census counts {Nia ; a = 1, … , A} and direct estimates ̂ ia ; a = 1, … , A}, subject to the restriction that they add up to the given domain {M margin Mi⋅ . More concretely, the counts {Mia ; a = 1, … , A} are estimated by solving the problem in xia Min (xi1 ,…,xiA )
s.t.
] A [ ∑ ̂ −x )2 (N −x )2 (M 𝛼ia iaN ia + (1 − 𝛼ia ) iâ ia
a=1
ia
Mia
(3.2.30)
A ∑
xia = Mi⋅ , a=1
where 𝛼ia ∈ [0, 1], a = 1, … , A, are specified constants. Note that the problem (3.2.30) is solved separately for each domain i. Using the Lagrange multiplier method, the optimal solution xia to problem (3.2.30) for domain i is given by ̃ ia = (𝛿ia ∕𝛿i⋅ ) Mi⋅ , M
a = 1, … , A,
(3.2.31)
52
where 𝛿i⋅ =
INDIRECT DOMAIN ESTIMATION
∑A
a=1 𝛿ia
and )−1
( 𝛿ia =
𝛼ia 1 − 𝛼ia + ̂ ia Nia M
,
a = 1, … , A.
We now turn to the choice of the constants 𝛼ia by considering the special case of a simple random sample of size ni drawn independently from each domain i, with ∑ sample sizes ni1 , … , niA in the A categories, where ni = Aa=1 nia . Then, following ideas similar to those in Section 3.3.2, a possible choice of 𝛼ia is given by 𝛼ia =
1 , 1 + nia ∕k∗
(3.2.32)
where k∗ > 0 is a constant that can be interpreted as a critical sample size, satisfying ̂ ia than 𝛼ia < 0.5 when nia > k∗ and more weight is given to the direct estimates M to the census counts Nia in (3.2.30). If nia < k∗ , then 𝛼ia > 0.5 and the composite distance in (3.2.30) gives more weight to the census counts. The constant k∗ may be taken as the minimum sample size under which the user considers a direct estimator as a minimally reliable estimator. Example 3.2.6. Count Estimation: Canadian LFS. Molina, Rao, and Hidiroglou (2008) applied the GLSM and composite SPREE methods to data from the 2001 Canadian census and LFS. Employed people are classified into different occupational classes labeled by two digits. Three-digit codes are used for a subclassification on different sectors of industry/services within each two-digit occupational class. The goal is to estimate the counts {Mia } in the cross-classification by province (i = 1, … , 10) and three-digit category (a = 1, … , A) separately for each two-digit category. The number of three-digit occupation categories A is different for each two-digit category. Direct estimates of the two-digit totals of employed people in each province were available from the LFS with a large enough sample size. In this application, ∑ ̂ ia , i = 1, … , m. The composthey were treated as true margins, that is, Mi⋅ = Aa=1 M ite SPREE weights 𝛼ia were taken as in (3.2.32) with k∗ = 20, a value close to the median sample size, which is also approximately half of the average sample size over the three-digit categories. GLSM was fitted using, as starting values of 𝜆a and 𝛽, those corresponding to a model that preserves the marginal and interaction effects of the census counts, (0) = 1. The starting value for 𝝃 is then 𝝃 (0) = that is, 𝜆(0) a = 0, a = 1, … , A, and 𝛽 T (0, … , 0, 1) . This method requires also the design-based covariance matrix Gi of M the vector of direct estimators 𝜽̂ i for each province i. For province i, considering the M total Mi⋅ as a fixed quantity, the covariance matrix of the vector 𝜽̂ i is taken as [ ( M )T ] M ̂ ̂ Gi = Θi − 𝜽i 𝜽̂ i ∕ni ,
̂ i = diag(𝜃̂ M , … , 𝜃̂ M ), Θ iA i1
53
SYNTHETIC ESTIMATION
assuming simple random sampling within the provinces. In the calculation of Vi , the ′ Jacobian matrix g (𝜽M i ) is given by ′
M M g (𝜽M i ) = diag(𝜃i1 , … , 𝜃iA ) −
1 M −1 M −1 ) 1A ]. [(𝜃 ) 1A , … , (𝜃iA A i1
We now show results for the two-digit occupation classes A1 and B5, which have N M and 𝜃̂ia A ≥ 4 three-digit categories and for which census and direct estimators 𝜃ia show clear differences for at least some categories a. Figure 3.1 plots direct (labeled LFS), census, composite SPREE (CSPREE), and GLSM estimates of the row proM M T files 𝜽M i = (𝜃i1 , … , 𝜃iA ) for two Canadian provinces and the two-digit class A1 with A = 4 categories. Figure 3.2 shows results for the two-digit occupation class B5 with A = 7 categories in two other Canadian provinces. Figures 3.1 and 3.2 clearly show that the composite SPREE estimates are always approximately in between the direct and the census estimates, in contrast to the GLSM estimates, which appear, for several three-digit categories, either below both direct and census estimates, or above. Note also that the GLSM estimates are not respecting the profiles displayed by the census and LFS counts for the different occupation classes, see for example the plot for Newfoundland and Labrador in Figure 3.2a, where the GLSM estimated proportion for the three-digit category B54 is larger than for B55, whereas for all the other estimates it is exactly the opposite. 3.2.8
*Weight-Sharing Methods
In this section, we study methods of producing weights ∗ij for each area i by sharing the weight ∗j attached to unit j ∈ s among the areas i = 1, … , m, such that the
Census
Quebec
GLSM
LFS
Census
CSPREE
GLSM
0.3 0.2
Cell proportions
0.4
LFS
CSPREE
0.1
Cell proportions
0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45
Newfoundland and Labrador
A11
A12
A13
Three−digit category
(a)
A14
A11
A12
A13
A14
Three−digit category
(b)
Figure 3.1 Direct, Census, composite SPREE, and GLSM estimates of Row Profiles 𝜽M i = M T ) for Canadian provinces Newfoundland and Labrador (a) and Quebec (b), for (𝜃i1M , … , 𝜃iA Two-Digit Occupation class A1.
54
INDIRECT DOMAIN ESTIMATION
Census
CSPREE
GLSM
LFS
Census
CSPREE
GLSM
0.1
Cell proportions 0.05 0.10 0.15 0.20 0.25
Cell proportions 0.2 0.3
LFS
Nova Scotia 0.30
0.4
Newfoundland and Labrador
B51
B52
B53 B54 B55 B56 Three−digit category
B51
B57
B52
(a)
B53 B54 B55 B56 Three−digit category
B57
(b)
Figure 3.2 Direct, Census, composite SPREE, and GLSM estimates of row profiles 𝜽M i = M T ) for Canadian provinces Newfoundland and Labrador (a) and Nova Scotia (b), (𝜃i1M , … , 𝜃iA for two-digit occupation class B5.
new weights satisfy calibration constraints. The WS property is desirable in practice because it ensures that the associated area synthetic estimators add up to the GREG estimator of the population total. The WS synthetic estimator of the area total Yi is simply obtained as ∗ (3.2.33) Ŷ iWS = ij yj , j∈s
where the weights
∗ ij
satisfy the calibration property ∗ ij xj
= Xi ,
i = 1, … , m,
(3.2.34)
j ∈ s.
(3.2.35)
j∈s
and the WS property
m ∗ ij
=
∗ j,
i=1
The use of the same weight, ∗ij , for all variables of interest used as y to produce small area estimates is of practical interest, particularly in microsimulation modeling that can involve a large number of y-variables. Method 1 A simple method of calibration to known totals Xi for area i is to find weights hij of unit j in area i that minimize a chi-squared distance to the ∑ ∑ original weights, D,i = j∈s cj ( j − hij )2 ∕ j , subject to j∈s hij xj = Xi for each i. The resulting calibration weights agree with the weights ̃ ij associated with the GREG-synthetic estimator (3.2.3), provided cj = T xj is used. However, the weights can take negative values and do not satisfy range restrictions on the ratios
55
SYNTHETIC ESTIMATION
̃ ij ∕ j , although satisfy the calibration property (3.2.34) and the WS property (3.2.35). Following Deville and Särndal (1992), National Center for Social and Economic Modeling (NATSEM) in Australia uses an alternative distance function. to obtain weights satisfying also range restrictions. This distance function is given ∑ by Di = j∈s G(hij , j ), where G(hij ,
j)
= (gij − L) log
gij − L 1−L
+ (U − gij ) log
U − gij U−1
.
(3.2.36)
Here, gij = hij ∕ j and L(< 1) and U(> 1) are specified lower and upper limits on the ratios gij . Minimizing Di with respect to the variables hij , subject to calibration ∑ constraints j∈s hij xj = Xi for each i, the range restrictions are satisfied if a solution, hij , exists. NATSEM method adjusts L and U iteratively to find a solution such that L ≤ gij ≤ U for all i and j ∈ s. The resulting weights ∗ij , however, may not satisfy ∑ ∗ ∗ the WS property m i=1 ij = j , j ∈ s. Using Lagrange multipliers, 𝝀i , to minimize Di subject to (3.2.34), it can be shown that 𝝀i is the solution of the equation ̂ i) − (Xi − X
−1 T j [g (xj 𝝀i )
− 1]xj = 0,
(3.2.37)
j∈s
where g−1 (u) =
L(U − 1) + U(1 − L) exp(𝛼u) , (U − 1) + (1 − L) exp(𝛼u)
with 𝛼 = (U − L)∕[(1 − L)(U − 1)]. If a solution 𝝀∗i to (3.2.37) exists, then the optimal ratios are g∗ij = g−1 (xTj 𝝀∗i ) and the optimal weights are ∗ij = j g∗ij ; Newton’s iterative method may be used to find the solution 𝝀∗i to the nonlinear equation (3.2.37). Method 2 An alternative WS method that requires modeling the weights ∗ij was proposed by Schirm and Zaslavsky (1997). They assumed a “Poisson” model on the weights, ∗ T j ∈ s, (3.2.38) ij = 𝛾ij exp(xj 𝜷 i + 𝛿j ), where 𝜷 i and 𝛿j are unknown coefficients and 𝛾ij is an indicator equal to one if area i is allowed to borrow from the area to which unit j belongs, and zero otherwise. This is called “restricted borrowing” and is typically specified by the user. We impose the WS and calibration constraints (3.2.35) and (3.2.34), respectively. Schirm and Zaslavsky (1997) proposed an iterative two-step method to estimate the parameters 𝜷 i and 𝛿j in the Poisson model (3.2.38) subject to the specified constraints. Letting 𝜷 (t−1) and 𝛿j(t−1) denote the values of the parameters in iteration t − 1, i the two steps are given by } { m ) ( (t) ∗ T (t−1) Step 1∶ 𝛿j = log 𝛾ij exp xj 𝜷 i j∕ i=1
56
INDIRECT DOMAIN ESTIMATION
and )−1 (
( Step 2∶
𝜷 i(t)
=
𝜷 (t−1) i
∗(t−1) xj xTj ij
+
) ∗(t−1) xj ij
Xi −
j∈s
,
j∈s
where ∗(t−1) is obtained by substituting 𝜷 (t−1) and 𝛿j(t) in the formula (3.2.38) for ij i ∗ . Iteration is continued until convergence, to get 𝜷 ̂ i and 𝛿̂j , which in turn lead to ij the desired weights ∗ij satisfying both constraints. Method 3 Randrianasolo and Tillé (2013) proposed an alternative method of weight sharing that avoids modeling the area-specific weights ∗ij . Letting ∗ij = ∗j qij , the ∑ WS condition is equivalent∑ to m i=1 qij = 1 for each j ∈ s. The fractions qij that satisfy the calibration conditions j∈s ( ∗j qij )xj = Xi for i = 1, … , m and the WS condition are obtained by a two-step iterative procedure. First, the calibration weights ∗j are ∑ obtained by minimizing the Kullback–Leibler distance measure DK = j∈s K(hj , j ) ∑ with respect to the variables hj subject to the calibration constraint j∈s hj xj = X, where K(hj , j ) = hj log(hj ∕ j ) + j − hj . (3.2.39) Using the Lagrange multiplier method, the desired weights hj = ∗ j
=
T j exp(xj 𝝀),
∗ j
are given by
j ∈ s,
(3.2.40)
where the Lagrange multiplier 𝝀 is obtained by solving the calibration equations T j exp(xj 𝝀)
= X.
(3.2.41)
j∈s
Equations (3.2.41) can be solved iteratively by the Newton–Raphson method. It follows from (3.2.40) that the weights ∗j are always positive. However, some of these calibration weights may take large values. The calibration weights ∗j given by (3.2.40) are used to determine the fractions ∑ (0) qij . Taking as starting values q(0) = Ni ∕N, which satisfy m i=1 qij = 1, where Ni is ij the known size of area i, new values q(1) are obtained by minimizing separately for ij each i DK,i = K(q(1) , q(0) ) (3.2.42) ij ij j∈s
subject to the area-specific calibration condition with respect to the variables q(1) ij (
∗ j qij )xj
= Xi .
(3.2.43)
j∈s
is given by Using again the Lagrange multiplier method, the solution q(1) ij q(1) = q(0) exp(xTj 𝝀i ), ij ij
(3.2.44)
57
COMPOSITE ESTIMATION
where the Lagrange multiplier 𝝀i is obtained by solving iteratively the calibration equations ( ∗j q(0) ) exp(xTj 𝝀i ) = Xi . (3.2.45) ij j∈s
Note that m sets of calibration equations need to be solved to obtain 𝝀1 , … , 𝝀m . In the values are revised to satisfy the WS property. Revised second step of the method, q(1) ij fractions are given by m
q(2) = q(1) ∕ ij ij
q(1) . ij
(3.2.46)
i=1
The two-step iteration is repeated until convergence to obtain the desired fractions qij , which are strictly positive. The resulting weights ∗ij = ∗j qij satisfy the area-specific calibration conditions and the WS property. Note that this method ensures strictly positive weights ∗ij because qij > 0 and ∗j > 0. 3.3
COMPOSITE ESTIMATION
A natural way to balance the potential bias of a synthetic estimator, say Ŷ i2 , against the instability of a direct estimator, say Ŷ i1 , is to take a weighted average of Ŷ i1 and Ŷ i2 . Such composite estimators of the small area total Yi may be written as Ŷ iC = 𝜙i Ŷ i1 + (1 − 𝜙i )Ŷ i2
(3.3.1)
for a suitably chosen weight 𝜙i (0 ≤ 𝜙i ≤ 1). Many of the estimators proposed in the literature, both design-based and model-based, have the composite form (3.3.1). In latter chapters we study model-based composite estimators derived from more realistic small area models that account for local variation. 3.3.1
Optimal Estimator
The design MSE of the composite estimator is given by MSEp (Ŷ iC ) = 𝜙2i MSEp (Ŷ i1 ) + (1 − 𝜙i )2 MSEp (Ŷ i2 ) +2 𝜙i (1 − 𝜙i )Ep (Ŷ i1 − Yi )(Ŷ i2 − Yi ).
(3.3.2)
By minimizing (3.3.2) with respect to 𝜙i , we get the optimal weight 𝜙i as 𝜙∗i =
MSEp (Ŷ i2 ) − Ep (Ŷ i1 − Yi )(Ŷ i2 − Yi ) MSEp (Ŷ i1 ) + MSEp (Ŷ i2 ) − 2Ep (Ŷ i1 − Yi )(Ŷ i2 − Yi )
≈ MSEp (Ŷ i2 )∕[MSEp (Ŷ i1 ) + MSEp (Ŷ i2 )],
(3.3.3)
assuming that the covariance term Ep (Ŷ i1 − Yi )(Ŷ i2 − Yi ) is small relative to MSEp (Ŷ i2 ).
58
INDIRECT DOMAIN ESTIMATION
Note that the approximate optimal 𝜙∗i , given by (3.3.3), lies in the interval [0, 1] and depends only on the ratio of the MSEs Fi = MSEp (Ŷ i1 )∕MSEp (Ŷ i2 ) as 𝜙∗i = 1∕(1 + Fi ).
(3.3.4)
Furthermore, the MSE of the resulting composite estimator Ŷ iC obtained using the optimal weight 𝜙∗i reduces to MSE∗p (Ŷ iC ) = 𝜙∗i MSEp (Ŷ i1 ) = (1 − 𝜙∗i )MSEp (Ŷ i2 ).
(3.3.5)
It now follows from (3.3.5) that the reduction in MSE achieved by the optimal estimator relative to the smaller of the MSEs of the component estimators is given by 𝜙∗i when 0 ≤ 𝜙∗i ≤ 1∕2, and it equals 1 − 𝜙∗i when 1∕2 ≤ 𝜙∗i ≤ 1. Thus, the maximum reduction of 50 percent is achieved when 𝜙∗i = 1∕2 (or equivalently Fi = 1). The ratio of MSEp (Ŷ iC ) with a fixed weight 𝜙i and MSEp (Ŷ i2 ) may be expressed as MSEp (Ŷ iC ) (3.3.6) = (Fi + 1)𝜙2i − 2𝜙i + 1. MSEp (Ŷ i2 )
Schaible (1978) studied the behavior of the MSE ratio (3.3.6) as a function of 𝜙i for selected values of Fi (= 1, 2, 6). His results suggest that sizable deviations from the optimal weight 𝜙∗i do not produce a significant increase in the MSE of the composite estimator, that is, the curve (3.3.6) in 𝜙i i s f a i r l y fla t i n t h e n e i g h b o r h o o h ae n m a l w e ig h t. M o re o v e r, b o th th e re d 𝜙 u i cf ot iro wn hi ni cMh tSE c o m p o s i t e e s t i m a t o r h a s a s m a l l e r M SE t h a n e i t h e r c o t h e s i zFei . W o f h Fei ins c l o s e t o o n e , w e g e t t h e m o s t a d v a n t a s itu a tio n s . It i s e a s y t o sŶhiC oi swb te ht taet r t h a n e i t h e r c o m p o n e n t e s t i m i n ∗i , 1). T h e l a t t e r i n t e r v a l r e d u c e s t w h e n (0, m2𝜙a∗i x− 1) ≤ 𝜙i ≤ m (2𝜙 e i n= 1, a n d i t b e c o m e sFi nd ae rv r ioa wt e es r f ar os m o n e . T r a n g ≤e 𝜙0i ≤ 1 w h F o p tim a l 𝜙 w∗i we iigl lh bt e c l o s e t o z e r o o r o n e w h e n o n e o f t h e r shme na l l . h a s a m u c h l a r g e r M SE t h a n tFhi ies oe ti ht he er ,r tl ha ar gt ie s o, w c a s e , t h e e s t i m a t o r w i t h l a r g e r M SE a d d s l i t t l e i n f o r m t o u s e t h e c o m p o n e n t e s t i m a t o r w i t h s m a l l e r M SE i n e s tim a to r. 𝜙t∗i ha et eo p t i m In p r a c t i c e , w e u s e e i t h e r a p r i o r g𝜙∗i uoe r s es sot if m ̂ f r o m t h e s a m p l e d a t a . A s s u m i n Ygi1 itsh ea it t tphheuern db i ir ae sc et de s t o r a p p r o x pi mu na bt ei al ys e d a s t h e o v e r a l l s a m p l e s i z e i n c r t h e a p p r o x i m a t e o p t i m a l w e i g h t ( 3 . 3 . 3 ) u s i n g ( 3 . 2. n ) ua m n (dŶei2 r−aŶti1o)2 rf oM r SE th e t o r m(Ŷ i2s )eg i v e n b y ( 3 . 2. 16 ) f o r t h pe(Ŷ i2 ̂ d e n o m i n ap (tŶoi1 )r +M M SE SE ( Y ), l e a d i n g t o t h e e s t i m a t e d o p tim p i2 𝜙̂ ∗i =
m s(Ŷei2 ) . (Ŷ i2 − Ŷ i1 )2
( 3 .3 .7 )
COM POSIT E EST IM A T ION
59
Ne v e r t h e l e s s , t h e e s t 𝜙i ∗imc aa nt obr e( 3v .e3r .y7 u) no sf t a b l e . On e w a y t h i s d i f fi c u l t y i s t o a v e r a g e𝜙̂ ∗it ho ve ee rs st iemv ea rt ea dl vwa er ii ag bh l tes s o la r ”a r e a s o r b o th . T h e r e s u ltin g c o m p o s ite e s tim a to th e in s e n s itiv ity to d e v ia tio n s f r o m th e o p tim a l w e i Es t i m a t i o n o f M SE o f t h e c o m p o s i t e e s t i m a t o r , e v e n d i f fi c u l t i e s s i m i l a r t o t h o s e f o r t h e s y n t h e t i c e s t i m a m e t h o d s i n Se c t i o n 3 . 2. 5 t o c o m p o s i t e e s t i m a t o r s ( s e
Example 3.3.1. Labor Force Characteristics. Gr i f fi t h s ( 19 9 6 ) u s e d a SPREEc o m p o s ite e s tim a to r to p ro v id e in d ire c t e s tim a te s c o n g r e s s i o n a l d ii is nt r ti hc et s Un ( CD i t es d) St a t e s . T h i s e s t i m a t o r ( 3 . 3 . 1) w i t h t h e o n e - s t e M p̃ ia⋅SPREE s t iams ta ht oe rs y n t h e t i c c o m ( 3 . 2.e21) ̂sia⋅e ad s e t sh t ei md ai rt eo cr t c Ŷoi1 m . T po o cn ae lnc tu, l a t e Ŷ i2 , a n d a s a m p l e - b aM t h e SPREE e s t i m a t e , t h e p o {N p iab u }l aw t ieor ne co ob ut an i tns e d f r o m t h ̂ ⋅ab D e c e n n i a l Ce n s u s , w h i l e t h {eM e }s twi me rae t eo db tma ianr eg di n f ar ol s m 19 9 4 M a r c h Cu r r e n t Po p u l a t i o n Su r v e y ( CPS) . T h e CPS tw o -s ta g e p ro b a b ility s a m p le d ra w n in d e p e n d e n tly D i s t r i c t o f Co l u m b i a . It w a s n o t d e s i g n e d t o p r o v i d e ̂ ia⋅ a t t h e CD l e v e l ; t h e CD s a m p l e s i z e s t e n d t o b e t o o M w ith d e s ire d re lia b ility . Es t i m a t e s o f t h e o p𝜙∗it iwm e ar le wo be it ga ihnt es d f r o m ( 3 . 3 . 3 ) , u s ̃M ̂ ia⋅ ) t o e s t i m a pt(M dM ê ia⋅M ). No SE t e t h a t w eŶ iSr e p l a c e e s t i m a t pe(M SEn 𝑣( ia⋅ ) a ̃ ̂ ̂ b yMia⋅ a n Ydi b yMia⋅ i n ( 3 . 2. 14 ) . On e c o u l d a l s o u s e𝜙̂ ∗it, hg ei vs ie mn p l e r b y ( 3 .3 .7 ) , b u t b o 𝜙 t h∗i a er se t hi mi g aht le ys uo nf s t a b l e . Gr i f fi t h s ( 19 9 ( 3 . 2. 14 ) t o e s t i m a t e t h e M SE o f t h e c o m Y p̂ iSoa sni Yt̂diei ne s t i m a t ̂ iia⋅t ,e r ee ss tpi m ( 3 . 2. 14 ) b y t h e c o m p o sM e c at it vo er layn. dT h i s M SE e s t i m h ig h ly u n s ta b le . Gr i f fi t h s ( 19 9 6 ) e v a l u a t e d t h e e f fi c i e n c i e s o f t h e c ̂e ia⋅- bf oa rs te hd e e fis vt i em CD SPREE e s t i m M ã ia⋅ t o, rr e, l a t i v e t o t h e s a m p lM a t os r i n t h e s t a t e o f Io w a . T h e M SE e s t i m a t e s b a s e d o n ( 3 . 2 p u rp o s e . T h e c o m p o s ite e s tim a to r p ro v id e d a n im p e s t i m a t o r i n t e r m s o f e s t i m a t e d M SE. T h e r e d u c t i o n a g e d o v e r t h e fi v e CD s , r a n g e d f r o m 17 % t o 7 8 %. T h e ̂ ia⋅t ,h feo dr it rhe rce te e os ft i tmh ea tco ar t, e g d i d n o t p e r f o r m b e t t e r t h a nM e m p lo y e d , o th e rs , a n d h o u s e h o ld in c o m e in th e ra n 3.3.2
Sample-Size-Dependent Estimators
Sa m p l e - s i z e - d e p e n d e n t ( SSD ) e s t i m a t o r s a r e c o m p w e i g 𝜙hi tt hs a t d e p e n d o n l y o n t hN̂ iea dn N od i m o r a tihn e c do ou m n tas i n t o t a X̂ i a n X di o f a n a u x i l i a rx.y Tv ha er is ae b el se t i m a t o r s w e r e o r i g i n a h a n d le d o m a in s fo r w h ic h th e e x p e c te d s a m p le s iz d ire c t e s tim a to rs fo r d o m a in s w ith re a liz e d s a m p l s a m p l e s i z e s s a t i s f y r e l i a b i l i t y r e qu i r e m e n t s ( D r e w ,
60
IND IRECT D OM A IN EST IM A T ION
T h e SSD e s t i m a t o r p r o p o s e d b y D r e w e t a l . ( 19 8 2) i s f o r m ( 3 . 3 . 1) w i t h w e i g h t { 1 i f N̂ i ∕Ni ≥ 𝛿; 𝜙i (S1) = ( 3 .3 .8 ) N̂ i ∕(𝛿Ni ) i f N̂ i ∕Ni < 𝛿,
∑ w h eN̂ ri = e si j i s t h e d i r e c t e x p a n s Ni ioa nn 𝛿ed >s 0t ii m s sa ut ob rje oc ft i v e l y c h o s e n t o c o n t r o l t h e c o n t r i b u t i o n o f t h eN̂ i ss yu ng t-h e t i c g e s t s 𝜙t ih(S1) a ti n c r e a s e s w i t h t h e d o m a i n s a m p l e s i z e . A n i s o b t a i n e d b y X̂si ∕X u bi f so tN̂rii ∕N t u i ti inn (g3 . 3 . 8 ) . Un d e r t h i s c h o i c e , D ( 19 8 2) u s e d t h e p o s t s t r a t i fi e d - r a t i o e s t i m aŶ i1t oa rn (d 2. 4 . 17 ) t h e r a t i o - s y n t h e t i c e s t i m a t o r ( 3 . 2. Y9̂ i2). Ta sh teh de i sr ey cn tt he se tt ii -c m a t o r ( 2. 4 . 17 ) , h o w e v e r , s u f f e r s f r o m t h e r a t i o b i a s is n o t s m a ll. T o a v o id th e ra tio b ia s , w e c o u ld u s e t ∑ ̂ ̂ ̂ Ŷ i + G g=1 (Xig − Xig )(Yg ∕Xg ), w h op-sbe i a s g o e s t o z e r o a s t h e o v e r a l i n c r e a s e s , e v e n i f t h e d o m a i n s a m p l e s i z e i s s m a l l . Ge ̂ i )T B̂ a s t h e d i r e c t e s t i m a t o r a n d t h e r e g r GREG e s t i m Ŷai +t o(Xri − X T ̂ ju n Ac t i o n w e s t i m Xai tBoa rs t h e s y n t h e t i c e s t i m a t o r i n c o 𝜙ni (S1). g e n e r a l - p u r p o 𝛿s ien c ( h3 o. 3i c.𝛿8e=)1. o i fsT h e Ca n a d i a n La b o u r Fo r c v e y ( LFS) u s e d t h e SSD e𝛿 s=t 2∕3 i m at ot op r r w o di tuh c e Ce n s u s D i v i s i e s tim a te s . Sä r n d a l a n d Hi d i r o g l o u ( 19 8 9 ) p r o p o s e d t h e “d a m p w h i c h i s o b t a i n e d f r o m t h e a l t e r n a t i v e m o d i fi e d GREG ∑ i n g t h e e f f e c t o f t h e d i sri e j ecj w t c ho emn Nêpi v< o eN n ir.e Tn ht i s e s t i m a t o r i g iv e n b y Ŷ iDR = XTi B̂ + (N̂ i ∕Ni )H−1
j ej
( 3 .3 .9 )
si
w iH t h= 0 i fN̂ i ≥ Ni a n Hd = h i fN̂ i < Ni , w h he >r e0 i s a s u i t a b l y c h o s e n c o n T h is e s tim a to r c a n b e w ritte n a s a c o m p o s ite e s tim a GREG e s t i m a t o r a s t h e d i r eXcTi B̂t ae ss tt ihme as tyo nr t ah ne dt i c e s t i m a t o ju n c t i o n w i t h t h e w e i g h t { 1 i f N̂ i ∕Ni ≥ 1, ( 3 . 3 . 10) 𝜙i (S2) = h (N̂ i ∕Ni ) i f N̂ i ∕Ni < 1.
A g e n e r a l - p u r p hoi sh s e= c2.h o i c e o f e cwo en isgi hd te r t h e s p e c i a l c a T o s t u d y t h e n a t u r e o𝜙fi (S1), t h ew SSD n 1g p l e r a n d o m s a m p l i n g fU.r oInmt ht ihs eN ĉ pi a=osN(n ep ,ui ∕n). la T t i ao kn 𝛿i = fni s≥ tE(n i n ( 3 . 3 . 8 ) , i t n o w𝜙i (S1) f o l=l 1o i w h ia) = t n(Ni ∕N). T h e r e f o r e , t h e SSD e s t i m a t o r c a n f a i l t o b o r r o w s t r e n g E(n t hi ) fi sr on mo t ol ta hr ge re d o e n o u g h . On t h e o t h N ê i r< hNia, tnhde , w eh𝜙ii(S1) eg nh =t N̂ i ∕Ni = Nni ∕(nNi ) d e c r e a snieds e ac sr e a s e s . A s a r e s u l t , m o r e w e i g h t i s g i v e ,̂ ii < n Nti ,h t eh ce aws ee𝜙ii(S1) g hbt e h a v e s w e l l , p o n e n t nw n a ll. T h u s N i i sh se m
61
COM POSIT E EST IM A T ION
u n l i k e i n tN̂hi ≥ e Nci . a Sis m e i l a r c o m m e n t s a p p l y t o t h e SSD e s t t h e w e𝜙ii (S2). g h tA n o t h e r d i s a d v a n t a g e o f SSD e s t i m a t o r s i s ta k e a c c o u n t o f th e s iz e o f b e tw e e n -a re a v a ria tio n re th e c h a ra c te ris tic o f in te re s t. T h e n , a ll th e c h a ra c te r le s s o f th e ir d iffe re n c e s w ith re s p e c t to b e tw e e n -a re o f Ch a p t e r 7 , w e d e m o n s t r a t e t h a t l a r g e e f fi c i e n c y g a a c h ie v e d b y u s in g m o d e l-b a s e d e s tim a to rs w h e n th r e la tiv e ly s m a ll. Ge n e r a l SSD e s t i m a t o r s p r o v i d e c o n s i s t e n c y w h e n a g a c te ris tic s b e c a u s e th e s a m e w e ig h t is u s e d fo r a ll o f u p to a d ire c t e s tim a to r a t a la rg e a re a le v e l. A s im p l Ŷ Ŷ iC (a) = ∑m iC Ŷ GR . ̂ i=1 YiC
( 3 . 3 . 11)
T h e a d ju s t e d ŶeiCs(a) t i amd ad t ou rp s t o t h e d i r Y ê GR c ta et st ht i em l a rt og re a r e a le v e l. T h e SSD e s t i m a t o r s 𝜙wi (S1) i t hm wa ye iag l hs ot s b e v i e w e d a s c a l i b ∑ m a t o sr s∗ij yj w i t h w e i∗ij gm h itns i m i z i n g t h e c h i - s qu a r e d d i s t a n m
cj [
j aij 𝜙i (S1)
− h∗ij ]2 ∕
j
( 3 . 3 . 12)
i=1 j∈s
∑ w i t h r e s ph∗ij es cu tb t je o c t t o t h e c oj∈snh∗ijsxjt=r aXii, in =t 1, s … , m. He r aeij , i s t h e d o m a i n i n d i c a ct joi rs va as rpi ae bc il fie ea dnj ∈dc s;o t nh sa tt a i ns ,t t h e “o p t i m a hl ∗ij” e qu a l ∗ijs. Us i n g t h i s d i s t a n c e m e a s u r e , w e a r e c a l i w e i g hj atij 𝜙 s i (S1) r a t h e r t h a n t h e o r i gj aiji .n Sia nl gw he ai gn hd t sM i a n ( 19 9 u s e d th e c a lib ra tio n a p p ro a c h to ta k e a c c o u n t o f d if ∑ ∑ t a n e o u s l y . Fo r e x a m p l e , t h e m a d j∈s d hi∗ijtyij v= Y î GR t y c ca on n bs et r ian i tnr to i=1 ∑ ∗ d u c e d a l o n g w i t h t h e c a l ij∈s b hrij xaj t=i X o i ,ni =c 1,o… n , sm.t rNo a i tne t st h a t , u s i n g t h i s a p p r o a c h , t h e ∗ijc aa rl ei b or ba t ia oi n ewd es iigmh ut sl t a n e o u th e a re a s . Es t i m a t i o n o f t h e M SE o f SSD e s t i m a t o r s r u n s i n t o d i f fi th e s y n th e tic e s tim a to r a n d th e o p tim a l c o m p o s ite e v a ria n c e e s tim a tio n is to u s e th e v a ria n c e e s tim a to r ̂ i )T B, ̂ n a m 𝑣(a e li e) y i, n o p e r a t o r n o t a t i o n , a s a n o v e r e s t Ŷ i + (Xi − X v a r i a n c e o f t h e SSD e s t i m 𝜙ia(S1) t o or r𝜙 u i (S2) s i nags et hi teh w e re i g h t a t t a c h t o t h e d i r e c t e s t i m a t o r ( Sä r n d a l a n d Hi d i r o g l o u 19 8 9 ) e i t(S1)} h e 2 𝑣(a v ai e), r i na no ct ien ag s t h a t t h e w e𝜙ii (S1) g h at s fi x e d a n d e s t i m a t {𝜙 T ̂ t h e v a r i a n c e o f t h e s y nXit Bh ies t si cm c aol m l r pe loa nt ievn et t o t h e v a r i a d ire c t c o m p o n e n t. T h is v a ria n c e e s tim a to r w ill u n d 𝑣(ai e). Re s a m p l i n g m e t h o d s , s u c h a s t h e ja c k k n i f e o r t h e 19 9 2) , c a n a l s o b e r e a d i l y u s e d t o g e t a v a r i a n c e e s t i m v a r i a n c e e s t i m a t o r s i n t h e c o n t e x t o f SSD e s t i m a t o r s h
62
INDIRECT DOM A IN EST IM A T ION
Example 3.3.2. Unemployed Counts. Fa l o r s i , Fa l o r s i , a n d Ru s s o ( 19 9 4 th e p e rfo rm a n c e s o f th e d ire c t e s tim a to r, th e s y n th e w i t𝛿 h= 1, a n d t h e o p t i m a l c o m p o s i t e e s t i m a t o r . T h e y c o i n w h i c h t h e It a l i a n LFS d e s i g n ( s t r a t i fi e d t w o - s t a g e s r o emi g d a t a f r o m t h e 19 8 1 It a l i a n c e n s u 𝜙s∗i .wT ah s e o“ob pt at i nm e ad l ”f w c e n s u s d a t a . In t h i s s t u d y , h e a l t h s e r v i c e a r e a s ( HSA s ) d o m a i n s ) t h a t c u t a c r o s s d e s i g n s t r a t am .=T14h HSA e s t us d y w o f t h e Fr i u l i r e g i o n , a n d t h e s a m p l e d e s i g n w a s b a s e d s a m p l i n g u n i t s ( PSUs ) a n d 2, 29 0 s e c o n d - s t a g e u n i t s ( SS a n d SSU i s a h o u s e h o l d . T h e y,v ias r ti ha eb lne u omf bi ne tre or ef su t n, e m p a h o u s e h o ld . In t h i s s i m u l a t i o n s t u d y , t h e p e r f o r m a n c e o f t h e e s t o f A RB a n d r e l a t i v e r o o t m e a n s qu a r e e r r o r ( RRM SE) . T e s t i m a t o r o Yf i ,t sh aeỸyi ,t oa tr ae l g i v e n b y 1 RB = R
R r=1
(
Ỹ i(r) Yi
) −1 ;
M SE =
1 R
R
(Ỹ i(r) − Yi )2 , r=1
t o ur loa ft e (rd =s1,a … m, R). p le w h eỸ i(r) r ei s t h e v a l u e o f t h eYi ef os rttr ihm s ai m √ No t e t h a t RRM = M SE SE ∕Yi a n d A =RB |RB|. Fa l o r s i e t a l . ( 19 9 4 ) u R = 4 00 s i m u l a t e d s a m p l e s t o c a l c u l a t e A RB a n d RRM SE e s tim a to r. T a b l e 3 . 3 r e p o r t s , f o r e a c h e s t i m amt = o 14 r , aHSA v e rs a og fe v a l RRM d e SE. n o Itt ei ds c l e a r f r o m T a b l A RB , d e A n oRB t e ,d a n d o f RRM SE, d 2.5%), SSD e s t i m a t t h aAt RB v a l u e s o f t h e d i r e c t e s t i m a t o r a n (< w h e re a s th o s e o f th e c o m p o s ite a n d s y n th e tic e s tim a v e r ,SE%, in te rm s th e tic e s tim a to r A h aRB s t% h e( al ba ro gu et s9t %) . Ho w eRRM s y n th e tic a n d c o m p o s ite e s tim a to rs h a v e th e s m a lle v a l u e f o r t h e d i r e c t e s t i m a t o r ) f o l l o w e d b y t h e SSD e s Fa l o r s i e t a l . ( 19 9 4 ) a l s o e x a m i n e d a r e a - s p e c i fi c v a l th e tic a n d c o m p o s ite e s tim a to rs w e re fo u n d to b e b a v a l u e s o f t h e r a t i o ( p o p u l a t i o n o f HSA ) /( p o p u l a t i o n o HSA ) , b u t e x h i b i t e d l o w RRM SE c o m p a r e d t o t h e o t h e
TABLE 3.3 Percent Average Absolute Relative Bias (ARB%)and Percent Average RRMSE (RRMSE%) of Estimators Es t i m a t o r Di r e c t Sy n t h e t i c Co m p o s i t e SSD Source: A d a p t e d f r o m
A RB % 1. 7 5 8 .9 7 6 . 00 2. 3 9
RRM SE% 4 2. 08 23 . 8 0 23 . 5 7 3 1. 08
T a b l e 1 i n Fa l o r s i e t a l . ( 19 9 4 ) .
63
JA M ES–ST EIN M ET HOD
b o t h b i a s a n d e f fi c i e n c y , t h e y c o n c l u d e d t h a t SSD e s o t h e r e s t i m a t o r s . It m a y b e n o t e d t h a t t h e s a m p l i n g le a d in g to la rg e e n o u g h e x p e c te d d o m a in s a m p le s e s tim a to r. 3.4 3.4.1
JAMES–STEIN METHOD Common Weight
A n o th e r a p p ro a c h to c o m p o s ite e s tim a tio n is to u s e ∑ 𝜙i = 𝜙, a n d t h e n m i n i m i z e m t hMe SE tpo(Ŷ tiCa), l wM i tSE, h r e s𝜙p( ePuc rt ct oe l l i=1 a n d Ki s h 19 7 9 ) . T h i s e n s u r e s g o o d o v e r a l l e f fi c i e n c y n o t n e c e s s a rily fo r e a c h o f th e s m a ll a re a s in th e g ro m
m
m
2 ̂ M SE p (YiC ) ≈ 𝜙
2 ̂ M SE p (Yi1 ) + (1 − 𝜙) i=1
i=1
̂ M SE p (Yi2 ).
( 3 . 4 . 1)
i=1
M i n i m i z i n g ( 3 . 4 . 1)𝜙 w g i ivt he sr et hs ep eo cp t t ti om a l w e i g h t ∑m ∗
𝜙 = ∑m
i=1 [M
i=1 M
̂ SE p (Yi2 )
̂ ̂ SE p (Yi1 ) + M SE p (Yi2 )]
.
( 3 . 4 . 2)
Su p p o s e wŶ i1e a tsa tkh ee d i r e c t e x p a nŶ i sa i no Ŷdni2 ae ss t ih me as yt on r t h e t i c a 16 y b) ,e e s t i m a t e d a s e s t i m ŶaiS .t oT r h e n f r o m 𝜙(∗ 3m . 2. ∑m 𝜙̂ ∗ =
̂ ̂ 2 ̂ i=1 [(YiS − Yi ) − 𝑣(Yi )] ∑m ̂ ̂ 2 i=1 (YiS − Yi )
∑m 𝑣(Ŷ i ) = 1 − ∑m i=1 . ̂ ̂ 2 i=1 (YiS − Yi )
( 3 .4 .3 )
∗ i sa tqu T h e e s t i𝜙̂m o ir t e r e l i a b𝜙̂l∗i eg , i uv ne nl i kb ey ( 3 . 3 . 7 ) , b e c a u s e w e o v e r s e v e r a l s m a l l a r e a s . Ho w e v e r , t h e u s e o f a c o m m i f t h e i n d i v i d u aVpl(Ŷvi ),a vr ia ar ny cc eos n, s i d e r a b l y . Ri v e s t ( 19 9 5 ) u s e d w e 𝜙 î ∗g i hn t ts h oe f ct oh ne tf eo xr tmo f a d ju s t m e n l a t i o n u n d e r c o u n t i n t h e 19 9 1 Ca n a d i a n Ce n s u s . He a l s t o t h e t o t a l M SE o f t h e r e s u l t i n g c o m p o s i t e e s t i m a t o r t o t a l M SE. Ov e r a l l p e r f o r m a n c e o f t h e c o m p o s i t e e s e s tim a to rs w a s s tu d ie d b y c o m p a rin g th e ir e s tim a te ∗ i sa s T h e c o m p o s ite e s t𝜙 îm t oi m r bi la as re tdo ot hn e w e l l - k n o w n J w h ic h h a s a ttra c te d a lo t o f a tte n tio n in th e m a in s tre b r i e f a c c o u n t o f t h e JS m e t h o d s i n t h i s s e c t i o n a n d r e f r i s ( 19 7 2a , 19 7 3 , 19 7 5 ) a n d B r a n d w e i n a n d St r a w d e r m t r e a t m e n t . Ef r o n ( 19 7 5 ) g a v e a n e x c e l l e n t e x p o s i t o r w e ll a s e x a m p le s o f th e ir p ra c tic a l a p p lic a tio n , in c lu d ic tin g b a ttin g a v e ra g e s o f b a s e b a ll p la y e rs .
64
INDIRECT DOM A IN EST IM A T ION
Su p p o s e t h e s m a Yl ila ar re e tah m e pe aa rna sm e t e r s𝜃i o= fg(Y i ni ) bt ee r ae s t . Le t s p e c i fi e d t r a n s Yfi ,o wr mh iac t hi oi n do uf c e s n o r m a l i t y o f t h e c o r r ̂ t o r𝜃̂is = g(Y i ) a n d s t a b i l i z e s t h 𝜃̂ei . Fo v a rr ei ax na cmeY ipsi slo eaf , pi fr o p o r t i o n , w c a n u s e a n a r c - s i n e t r a n s f o r m a tg(Y i oi )ni n. So c l um 𝜃i d=eeYoi at nh de r c h o i in d T T ̂ ̂ ̂ ̂ t = (𝜃1 , … , 𝜃m ) a n 𝜽d = (𝜃1 , … , 𝜃m ) . W e a s s𝜃i u∼ m N(𝜃ei , 𝜓i ) 𝜃i = l o Ygi . Le 𝜽 in d e n o t e s “i n de p e n de n t l y di s t r w i t h k n o w n 𝜓vi (ia=r 1, i a…n, m) c ews h e∼r de a s ” a nN(a, d b) de n o t e s a n o r m a l v a r ai a nb dl ev w a r iib. tahnWm c e e faun r t h e r a s s u m e t h a t a p 𝜽r = i o(𝜃r1 , g…u, 𝜃em )sT ,s soa 𝜽fy0 = (𝜃10 , … , 𝜃m0 )T , i s a v a i l a b l e o r c a n b e e v a l u a t e d f r o m t h e 𝜃da vl tee c, dittfoo r ao f i i st al .i nFoe ra er lxya rmep-lpa −1 Z t TsZ)qu a Tr𝜽̂ e=s p r e di c t a u x i l i a r y vzia, rt hi ae bn l ew s e, c a n t a k e t h e l e azTis(Z 0 T T ̂ e (z1 , … , zm ). In t h e a b s e n c e o f s u c h a u x i l i a r y zi 𝜷 LS a s𝜃i , w h eZ r = ∑ ̂ ̂ w e szie=t 1 s o t h𝜃i0a =t m i=1 𝜃i ∕m = 𝜃⋅ f o r ai. lTl h e p e r f o r m a n c e o f a n e ̃y w i l l b e m e a s u r e d i n t e r m s o f i t s t o t a l M SE ( t o t a l o f𝜽, s a 𝜽, b y m
Ep (𝜃̃i − 𝜃i )2 .
̃ = R(𝜽, 𝜽)
( 3 .4 .4 )
i=1
3.4.2
Equal Variances 𝝍i = 𝝍
In t h e s p e c i a l c a s e o f e qu a l𝜓is =a 𝜓, m t ph lei nJSge vs tai rmi𝜃ai ai nst ocg eri vso ,ef n b y [ ] ̂𝜃i,JS = 𝜃 0 + 1 − (m − 2)𝜓 (𝜃̂i − 𝜃 0 ), m ≥ 3 ( 3 .4 .5 ) i i S
∑ r e(𝜃̂i − 𝜃i0 )2 . If 𝜃i0 i s t h e l e a s t a s s u m𝜽0i = n (𝜃 g 10 , … , 𝜃m0 )T i s fi x e d, wS =h e m i=1 s qu a r e s p r e di c t o r , t m h − e 2n b wym e− pr e− p2 il na c( 3e . 4 . 5 )p, i ws thh ee r ne u m b e r o f e s t i m a t e d p a r a m e t e r s i n t h e r e g r e s s i o n e qu a t i b e e x p r e s s e d a s a c o m p o s i t e𝜙̂ JSe = s t1i−m[(ma −t o2)𝜓]∕S r w ai ttht awc he ie gd h t t o 𝜃̂i a n d −1 𝜙̂ JS t o t h e p r i o 𝜃ri0 . gT u he es sJS e s t i m a t o r i s a l s o c a l l e d a e s t i m a t o r b e c a u s e i t s h r i n 𝜃̂ki t so twh ea rdid rt eh ce𝜃i0t.ge us et isms a t o r Ja m e s a n d St e i n ( 19 6 1) e s t a b l i s h e d t h e f o l l o w i n g r e m r i o r i t 𝜽̂yJS o= f(𝜃̂1,JS , … , 𝜃̂m,JS )T o v e𝜽̂ = r (𝜃̂1 , … , 𝜃̂m )T i n t e r m s o f t o t a l M SE: Theorem 3.4.1. Su p p o s e t h e di r e c 𝜃̂ti ae rs et i m sn ide i n adet po erN(𝜃 , 𝜓)n wt i t h k n o w n s a m p l i𝜓,n ag nv dat rh i ea ng𝜃i0cui see fis sx e d. T h e n , ̂ = m𝜓 f o r a𝜽,l lt h a t 𝜽̂ iJSs do ( a )R(𝜽, 𝜽̂ JS ) < R(𝜽, 𝜽) , m i n 𝜽̂a w t e ist h r e s p e c t t o t o t a l M SE. (b R ) (𝜽, 𝜽̂ JS ) = 2𝜓 a t𝜽 = 𝜽0 . (m−2)2 𝜓 2 t hR(𝜽, a t𝜽̂ JS ) i s m i n i m i z e d w h e ( c )R(𝜽, 𝜽̂ JS ) ≤ m𝜓 − ∑m 0 2, s o 𝜽 = 𝜽0 .
(m−2)𝜓+ i=1 (𝜃i −𝜃i )
JA M ES–ST EIN M ET HOD
65
A p r o o f o f T h e o r e m 3 . 4 . 1 i s g i v e n i n Se c t i o n 3 . 5 . It c lwo sh ee tno t thh ee t gr uu ee t h a𝜽̂ JSt l e a ds t o l a r g e r e du c t i o n i n t o t a l 𝜽Mi s SE ̂t = e , 0t, 𝜽 ĥ JSe)∕R(𝜽 r e 0l ,a𝜽) iv e to ta l 𝜽0 a n md i s n o t v e r y s m a l l . Fo r e x a m p l R(𝜽 s oSE n ol yf o n e - fi f t h o f t h e 2∕10 = 0.2 w h em n= 10 s o t h a t t h e t o t 𝜽̂aJSl iM M SE o𝜽̂ w f h 𝜽e = n 𝜽0 . On t h e o t h e r h a n d, t h e r e du c t i o n i n t o t a l M ∑ t h e v a r i a b i l i t y o f t h e 𝜽e irsr loa r rsg i en , gt hu aem ts (𝜃 issi i− ,na𝜃gi0s)2 i n c r e a s e s , i=1 ̂ = m𝜓. R(𝜽, 𝜽̂ JS ) t e n ds R(𝜽, t o 𝜽) W e n o w di s c u s s s e v e r a l s a l i e n t f e a t u r e s o f t h e JS m e
( 1) T h e JS m e t h o d i s a t t r a c t i v e t o u s e r s w a n t i n g g o o g r o u p o f s m a l l a r e a s b e c a u s e l a r g e g a i n s i n e f fi c t r a di t i o n a l de s i g n - b a s e d f r a m e w o r k w i t h o u t a s s u a r e a p a r a𝜃m i. e t e r s ( 2) T h e JS e s t i m a t o r a r i s e s qu i t e n a t u r a l l y i n t h e e m p r e di c t i o n ( EB LUP) a p p r o a c h o r t h e e m p i r i c a l B a y e in d a r a n do m - e f f e c t 𝜃si ∼mN(z o Tide 𝜷, 𝜎l 𝑣2w ); s iet he Se c t i o n 6 . 1. + ( 3 ) A “p l u s - r u l e𝜃̂i,JS ” e i ss t oi mb taat ion r e d f r o m ( 3 . 4 . 5 ) b y c h a n g 1 − (m − 2)𝜓∕S t o 0 w h e Sn 3 ): [ ] (m − 3)𝜈 𝜓̂ ̂ Y i,JS = y + 1 − (yi − y), (𝜈 + 2)S
( 3 .4 .7 )
∑ 2 w h eS r=e m n di r e c t e i=1 (yi − y) . Es t i m a t o r s ( 3 . 4 . 7 ) do m i n yai ti e t e r m s o f t o t a l M SE ( Ra o 19 7 6 ) . T h e {y n ij }o irsmu as le i dt yi n a s s u e s t a b l i s h i n g t h e do m i n a n c e r e s u l t . T h e a s s u m p t i o s i z e s , a n d c o m m o n v a r i a n c e a r e qu i t e r e s t r i c t i v e i ( 6 ) W e h a v e a s s u m e d t h a t 𝜃̂ti ha er e dii rne de c tp ee snt de i mn at ,t ob rus t t h i s n o t h o l d w h e n t h e s m a l l a r e a s c u t a c r o s s t h e de s i g c a s e , w e a s s𝜽̂ ui smm ve at hr iaa tt e nNmo(𝜽, rm 𝚿), aal n, d t h a t a n i n de p e n de e s t i m a𝚿 t oi sr bo af s e d o n aS st ht aa t ti is st idic s t r iW bmu(𝚿,t e𝜈),d a sW i s h a r t di s t r i b u t i o𝜈 df n .wIn i tt hh i s c a s e , t h e JS𝜃iei s tgi m i v ae tno br yo f ( ̂𝜃i,JS = 𝜃 0 + 1 − i
) (m − 2) (𝜈 − m + 3)Q
(𝜃̂i − 𝜃i0 ),
( 3 .4 .8 )
w h eQr=e (𝜽̂ − 𝜽0 )T S−1 (𝜽̂ − 𝜽0 ) ( s e e Ja m e s a n d St e i n 19 6 1 a n d B i l o Sr i v a s t a v a 19 8 8 ) . ( 7 ) T h e JS m e t h o d a s s u m e s n o s p e c i fi𝜃i ’s c , r se ul ac thi o ans s h i p iid a , m} t o e s t i m 𝜃i . aT t he u s , 𝜃i ∼ N(𝜃i0 , 𝜎𝑣2 ), a n d y e t u s e s{𝜃̂a𝓁 ;l𝓁l =da1,t… t h e t o t a l M SE i s r e du c e d e v e n𝜃i rwe hf ee rn t ot h oe bdiv f ifoe ur es ln yt di s p r o b l e m s ( e𝜃i. gr e. ,f es or m t o eb a t t i n g a v e r a g e s o f b a s e b a r e s t t o p e r c a p i t a i n c o m e o f s m a l l a r e a s i n t h e Un t o t a l M SE h a s n o p r a c t i c a l r e l e v a n c e i n s u c h s i t u a ( 8 ) T h e JS m e t h o d m a y p e r f o r m p o o r l y i 𝜃ni we si t hi m a t i n u n u s u a l l y l a r g e o r s𝜃i m l def av ci at ,t fi oom, nr slt ah re g me a x i m u m − 𝜃ai0 . l In M SE f o r a n i n di v i du a l a𝜃ri eo av ep ra ar al lma er et ea rs c a n b e a s l a (m∕4)𝜓, u s i 𝜃n̂i,JSg; f o r e x a m m p = l e16, f toh r i s m a x i m u m M SE i s e e qu a l t o 4 . 4 1 t i m e s𝜃̂i t(hEfe r M o n SE a on fd M o r r i s 19 7 2a ) . T o r e u n de s i r a b l e e f f e c t , Ef r o n a n d M o r r i s ( 19 7 2a ) p r o p e s tim a to r s ,”w h ic h o f f e r a c o m p r o m is e b e tw e e n e s tim a to r. T h e s e e s tim a to rs h a v e b o th g o o d e n s p r o p e r t i e s𝜃̂i,JS , u. nA l i sk t er a i g h t f o r w a r d c o m p r o m i s e e s t i b y r e s t r i c t i n g t h e a m𝜃̂i,JSo diu fnf te br sy f𝜃̂w hmiac hm u l t i p l e o f t h ir too s t a n da r d e 𝜃r̂ir: o r o f
∗ 𝜃̂i,JS
⎧𝜃̂ ⎪ i,JS = ⎨𝜃̂i − c𝜓i1∕2 ⎪𝜃̂ + c𝜓 1∕2 ⎩ i i
1∕2 1∕2 i f 𝜃̂i − c𝜓i ≤ 𝜃̂i,JS ≤ 𝜃̂i + c𝜓i , 1∕2 i f 𝜃̂i,JS < 𝜃̂i − c𝜓i , 1∕2 i f 𝜃̂i,JS > 𝜃̂i + c𝜓i ,
( 3 .4 .9 )
67
JA M ES–ST EIN M ET HOD
w h ec r>e 0 i s a s u i t a b l y c h o s e n c o nc =s 1, t a fn ot r. Te xh ae mc hp ol ei ,c ∗t )M ̂ e n s u r e s t hp (a𝜃̂i,JS SE = 2 M SE < 2𝜓 ( 𝜃 ), w h i l e r e ta in in g m o re th p i ̂ ̂ o f t h e g a𝜽JSi no vo e𝜽 f ri n t e r m s o f t o t a l M SE. W e r e f e r t h e r a n d M o r r i s ( 19 7 2a ) f o r f u r t h e r de t a i l s o n l i m i t e d t r a
Example 3.4.1. Batting Averages. Ef r o n ( 19 7 5 ) g a v e a n a m u s i n g e x t i n g a v e r a g e s o f m a jo r l e a g u e b a s e b a l l p l a y e r s i n t h s u p e r i o r i t y o f JS e s t i m a t o r s o v e r di r e c t e s t i m a t o r s . T f = 18 p l a y e r s a f t e r t h e i r fi r s t 4 5 t i m e s a t b a t du a g eP̂si , , o m T h e s e e s t i m a t e s a r e t a k e n 𝜃̂ai =s P̂ti .h Te hdier eJSc et set si m t i ma t ae tse w s e re ∑ ̂ ̂ P ∕18 = 0.26 5 =∶ P a n 𝜓 d i s c u l a t e d f r o m ( 3 . 4 . 5 ) u s 𝜃ii0n=g m a s p r i o r g u e s s ⋅ i=1 i t a k e nP̂ ⋅ (1 a s− P̂ ⋅ )∕4 5= 0.004 3 , t h e b i n o m i a l v a rP̂ii ai sn ct ree . a No t e tdea ts h a N(Pi , 𝜓). T h e c o m p r o m i s e JS e s t i m a t e s w e r ec =o 1.b Tt aoi n e d f r c o m p a r e t h e a c c u r a c i e s o f t h e e s t i m a t e s i, du t h rei nb ga t t i n g t h e r e m a i n de r o f t h e s e a s o n ( a b o u t 3 7 0 m o r e t i m e s a l u ae ns do f t h e t h e t r u e 𝜃vi =a Pl iu. Te a b l e 3 . 4 a l s o r e p o r t s t h e v aP̂ i,JS t h e c o m p r o m i s eP̂ ∗i,JS JS. e s t i m a t o r B e c a u s e t h e tPri au ree va as lsuu ems e d t o b e k n o w n , w e c a n c o m o v e ra ll a c c u ra c ie s u s in g th e ra tio s m
m
(P̂ i − Pi )2 ∕
R1 = i=1
a n d
(P̂ i,JS − Pi )2 i=1
m
m
(P̂ i − Pi )2 ∕
R2 = i=1
(P̂ ∗i,JS − Pi )2 . i=1
W e gR1e=t 3.5 0, w h i c h m e a n s t h a t t h e JS e s t i m a t e s o u t p e r f b y a f a c t o r o f 3R2. = 5 4.09 0. A , lss oo t, h a t t h e c o m p r o m i s e JS e s t i m e v e n b e t t e r t h a n t h e JS e s t i m a t e s i n t h i s e x a m p l e . It m a JS e s t i m a t o r p r o t e c t P ŝ 1 t=h 0.4 e p00r oo fp po lrat yi oe nr 1 ( Ro b e r t o Cl e m e n 5 .o n p r o p o r tio n o v e r s h r i n k i n g t o w a r d t h eP̂ ⋅ c=o0.26 m m in d W e t r eP̂ ai ∼t eN(P d i , 𝜓) i n c a l c u l a t i n g t h e JS e s t i m a t e s i n T a b in d ns Pui ) m w ei tn th=h 4a 5t . Un de r t h i s a s s u m b e m o r e r e a s o n a b lneP̂ i t∼o Ba is(n, Pf i .oEfn r o n a n d M o r r i s ( 19 7 5 ) u s e d t h e t i o n , t h e v a nrP̂ii adenpc ee n ods a r c - s i n e t r a n s√f o r m a t i o n t o s t a b i l i √ z e th e v a ria n c e o f a n a tr oc (2 sP î ni − 1) = g(P̂ i ) a n 𝜃di = n a r c (2P s i ni − 1) = f o r m a t i o n 𝜃̂li e= a ds in d ̂ g(Pi ), a n d w e h a v e a p 𝜃pi r∼oN(𝜃 x i, 1). m aT t he le y JS e s t i m𝜃i w a t ae s o cf a l c u ∑ î ̂⋅ . eTs hs e r e s u l t i n 𝜃ĝi,JSe s t i m a t 𝜃 ∕18 = 𝜃 l a t e d f r o m ( 3 . 4 . 5 ) u𝜃i0s=i n m g t h e g u i i=1 w e r e r e t r a n s f o r m e d t o g−1 p (r𝜃̂oi,JSv) oi de f t eh se t it m r u ae t pe srPoi . pEfo r rot ni o n s a n d M o r r i s ( 19 7 5 ) c a l c u l a t e d t h e o v e r 𝜃̂ai,JSl lr ae cl ac tui vr ae c t yo o f t h e di r e c t e s 𝜃̂ti iamsR a= t3.5 o r0.s T h e y a l s o 𝜃n̂i,JSo it se cd tl ho as 𝜃eti tr ht oa𝜃̂ni f o r 15 o m u = t o18f b a t t e r s , a n d i s w o r s e o n l y f o r b a t t e r s 1, 10,
68
INDIRECT DOM A IN EST IM A T ION
TABLE 3.4 Pl a y e r 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Batting Averages for 18 Baseball Players Di r e c t Es t i m a t e 0. 4 00 0. 3 7 8 0. 3 5 6 0. 3 3 3 0. 3 11 0. 3 11 0. 28 9 0. 26 7 0. 24 4 0. 24 4 0. 222 0. 222 0. 222 0. 222 0. 222 0. 200 0. 17 8 0. 15 6
Source: A da p t e d f r o m
3.4.3
T ru e V a lu e 0. 3 4 6 0. 29 8 0. 27 6 0. 221 0. 27 3 0. 27 0 0. 26 3 0. 210 0. 26 9 0. 23 0 0. 26 4 0. 25 6 0. 3 04 0. 26 4 0. 226 0. 28 5 0. 3 19 0. 200
JS Es t i m a t e 0. 29 3 0. 28 9 0. 28 4 0. 27 9 0. 27 5 0. 27 5 0. 27 0 0. 26 5 0. 26 1 0. 26 1 0. 25 6 0. 25 6 0. 25 6 0. 25 6 0. 25 6 0. 25 1 0. 24 7 0. 24 2
Co m p r o m JS Es t i m 0. 3 3 4 0. 3 12 0. 29 0 0. 27 9 0. 27 5 0. 27 5 0. 27 0 0. 26 5 0. 26 1 0. 26 1 0. 25 6 0. 25 6 0. 25 6 0. 25 6 0. 25 6 0. 25 1 0. 24 3 0. 221
T a b l e 1 i n Ef r o n ( 19 75 ) .
Estimation of Component MSE
W e n o w t u r n t o t h e e s t i m 𝜃̂i,JS a t if oo nr eo af c M h i, SE a ar seo safu m i n g d ̂𝜃i i∼n N(𝜃 l i nFog r vt ha ri si a pn uc re p o s e i , 𝜓), i = 1, … , m w i t h k n o w n s a m p𝜓. w e e x p𝜃̂i,JS r ea sss ̂ ( 3 . 4 . 10) 𝜃̂i,JS = 𝜃̂i + hi (𝜽) w ith
̂ = (m − 2)𝜓 (𝜃 0 − 𝜃̂i ). hi (𝜽) i S
( 3 . 4 . 11)
Us i n g t h e r e p r e s e n t a t i o n ( 3 . 4 . 10) , w e h a v e 2 ̂ ̂ ̂ M SE p (𝜃i,JS ) = Ep [𝜃i + hi (𝜽) − 𝜃i ]
̂ + Ep [h2 (𝜽)]. ̂ = 𝜓 + 2Ep [(𝜃̂i − 𝜃i )hi (𝜽)] i B y Co r o l l a r y 3 . 5 . 1 t o St e i n ’s l e m m a g i v e n i n Se c t i o n 3 ̂ = 𝜓Ep [𝜕hi (𝜽)∕𝜕 ̂ 𝜃̂i ]. Ep [(𝜃̂i − 𝜃i )hi (𝜽)] T h u s, 2 ̂ ̂ ̂ ̂ M SE p (𝜃i,JS ) = Ep [𝜓 + 2𝜓 𝜕hi (𝜽)∕𝜕 𝜃i + hi (𝜽)].
69
JA M ES–ST EIN M ET HOD
He n c e , a n u n b i a s e d e s t i m 𝜃̂i,JS a t ios r go i fv tehne bMy SE o f ̂ 𝜃̂i + h2 (𝜽). ̂ m sp (e𝜃̂i,JS ) = 𝜓 + 2𝜓 𝜕hi (𝜽)∕𝜕 i In f a c t , ( 3 . 4 . 12) i s t h e m c i e n c y o f t h 𝜽̂e us nt adet ir s nt ioc r v a lu e s . T h e n , a b e tte r If t h e g 𝜃ui0 ei ss sfi x e d, t h e n
( 3 . 4 . 12)
in im u m v a ria n c e -u n b ia s e de s m a lity . T h is e s tim a to r, h o w e v (𝜃̂i,JSa) t=o mr ia[0, e s t i+p m s xgm i vsp (e𝜃êi,JS n )].b y m s e u s i n g ( 3 . 4 . 11) w e h a v e [
̂ 𝜕hi (𝜽) (m − 2)𝜓 =− ̂ S 𝜕 𝜃i
1−
2(𝜃̂i − 𝜃i0 )2
] .
S
( 3 . 4 . 13 )
−1 ∑ If 𝜃i0 i s t a k e n a s t𝜃̂h⋅ =e mm em a n𝜃̂i , t h e n i=1
̂ = hi (𝜽) a n d
(m − 3)𝜓 ̂ (𝜃⋅ − 𝜃̂i ) S
) ( ̂ 𝜕hi (𝜽) 1 (m − 3)𝜓 =− 1− m S 𝜕 𝜃̂i
[
] 2(𝜃̂i − 𝜃̂⋅ )2 1− , S
( 3 . 4 . 14 )
( 3 . 4 . 15 )
T h e de r i v a t i o n s f o r t h e c a s e o f a𝜃i0 l=e zaTi 𝜷̂s LS t , s cqua an r be se p r e o b t a i n e d i n a s i m i l a r m a n n e r . No t e t h a t ( 3 . 4 . 12) i s v a l i in d f u n c thii(𝜽), ô n a s s u m𝜃̂i ∼i nN(𝜃 g i , 𝜓). A l t h o u g h t h e M SE e s t i m a t o r ( 3 . 4 . 12) i s t h e m i n i m u m i t s c o e f fi c i e n t o f v a r i a t i o n ( CV ) c a n b e qu i t e l a r g e , t h M o de l - b a s e d e s t i m a t o r s o f M SE, c o n s i de r e d i n Ch a p t e r b e s o m e w h a t b i a s e d i n t h e de s i g n - b a s e d f r a m e w o r k . B i l o de a u a n d Sr i v a s t a v a ( 19 8 8 ) c o n s i de r e d t h e g e n e g =e (s𝜃̂1,JS t i ,m… ,a𝜃̂m,JS t o )rT g i v e n b y m a t o𝜃̂i ,ri s= 1, … , m, a n d t h e r e s u l t i n𝜽̂ JS ( 3 .4 .8 ) . T h e y o b ta in e dth e m in im u m v a r ia n c e - u n b ia M(𝜽̂ JS ) = Ep (𝜽̂ JS − 𝜽)(𝜽̂ JS − 𝜽)T a s ( ) [ ] ̂ (𝜽JS − 𝜽0 )(𝜽̂ JS − 𝜽0 )T ̂ 𝜽̂ JS ) = 1 − 2𝜅 S + 𝜅 𝜅 + 4 (𝜈 + 1) , ( 3 . 4 . 16 ) M( Q 𝜈 𝜈 (𝜈 − m + 3) Q2
w h e𝜅 r=e (m − 2)∕(𝜈 − m + 3). T hit eh di a g o n a l e l e m e n t o f ( 3 . 4 . 16 ) i s o f M p (SE 𝜃̂i,JS ). Ri v e s t a n d B e l m o n t e ( 2000) g e n e r a l i z e d t h e M SE e s t i m a r=i (𝜓 a i𝓁n ).c In e m p aar tt ri icxu l a r , i f t h e e s t i 𝜽̂ ∼ Nm (𝜽, Ψ) w i t h k n o w n c o v Ψ 𝜓i𝓁n=t 0ww iht hie ≠n 𝓁, t h e n 𝜃̂i a r e i n de p e n de ̂ 𝜃̂i + h2 (𝜽), ̂ m sp (e𝜃̂i,JS ) = 𝜓ii + 2𝜓ii 𝜕hi (𝜽)∕𝜕 i le m e n t o f w h e𝜓rii ei s t hit eh di a g o n a l e Ψ.
( 3 . 4 . 17)
70
INDIRECT DOM A IN EST IM A T ION
̂ h a𝜃̂ti m Ri v e s t a n d B e l m o n t e ( 2000) n o t e𝜕hdi (t𝜽)∕𝜕 t h ae y deb rei ve av t ai vl ue a t e d n u m e r i c hai (⋅) l lhy a ws ∑ nh oe ne x p l i c i t f o r m . Fo r e 𝜃xi = a Y mi p l e , s u ̂ii m Y a n d a r e l i a b l e di r eŶ c=t e m s t o f a t t h o e r p o p u l a t i o n t o t a l is a v a i i=1 ̂Yi = Ni Ŷ i a n d t h e do m Nai i ns ks inz oe w n . T h e n , i t i s de s i r a b l e t o a d ̂ ̂c tFoe sr tei xm a amt op r l e , a s i m p l e m a tŶoi,JSr = Ni Y i,JS t o a dd u p t o t h e di r e Y. a dju s t m e n t o f t h e f o r m ( 3 . 3 . 11) m a y b e u s e d t o e n s u r e ̂t oT h e r a t i o - a dju s t e d JS e its ht i amr ea at Y a dd u p Y. oti ori sot agf litvh ee n b y Ŷ i,JS ̂ Y, Ŷ i,JS (a) = ∑m ̂ i,JS Y i=1
( 3 . 4 . 18 )
̂ ̂ ̂ w h eŶ i,JS r e = Ni [Y i + hi (Y)] = Ŷ i + Ni hi (Y). T h e e s t i m a t o r ( 3 . 4 . 18 ) m a y ̂ w h e re a sŶ i,JS (a) = Ŷ i + h∗i (Y), ( ∑m ) ∑m ̂ ̂ ̂ i=1 Yi i=1 Yi ∗ ̂ hi (Y) = ∑m Ni hi (Y) + ∑m − 1 Ŷ i . ( 3 . 4 . 19 ) Ŷ i,JS Ŷ i,JS i=1
i=1
∑ ̂ = 0, w h i c h e n s u r e s t h a t t h e a dju s t e Y. ̂d JS e s t i m No t e t h m a th∗i (Y) i=1 ∗ ̂ b eŶ i .uAs en det os t ei m Nu m e r i c a l di f f e r e n t i a t i o n m 𝜕hai (yY)∕𝜕 v aal tuo ar t oe f M o cmh a( 3n .g4 he.∗i (⋅). 17) d t ow i t h o fŶ i,JS (a) m a y t h e n b e o b t a i n e d fhri (⋅) 3.4.4
Unequal Variances 𝝍i
W e n o w t u r n t o t h e c a s e o f u n e qu a l b 𝜓ui . t Ak ns ot rwa ing sh at -m p f o r w a r d w a y t o g e n e r a l i z e t h e JS m e t h o d i s t o c o n s i d √ √ in d 𝛿i = 𝜃i ∕ 𝜓i a n d e s t i m 𝛿̂ia=t 𝜃ôi ∕r s𝜓i . T h e n , i t h o 𝛿̂li ds ∼ N(𝛿 t hi , 1) a ta n d √ 𝛿i0 = 𝜃i0 ∕ 𝜓i i s t a k e n a s t h 𝛿ei . gWu e sc sa on f n o w a p p l y t h e JS e s t i m t o t h e t r a n s f o r m e d da t a a n d t h e n t r a n s f o r m b a c k t o t h l e a ds t o ( ) ̂𝜃i,JS = 𝜃 0 + 1 − m − 2 (𝜃̂i − 𝜃 0 ), m ≥ 3, ( 3 . 4 . 20) i i S̃ ∑ 0 2 ̂ w h eS̃ r=e m i=1 (𝜃i − 𝜃i ) ∕𝜓i . T h e e s t i m a t o r ( 3 . 4 . 20) do m i n a t e s t h e ̂𝜃i i n t e r m s o f a w e i g h t e d M∕𝜓SE h owt ei ni gt eh rt m s 1s o f t h e t o t i , bwu it t n th a t is , m m 1 1 ̂ ̂ M SE M SE ( 3 . 4 . 21) p (𝜃i,JS ) < p (𝜃i ). 𝜓 𝜓 i=1 i i=1 i
− (m M o r e o v e r , i t g i v e s t h 𝜙ẽ JSc=o1 m m− o2)∕nS̃ two𝜃̂ei ai ng dh−1t 𝜙̃ JS t o t h e g u e𝜃si0 , s t h a t i s ,𝜃̂iei as cs h r u n k t o w a 𝜃ri0 db t yh et hge u sea sms 𝜙̃eJS rf ea gc at or dr l e s s o f i t s s a m p l𝜓ii .n Tg hv ias r ii sa n co et a p p e a l i n g t o t h e u s e r a t h e c o m p o s i t e e s t i m a t o r 𝜙̂w∗ g i it vh ec no bm y m( 3o . n4 .w3 )e .i On g h et w o u l h a v e m o r e s h𝜃̂ir, it nh ke al ag re g𝜓oei ifrs t. hT eh e m o de l - b a s e d m e t h o ds o 5 –8 p r o v i de s u c h u n e qu a l s h r i n k a g e .
71
PROOFS
3.4.5
Extensions
V a r i o u s e x t e n s i o n s o f t h e JS m e t h o d h a v e b e e n s t u di e t h e do m i n a n c e r e s u l t h o l ds u n de r s p h e r i c a l l y s y m m e c a l f a m i l y o f di s t𝜽̂r (i B b ur at i no dw n s ef ion r a n d St r a w de r m a n 19 9 0, S B i l o de a u 19 8 9 ) a n d f o r t h e e x p o n e n t i a l f a m i l y o f di s 19 8 3 ) . T h e e l l i p t i c a l f a m i l y o f di s t r i b u tt,i oa n ds do i n uc bl ul ede t h e x p o n e n t i a l di s t r i b u t i o n s . Ef r o n a n d M o r r i s ( 19 72b ) e x t e n de d t h e JS m 𝜽 e i toh fqo d t o t h e in d ∼ N(𝜽 , 𝚺) w i t h n e t di f f e r e n t a r e a c h a r a c t e r i s t i c s . 𝜽̂In p a r t i c u l a r k, an𝚺,sos wu m i i a n d de fi n e t h e c o m p o s i t e 𝜽̃r = i s(𝜽̃k1 , … o f, 𝜽̃am )no ef st htqi×emm m a t oa tr r i x o f p a r a m 𝜽e =t (𝜽 e r1 ,s… , 𝜽m ) a s ] [ ̃ = Ep t r(𝜽 − 𝜽) ̃ , ̃ T 𝚺−1 (𝜽 − 𝜽) R(𝜽, 𝜽)
w h e r e t r de n o t e s t h e t r a c e o p e r a t o r . T h e n , t h e e s t i m a [ ] 𝜽̂ i,JS = 𝜽0i + I − (m − q − 1)𝚺S−1 𝜽̂ i , i = 1, … , m ( 3 . 4 . 22)
do m i n a t e t h e di r e 𝜽̂ci ,ti = e 1, s t… im , m, ai tno tr es r m s o f t h e a b o v e c o m p 0 i s a g u e 𝜽 s a s n o d f In ( 3 . 4 𝜽 . 22) , i i S = (𝜽̂ − 𝜽0 )(𝜽̂ − 𝜽0 )T
w i t𝜽̂ h= (𝜽̂ 1 , … , 𝜽̂ m ) a n 𝜽d0 = (𝜽01 , … , 𝜽0m ); n o t e tShi sa aqt × q m a t r i x . In t h e u n i v a r i a t e c aqs=e1,w( 3i t. h4 . 22) r e du c e s t o t h e JS e s t i m a t o r ( 3 . 4 . 5 3.5
PROOFS
T o p r o v e T h e o r e m 3 . 4 . 1 i n s e c t i o n 3 . 4 . 2, w e n e e d t h e St e i n ( 19 8 1) ( s e e a l s o B r a n dw e i n a n d St r a w de r m a n 19 9 ′
Lemma 3.5.1. Le tZ ∼ N(𝜇, 1). T h eE[h(Z)(Z n , − 𝜇)] = E[h (Z)], p r o v i de d t h e e x p e c ta tio n s e x is t a n d ] [ 1 ( 3 . 5 . 1) lim h(z) e x p− (z − 𝜇)2 = 0, z→±∞ 2 ′
w h eh r(Z) e = 𝜕h(Z)∕𝜕Z. Proof: ∞ } { 1 1 h(z)(z − 𝜇) e x p− (z − 𝜇)2 dz E[h(Z)(Z − 𝜇)] = √ 2 2𝜋 ∫−∞ ∞ [ { }] 1 1 d e x p− (z − 𝜇)2 dz. = −√ h(z) dz 2 2𝜋 ∫−∞
72
INDIRECT DOM A IN EST IM A T ION
In t e g r a t i o n b y p a r t s a n d ( 3 . 5 . 1) y i e l d ]|∞ [ 1 1 E[h(Z)(Z − 𝜇)] = − √ h(z) e x p− (z − 𝜇)2 || 2 |−∞ 2𝜋 ∞ [ ]2 ′ 1 1 +√ h (z) e x p− (z − 𝜇) dz 2 2𝜋 ∫−∞ ′
= E[h (Z)]. ◾ in d
Corollary 3.5.1. Le tZ = (Z1 , … , Zm )T w i tZhi ∼ N(𝜇i , 1), i = 1, … , m. If c o n di t i o n ( 3 . 5 . 1) h o l dsE[h(Z)(Z , t h ei − n 𝜇i )] = E[𝜕h(Z)∕𝜕Zi ], f o r a r e a l - v a l u e d f u n c h(⋅). T o e s ta b lis h T h e o re m in d t h e c a n o n iZci ∼a N(𝜇 l c i ,a1)s e
3 . 4 . 1, i t i s s u f fi c i e n t t o p r o v e
in d
Theorem 3.5.1. Le tZ = (Z1 , … , Zm )T w i tZhi ∼ N(𝜇i , 1), i = 1, … , m, f o mr ≥ 3 , a n d ( ) a 𝝁̂ JS (a) = 1 − Z, 0 < a < 2(m − 2), ‖Z‖2 w h e‖Z‖ r e2 = ZT Z. T h e n ,
( a )̂𝝁JS (a) do m i n Za ft oe rs op)e. fFu fi rc ti he en rt m s 𝑣( io’s r ae r, et ha er e a - s p e c i fi c r a n e f f e c t s a s s u m e d t o b e i n de p e n de n t a n d i de n t i c a l l y di s t r Em (𝑣i ) = 0,
Vm (𝑣i ) = 𝜎𝑣2 (≥ 0),
( 4. 2. 2)
w h eEmr ede n o t e s t h e m o de l e Vxmpt he ec tma t oi oden l av na dr i a n c e . W e de n iid a s s u m p t𝑣ii o∼ n(0, a𝜎𝑣2s). No r m a l it y o f t h e r a𝑣inisdoa ml s oe fof fe t ce tns u s e d, b u t it is p o s s ib l e t o m a k e “r o b u s t ” in f e r e n c e s b y r e l a x 2 ish ae m ( Ch a p t e r 6, Se c t io n 6. 3 ) .𝜎𝑣T p ae raa smu re et eo rf h o m o g e n e it y o a f t e r a c c o u n t in g f ozir. t h e c o v a r ia t e s In s o m e a p p l ic a t io n s , n o t a l l a r e a s a r e s e l e c t e d in t h e Su p p o s e t h a t Mw a er eha as v in e t h e p o p u l amt io a rne aa sn da or en ls ye l e c t e in t h e s a m p l e . W e a s s u m e a m o de l o f t h e f o r m ( 4. 2. 1) 𝜃i = zTi 𝜷 + bi 𝑣i , i = 1, … , M. W e f u r t h e r a s s u m e t h a t t h e s a m p l p o p u l a t io n m o de l , t h a t is , t h e b ia s in t h e s a m p l e s e l e c ( 4. 2. 1) h o l ds f o r t h e s a m p l e d a r e a s . Fo r m a k in g in f e r e n c e s a b o u t Y t hi ue n sde mram l l oa de r e l a ( 4. m 2.e1)a ,n ̂ w e a s s u m e t h a t dir eY icat r ee s at im v a ail ta ob r lse . As in t h e Ja m e s –St e ( Ch a p t e r 3 , Se c t io n 3 . 4) , w e a s s u m e t h a t ̂ 𝜃̂i = g(Y i ) =𝜃i + ei ,
i = 1, … , m,
w h e r e t h e s a m epi al rine gin eder rpo er ns de n t w it h
( 4. 2. 3 )
77
BASIC AREA LEVEL MODEL
Ep (ei |𝜃i ) =0,
Vp (ei |𝜃i ) =𝜓i .
( 4. 2. 4)
. Tg hv ea ar ia b It is a l s o c u s t o m a r y t o a s s u m e t h 𝜓ai , ta t rhe e ks na om wp nl in a s s u m p t io n s m a y b e qu it e r e s t r ic t iv e in s o m e a p p l ic a s eisd af on r o n l in e a r f u n c t io n a n d t e s t im 𝜃âi tmo ra y b e de s ig n -𝜃ibifiag(⋅) s m a l l . T h e s a m p l in g e r r o r s m a y n o t b e in de p s a m p l ne i is s ize c u t a c r o s s t h e s t r a t a o r c l u s t e r s o f t h e s a m p l in g de s ig c an nc eb se , r e l a x e d b 𝜓yi fer so t m im tah t ein ug n it l e v e l s a m s a m p l in g v 𝜓 a i ,r ia e t aa t m s tna cb𝜓eil. es e s t im a da t a a n d t h e n s m o o t h in g t h 𝜓 êi et os tgim e d ov ra er ia T h is s m o o t h in g is c a l l e d t h e g e n e r a l ize d v a r ia n c e f u n c 2007) . No r m a l it y o f t h𝜃̂ie ise as lt simo ao t fot er n a s s u m e d, b u t t h is m r e s t r ic t iv e a s t h e n o r m a l it y o f t h e r a n do m e f f e c t s , du e e f f e c 𝜃̂ti . o n e los bo t na in e d b𝜎𝑣2y= s0,e tt ht in a tg𝜃iis=, zTi 𝜷. Su c h De t e r m in is t ic m𝜃i ao rde m o de l s l e a d t o s y n t h e t ic e s t im a t o r s t h a t do n o t a c c o u n t h e v a r ia t io n r e fle c t e d in t h ezi .a u x il ia r y v a r ia b l e s Co m b in in g ( 4. 2. 1) w it h ( 4. 2. 3 ) , w e o b t a in t h e m o de l 𝜃̂i = zTi 𝜷 + bi 𝑣i + ei ,
i = 1, … , m.
( 4. 2. 5)
No t e t h a t ( 4. 2. 5) in v o l v e s de s eigi an s- in w du e l cl ea ds emr r oo𝑣de l e er r o r s ir. sW a s s u m e𝑣i ta hn aedit a r e in de p e n de n t . Mo de l ( 4. 2. 5) is a s p e c ia l c m ix e d m o de l ( Ch a p t e r 5) . T h e a s s u Emp (epi |𝜃ti )io=0 n in t h e s a m p l in g m o de l ( 4. 2. 3 ) m a y n hiteh a r e a is s m𝜃i ais l al an no dn l in e a r f u n c t io n o f t t h e s a m p nli ein s t ize ̂ti ise sdet im s ig ant o- ur n b ia s e d. T h e n , a m o r e r e a Yi , e v e n if t h e dir e cY m o de l is g iv e n b y ̂i = Yi + 𝜀i , i = 1, … , m, ( 4. 2. 6) Y
̂iisis, de s ig n - u n b ia s e dYfi . oInr t h is e tcoa t sa el , t h e s a m w itEhp (𝜀i |Yi ) =0, t h a tY p l in g a n d l in k in g m o de l s a r e n o t m a t c h e d. As a r e s u l t , w t h e l in k in g m o de l ( 4. 2. 1) t o p r o du c e a s in g l e l in e a r m ix e T h e r e f o r e , s t a n da r d r e s u l t s in l in e a r m ix e d m o de l t h e o n o t a p p l y . In Se c t io n 10. 4, w e u s e a HB a p p r o a c h t o h a n dl l in k in g m o de l s .
Example 4.2.1. Income for Small Places. In t h e c o n t e x t o f e s t im a t in g p in c o m e ( PCI) f o r s m a l l p l a c e s in t h e Un it e d St a t e s w it h p In 2. t5)h ewir it ah p p 𝜃l iic= a t io n , Fa y a n d He r r io t ( 1979) u s e d m o bde i =l 1.( 4. l o (Y g i ), w h eY irise t h e PCI in it thh ae r e a . Mo de l ( 4. 2. 5) is c a l l e d t h e Fa y ( FH) m o de l in t h e s m a l l a r e a l it e r a t u r e b e c a u s e t h e y w e f o r s m a l l a r e a e s t im a t io n . So m e de t a il s o f t h is a p p l ic a t i
Example 4.2.2. US Poverty Counts. T h e b a s ic a r e a l e v e l m o de l ( 4. 2. 5 u s e d in t h e s m a l l a r e a in c o m e a n d p o v e r t y e s t im a t io
78
SMALL AREA MODELS
Un it e d St a t e s t o p r o du c e m o de l - b a s e d c o u n t y e s t im a t e s ( Na t io n a l Re s e a r c h Co u n c il 2000) . Us in g t h e s e e s t im a t e s Edu c a t io n a l l o c a t e s a n n u a l l y s e v e r a l b il l io n do l l a r s o a n d t h e n s t a t e s dis t r ib u t e t h e s e f u n ds a m o n g s c h o o l dis a l l o c a t e d o n t h e b a s is o f e s t im a t e d c o u n t s f r o m t h e p r c o u n t s h a v e c h a n g e d s ig n ifi c a n t l y o v e r t im e . g ni ), , w h Ye i rise t h e t r u e p o v e r t ity h c co ou un nt tiny t h e In t h is a p p l 𝜃ici =a l toio(Y ̂ieoc fYt i ew s at im s oa bt ot ar in e d f r o m t h e c u r r e n ( s m a l l a r e a ) a n d a dirY e r teo or bv taa r inia eb dl fe rso, m c e n s t io n s u r v e y ( CPS) . Ar e a l e v e l zpi , rwe dic a dm in is t r a t iv e r e c o r ds . Cu r r e n t de t a il s o f t h is a p p l ic a t io n
Example 4.2.3. Census Undercount. In t h e c o n t e x t o f e s t im a t in g t h e in t h e de c e n n ia l c e n s u s o f t h e Un it e d St a t e s , Er ic k s e n a n d ( 4. 2. 5) w biit=h 1 a n d t r e a𝜎𝑣2t in a sg k n o w n . In t h e ir𝜃i a= p T(pi −l C ici )a∕Tti io n , is t h e c e n s u s u n de itr ch os ut an t te f (oa r r tehTaieis ) , twh eh et rrue e ( u n k n o w n ) c a n C d i is t h e c e n s u s c oit hu na tr ein a t𝜃̂ , ihaisen tdh e dir e c t e s t𝜃im e o f i f r ao tm a √ p o s t - e n u m e r a t io n s u r v e y ( PES) . Cr e s s ie ( 1989) bi =a l s o u s e djuTi ∕C s ti ,mu es in n tgf at hc et oPES r s e s t im a t e 1∕ Ci t o e s t im a t e t h e c e n s u s 𝜃ai = 𝜃̂i , o 𝜃fi . In t h e s e a p p l ic a t io n s , 𝜃̂t ihc eo PES u l debs et im s e ar io t e us s l y b ia s e d, a b y Fr e e dm a n a n d Na v idi ( 1986) . In t h e c o n t e x t o f t h e Ca n a dia n Ce n s u s , Dic k ( 1995) u s e r e n o t e s p r×oa vg × in e s ce ex c o m b in a t io n . He u 𝜃i = Ti ∕Ci a n bdi = 1, w h ei de y ap sl sinu𝜓gimt voinabrgeia pn rcoe ps o r t io n a s m o o t h e d e s t im a t e s o f t h e𝜓i ,s ba m t o s o m e p o w e r o f t h e Ct ir. uSoe mc ee ndes ut as ilcs oo uf nt ht is a p p l ic a t io n in Ex a m p l e 6. 1. 3 .
4.3 W p o p o to
BASIC UNIT LEVEL MODEL e p p b
a s s u m e t h a t u n it - s p xeij c= ifix(ij1c, … a ,uxijpx)Tilaiar re ya da v at ail a b l e f o r e a u l a t io n j in e l e am c eh ns tm i.a Itl lisa or ef at e n s u f fi c ie n t t o a s s u m e t u l a t io X n i am r e ak nn so w n . Fu r t h e r m o r e , t yhij ,e isv aa sr siau bml ee od f in e r e l axijt tehd rt o u g h t h e b a s ic n e s t e d e r r o r l in e a r r e g r e s yij = xTij 𝜷 + 𝑣i + eij ,
j = 1, … , Ni , i = 1, … m.
( 4. 3 . 1)
He r e t h e a r e a - s p e𝑣i ca ifi c t se d t o b e iid r a n do m v a r ia b l r ec ae sf sf ue m e t as r e iid r a n do m v a r ia b l e s ( 4. 2. 2)eij, = kij ẽ ij f o r k n o w n ckijoa nn sd tt aeh̃ ijn’s p e n de n t 𝑣oi ’sf ,t w h eit h Em (̃eij ) =0,
Vm (̃eij ) =𝜎e2 .
( 4. 3 . 2)
In a ddit io n , n o r m 𝑣ai ’sl itayneoijd’sf is t h oe f t e n a s s u m e d. T h e p a r a m e t a r e t h e s m a l l aY iroe ra t m sa la sn da r d r e g r e s s io n m o de l s a r e h ee taYoin. t St
79
BASIC UNIT LEVEL MODEL
b y s e t𝜎t𝑣2in= g0 o r e qu iv a 𝑣li e=n0 in t l y( 4. 3 . 1) . Su c h m o de l s l e a d t o s y n e s t im a t o r s ( Ch a p t e r 3 , Se c t io n 3 . 2. 3 ) . oi m u n tith s e in it thh ae r e a W e a s s u m e t h sai ot af ss ize ani mis pt al ek e n f r N (i = 1, … , m) , a n d t h a t t h e s a m p l e v a l u e s a l s o o b e y t h e a s s l a t t e r a s s u m p t io n is s a t is fi e d u n de r s im p l e r a n do m s a m g e n e r a l l y f o r s a m p l in g de s ig n s t h a t xuij sine tthh ee sa eul xe cil tiaior ny o f t h e s a sm p l se es e t h is , w e w r it e ( 4. 3 . 1) in m a t r ix f o r m a s i . To yPi = XPi 𝜷 + 𝑣i 𝟏Pi + ePi ,
i = 1, … , m,
( 4. 3 . 3 )
e i × 1 v e c t o r s𝟏Pi, isa nt hdNei × 1 v e c t o r w h eXrPi eis aNi × p m a t ryPiixa ,n edPi a r N o f o n e s . W e n e x t p a r t it io n ( 4. 3 . 3 ) in t o s a m p l e d a n d n o n [ ] [ ] [ ] [ ] yi Xi 𝟏 e P yi = = 𝜷 + 𝑣i i + i , ( 4. 3 . 4) yir Xir 𝟏ir eir w h e r e t h e sr de u bn soc t reips tt h e n o n s a m p l e d u n it s . If t h e m o de l T a f ree r b t h a t is , if s e l e c t io n b ia s is a b s e 𝝀n=t ,𝜷(tT ,h𝜎𝑣2e, 𝜎ne2 )in e na sc ee ds oo nn f (yi |XPi , 𝛌) = f (yi , yir |XPi , 𝛌)dyir , ∫
i = 1, … , m,
( 4. 3 . 5)
w h ef (y r ei , yir |XPi , 𝛌) is t h e a s s u m e d jo in t dis yi at rnibydiru. tOn io nt ho ef o t h e r Ta, w h a n d, de fi n in g t h e v e c ati o= r a(oi1 , f…in, aiN dic t o r s ) h a e r = e 1 if j ∈ si a n d ij i , ai )nis o gf ivs ae m n bp yl e da t a aij = 0 o t h e r w is e , t h e jo in t dis t r ib u(yt iio f (yi , ai |XPi , 𝛌) = f (yi , yir |XPi , 𝛌) f (ai |yi , yir , XPi )dyir ∫ [ ] ) ( P = f yi , yir |Xi , 𝛌 dyir f (ai |XPi ), ∫ p r o v ide d f (ai |yi , yir , XPi ) =f (ai |XPi ), t h a t is , t h e s a m p In t h is c a s e , s e l e o b e y th e a s s u m If t h e s a m p l e s in c l u de XdPiin , th e n th
poPie. t nded po en n d l e s e l e c t io n p r o byPiab bu ilt itmie as ydodenX c t io n b ia s is a b s e n t a n d w e m a y a s s u m P t hf oa rt is in ,f ue sr e n𝛌 (c Sm e s ito hn 1983 ) . e d m fo(yde i |Xil ,, 𝛌) e l e c t io n p r o b a b il it ie s dezPip teh na dt oisn n aont a u e dis t r ib u t io n (yoi , fai )s isa m p l e da t a
[ ] ( ) f yi , yir |XPi , zPi , 𝛌 dyir f (ai |zPi , XPi ). f (yi , ai |XPi , zPi , 𝛌) = ∫ In t h is c a s e , in f𝛌eisr eb na cs e fd(y ooi |X nn Pi , zPi , 𝛌), w h ic h is dif f e r e n t f r o m ( P P P u n l ezi siss u n r e l ayit eg divt oeXin. Th e r e f o r e , w e h a v e s a m p l e s e l e c
80
SMALL AREA MODELS
c a n n o t a s s u m e t h a t t h e m o de l ( 4. 3 . 3 ) h o l ds f o r t h e s a m m o de l ( 4. 3 . 3 ) b yzPi ina nc ld ut hdine ng t e s t f o r t h e s ig n ifi c a n c e o f r e g r e s s io n c o e f fi c ie n t u s in g t h e s a m p l e da t a . If t h e n u t h e n w e c o u l d a s s u m e t h a t t h e o r ig in a l m o de l ( 4. 3 . 3 ) a l ( Sk in n e r 1994) . Th e m o de l ( 4. 3 . 3 ) is a l s o n o t a p p r o p r ia t e u n de r t w o - s s m a l l a r e a s b e c a u s e r a n do m c l u s t e r e f f e c t s a r e n o t in t h e m o de l t o a c c o u n t f o r s u c h f e a t u r e s ( Se c t io n 4. 5. 2) . W e w r it e t h e s m aY i lal sa r e a m e a n Y i = fi yi + 1( − fi )yir
( 4. 3 . 6)
w it fhi = ni ∕Ni , w h eyi ra en ydir de n o t e , r e s p e c t iv e l y , t h e m e a n s o f t n o n s a m p l e d e l e m e n t s . It f o l l o w s f r o m ( 4. 3 . 6)Y it h a t e s t i is e qu iv a l e n t t o e s t im a t in g t h e r e a l iza ytirio g iv n oe nf tt hh ee rs aa nmdop ml e ia Pir}.y da t a da t {ayi } a n d t h e a u x il {X If t h e p o p u l aNti io is nl a sr ize g e , t h e n t h e s mY i aisl la ap rpe rao m x im e a an t e l y e qu a l t o T 𝜇i = Xi 𝜷 + 𝑣i ( 4. 3 . 7) T
n o t in g Yti h= aXit 𝜷 + 𝑣i + Ei a n Edi ≈ 0, w h eEi rise t h e m e a nNi eo rf r toheijres a n Xdi is t h e k n o w nXPim. Ite fao nl loo fw s t h a t t h eY ieiss at im p p ar toiox nimo af t e l y e qu iv a l e n t t o t h e e s t im a t io n o f 𝜷a al in n de t ah re c roe m a l bizaint io a t nio on f ot r a n do m v 𝑣ai .r ia b l e W e n o w g iv e s o m e e x a m p l e s o f m o de l ( 4. 3 . 1) . So m e a r e g iv e n in Ch a p t e r 6, Se c t io n 7. 3 .
Example 4.3.1. County Crop Areas. Ba t t e s e , Ha r t e r , a n d Fu l l e r ( 1988) u n e s t e d e r r o r r e g r e s s io n m o de l ( 4. 3 . 1) t o e s t im a t e c o u n s u r v e y da t a in c o n ju n c t io n w it h s a t e l l it e in f o r m a t io n . e s t e d in e s t im a t in g t h e a r e a u n de r c o rmn= a12nc dos uo ny tbiee sa in n s fo a t tea l laitse da t a No r t h - Ce n t r a l Io w a u s in g f a {y r mi } a- nin dt LANDSAT e r v ie w s da a s{XPi }. Ea c h c o u n t y w a s div ide d in t o a r e a s e g m e n t s , a n d t s o y b e a n s w e r e a s c e r t a in e d f o r a s a m p l e o f s e g m e n t s r a en ng tes d in f r oa m Th e n u m b e r o f s a m p l e d s engi , m c o 1ut no t 6. y ,Au x il ia r y in t h e f o r m o f n u m b e r o f p ix e l s ( a t e r m u s e d f o r “p ic t u r c l a s s ifi e d a s c o r n a n d s o y b e a n s w e r e a l s o o b t a in e d f o r in g t h e s a m p l e d s e g m e n t s , in e a c h c o u n t y u s in g t h e L Ba t t e s e e t a l . ( 1988) p r o p o s e d t h e m o de l yij = 𝛽0 + 𝛽1 xij1 + 𝛽2 xij2 + 𝑣i + ẽ ij ,
( 4. 3 . 8)
w h ic h is a s p e c ia l c a s e o f kmij =o 1,dexijl =( 4. 1,( 3xij1. ,1) xij2w )T , ita hn 𝜷d = T o r=nn u( omr bs eo ry b e (𝛽0 , 𝛽1 , 𝛽2 ) . He r eyij, = n u m b e r o f h e c t a r e s o f cxij1
81
EXTENSIONS: AREA LEVEL MODELS
o f p ix e l s c l a s s ifi e dxij2 a s= cn ou rmn ,b ae nr do f p ix e l s c l a s s ifi e d a s s t h jet h a r e a s e g m itehn ct oo uf nt ht ye . So m e de t a il s o f t h is a p p l ic a Ex a m p l e 7. 3 . 1.
Example 4.3.2. Wages and Salaries. Ra o a n d Ch o u dh r y ( 1995) s t u die d t h e p t io n o f u n in c o r p o r a t e d t a x fi l e r s f r o m t h e p r o v in c e o f 2. 4. 1) . Th e y p r o p o s e d t h e m o de l 1∕2
yij = 𝛽0 + 𝛽1 xij + 𝑣i + xij ẽ ij ,
( 4. 3 . 9)
1∕2
w h ic h is a s p e c ia l c a s keij = o xfij ( .4.He 3 .r1) yeij ,aw n itxdijhde n o t e , r e s p e c t iv e t h e t o t a l w a g e s a n d s a l a r ie s a n d g r ojt sh s fib r ums in in it he t sha sre ein ac. o m e Sim p l e r a n do m s a m p l in g f r o m t h e o v e r a l l p o p u l a t io n t o t aYli so r t h e mY ie. aSon ms e de t a il s o f a s im u l a t io n s t u dy b a s e d g iv e n in Ex a m p l e 7. 3 . 2.
4.4
EXTENSIONS: AREA LEVEL MODELS
W e n o w c o n s ide r v a r io u s e x t e n s io n s o f t h e b a s ic a r e a 4.4.1
Multivariate Fay–Herriot Model
Su p p o s e w e w a n t t o r ×e1s vt im e c at ot er ao nf a r e a c h𝜽ai =r a c t e r is t e gj (Y ij ) a n dY ij is t h ite h s m a l l a r e a m jte ha n f o r t (𝜃i1 , … , 𝜃ir )T , w h e𝜃ijr = c h a r a c t ej =r is 1, … t ic, r,, a n 𝜽di = 𝜃̂(i1 , … , 𝜃̂ir )T is t h e v e c t o r o f s u r v e y e s t W e c o n s ide r t h e m u l t iv a r ia t e s a m p l in g m o de l ̂i = 𝜽i + ei , 𝜽
i = 1, … , m,
( 4. 4. 1)
t te n o rm a l w h e r e t h e s a m pei l=in e(gi1 , …e ,reirr )oT ra sr e in de p e r-n vdea nr ia 𝚿i nc co en m dit io a tnr𝜽ic ai . lHe e os rn𝟎e Nr (𝟎, 𝚿i ), w it h m𝟎 ae na dn k n o w n c o v a r ia is t hr e× 1 n u l l v e c t o r . W e f u r 𝜽t ihise rr e al sa st eu dmt oe at hr ea at - s p e c ifi c a da t {azij } t h r o u g h t h e l in e a r m o de l 𝜽i = Zi 𝜷 + vi ,
i = 1, … , m,
( 4. 4. 2)
de m p eNenr (𝟎, w h e r e t h e a r e a - s p e c ifivi ca rr ea nin do fde f 𝚺e𝑣n),c tZ t is is a rn × rp e v e c to r o f m a t r ix jtwh itrho w g iv(𝟎eT ,n… ,b𝟎Ty, zTij , 𝟎T , … , 𝟎T ), a n 𝜷d is t h rpT r e g r e s s io n c o e f fi𝟎 isc ie t hpne×t 1s n. He u l rl ev ,e c zt ijo or ca cn ud r s jtinh tph oe s it io n o f t h e r o w j tv he rc ot ow r ) (. Co m b in in g ( 4. 4. 1) w it h ( 4. 4. 2) , w e o b t a in a m u l t iv a r ia t e ̂i = Zi 𝜷 + vi + ei . 𝜽
( 4. 4. 3 )
82
SMALL AREA MODELS
Th e m o de l ( 4. 4. 3 ) is a n a t u r a l e x t e n s io n boi = f t1.h Fa e FH y m o de ( 1987) a n d Da t t a , Fa y , a n d Gh o s h ( 1991) p r o p o s e d t h e m u l t iv a n d de m o n s t r a t e d t h a t it c a n l e a d t o m o r e e f fi c ie n t e s t im ̂i tuion nl ik Y ij b e c a u s e it t a k e s a dv a n t a g e o f t h e c o r r e l a𝜽 s be e t w e t h e u n iv a r ia t e m o de l ( 4. 2. 5) .
Example 4.4.1. Median Income. Da t t a , Fa y , a n d Gh o s h ( 1991) a p p l ie d t h v a r ia t e m o de l ( 4. 4. 3 ) t o e s t im a t e t h e c u r r e n t m e dia n in c in e a c h o f t h e Am e r ic a n s t a t e s ( s m a l l a r e a s ) . Th e s e e s t de t e r m in e t h e e l ig ib il it y f o r a p r o g r a m o f e n e r g y a s s is a dm in is t e r e d b y t h e U. S. De p a r t m e n t o f He a l t h a n d Hu m a n t io n𝜽i, = 𝜃(i1 , 𝜃i2 )T w it𝜃hi1 = p o p u l a t io n m e dia n in c o m e o f f o u r - p s t a it ae n 𝜃di2 = 34 ( p o p u l a t io n m e dia n in c o m e o f fi vi) e+ -14 p e r s o n ̂im ( p o p u l a t io n m e dia n in c o m e o f t h i) r e. eDir- pe ec rt se os tn𝜽 dt e ils ie s in i af anam t h e a s s o c ia t e d c o v 𝚿 â irwia en rc ee omb at at rinixe d f r o m𝚿i twh ae sCPS, t r e aa nt ed d ̂ it ta inn gd ig n o r in g t h e v a r ia b il it𝚿 ̂yi . He a s k n o w n b 𝚿yi =l e𝚿 o f r teh e e s t i ̂ n gt el dy dir e 𝜃i2 is n o t a p a r a m e t e r o f in t e r e s t , b u t t h 𝜃ei2 ais s ss tor co ia r e l a t e 𝜃̂di1 .t oBy t a k in g a dv a n t a g e o f t h is a s s o c ia t io n , a n im p p a r a m e t e r o𝜃i1f cina tne rb ee s ot b t a in e d t h r o u g h t h e m u l t iv a r ia t e w t aa s b a s e d o n c ezijn=s 1, u( zsij1 ,da zij2t)Ta, :j = 1, 2, Th e a u x il ia r{zyij } da w h ezi11 r ea n zdi12 de n o t e , r e s p e c t iv e l y , t h e a dju s t e d c e n s u s m c u r r e n t y e a r a n d t h e b a s e - y e a r c e n s u s m e dia n in c o m e it h s t a t ezi21 , aann dzdi22 de n o t e , r e s p e c t iv e l y , t h e w e ig h 34t e d a v e r a a n 14d) o f a dju s t e d c e n s u s m e dia n in c o m e s f o r t h r e e - p e r s o a n d t h e c o r r e s p o n din g w e ig h t e d a v e r a g e ( w it h t h e s a m c e n s u s m e diait nh s s in t a t he .e Th e a dju s t e d c e n s u s m e dia n in c o m y e a r w e r e o b t a in e d b y m u l t ip l y in g t h e c e n s u s m e dia n i p r o du c e d b y t h e Bu r e a u o f Ec o n o m ic An a l y s is o f t h e U. S. So m e de t a il s o f t h is a p p l ic a t io n a r e g iv e n in Ex a m p l e s 4. 4.4.2
Model with Correlated Sampling Errors
A n a t u r a l e x t e n s io n o f t h e b a s ic FH m o de l w it h in de p e n d ̂ve=er 𝜃ĉ h eg 𝜽 or s,r𝜃̂sm )T , is t o c o n s ide r t h e c o r r e l a teei . dDe s afi mn ep lt in r(1o,t … 𝜽 = 𝜃(1 , … , 𝜃m )T , a n ed= e(1 , … , em )T , a n d a s s u m e t h a t ̂ = 𝜽 + e, 𝜽
( 4. 4. 4)
r =c o𝜓(i𝓁v) aisr ia n c e m w it eh|𝜽 ∼ Nm (𝟎, 𝚿), w h e r e t h e s a m p l in g e r r o 𝚿 k n o w n . Co m b in in g ( 4. 4. 4) w it h t h e𝜃i ’sm, w o de e lo (b4.t 2. a 1) in fa o gr et hn e 1, il =( 1, m,. in a l iza t io n o f t h e b a s ic FH mbi o= de 4.… 2. ,5) If ( 4. 2. 1) , t h e n t h e c o m b in e d m o de l m a y b e w r it t e n a s ̂ = Z𝜷 + v + e, 𝜽
( 4. 4. 5)
83
EXTENSIONS: AREA LEVEL MODELS
w h ev r=e 𝑣(1 , … , 𝑣m )T a n Zd is a m n × p m a t r ix itwh itr ho w e qu zTi . aInl tpo r a c t ic e , ̂v oe yr ae s tmimo ao t toh re d e s t im a t o r , b u t t h 𝚿 is r e p l a c e d b y a s u r 𝚿 a s s o c ia t e d w it h t h e e s t im a t o r is o f t e n ig n o r e d.
Example 4.4.2. US Census Undercoverage. In t h e c o n t e x t o f e s t im a t in g t c o u n t in t h e 1990 Ce n s u s o f t h e Un it e d St a t e s , t h m e =p o p u l a t 357 p o s t s t r a t a c o m p o s e d o f 51 p o s t s t r a t u m g r o u p s , e a in t o 7 a g e –s e x c a t e g o r ie s . Th e 51 p o s t s t r a t u m g r o u p s w r a c e /e t h n ic it y , t e n u r e ( o w n e r , r e n t e r ) , t y p e o f a r e a , a n r a, 357 tu m w e r e o b t a in e d u s in g t h e da t a f r o 𝜃̂i f o r e a c h p o si =t s1,t… PES. W e r e f e r t h e r e a de r t o Se c t io n 3. 4, Ch a p t e r 3 o f Ra o ( 20 s thmp eo ns t -f a c t s y s t e m e s t im a𝜃̂itisio tnh . eHee sr et im , a t e d c e n s u s a dju it s t r a t u m . A m o de l o f t h e f o r m ( 4. 4. 5) w a s e m p l o y e d t o o t h e a dju s t m e n𝜃i t( Is f aa ck t i,o Ts r s a y , a n d Fu l l e r 2000) . In a p r e v io u s Hw a n g , a n d Ts a y 1991) , 1392 p o s t s t r a t a w e r e e m p l o y e d, a o f a dju s t m e n t f a c t o r s w e r e o b t a in e d. So m e de t a il s o f t Ex a m p l e 8. 2. 1. 4.4.3
Time Series and Cross-Sectional Models
Ma n y s a m p l e s u r v e y s a r e r e p e a t e d in t im e w it h p a r t ia e l e m e n t s . Fo r e x a m p l e , in t h e m o n t h l y US CPS, a n in div id t h e s a m p l e f o r f o u r c o n s e c u t iv e m o n t h s , t h e n dr o p s s u c c e e din g m o n t h s , a n d t h e n c o m e s b a c k f o r a n o t h e r m o n t h l y Ca n a dia n l a b o u r f o r c e s u r v e y ( LFS) , a n in div idu s a m p l e f o r s ix c o n s e c u t iv e m o n t h s a n d t h e n dr o p s o u t s u r v e y s , c o n s ide r a b l e g a in in e f fi c ie n c y c a n b e a c h ie v b o t h a r e a s a n d t im e . Ra o a n d Yu ( 1992, 1994) p r o p o s e d a n e x t e n s io n o f t h e b a s h a n dl e t im e s e r ie s a n d c r o s s - s e c t io n a l da t a . Th e ir m o de l m o de l ( 4. 4. 6) 𝜃̂it = 𝜃it + eit , t = 1, … , T, i = 1, … , m a n d a l in k in g m o de l 𝜃it = zTit 𝜷 + 𝑣i + uit .
( 4. 4. 7)
He r 𝜃eit = g(Y it ) is a f u n c t io n o f t h e s Y mit , a𝜃̂itl isl at hr ee a dirm e ec at ns u r v e y e s m a to r fo r s m i a ta tl im l t,a ter he ea s a m p l in eit ga ree r nr oo rr sm a l l y dis t r ib u t e d k wdiaitgh obnl ao lc ck os v a t h 𝜃eit ’s w it h ze r o m e a n s a n d a k n o w n b l o c 𝚿 𝚿i , a nzitd is a v e c t o r o f a r e a - s p e c ifi c c o v a r ia t e s , t,s fo omr e o f w iid 𝜎𝑣2 )r at hn ed rt uhmit ’s e o ar re e, a s s u m e d e x a m p l e , a dm in is t r a t iv e 𝑣da . Fu i ∼t aN(0, t o f o l l o w a c o m m o n fi r s t - o r de r a u t o ri,e tgh r ae ts iss iv , e p ro c e uit = 𝜌ui,t−1 + 𝜀it ,
|𝜌| < 1,
( 4. 4. 8)
84
SMALL AREA MODELS iid
w it 𝜀hit ∼ N(0, 𝜎 2 ). Th e e r {e r oit },r s{𝑣i }, a n {𝜀 d it } a r e a l s o a s s u m e d t o b e p e n de n t o f e a c h o t h e r . Mo de l s o f t h e f o r m ( 4. 4. 7) a n d ( 4. 4 u s e d in t h e e c o n o m e t r ic s l it e r a t u r e ( An de r s o n a n d Hs ia o 1 e r r oeitr. s Th e m o de l ( 4. 4. 𝜃7)it ’so de n pt he en ds o n b o t h𝑣iaa rne dat he ef f ae rcet as - b y - t im e f f e ucit tt sh a t a r e c o r r e l a t e d a c r o s i.s W t ime ec fa on r ae l as co he ax rpe r ae s s ( a s a dis t r ib u t e d- l a g m o de l 𝜃it = 𝜌𝜃i,t−1 + z(it − 𝜌zi,t−1 )T 𝜷 + 1( − 𝜌)𝑣i + 𝜀it .
( 4. 4. 9)
Th e a l t e r n a t iv e f o r m𝜃it t( o4. 4. t h9)e rpe rl ea vt eios u s p e r io 𝜃di,t−1 a r, et ha ep a r a m v a l u e s o f t h e a u x il ia r y v a r ia tba lnetd− s 1, f ot hr teh rea tnimdo em p ao rine ta s e f f e Mo er ee cf foe mc t ps l e x muito’s de t hl sa no n( 4.t h4. e8) c a n 𝑣i , a n d t h e a r e a - b y 𝜀-it t. im b e f o r m u l a t e d b y a s s u m in g a n a u t o r e g r e s s iv e m o v in g t h e r e s u l t in g e f fi c ie n c y g a in s r e l a t iv e t o ( 4. 4. 8) a r e u n l s m a l l a r e a e s t im a t io n c o n t e x t . Gh o s h , Na n g ia , a n d Kim ( 1996) p r o p o s e d a dif f e r e n t t im e m o de l f o r s m a l l a r e a e s t im a t io n g iv e n b y ind 𝜃̂it |𝜃it ∼ N(𝜃it , 𝜓it ), ind
𝜃it |𝛂t ∼ N(zTit 𝜷 + wTit 𝜶 t , 𝜎t2 ), a nd
ind
𝜶 t |𝜶 t−1 ∼ Nr (Ht 𝜶 t−1 , 𝚫).
( 4. 4. 10) ( 4. 4. 11) ( 4. 4. 12)
He r zeit a ndwit a r e v e c t o r s o f a r e a - s p e c ifi c c o v a r𝜓iait at er es , t h e s a ar ×w1 vn,e c t o r o f t im e - s p e c ifi c rHat is ndo m e f f e a s s u m e d t o b 𝜶et isk no a k no w r × nr m a t r ix . Th e dy na m ic ( o r s t a t e - s p a c e ) m o de l ( 4. 4 c a s(re= 1) w it Hht = 1 r e du c e s t o t h e w e l l - k no w n r a ndo m w a l k m m o de l s u f f e r s f r o m t w o m a jo r l im it a t io 𝜃̂it ansr :e( i)a sThs ue mdir ee dc t t o b e inde p e nde nt o v e r i.t im Th eis faos rs eu amc hp t io n is no t r e a l is t ic in t h o f r e p e a t e d s u r v e y s w it h o v e r l a p p ing s a m p l e s , s u c h LFS. ( ii) Ar e a - s p e c ifi c r a ndo m e f f e c t s a r e no t inc l u de d in t h e e x c e s s iv e s h r ink a g e o f s m a l l a r e a e s t im a t o r s s im il a r t o Da t t a , La h ir i, a nd Ma it i ( 2002) a nd Yo u ( 1999) u s e d t h e Ra o –Yu ’s 1) m o d l ink ing m o de l s ( 4. 4. 6) a nd ( 4. 4. 7) , b u t r e p l a c e d t h euitAR( b y a r a ndo m w a l k m o de l g iv𝜌 = e 1, n bt hy a( t4. uitis4.=,8)uit−1 w +it𝜀hit . Da t t a e t a l . ( 1999) c o ns ide r e d a s im il a r m o de l , b u t a dde d e x t r a t e r t o r e fle c t s e a s o na l v a r ia t io n in t h e ir a p p l ic a t io n. Pf e f f e r m a nn a nd Bu r c k ( 1990) p r o p o s e d a g e ne r a l m o de l i r a ndo m e f f e c t s . Th e ir m o de l is o f t h e f o r m 𝜃̂it = 𝜃it + eit ,
( 4. 4. 13)
𝜃it = zTit 𝜷 it ,
( 4. 4. 14)
EXTENSIONS: AREA LEVEL MODELS
85
w h e r e t h e c o𝜷 ite =f fi𝛽(it0c , ie … ,nt𝛽itp s )T a r e a l l o w e d t o v a r y c r o s s - s e c t hi aaer rereraao s rss u m e d t o b e a r e a s ) a nd o v e r t im e , a nd t h eeit fs oa rme pa lc ing a rv iaa t𝛽r itio vo eef r t im e is s e r ia l l y u nc o r r e l a t e d w it h m𝜓it .e Th a n e0 av nd iao nnc s p e c ifi e d b y t h e f o l l o w ing s t a t e - s p a c e m o de l : [ ] [ ] [ ] 𝛽itj 𝛽i,t−1,j 1 ( 4. 4. 15) = Tj + 𝑣 , j = 0, 1, … , p. 𝛽ij 𝛽ij 0 itj
He r e t 𝛽hij ’se a r e fi x e d c o eTjf is fi ca iek ntnos ,w× 2n m 2 a t r ix (0, w 1)it ah s t h e f o l r e er ar oc hri saa r ree ua nc o r r e l a t e d o v e r t s e c o nd r o w , a nd t h e m {𝑣itjo} de w it h m e a n 0 a nd c oEvm (𝑣a itjr𝑣ia e 𝑣j𝓁 s ; j, 𝓁 = 0, 1, … , p. it𝓁 )nc=𝜎 Th e f o r m u l a t io n ( 4. 4. 15) c o v e r s s e v e r a l u s eTfj = u l m o de [ ] 0 1 g iv e s t h e w e l l - k no w n r a ndo m - c o e 𝛽f itjfi =c 𝛽ie nt𝑣itjr e g r e s s ij + 0 1 ( Sw a m y 1971) . Th e f a m il ia r r a 𝛽ndo 𝛽i,t−1,j w +a 𝑣litjk is mo ob de t a line d b y itj =m [ ] 1 0 c h o o sTjing = . In t h is c a s e , t h e 𝛽cij oine( f4.fi4.c15) ie ntis r e du nda nt a nd 0 1 [ ] 𝜌 1−𝜌 s h o u l d b e o m it tTej = d s1.o Tht he a ct h T o j ic =e g iv e s t h e AR( 1) 0 1
m o de𝛽itjl − 𝛽ij = 𝜌(𝛽i,t−1,j − 𝛽ij ) +𝑣itj . Th e s t a t e s p a c e m o de l ( 4. 4. 15) i g e ne r a l , b u t t h e a s s u m p t io n o f s e r ia l l yeituinnc( 4.o 4.r r13) e l is a te ds a r e s t r ic t iv e in t h e c o nt e x t o f r e p e a t e d s u r v e y s w it h o v e r
Example 4.4.3. Median Income. Gh o s h , Na ng ia , a nd Kim ( 1996) a p p l ie m o de l ( 4. 4. 10) –( 4. 4. 12) t o e s t im a t e t h e m e dia n inc o m e o f f o t h e 50 Am e r ic a n s t a t e s a nd t h e Dis (m t r =ic51). t o Th f Co e lUS u m De bp ia a rtm e nt o f He a l t h a nd Hu m a n Se r v ic e s u s e s t h e s e e s t im a t e a s s is t a nc e p r o g r a m f o r l o w - inc o m e f a mT il= ie 9 ys .e Th a r es y u s f o= r1, y…e, 51). a r (1989 He r e ( 1981–1989) t o e s t 𝜃im iT , taht e m e dia n inc o m e s (i t a nd n inc o m e zit = 1,( zit1 )T w it zhit1 de no t ing t h e “a dju s t e d” c e ns u s m e dia a r ei,aw h ic h is o b t a ine d b y a dju s t ing t h e b a s e y e a r ( 1979) c e n t h e p r o p o r t io na l g r o w t h in PCI. Us ing t h e s a m e da t a , Da t t a e s t im a t e d m e dia n inc o m e o f f o u r - p e r s o n f a m il ie s a p p l y w it h a r a ndo m uwit . aSol km f eo rde t a il s o f t h is a p p l ic a t io n a r e g iv 8. 3. 1.
Example 4.4.4. Canadian Unemployment Rates. Yo u , Ra o , a nd Ga m b ino ( 2003) a p p l ie d t h e Ra o –Yu m o de l s ( 4. 4. 6) a nd ( 4. 4. 7) t o e s t im a t e m r a t e s f o r c it ie s w it h p o p u l a t io n o v e r 100, 000, c a l l e d Ce ns ( CMAs ) , a nd o t h e r u r b a n c e nt e r s , c a l l e d Ce ns u s Ag g l o m e r a Re l ia b l e e s t im a t e s a t t h e CMA a nd CA l e v e l s a r e u s e d b y t h e ( EI) p r o g r a m t o de t e r m ine t h e r u l e s u s e d t o a dm inis t e r t h o f u ne m p l o y m e nt r a t e s f r o m t h e Ca na dia n LFS a r e r e l ia b p r o v inc e s , b u t m a ny CMAs a nd CAs do no t h a v e a l a r g e e no u r e l ia b l e dir e c t e s t im a t e s f r o m t h e LFS.
86
SMALL AREA MODELS
EI b e ne fi c ia r y r a t e s w e r e u s zeit ,d in a st ha eu lxink il iaingr y mdaot de a , l ( 4. 4. 7) Bo t h AR( 1) a nd r a ndo m w a l k m o de l s o n t h e a r ueit ,a w- b eyr -e t im e c o ns ide r e d. So m e de t a il s o f t h is a p p l ic a t io n a r e g iv e n in Ex a
Example 4.4.5. US Unemployment Rates. Da t t a e t a l . ( 1999) a p p l ie d m o ( 4. 4. 6) a nd ( 4. 4. 7) w it h a r a ndouit ma ndws ae la ks of onar l v a r ia t io n t o e s t m o nt h l y u ne m p l o y m e nt r a t e s f o r 49 Am e r ic a n s t a t e s ( e Yo r k ) a nd t h e Dis t r ic t o(mf =Co 50). l u Th m eb s iae e s t im a t e s a r e u s e d b y f e de r a l a g e nc ie s f o r t h e a l l o c a t io n o f f u nds a nd p o l ic y f o r t h e p e r io d Ja nu a r y 1985–De c e m b e r 1988 a nd u s e 𝜃̂dit tahnde CPS e s t t h e u ne m p l o y m e nt ins u r a nc e ( UI) c l a im s r a t e ( p e r c e nt a c l a im ing UI b e ne fi t s a m o ng t h e t o t a l no na g r ic u l t u r a l e m da t azit, . Se a s o na l v a r ia t io n in m o nt h l y u ne m p l o y m e nt r a t e s int r o du c ing r a ndo m m o nt h a nd y e a r e f f e c t s int o t h e m o de a p p l ic a t io n a r e g iv e n in Ex a m p l e 10. 9. 2. 4.4.4
*Spatial Models
Th e b a s ic FH m o de l ( 4. 2. 5) a s s u m 𝑣i , be us iid a ee f af ep cp tlsic a t io ns it t ina sr oe m b e m o r e r e a l is t ic t o e nt e r t a in m o de l s t h a t 𝑣ia’sl .l oSpwa tciao l r r e l a m o de l s o n t h e a 𝑣r i ea ar ee fu f se ec dt w s h e n “ne ig h b o r ing ” a r e a s c a n b f o trh ee x a m p l e a c h ai. rSu e ac h m o de l s indu c e c o r r e 𝑣li ’s a t de io pnse anding m o , ng o n g e o g r a p h ic a l p r o x im it y in t h e c o nt e x t o f e s t im a t ing r a t e s . Cr e s s ie ( 1991) u s e d a s p a t ia l m o de l f o r s m a l l a r e a e US c e ns u s u nde r c o u nt . i, t h” ae rnea acs o ondit f aior enaa l a u t o r e g If Ai de no t e s a s e t o f “ne ig h b o r ing iv e ion tna h el dis t r s io n ( CAR) s p a t ia l m o de l a s s u m e s t h a t t h bei 𝑣ic, og ndit a r e a e f f e c t s f o r t h{𝑣𝓁e ∶o𝓁t ≠h i}, e risa gr eiva es n b y )
( ∑ bi 𝑣i |{𝑣𝓁 ∶ 𝓁 ≠ i} ∼ N
qi𝓁 b𝓁 𝑣𝓁 , b2i 𝜎𝑣2
𝜌
.
( 4. 4. 16)
𝓁∈Ai
2 He r {eqi𝓁 } a r e k no w n c o ns t a ntqi𝓁 s bs2𝓁 a= tqis < 𝓁), a nd𝜹 = 𝜌,( 𝜎𝑣2 )T is 𝓁i bfi y(i ing t h e u nk no w n p a r a m e t e r v e c t o r . Th e m o de l ( 4. 4. 16) im p l ie
B1∕2 v ∼ Nm (𝟎, 𝚪(𝛅) =𝜎𝑣2 (I − 𝜌Q)−1 B),
( 4. 4. 17)
n × m m a t r ix qw it 0h w h e ne v e r w h eBr=e dia g(b21 , … , b2m ) a ndQ = q(i𝓁 ) is a m i𝓁 = 𝓁 ∉ Ai ( inc l u ding qii = 0) a nd v = 𝑣(1 , … , 𝑣m )T ( s e e Be s a g 1974) . Us ing ( 4. 4. 17) i )T aopdep le. aNo r s t no ( 4. 2. 5) , w e o b t a in a s p a t ia l s m a l l 𝜹a=r e𝜌,(a𝜎𝑣2m e t nl h ina t e a r l y𝚪(𝜹). in In t h e g e o s t a t is t ic s l it e r a t u r e , c o v a r ia nc e𝚪(𝛿) s t r=u c t u r e s 𝚪(𝛿) =𝜎𝑣2 [𝛿1 I + 𝛿2 D(𝛿3 ) ]h a v e b e e n u s eD d, = w e(−di𝓁 h )e r e 𝜎𝑣2 (𝛿1 I + 𝛿2 D) a nd ( ii) d a ndD(𝛿3 ) = 𝛿3(i𝓁 ) a r m e × m m a t r ic e dsi𝓁wde itnoh t ing a “dis t a nc e ” ( no t ne c e s s
87
EXTENSIONS: AREA LEVEL MODELS
Eu c l ide a n) b e t w e e ni sa m nd𝓁.a No l l at er et ah sa t in c a s e ( i) , t 𝛿h1 ea nd p a ra m e a r lw y in h e r e a s in c𝛿2aa snd e𝛿3( aii)p , p e a r no nl ine 𝚪(𝜹). a r l y in 𝛿2 a p p e a r l ine𝚪(𝜹), Co ns ide r no w t h e b a s ic FH m obide= l1, (i 4. h a l t e r na = 2. 1, 5) …,w m. itAn ∶ 𝓁 ≠ i}, r e f e r r e d t o a s c o ndit io n t iv e a p p r o a c h t o a s s u m 𝑣i |{𝑣 ing a m o de l f o r 𝓁 a p p r o a c h ( Be s a g 1974) , is t o a s s u m e a m o de l f o r t h e jo in v = 𝑣(1 , … , 𝑣m )T b a s e d o n t h e s im u l t a ne o u s e qu a t io ns v = 𝜙Wv + u,
u ∼ N(𝟎, 𝜎u2 I).
( 4. 4. 18)
Mo de l ( 4. 4. 18) is r e f e r r e d t o a s s im u l t a ne o u s l y a u t o r e g r e l a r l yQ tino ( 4. 4. 17) , t h e m W in a t( r4.ix4. 18) de s c r ib e s t h e ne ig h b o r h o o o f t h e s m a l l 𝜙a rr ee pa rs e as nd e nt s t h e s t r e ng t h o f t h e s p a t ia l r e l a r a ndo m e f f e c t s a s s o c ia t e d w it h ne ig h b o r ing a r e a s a nd t h I − 𝜙W is no ns ing u l a r . Th e n, ( 4. 4. 18) is e qu iv a l e nt t o v = I(− 𝜙W)−1 u ∼ N(𝟎, G(𝛅) ,)
( 4. 4. 19)
( 𝜎u2 )T . w itG(𝛅) h =𝜎u2 [(I − 𝜙W) I( − 𝜙W)T ]−1 , f o𝛅r= 𝜙, A s im p l e c h W o ic is et ho ef s y m m e t r ic b ina r y W c̃ o=nt𝑤̃(igi𝓁 ),u in it y m a t r 𝑤i𝓁 = 0 o t h e r w is eA,i ⊆ w {1, h… e r, m} e is w h ic𝑤̃ i𝓁h = 1 if 𝓁 ≠ i a nd𝓁 ∈ Ai , a nd̃ a g a in t h e s e t o f ne ig h b i.o Ins r at er ea ad,s ifo of ne a r ce oa ns ide r s t h e r o w - s t ∑ ̃ , t h e 𝜙 n ∈ ( − 1, 1). In t h is c 𝜙 a cs ae n, b e 𝑤 m a tW r ix= 𝑤( i𝓁 ) w it𝑤hi𝓁 = 𝑤̃ i𝓁 ∕ m k=1 ik int e r p r e t e d a s a c o r r e l a t io n c o e f fi c ie nt a nd is c a l l e d s p a t ia t e r ( Ba ne r je e , Ca r l in a nd Ge l f a nd 2004) . Pe t r u c c i a nd Sa l v a t i ( 20 t ia l m o de l de fi ne d b y ( 4. 2. 5) a nd ( 4. 4. 18) t o e s t im a t e t h e a m e r e d t o s t r e a m s in t h e Ra t h b u n La k em W = 61a st ue rb s- hwe ad tine Io r swh ae ds b Pr a t e s i a nd Sa l v a t i ( 2008) u s e d t h e s a m e m o de l m t o= 43 e s t im a t e s u b - r e g io ns o f Tu s c a ny , na m e d Lo c a l Ec o no m ic Sy s t e m s Lif e Co ndit io ns Su r v e y f r o m Tu s c a ny . A dr a w b a c k o f t h e s p a t ia l m o de l s ( 4. 4. 17) o r ( 4. 4. 18) is t h t h e ne ig h b oAirah r oe odedsfi ne d. Th e r e f o r e , t h e s e m o de l s int r o du c it y ( Ma r s h a l l 1991) .
Example √ 4.4.6. US Census Undercount. Cr e s s ie ( 1991) e x t e nde d h is m o de l bi = 1∕ Ci f o r e s t im a t ing t h e US c e ns u s u nde r c o u nt ( Ex a m p ing s p a t ia l de p e nde nc e t h r o u g h t h e CAR m o de l ( 4. 4. 17) . By e a na l y s is , h e de fi ne d {√ C𝓁 ∕Ci qi𝓁 = 0
if di𝓁 ≤ 700 m il ,e is≠ 𝓁, o t h e r w is e ,
w h eCir ise t h e c e ns u s c oit hu nt s tin a tt hedi𝓁eaisndt h e dis t a nc e b e t w e o f g r a v it yit ho fa t𝓁t nd h he s t a t e s ( s m a l l a r e a s )qi𝓁. ,Otnoh te ne r cc he os dis t a nc e - b a s e d, m a y b e c h o s e n. Cr e s s ie ( 1991) no t e g r a p h e r ’s m a p o f t h e s m a l l a r e a s m a y b e qu it e
e nt h e c sica er sil yo f dth a t th dif f e r e n
88
SMALL AREA MODELS
Fo r e x a m p l e , fo r th e p u rp o c it ie s s u c h a s So m e de t a il s o 4.4.5
Ne w Yo r k Cit y a nd t h e r e s t o f Ne w Yo r k St a t e s e o f u nde r c o u nt e s t im a t io n a nd it is m o r e r e De t r o it , Ch ic a g o , a nd Lo s Ang e l e s a s “ne ig h b f t h is a p p l ic a t io n a r e g iv e n in Ex a m p l e 8. 4. 1.
Two-Fold Subarea Level Models
W e no w c o ns ide r t hi isa ts eu ab cdiv h aider N edi int as uo b a r e a s , a nd t h e p a r a m e e t he ae ns r w it h t h e s𝜃iju(jb=a1,r… e ,aNim , i e= a ns int e r e s t a r e t h e a𝜃i rt eo ag m 1, … , m). A l ink ing m o de l in t h is c a s 𝜃eij =is zTijg𝜷 iv n ubij ,y w h ezijr e + 𝑣ei + is ap × 1 v e c t o r o f s u b a r e a l e v e l ma > u p) x il ,𝜷 iais r tyh pve×a1 rviae bc lt eo sr ( iid 2 o f r e g r e s s io n p a r a m e t e 𝑣ri s∼ aN(0, nd a𝜎𝑣r) ea ar ee finde f e cp tes nde nt o f s u b a r iid e f f e ucij t ∼s N(0, 𝜎u2 ). W e a s s u nmi s eu bt ha ar te a s a r e s a m Nips lue bd fa rr oe m a sth e ̂ in a r ei.a Fu r t h e r 𝜃mij iso ra e dir , e c t e s t im a t o r o f t h𝜃ije, bs au sb ea dr oe na am e a u bTha er es aa m p l ing m o de l s a m p lnije uonit f s s e l e c t eNijduf rnito smw it h in s ij. ind ̂ g iv e n𝜃bij = y 𝜃ij + eij , w h eij |𝜃 r eij ∼ N(0, 𝜓ij ), w it h k no 𝜓ij .wNon w c o m b ining t h e s a m p l ing m o de l w it h t h e l ink ing m o de l l e a ds t o t h e t w o - f 𝜃̂ij = zTij 𝜷 + 𝑣i + uij + eij ,
j = 1, … , ni , i = 1, … , m,
( 4. 4. 20)
s e e To r a b i a nd Ra o ( 2014) . No t e t h a t ti hise gmiv ee an n𝜃bi o=y f a r e a ∑Ni ∑Ni N 𝜃 ∕Ni⋅ , w h N e i⋅r = e j=1 Nij is t h e p o p u l a t io n ci. oMo u nt de inl a( 4. r e4.a20) j=1 ij ij e na b l e s u s t o e s t im a t e 𝜃bi ao nd t hs ua br ea ar ema 𝜃eijm(ja = ns e 1, a… ns , Ni , i = 1, … , m), b y b o r r o w ing s t r e ng t h f r o m r e l a t e d a r e a s a nd s u b Fu l l e r a nd Go y e ne c h e ( 1998) p r o p o s e d a s u b a r e a m o de l s c o nt e x t o f SAIPE ( s e e Ex a m p l e 4. 2. 2) . In t h is a p p l ic a t io n, c w h ic h is ne s t e d w it h in a s t a t e ( a r e a ) a nd dir e c t c o u nt y e s t im t h e CPS da t a . Co u nt y l e v e l a u x il ia r y v a r ia b l e s w e r e o b t a in is t r a t iv e r e c o r ds , a s no t e d in Ex a m p l e 4. 2. 2.
4.5
EXTENSIONS: UNIT LEVEL MODELS
W e no w c o ns ide r v a r io u s e x t e ns io ns o f t h e b a s ic u nit l e v e 4.5.1
Multivariate Nested Error Regression Model
As in Se c t io n 4. 3, w e a s s u m e t h a t u nit - sxijp aer ce ifia cv aa ilu axbill ia e rf yo rd a l l t h e p o p u l a t io j inn ee al ec m h se m nt sai. l W l a re e fau r t h e r a s sr u× m 1 e th a t l ae trxeije dtsht tor, o u g h a m u l t iv a r ia t e ne s t v e c t o r o f v a r ia b l yeij ,s iso rf eint r e g r e s s io n m o de l ( Fu l l e r a nd Ha r t e r 1987) : yij = Bxij + vi + eij ,
j = 1, … , Ni , i = 1, … , m.
( 4. 5. 1)
89
EXTENSIONS: UNIT LEVEL MODELS
He r B e is r × p m a t r ix o f r e g r e s s io n cr ×o 1e vf efi cc tieo nt r ss ,o tfh ae rvei a e f f e c a r e a s s u m e d t o b e iid 𝟎 a wnd itc ho vm a er aia nnc e𝚺𝑣 ,ma andt tr hix r× e1v e c to rs e ac no v a r ia nc e𝚺e m a nd a inde t r ix p e nde vnti . oIn f o f e r reijo ar rs e iid w it h 𝟎ma nd a ddit io n, no r m a l viti ’sy ao nd feijt’sh ise o f t e n a s s u m e d. Th e m o de l ( 4. 5. 1) e x t e ns io n o f t h e ( u niv a r ia t e ) ne s t e d e r r o r rkeij = g 1. r e s s io n m o d ∑Ni −1 Th e t a r g e t p a r a m e t e r s a r e t h eYi v= eNic t oj=1r ysij , owf ha rice ha cma en a ns
b e a p p r o x im 𝝁i =a BX t e i d+bviyif t h e p o p u l aNit isio lnas rize g e . In t h e l a t t e r c a it f o l l o w s t h a t t h e Yei iss teim qu aivt aio l ne ontf t o t h e e s t im a t io n o f a l ine t h ev m t io n oB fa nd t h e r e a l iza t io n o f t h e vir. aAsndoin m e c ut ol triv a r ia t e FH m o t h e u nit l e v e l m o de l ( 4. 5. 1) c a n l e a d t o m o r e e f fi c ie nt e s t im t h pe ou ne nivnta sr oia ft e m o de l o f t h e c o r r e l a t io ns b e t w e eyijn, tuh nle ikc oe m
Example 4.5.1. County Crop Areas. In Ex a m p l e 4. 3. 1, w e cyijo=u l d t a k e (yij1 , yij2 )T w it yhij1 = nu m b e r o f h e c t a r eyij2 s = o nu f cmo rb near ndo f h e c t a r e s s o y b e a ns , a nd r e t axijin=t h(1,exij1s, axij2m)T w e itxhij1 = nu m b e r o f p ix e l s c l a nu m b e r o f p ix e l s c l a s s ifi e d a s s o y b e a ns . By fi e d a s c o rxij2 n a= nd o f t h e c o r r e l a t yioij1 na bndeyij2t ,wa en im e n p r o v e d e s𝝁ti im = (𝜇ai1 ,t 𝜇oi2r)T oc fa n b e o b t a ine d t h r o u g h t h e m u l t iv a r ia t e ne s t e d e r r o r m o de s im u l a t io n s t u dy b a s e d o n t h is da t a a r e g iv e n in Ex a m p l e 8. 4.5.2
Two-Fold Nested Error Regression Model
Su p p o s e tith ha st m t h ae l l a r e aMi cp or im nt a ainsr y u nit s ( o r c l u s tjtehr s ) , a nd t b u nit s ( e l e m (yeij𝓁 nt , xij𝓁 s )) . Le t p r im a r y u nit ( c l uit sh t ea rr )e in a tch oN e ijntsauins b e t yh aendx - v a l u e s 𝓁t f oh r et hl ee m e nt jt hin pt hr eim a r y u nitit hf r ao r(𝓁 me = at h e 1, … , Nij , j = 1, … , Mi , i = 1, … , m). Unde r t h is p o p u l a t io n s t r u c t u r e , it is f i in e a c h p r a c t ic e t o e m p l o y t w o - s t a g e c l u s t e r s a m si ,p ol m ing c l u s t e r s is s e l e c itt eh daf rr eo am. Frt hojt ehm s ta hme p l e d c l u s t e rs,ij ,a s u b s a y at hndxe av sa sl uo ec sia at er ed o b s e r v e d. o fnij e l e m e nt s is s e l e c t e d a nd Th e f o r e g o ing p o p u l a t io n s t r u c t u r e is r e fle c t e d b y t h e t s io n m o de l ( St u k e l a nd Ra o 1999) g iv e n b y yij𝓁 = xTij𝓁 𝛽 + 𝑣i + uij + eij𝓁 𝓁 = 1, … , Nij , j = 1, … , Mi , . i = 1, … , m.
( 4. 5. 2)
He r e t h e a r e {𝑣 a i }, e ft fhe ec ct sl u s t e {u r ije},f af endct ths e r e s idu {e a ij𝓁 l e} w r r oit hr s e sa s s u m e d t o b e m u t u a l l y in eij𝓁 = kij𝓁 ẽ ij𝓁 , f o r k no w n c okij𝓁ns, ta ar nt iid
iid
iid
Fu r t h e r m 𝑣i ∼ o (0, r e𝜎,𝑣2 ), uij ∼ (0, 𝜎u2 ), a ndẽ ij𝓁 ∼ (0, 𝜎e2 ); no r m a l it y o f t h e r a ndo ̃ ij𝓁 is a l s o o f t e n a s s u m e d. W e c o ns ide r t h a c o m p o 𝑣ne i , unt ij , sa nde h o l ds a l s o f o r t h e s a m p l e v a l u e s , w h ic h is t r u e u nde r c l u s t e r s a nd s u b u nit s w it h in s a m p l e d c l u s t e r s o r , m o r e g t ho er m s e al et io c tnio n o f t h e s a m p l e . Da t t t h a t u s e t h e a u x il ia xrij𝓁y ininf ( 1991) u s e d t h e m o de l ( 4. 5. 2) f o r t h e s p e c ia l c a s e o f c l u s t
90
SMALL AREA MODELS
is , w xitij𝓁h = xij f o 𝓁r = 1, … , Nij . Gh o s h a nd La h ir i ( 1998) s t u die d t h e c a s a u x il ia r y inf o r m a t xioTij𝓁 𝜷n, =in𝛽wf oh r icai, hj,l l a nd𝓁 . Th e p a r a m e t e r s o f int e r e s t a r e t h e s m a l l a r e a m e a ns ,
Yi =
Nij ⎡ ⎤ ∑∑ ∑∑ 1 ⎢∑ ∑ yij𝓁 + yij𝓁 + yij𝓁 ⎥ , ⎥ Ni ⎢ j∈s 𝓁∈s j∈si 𝓁∈rij j∈ri 𝓁=1 ⎣ i ij ⎦
( 4. 5. 3)
w h eri rae ndrij de no t e , r e s p e c t iv e l y , t h e s e t o f no insa andmt hpe l e d c l u ∑Ni s e t o f no ns a m p l e d s u bij.u Ifnitt sh in e cp lou ps ut el ra t io ni, sNiize f Na ijr, e a = o j=1 is l a r g e ,Y it m h eany b e a p p r o x im a t e d a s T
Y i ≈ Xi 𝜷 + 𝑣i ,
( 4. 5. 4)
T
no t ing t Y h i a= tXi 𝜷 + 𝑣i + U i + Ei w it U h i ≈ 0 a ndEi ≈ 0, w h eU ir ae ndEi a r e t h e a r e a m uije aa nde ns ij𝓁o af ndXi is t h e k no w n m exij𝓁 a ’sn o. Itf ft ho el l o w s f r o m ( 4. 5. 4) t h a t t h e e s Yt iim t ioivn ao lf e nt t o t h e e s t im a t io n o f a l ine a r is ea qu o f𝜷 a nd t h e r e a l iza t io n o f t h e r𝑣ai . ndo m v a r ia b l e 4.5.3
Two-Level Model
Th e b a s ic u nit l e v e l m o de l ( 4.𝛽13.m1) awy itbhe int e xe rp cr ee ps st e d a s a m o + 𝑣 a nd c o m m o n𝛽2s, … l o, 𝛽pp ,e t sh a t is , r a ndo m a r e a - s p e c 𝛽ifi1i = c 𝛽int e r c e p t s 1 i yij = 𝛽1i + 𝛽2 xij2 + · · · +𝛽p xijp + eij . Th is m o de l f o r m s u g g e s t s a m o r e g t h a t a l l o w s b e t w e e n- a r e a v a r ia t io n in t h e s l o p e s b e y o nd det le r m s o f w e c o ns ide r r a ndo m 𝜷- ic=o(𝛽ei1 f, … fi ,c𝛽ipie)Tnta snd t h e n m 𝜷oi in ̃ a r e a l e v e l c Zoi ,v wa rh iaict he sl e a ds t o t h e t w o - l e v e l s m a l l a r e a Ho l t 1999) g iv e n b y yij = xTij 𝜷 i + eij ,
j = 1, … , Ni , i = 1, … , m
( 4. 5. 5)
w it h 𝜷 i = Z̃ i 𝜶 + vi ,
( 4. 5. 6) iid
w h eZ̃ ri is e ap × q m a t r𝜶ixis , aq × 1 v e c t o r o f r e g r e s s io vi n∼p(𝟎,a Σr 𝑣a) m e t e r s iid 2 a ndeij = kij ẽ ij w itẽhij ∼ (0, 𝜎e ). W e m a y e x p r e s s ( 4. 5. 5) in a m a t r ix f o yPi = XPi 𝜷 i + ePi .
( 4. 5. 7)
Th e t w o - l e v e l m o de l g iv e n b y ( 4. 5. 6) –( 4. 5. 7) e f f e c t iv e l y i l e v e l a nd a r e a l e v e l c o v a r ia t e s int o a s ing l e m o de l : yPi = XPi Z̃ i 𝜶 + XPi vi + ePi .
( 4. 5. 8)
91
EXTENSIONS: UNIT LEVEL MODELS
Fu Th m th
r t h e r m o r e , t h e u s e o𝜷 if, rp ae ndo r m mit s s gl or ep ae ts e, r fle x ib il it y in m e s a m p l e{(yvij ,axijl)u;j = e s1, … , ni , i = 1, … , m} a r e a l s o a s s u m e d t o o b e o de l ( 4. 5. 8) , t h a t is , t h e r e is no s a Nmi isp l lae r sg ee l, ewc teioc nabn ia e xs .p Ifr e m Yei ua nde n r ( 4. 5. 8) a s T
T
𝜇i = Xi Z̃ i 𝜶 + Xi vi .
( 4. 5. 9)
It f o l l o w s f r o m ( 4. 5. 9) t h a tY it is h ea ep sp t rim t io an toe fl y e qu iv a l e n o xa im e s t im a t io n o f a l ine a r c 𝛂o amnd bt hinae trioe na ol iza f t io n o f t h e vri a ndo m v w it h u nk no w n c o v a 𝚺 r 𝑣ia. nc e m a t r ix Th e m o de l ( 4. 5. 8) is a s p e c ia l c a s e o f a g e ne r a l l ine a r m s iv e l y f o r l o ng it u dina l da t a ( La ir d a nd W a r e 1982) . Th is m o de l c e Xs P1i a ndXP2i t o b e a s s o c ia𝜶 ta endv dw t hh e f o r m i in it yPi = XP1i 𝜶 + XP2i vi + ePi ,
i = 1, … , m.
( 4. 5. 10)
Th e c h oXP1iic=eXPi Z̃ i a ndXP2i = XPi g iv e s t h e t w o - l e v e l m o de l ( 4. 5. 8) . m o de l ( 4. 5. 10) c o v e r s m a ny o f t h e s m a l l a r e a m o de l s c o 4.5.4
General Linear Mixed Model
Da t t a a nd Gh o s h ( 1991) c o ns ide r e d a g e ne r a l l ine a r m ix e d m o t h a t c o v e r s t h e u niv a r ia t e u nit l e v e l m o de l s a s s p e c ia l c yP = XP 𝜷 + ZP v + eP .
( 4. 5. 11)
He r eP a ndv a r e inde p e nde nteP, ∼wN(𝟎, it h𝜎 2 𝚿P ) a ndv ∼ N(𝟎, 𝜎 2 D(𝛌) ,) w h e r e P p . d.is) m a pa .t d. r ixm, a nd t r ix , w h ic h 𝚿 is a k no w n p o s it iv e de fi nit e ( D(𝛌) s t r u c t u r a l l y k no w n e x c e p 𝛌 t tf yo pr sicoa ml l ye inv p a or al vming e t rear tsio s o f v c o m p o ne nt s o 𝜎f k2 ∕𝜎 t h 2 .e Fu f or rt m h e rX mP ao ndZ r e P, a r e k no w n de s ig n m a t r ic P a ndy is t hNe× 1 v e c t o r o f p oy- pv ua ll au teios n. Sim il a r t o ( 4. 3. 4) , w e c a n p a r t it io n ( 4. 5. 11) a s [ ] [ ] [ ] [ ] X Z e y = 𝜷+ v+ , y = Xr Zr er yr P
( 4. 5. 12)
w h e r e t h e sr de u bnos ct er sip no t ns a m p l e⊕ddeu no nit st e. Le t h te dir e c t s u m o f m c e s , t h a t is , f o Br1 ,m … ,aBtmr, ic w ee s h ⊕ am v B e = b l o c k (B dia g , Bm ). Th e n, i 1, … i=1 T th e v e c to r o f Y a= r e(Ya1 , … t o, Yt ma)Tl sh a s t h e Ay f o+rCy m r w it Ah = ⊕m i=1 ni m T a ndC = ⊕i=1 N −n . i i Da t t a a nd Gh o s h ( 1991) c o ns ide r e d a c r o s s - c l a s s ifi c a t io n m b y t h e g e ne r a l m o de l ( 4. 5. 11) b u t no t b y t h e “l o ng it u dina l ” m t h e u nit s in a s m a l l a r e a a C r es uc bl ag s rsoifiu ep ds int( eo . g . , a g e , s o c io
92
SMALL AREA MODELS
c l a s s ) l a b j e=l1,e … d b, Cya nd t h e a r e a - b y - s u b gNijr ao r ue pk cnoe w l l n. s ize Th se c r o s s - c l a s s ifi c a t io n m o de l is g iv e n b y yij𝓁 = xTij𝓁 𝜷+𝑣i + aj + uij + eij𝓁 , 𝓁 = 1, … , Nij , j = 1, … , C, i = 1, … , m,
( 4. 5. 13)
w h e r e a l l r a ndo {𝑣im e r{umij },s a nd{eij𝓁 } a r e m u t u a l l y inde p e nde nt , }, {atj }, iid
iid
iid
iid
is f y ing eij𝓁 ∼ N(0, 𝜎 2 ), 𝑣i ∼ N(0, λ1 𝜎 2 ), aj ∼ N(0, λ2 𝜎 2 ), a nduij ∼ N(0, λ3 𝜎 2 ) f o r 2 λk = 𝜎k ∕𝜎 2 , k = 1, 2, 3.
4.6
GENERALIZED LINEAR MIXED MODELS
W e no w c o ns ide r g e ne r a l ize d l ine a r m ix e d m o de l s t h a t a r e e a nd c o uy- nt v a l u e s , o f t e n e nc o u nt e r e d in p r a c t ic e . 4.6.1
Logistic Mixed Models
Su p p oyijs ise b ina r y , t hyija=t 0iso , r 1, a nd t h e p a r a m e t e r s o f int e r e ∑Ni s m a l l a r e a p rY io=pPoi =r t io ns y ∕Ni , i = 1, … , m. Ma c Gib b o n a nd To m b e r j=1 ij l in ( 1989) u s e d a l o g is t ic r e g r e s s io n m o de l w it h r a ndo m a r e c ontndit Beior no na lu ol l ni t h e m a Pt ie. Th eyij ’s a r e a s s u m e d t o b e inde p (peij )nde pij ’s . Th e n, ptijh’s ea r e a s s u m e d t o o b e y t h e f o l l o w ing l ink ing m a r e a e f𝑣fi :e c t s ( ) pij ( 4. 6. 1) = xTij 𝜷 + 𝑣i , l o g(pitij ) =l o g 1 − pij iid
w h e𝑣ir ∼ e N(0, 𝜎𝑣2 ) a nd t hxeij a r e u nit)- s p e c ifi c c o v a r ia t e s . Th e m o (∑ ∑ yij + j∈ri ̂ pij ∕Ni , w h ̂ peij rise o b t a ine d f r o m ( 4. 6. 1) m a t o Pri iso of f t h e f oj∈sr im u sn ing b y e s t im 𝜷 aa tnd ingt h e r e a l iza𝑣i ,t io o f t h e e m p ir ic a l Ba y e s o r e m p ( EB) o r HB m e t h o ds . Ma l e c e t a l . ( 1997) c o ns ide r e d a dif f e r e nt l o g is t ic r e g r e s s io r e g r e s s io n c o e f fi c ie nt s . Su p pi ao rs ee gt hr oe uu pnite sd int in ao r ce la a s s e s ind b yh. Th e c o yuihjnt(j s= 1, … , Nih ) in t h e c(i,eh)l al r e inde p e nde nt Be r no u l l i v a e rth e rm o re , th e c a b l e s w it h c o ns t a nt pih ,p cr o bndit a io b ilnait ly opihn’st .h Fu p r o b a b pilih ita ie r es a s s u m e d t o o b e y t h e f o l l o w ing r a ndo m - s l o m o de l : ( 4. 6. 2) 𝜃ih = l o g(pitih ) =xTh 𝜷 i , w h e re 𝜷 i = Zi 𝛂 + vi ,
( 4. 6. 3)
GENERALIZED LINEAR MIXED MODELS
93
iid
w it vhi ∼ N(𝟎, 𝚺𝑣 ), w h eZir ise ap × q m a t r ix o f a r e a l e v e lxhc iso va a r ia t e s c l a s s - s p e c ifi c c o v a r ia t e v e c t o r .
Example 4.6.1. Visits to Physicians. Ma l e c e t a l . ( 1997) a p p l ie d t h e m o d b y ( 4. 6. 2) a nd ( 4. 6. 3) t o e s t im a t e t h e p r o p o r t io n o f p e r s o ns t h e p a s t y e a r in t h e 50 s t a t e s a nd t h e Dis t r ic t o f Co l u m b ia a g r a p h ic c l a s s e s , u s ing da t a f r o m t h e US Na t io na l He a l t h I He r ie de no t e s a c o uh nt a yde amndo g r a p hxh icis ca l va es sc ;t o r o f c o v a r ia t e c h a r a c t e r ize s t h e de mh. oSog m r a ep hdeict acill sa so sf t h is a p p l ic a t io n a Ex a m p l e 10. 13. 3. 4.6.2
*Models for Multinomial Counts
W e no w c o ns ide r t h e c a Ks ec ao t fe gc o(yurij1ient ss ,, yin ,… ijK ), w h ic h f o l l o w a m u l t ino m ia l dis t r ib u m t io n w it h s ize ≥ 1 a nd p r o b a b (p ilij1it, … ie ,spijK ) w it h ij ∑K b il f oit ie l l os w t h e l o g is t ic m ix e d m o de l k=1 pijk = 1. Th e p r o b apijk l o (p g ijk ∕pijK ) =xTij 𝜷 k + 𝑣ik ,
k = 1, … , K − 1, j = 1, … , ni , i = 1, … , m, ( 4. 6. 4)
w h e r e t h e v e c t o r s o f r a ndo i, (𝑣m e , f𝑣i,K−1 f e c)Tt, sa fr oe rN a r (𝟎, e a𝚺𝑣 ). iid i1 , … K−1 ∑ h ayijk t = 1. Fu r In t h e s p e c ia lmijc = a 1, s eyijko∈f {0, 1}, k = 1, … , K, s u c h t Kk=1 t h e r m o r e K, = l e2,t twinge o b t a in t h e l o g is t ic m ix e d m o de l ( 4. 6. 1) .
Example 4.6.2. Labor Force Estimates. Mo l ina , Sa e i, a nd Lo m b a r día ( 2007) e m a t e d l a b o r f o r c e c h a r a c t e r is t ic s in UK s m a l𝑣lik a=r 𝑣ei a s u nde r f o r t h e s a m p l e c o u nt s o f u ne m p l o y e d, e mK p= l3)o yin e d, a nd n g e nde r –a g ej wc litah s ins ai.r Th e a e c o v axijr aiar tee tsh e l o g - p r o p o r t io n o f r e u ne m p l o y e d a nd 22 du m m y indic a t o r s inc l u ding g e nde r –a g e c o v e r ing t h e s m a l l a r e a s . Lóp e z- Vizc a íno , Lo m b a r día , a nd 𝑣ik , m e f f e c e r e d t h e m o de l ( n4.i = 6. 14)f ow r itai hal lnd w it h inde p e nde nt r a ndo . Lóp e zVizc a íno , Lo m b a r día , a nd Mo k = 1, … , K − 1, t h a t is , w it h 𝚺dia g o na l 𝑣 ( 2014) e x t e nde d t h is m o de l b y inc l u ding , inde p e knde a nd nt al yr ef ao r e a c i, t im e - c o r r e l a t e d r a ndo m e f f e c t s f o l l o w ing a n AR( 1) p r o c 4.6.3
Models for Mortality and Disease Rates
Mo r t a l it y a nd dis e a s e r a t e s f o r s m a l l a r e a s in a r e g io n a r dis e a s e m a p s s u c h a s c a nc e r a t l a s e s . Su c h m a p s a r e u s e a b il it y o f a dis e a s e a nd t h e n ide nt if y h ig h - r a t e a r e a s w a r r a a r e a m o de l is o b t a ine d b y a s s u m ing t h ayi ,t at hr ee inde o b ps ee rnde v entd a r iid nd te ha𝜆ani t∼ g a m (𝛼, m 𝜈).a Po is s o n v a r ia b l e s w it h c E(y o ndit i |λi ) io=nna i λi la m o, as𝜈)nd ea dr ine t thh ee He r λei a ndni a r e t h e t r u e r a t e a nd nu m bit eh r aer xe pa (𝛼, s c a l e a nd s h a p e p a r a m e t e r s o f t h e g a m m a dis t r ib u t io n.
94
SMALL AREA MODELS
e s t im a t λei as roe f o b t a ine d u s ing EB o r HB m e t h o ds ( Cl a y t o n a nd Da t t a , Gh o s h , a nd W a l l e r 2000) . CAR s p a t ia l m o de l s o f t h e r a t 𝜃e i s= l o (𝜆 g i ) h a v e a l s o b e e n p r o p o s e d ( Cl a y t o n a nd Ka l do r 198 λi c a n b e e x t e nde d t o inc o r p o r a t e zai , rfeo ar lee xv ae𝜃m zoTi l𝜷ve+a, 𝑣ri ia t e s il =cp iid
w it𝑣hi ∼ N(0, 𝜎𝑣2 ). Na ndr a m , Se dr a ns k , a nd Pic k l e ( 1999) s t u die d r e g r e w it h r a ndo m s l o p e s f o r t h e a𝜃ijg =e l -os(𝜆 c ifih cje de gpij ),e w rl oenog t re as t ae gs e . Jo int m o r t a l it(yy1i , yr2ia) ct ea sn a l s o b e m o de l e d b y(y1ia, ys2is) ua rme ing t h a c o ndit io na l l y inde p e nde nt Po is s oE(y n v1i |λ a 1ir) ia=nb1ilλe1i as nd wE(y it 2ih|𝜆2i ) = n2i 𝜆2i . In a ddit io 𝜽n, g 1i ), l o (𝜆 g 2i ) T) a r e a s s u m e d Nt 2o( 𝝁,b𝚺e). iid Th is i = (l o (𝜆 m ix t u r e m o de l indu c e s de p ye1i nde e ba er gt winae lel yn . As a n e x a m a ndync 2i m o f t h is b iv a r ia yt1iea m de no l , t e t h e nu m b e r o f de a t h s du e t o c a ndy2io de 1 a nd 2 a nd (n1i , n2i ) t h e p o p u l a t io n a t r is k a t s i. it eDes So 1 a undza2 in( a1992) re a s h o w e d t h a t t h e b iv a r ia t e m o de l l e a ds t o im( λ1ip, λr2io) v e d e s t c o m p a r e d t o e s t im a t e s b a s e d o n s e p a r a t e u niv a r ia t e m o
Example 4.6.3. Lip Cancer. Ma it i ( 1998) m o 𝜃de N (𝜇, 𝜎 2 ). He i =l leo d(gλi ) a s iid a l s o c o ns ide r e d a CAR s p a t ia𝜃i l’s mt hoa de t rl eo l na tt𝜃hieteso ea a sceht o f ne ig h b o r h o o d a r e i.a sHeo de f avr ee lao p e d m o de l - b a s e d e s t im a t e s o f l ip in Sc o t l a nd f o r m e a= c56hc oo fu nt ie s . So m e de t a il s o f t h is a p p l ic a t i Ex a m p l e s 9. 6. 1 a nd 10. 11. 1.
4.6.4 Gh Co a s to
Natural Exponential Family Models
o s h e t a l . ( 1998) p r o p o s e d g e ne r a l ize d l ine a r m o de l s w i ndit io na l o n𝜃ijt’sh ,e t h e s a m p l yeij (jv=a1,l … u e, nsi , i = 1, … , m) a r e s u m e d t o b e inde p e nde nt l y dis t r ib u t e d w it h p r o b a b il it y d t h e na t u r a l e x p o ne nt ia l f a m il y w𝜃ijit: h c a no nic a l p a r a m e { } ( )] 1 [ f (yij |𝜃ij ) =e x p 𝜃ij yij − a 𝜃ij + b(yij , 𝜙ij ) 𝜙ij
( 4. 6. 5)
f o r k no w n s c a l e 𝜙 p ija(>r a0)ma nd e t se pr es c ifi e d f ua(⋅) nc at io ndb(⋅). ns Th e e x p o ne nt ia l f a m il y ( 4. 6. 5) c o v e r s w e l l - k no w n dis t r ib u t io b ino m ia l , a nd Po is s o n dis t r ib u t io ns . Fo yij ris e bx ino a mm(npijia , lpijel ),, w h e n w e h a𝜃ijv=el o g(pitij ) a nd𝜙ij = 1. Sim il a r l y , yijw ish Po e nis s( λoij ),n w e h a v e 𝜃ij = l o (gλij ) a nd𝜙ij = 1. Fu r t h e r m 𝜃oij ’sr ea , rt eh m e o de l e d a s 𝜃ij = xijT 𝜷 + 𝑣i + uij , iid
( 4. 6. 6) iid
w h e𝑣ri ea nduij a r e m u t u a l l y inde 𝑣pi ∼ e nde N(0,nt 𝜎𝑣2 )oa fnduij ∼ N(0, 𝜎u2 ). Gh o s h e t a l . ( 1999) e x t e nde d t h e l ink ing m o de l ( 5. 6. 5) t o h a n a p p l ie d t h e m o de l t o dis e a s e m a p p ing .
95
GENERALIZED LINEAR MIXED MODELS
4.6.5
*Semi-parametric Mixed Models
Only Moments Specified Se m i- p a r a m e t r ic m o de l s b a s e d o nl y o n t o f t h e fi r s t t w o m o m e ntysij co of ndit t h io e rnae ls po on tns h e sa 𝜇r i ea andm e a ns o f t h𝜇i ’s e h a v e a l s o b e e n p r o p o s e d. In t h e a b s e nc e o f c o v a ( 1987) a s s u m e d t h e f o l l o w ing u nit l e v e i,l m c o ondit de io l : (nai) l Fo o nr t eh ae c h iid 𝜃i ’s , t hyije’s a r e iid w it h 𝜃ima nd e av na r ia𝜇nc (𝜃 e ), de no t y e |𝜃 d ∼ (𝜃 , 𝜇 (𝜃 ) , ) 2 i ij i i 2 i iid
j = 1, … , Ni , i = 1, … , m; ( ii)𝜃i ∼ (𝜇, 𝜎𝑣2 ); ( iii) 0< 𝜎e2 = E𝜇2 (𝜃i ) < ∞. Ra g h u na t h a n ( 1993) inc o r p o r a t e d a r e a l e v zei al sc of ov la- r ia t e iid e is a k no w n p o s it iv e f u nc t io l o w s : y(ij |𝜃 i)i ∼ (𝜃i , b1 (𝜙, 𝜃i , aij ) ) w h e br1 (⋅) a “dis p e r s io n” p a 𝜙, r a smm e at le l r a r e a𝜃i , ma nd e ak nno w n c o ans ij ; t( aii)nt ind
𝜃i ∼ (𝜏i = h(zTi 𝜷), b2 (𝜓, 𝜏i , ai ) )w h eh(⋅) r e is a k no w n f u nc bt2io (⋅) nis a andk no w n p o s it iv e f u nc t io n o f a “dis p e𝜓, r st io h en”m p 𝜏ai e, raaand nma ek tno e rw n c aoi . ns t a nt Th e “l o ng it u dina l ” m o de l ( 4. 5. 10) w it h u nit l e v e l c o v a r ia t e l e t t ing ( 4. 6. 7) E(yij |vi ) =𝜇ij , V(yij |vi ) =𝜙 b(𝜇ij ) a nd h(𝜇ij ) =xTij1 𝜷 + xTij2 vi ,
iid
vi ∼ (𝟎, 𝚺𝑣 );
( 4. 6. 8)
𝟎 aundt ec do wv ait hr iamnc ee a n t h a t visi a, r e inde p e nde nt a nd ide nt ic a l l y dis t r ib m a t𝚺 r 𝑣ix( Br e s l o w a nd Cl a y t o n 1993) . Example 4.6.4. Hospital Admissions. Ra g h u na t h a n ( 1993) o b t a ine d m o de e s t im a t e s o f t h e m e a n nu m b e r o f h o s p it a l a dm is s io ns f 1, 000 indiv idu a l s in e a c h c o u nt y f r o m t h e s t a t e o f W a s h ing dis c h a r g e da t a . Th e m o de l c o ns ide r e d f o r t h o s e da t a h a s t E(yij |𝜃i ) =𝜃i ,
V(yij |𝜃i ) =𝜃i + 𝜙 𝜃i2
a nd E(𝜃i ) =𝛽,
V(𝜃i ) =𝜓,
s io c onsu fnt i,o yrr e indiv s t r ic idut ing a l w h eyijr is e t h e nu m b e r o f c a nc e r a dm isj in t o indiv idu a l s a g e d 18 y e a r s o r o l de r . Spline Models Th e a s s u m p t io n o f a s p e c ifi e d p a r a m t h e m e a n f u nc t io n in t h e b a s ic u nit l e v e l m o de l ( a s s u m p t io n o f a s e m i- p a r a m e t r ic r e g r e s s io r e g r e s s io n m e a n m o de l b y a p e na l ize d s p l ine m m o de l . Op s o m e r e t a l . ( 2008) s t u die d t h e e s t im P- s p l ine u nit l e v e l m o de l s .
e t r ic l ine a 4. 3. 1) m a y n. Ap p r o x o de l r e s u a t io n o f s m
5 EMPIRICAL BEST LINEAR UNBIASED PREDICTION (EBLUP): THEORY
5.1
INTRODUCTION
In Chapter 4, we presented several small area models that may be regarded as special cases of a general linear mixed model involving fixed and random effects. Moreover, when the population sizes of the areas are large, small area means can be expressed as linear combinations of fixed and random effects of the model. Best linear unbiased prediction (BLUP) estimators of such parameters can be obtained in the classical frequentist framework by appealing to general results on BLUP estimation. BLUP estimators minimize the MSE among the class of linear unbiased estimators and do not depend on normality of the random effects. But they depend on the variances (and possibly covariances) of random effects, called variance components. These parameters can be estimated by a method of moments such as the method of fitting constants (Henderson 1953), or alternatively by maximum likelihood (ML) or restricted maximum likelihood (REML) based on the normal likelihood (Hartley and Rao 1967, Patterson and Thompson 1971). Substituting the estimated variance components into the BLUP estimator, we obtain a two-stage estimator referred to as the empirical BLUP or EBLUP estimator (Harville 1991), in analogy with the empirical Bayes (EB) estimator (Chapter 9). In this chapter, we present general results on EBLUP estimation. We also consider the more difficult problem of estimating the MSE of EBLUP estimators, taking account of the variability in the estimated variance and covariance components.
98
EBLUP THEORY
Results are spelled out for the special case of a linear mixed model with block diagonal covariance structure, which covers many commonly used small area models.
5.2
GENERAL LINEAR MIXED MODEL
Suppose that the sample data obey the general linear mixed model y = X𝜷 + Zv + e.
(5.2.1)
Here y is the n × 1 vector of sample observations, X and Z are known n × p and n × h matrices of full rank, and v and e are independently distributed with means 𝟎 and covariance matrices G and R depending on some variance parameters 𝜹 = (𝛿1 , … , 𝛿q )T . We assume that 𝜹 belongs to a specified subset of Euclidean q-space such that the variance–covariance matrix of y, given by V = V(𝜹) ∶=R + ZGZT , is nonsingular for all 𝜹 belonging to the subset. We are interested in estimating a linear combination, 𝜇 = lT + mT v, of the regression parameters and the realization of v, for specified vectors of constants l and m. A linear estimator of 𝜇 is of the form 𝜇̂ = aT y + b for a and b known. It is model-unbiased for 𝜇 if E(𝜇) ̂ =E(𝜇), (5.2.2) where E denotes the expectation with respect to the model (5.2.1). The MSE of 𝜇̂ is given by (5.2.3) MSE(𝜇) ̂ =E(𝜇̂ − 𝜇)2 , which reduces to the variance of the estimation error, MSE(𝜇) ̂ =V(𝜇̂ − 𝜇), if 𝜇̂ is unbiased for 𝜇. MSE of 𝜇̂ is also called mean squared prediction error (MSPE) or prediction mean squared error (PMSE); see Pfeffermann (2013). We are interested in finding the BLUP estimator, which minimizes the MSE in the class of linear unbiased estimators 𝜇. ̂ 5.2.1
BLUP Estimator
For known 𝜹, the BLUP estimator of 𝜇 is given by 𝜇̃ H = t(𝜹, y) =lT ̃ + mT ṽ = lT ̃ + mT GZT V−1 (y − X ̃ ),
(5.2.4)
where ̃ = ̃ (𝜹) = X(T V−1 X)−1 XT V−1 y
(5.2.5)
is the best linear unbiased estimator (BLUE) of , ṽ = ṽ (𝜹) =GZT V−1 (y − X ̃ ),
(5.2.6)
99
GENERAL LINEAR MIXED MODEL
and the superscript H on 𝜇̃ stands for Henderson, who proposed (5.2.4); see Henderson (1950). A direct proof that (5.2.4) is the BLUP estimator is given in Section 5.6.1, following Henderson (1963). Robinson (1991) gave an alternative proof by writing 𝜇̂ as 𝜇̂ = t(𝜹, y) +cT y with E(cT y) =0, that is, XT c = 𝟎, and then showing that E[lT ( ̃ − )yT c] = 0 and E[mT (̃v − v)yT c] = 0. This leads to MSE(𝜇) ̂ =MSE[t(𝜹, y) ] +E(cT y)2 ≥ MSE[t(𝜹, y) ] Robinson’s proof assumes the knowledge of the BLUP t(𝜹, y). Henderson et al. (1959) assumed normality of v and e and maximized the joint density of y and v with respect to and v. This is equivalent to maximizing the function 1 1 (5.2.7) 𝜙 = − (y − X − Zv)T R−1 (y − X − Zv) − vT G−1 v, 2 2 which leads to the following “mixed model” equations: ] ][ ] [ [ ∗ XT R−1 Z XT R−1 X XT R−1 y ∗ = ZT R−1 X ZT R−1 Z + G−1 v ZT R−1 y The solution of (5.2.8) is identical to the BLUP estimators of and v∗ = ṽ . This follows by noting that
and v, that is,
(5.2.8) ∗
= ̃
R−1 − R−1 Z(ZT R−1 Z + G−1 )−1 ZT R−1 = V−1 and (ZT R−1 Z + G−1 )−1 ZT R−1 = GZT V−1 When V has no simple inverse but G and R are easily invertible (e.g., diagonal), the mixed model equations (5.2.8) are often computationally simpler than (5.2.5) and (5.2.6). In view of the equivalence of ( ∗ , v∗ ) and ( ̃ , ṽ ), the BLUP estimates are often called “joint maximum-likelihood estimates,” but the function being maximized, 𝜙, is not a log likelihood in the usual sense because v is nonobservable (Robinson 1991). It is called a “penalized likelihood” because a “penalty” −vT G−1 v∕2 is added to the log likelihood obtained regarding v as fixed. The best prediction (BP) estimator of 𝜇 is given by its conditional expectation E(𝜇|y), because for any estimator d(y) of 𝜇, not necessarily linear or unbiased, it holds E[d(y) −𝜇]2 ≥ E[E(𝜇|y) −𝜇]2 This result follows by noting that E[(d(y) −𝜇)2 |y] = E[(d(y) −E(𝜇|y) +E(𝜇|y) −𝜇)2 |y] = d(y) [ −E(𝜇|y) 2] + E[(E(𝜇|y) −𝜇)2 |y] ≥ E[(E(𝜇|y) −𝜇)2 |y]
100
EBLUP THEORY
with equality if and only if d(y) =E(𝜇|y). Under normality, the BP estimator E(𝜇|y) reduces to the BLUP estimator (5.2.5) with ̃ replaced by , that is, it depends on the unknown . In particular, the BP estimator of mT v is given by E(mT v|y) =mT E(v|y) =mT GZT V−1 (y − X ) This estimator is also the best linear prediction (BLP) estimator of mT v without assuming normality. Since we do not want the estimator to depend on , we transform y to all error contrasts AT y with mean 𝟎, that is, satisfying AT X = 𝟎, where A is any n × n( − p) full-rank matrix that is orthogonal to the n × p model matrix X. For the transformed data AT y, the BP estimator E(mT v|AT y) of mT v in fact reduces to the BLUP estimator mT ṽ (see Section 5.6.2). This result provides an alternative justification of BLUP without linearity and unbiasedness, but assuming normality. It is interesting to note that the transformed data AT y are also used to obtain the restricted (or residual) maximum-likelihood (REML) estimators of the variance parameters 𝜹 (see Section 5.2.3). We have considered the BLUP estimation of a single linear combination 𝜇 = lT + mT v, but the method readily extends to simultaneous estimation of r(≥ 2) linear combinations, 𝝁 = L + Mv, where 𝝁 = 𝜇(1 , , 𝜇r )T . The BLUP estimator of 𝝁 is given by t(𝜹, y) =L ̃ + M̃v = L ̃ + MGZT V−1 (y − X ̃ )
(5.2.9)
The estimator t(𝜹, y) is optimal in the sense that for any other linear unbiased estimator t∗ (y) of 𝝁, the matrix E(t∗ − 𝝁) t(∗ − 𝝁)T − E(t − 𝝁) t( − 𝝁)T is positive semi-definite (psd). Note that E(t − 𝝁) t( − 𝝁)T is the dispersion matrix of t − 𝝁. 5.2.2
MSE of BLUP
The BLUP estimator t(𝜹, y) may be also expressed as t(𝜹, y) =t∗ (𝜹, , y) +dT ( ̃ − ), where t∗ (𝜹, , y) is the BLUP estimator when
is known, given by
t∗ (𝜹, , y) =lT + bT (y − X ),
(5.2.10)
with bT = mT GZT V−1 and dT = lT − bT X. It now follows that t∗ (𝜹, , y) −𝜇 and dT ( ̃ − ) are uncorrelated, noting that E[(bT (Zv + e) −mT v) v( T ZT + eT )V−1 ] = 𝟎 Therefore, taking expectation of [t(𝜹, y) −𝜇]2 , we obtain MSE[t(𝜹, y) ] =MSE[t∗ (𝜹, , y) ] +V[dT ( ̃ − ) ] =g1 (𝜹) +g2 (𝜹),
(5.2.11)
101
GENERAL LINEAR MIXED MODEL
where g1 (𝜹) =V[t∗ (𝜹, , y) −𝜇] = mT (G − GZT V−1 ZG)m
(5.2.12)
and g2 (𝜹) =dT (XT V−1 X)−1 d
(5.2.13)
The second term in (5.2.11), g2 (𝜹), accounts for the variability in the estimator ̃ of . Henderson (1975) used the mixed model equations (5.2.8) to obtain an alternative formula for MSE[t(𝜹, y) ,] given by [ T
T
MSE[t(𝜹, y) ] = l (, m )
C11 C21
]( ) l , m C22 C12
(5.2.14)
where the matrix C with blocks Cij (i, j = 1, 2) is the inverse of the coefficient matrix of the mixed model equations (5.2.8). This form is computationally simpler than (5.2.11) when G and R are more easily invertible than V. 5.2.3
EBLUP Estimator
The BLUP estimator t(𝜹, y) given by (5.2.4) depends on the variance parameters 𝜹, ̂ which are unknown in practical applications. Replacing 𝜹 by an estimator 𝜹̂ = 𝜹(y), H ̂ we obtain a two-stage estimator 𝜇̂ = t(𝜹, y), which is referred to as the empirical ̂ y) and t(𝜹, y) as BLUP (or EBLUP) estimator. For convenience, we also write t(𝜹, ̂ t(𝜹) and t(𝜹). ̂ remains unbiased for 𝜇, that is, E[t(𝜹) ̂ −𝜇] = 0, proThe two-stage estimator t(𝜹) ̂ ̂ vided (i) E[t(𝜹) ]is finite; (ii) 𝜹 is any even translation-invariant estimator of 𝜹, that ̂ −y) =𝜹(y) ̂ ̂ − Xb) =𝜹(y) ̂ is, 𝜹( and 𝜹(y for all y and b; (iii) The distributions of v and e are both symmetric around 𝟎 (not necessarily normal). A proof of unbiasedness of ̂ due to Kackar and Harville (1981), uses the following results: (a) 𝜹(y) ̂ ̂ t(𝜹), =𝜹(Zv + T ̂ ̂ e) =𝜹( −Zv − e); (b) t(𝜹) −𝜇 = 𝜙(v, e) −m v, where 𝜙(v, e) is an odd function of v and e, that is, 𝜙( −v, −e) = 𝜙(v, − e). Result (b) implies that E𝜙(v, e) =E𝜙( −v, −e) = ̂ −𝜇] = 0. −E𝜙(v, e), or equivalently E𝜙(v, e) =0, which implies E[t(𝜹) Kackar and Harville (1981) have also shown that standard procedures for estimating 𝜹 yield even translation-invariant estimators, in particular, ML, REML, and the method of fitting constants (also called Henderson’s method 3). We refer the reader to Searle, Casella, and McCulloch (1992) and Rao (1997) for details of these methods for the analysis of variance (ANOVA) model, which is a special case of the general linear mixed model (5.2.1). The ANOVA model is given by y = X + Z1 v1 + · · · +Zr vr + e,
(5.2.15)
where v1 , , vr and e are independently distributed with means 𝟎 and covariance matrices 𝜎12 Ih1 , , 𝜎r2 Ihr and 𝜎e2 In . The parameters 𝜹 = 𝜎(02 , , 𝜎r2 )T
102
EBLUP THEORY
with 𝜎k2 ≥ 0(k = 1, , r) and 𝜎02 = 𝜎e2 > 0 are the variance components. Note that G is now block diagonal with blocks 𝜎k2 Ihk , k = 1, , r, R = 𝜎e2 In , and ∑ V = 𝜎e2 In + rk=1 𝜎k2 Zk ZTk , which is a special case of a covariance matrix with linear ∑ structure, V = rk=1 𝛿k Hk , for known symmetric matrices Hk . 5.2.4
ML and REML Estimators
We now provide formulas for the ML and REML estimators of and 𝜹 under the general linear mixed model (5.2.1) and the associated asymptotic covariance matrices, assuming normality (Cressie 1992). Under normality, the log-likelihood function is given by 1 1 (5.2.16) l( , 𝜹) =c − log |V| − (y − X )T V−1 (y − X ), 2 2 where c denotes a generic constant. The partial derivative of l( , 𝜹) with respect to 𝜹 is given by s( , 𝜹), with the jth element 1 1 sj ( , 𝜹) =𝜕l( , 𝜹) ∕𝜕𝛿j = − tr(V−1 V(j) ) − (y − X )T V(j) (y − X ), 2 2 where V(j) = 𝜕V∕𝜕𝛿j and V(j) = 𝜕V−1 ∕𝜕𝛿j = −V−1 V(j) V−1 , noting that V = V(𝜹). Also, the matrix of expected second-order derivatives of −l( , 𝜹) with respect to 𝜹 is given by (𝜹) with (j, k) th element jk (𝜹) =
1 tr(V−1 V(j) V−1 V(k) ) 2
(5.2.17)
The ML estimator of 𝜹 is obtained iteratively using the Fisher-scoring algorithm, with updating equation 𝜹(a+1) = 𝜹(a) + [(𝜹(a) )]−1 s[ ̃ (𝜹(a) ), 𝜹(a) ],
(5.2.18)
where the superscript (a) indicates that the specified terms are evaluated at the values ). At of 𝜹 and ̃ = ̃ (𝜹) at the a th iteration 𝜹 = 𝜹(a) and ̃ = ̃ (𝜹(a) ), (a = 0, 1, 2, convergence of the iterations (5.2.18), we get the ML estimators 𝜹̂ ML of 𝜹 and ̂ ML = ̃ (𝜹̂ ML ) of . The asymptotic covariance matrix of ̂ ML and 𝜹̂ ML has a block diagonal structure, diag[V( ̂ ML ), V(𝜹̂ ML )], with V( ̂ ML ) = (XT V−1 X)−1 ,
V(𝜹̂ ML ) = −1 (𝜹)
(5.2.19)
A drawback of the ML estimator of 𝜹 is that it does not take account of the loss in degrees of freedom (df) due to estimating . For example, when y1 , , yn are iid N(𝜇, 𝜎 2 ), the ML estimator 𝜎̂ 2 = [(n − 1)∕n]s2 is not equal to the customary unbi∑ ased estimator of 𝜎 2 , s2 = ni=1 (yi − y)2 ∕(n − 1). The REML method takes account of the loss in df by using the transformed data y∗ = AT y, where A is any n × n( − p) full-rank matrix that is orthogonal to the n × p matrix X. It follows that y∗ = AT y
103
GENERAL LINEAR MIXED MODEL
is distributed as a (n − p)-variate normal with mean 𝟎 and covariance matrix AT VA. The logarithm of the joint density of y∗ , expressed as function of 𝜹, is called restricted log-likelihood function and is given by lR (𝜹) = c −
1 1 1 log |V| − log |XT V−1 X| − yT Py, 2 2 2
(5.2.20)
where P = V−1 − V−1 X(XT V−1 X)−1 XT V−1
(5.2.21)
Note that Py = V−1 (y − X ̃ ). The partial derivative of lR (𝜹) with respect to 𝜹 is given by sR (𝜹), with the jth element 1 1 sRj (𝜹) = 𝜕lR (𝜹)∕𝜕𝛿j = − tr(PV(j) ) + yT PV(j) Py 2 2 Also, the matrix of expected second-order derivatives of −lR (𝜹) with respect to 𝜹 is given by R (𝜹) with (j, k)th element R,jk (𝜹) =
1 tr(PV(j) PV(k) ) 2
(5.2.22)
Note that both sR (𝜹) and R (𝜹) are invariant to the choice of A. The REML estimator of 𝜹 is obtained iteratively from (5.2.18) by replacing (𝜹(a) ) and s[ ̃ (𝜹(a) ), 𝜹(a) ] by R (𝜹(a) ) and sR (𝜹(a) ), respectively. At convergence of the iterations, we get REML estimators 𝜹̂ RE and ̂ RE = ̃ (𝜹̂ RE ). Asymptotically, V(𝜹̂ RE ) ≈ V(𝜹̂ ML ) = −1 (𝜹) and V( ̂ RE ) ≈ V( ̂ ML ) = (XT V−1 X)−1 , provided p is fixed. For the ANOVA model (5.2.15), the ML estimator of 𝜹 can be obtained iteratively using the BLUP estimators ̃ and ṽ (Hartley and Rao 1967, Henderson 1973). The updating equations are given by 𝜎i2(a+1) =
1 (a)T (a) [̃v ṽ i + 𝜎i2(a) tr(T∗(a) )] ii hi i
(5.2.23)
1 T (a) y (y − X ̃ − Z̃v(a) ), n
(5.2.24)
and 𝜎e2(a+1) = where
T∗ii = (I + ZT R−1 ZG)−1 Fii , with tr(T∗ii ) > 0. Here, Fii is given by G with unity in place of 𝜎i2 and zero in place of 𝜎j2 (j ≠ i). The values of ̃ and ṽ for a specified 𝜹 are readily obtained from the mixed model equations (5.2.8), without evaluating V−1 . The algorithm given by (5.2.23) and (5.2.24) is similar to the EM algorithm (Dempster, Laird, and Rubin 1977).
104
EBLUP THEORY
The asymptotic covariance matrices of ̂ ML and 𝜹̂ ML are given by (5.2.19) using V(j) = Zj ZTj (j = 0, 1, , r) and Z0 = In in (5.2.17). For the ANOVA model (5.2.15), the REML estimator of 𝜹 can be obtained iteratively from (5.2.23) and (5.2.24) by changing n to n − p in (5.2.24) and T∗ii to Tii in (5.2.23), where Tii = (I + ZT QZG)−1 Fii with Q = R−1 − R−1 X(XT R−1 X)−1 XT R−1 (Harville 1977). The elements of the information matrix, R (𝜹), are given by (5.2.22) using V(j) = Zj ZTj . Anderson (1973) suggested an alternative iterative algorithm to obtain 𝜹̂ ML with updating equation (5.2.25) 𝜹(a+1) = [(𝜹(a) )]−1 b(𝜹(a) ), where the i th element of b(𝜹) is given by bi (𝜹) =
1 T y PZi ZTi Py 2
(5.2.26)
This algorithm is equivalent to the Fisher-scoring algorithm for solving ML equations (Rao 1974). The algorithm is also applicable to any covariance matrix with a linear ∑ structure, V = ri=1 𝛿i Hi . We simply replace Zi ZTi by Hi to define (𝜹) and b(𝜹) using (5.2.17) and (5.2.26). For REML estimation of 𝜹, an iterative algorithm similar to (5.2.25) is given by 𝜹(a+1) = [R (𝜹(a) )]−1 b(𝜹(a) )
(5.2.27)
The algorithm (5.2.27) is also equivalent to the Fisher-scoring algorithm for solving REML equations (Hocking and Kutner 1975). Rao (1971) proposed the method of minimum norm quadratic unbiased (MINQU) estimation, which does not require the normality assumption, unlike ML and REML. The MINQU estimators depend on a preassigned value 𝜹0 for 𝜹 and are identical to the first iterative solution, 𝜹(1) , of the REML updating equation (5.2.27) using 𝜹0 as the starting value, that is, taking 𝜹(0) = 𝜹0 . This result suggests that REML (or ML) estimators of 𝜹, derived under normality, may perform well even under nonnormal distributions. In fact, Jiang (1996) established asymptotic consistency of the REML estimator of 𝜹 for the ANOVA model (5.2.15) when normality may not hold. Moment methods such as fitting-of-constants are also used to estimate 𝜹. We study those methods for particular cases of the general linear mixed model or the ANOVA model in Chapters 6–8. Henderson’s (1973) iterative algorithms defined by (5.2.23) and (5.2.24), for computing ML and REML estimates of variance components 𝜎e2 and 𝜎i2 (i = 1, , r) in the ANOVA model (5.2.15), are not affected by the constraints on the parameter
GENERAL LINEAR MIXED MODEL
105
space: 𝜎e2 > 0; 𝜎i2 ≥ 0, i = 1, , r (Harville 1977). If the starting values 𝜎e2(0) and 𝜎i2(0) are strictly positive, then, at every iteration a, the values 𝜎e2(a+1) and 𝜎i2(a+1) remain positive, although it is possible for some of them to be arbitrarily close to 0. The Fisher-scoring method for general linear mixed models (5.2.1) and the equivalent Anderson’s method for models with linear covariance structures do not enjoy this desirable property. Modifications are needed to accommodate constraints on 𝜹 = (𝛿1 , , 𝛿q )T . We refer the reader to Harville (1977) for details. MINQUE and moment methods also require modifications to account for constraints on 𝜹. 5.2.5
MSE of EBLUP
̂ may be decomposed as The error in the EBLUP estimator t(𝜹) ̂ − 𝜇 = [t(𝜹) − 𝜇] + [t(𝜹) ̂ − t(𝜹)] t(𝜹) Taking expectation of the squared error, we obtain ̂ = MSE[t(𝜹)] + E[t(𝜹) ̂ − t(𝜹)]2 + 2 E[t(𝜹) − 𝜇][t(𝜹) ̂ − t(𝜹)] MSE[t(𝜹)]
(5.2.28)
Under normality of the random effects v and errors e, the cross-product term in (5.2.28) is zero provided 𝜹̂ is translation invariant (see Section 5.6.3) so that ̂ = MSE[t(𝜹)] + E[t(𝜹) ̂ − t(𝜹)]2 MSE[t(𝜹)]
(5.2.29)
It is clear from (5.2.29) that the MSE of the EBLUP estimator is always larger than that of the BLUP estimator t(𝜹) under normality. The common practice of approx̂ by MSE[t(𝜹)] could therefore lead to significant understatement imating MSE[t(𝜹)] ̂ especially in cases where t(𝜹) varies with 𝜹 to a significant extent and of MSE[t(𝜹)], where the variability of 𝜹̂ is not small. The last term of (5.2.29) is generally intractable except in special cases, such as the balanced one-way model ANOVA yij = 𝜇 + 𝑣i + eij , i = 1, , m, and j = 1, , n (Peixoto and Harville 1986). It is therefore necessary to obtain an approximation to this term. Das, Jiang, and Rao (2004) obtained a second-order approximation to ̂ − t(𝜹)]2 for general linear mixed models. Here we only give a sketch of the E[t(𝜹) proof along the lines of Kackar and Harville (1984). By a first-order Taylor expansion, denoting d(𝜹) = 𝜕t(𝜹)∕𝜕𝜹, we obtain ̂ − t(𝜹) ≈ d(𝜹)T (𝜹̂ − 𝜹), t(𝜹)
(5.2.30)
where ≈ is used to indicate that the remainder terms in the approximation are of lower order (second-order approximation). Result (5.2.30) is obtained assuming that the terms involving higher powers of 𝜹̂ − 𝜹 are of lower order than d(𝜹)T (𝜹̂ − 𝜹). Furthermore, under normality, noting that the terms involving the derivatives of ̃ − with respect to 𝜹 are of lower order, we have d(𝜹) ≈ 𝜕t∗ (𝜹, )∕𝜕𝜹 = (𝜕bT ∕𝜕𝜹)(y − X ) =∶ d∗ (𝜹),
106
EBLUP THEORY
where t∗ (𝜹, ) is given by (5.2.10). Thus, we have E[d(𝜹)T (𝜹̂ − 𝜹)]2 ≈ E[d∗ (𝜹)T (𝜹̂ − 𝜹)]2
(5.2.31)
Furthermore, ̂ E[d∗ (𝜹)T (𝜹̂ − 𝜹)]2 ≈ tr[E(d∗ (𝜹)d∗ (𝜹)T )V(𝜹)] ̂ =∶ g3 (𝜹), = tr[(𝜕bT ∕𝜕𝜹)V(𝜕bT ∕𝜕𝜹)T V(𝜹)]
(5.2.32)
̂ is the asymptotic covariance matrix of 𝜹. ̂ It now follows from (5.2.30), where V(𝜹) (5.2.31), and (5.2.32) that ̂ − t(𝜹)]2 ≈ g3 (𝜹) E[t(𝜹)
(5.2.33)
Replacing (5.2.11) and (5.2.33) in (5.2.29), we get a second-order approximation ̂ as to the MSE of t(𝜹) ̂ ≈ g1 (𝜹) + g2 (𝜹) + g3 (𝜹) MSE[t(𝜹)] The terms g2 (𝜹) and g3 (𝜹), due to estimating leading term g1 (𝜹). 5.2.6
(5.2.34)
and 𝜹, are of lower order than the
Estimation of MSE of EBLUP
̂ as a measure of variabilIn practical applications, we need an estimator of MSE[t(𝜹)] ̂ A naive approach approximates ity (or uncertainty) associated with the estimator t(𝜹). ̂ by MSE[t(𝜹)] in (5.2.11) and then substitutes 𝜹̂ for 𝜹 in MSE[t(𝜹)]. The MSE[t(𝜹)] resulting naive estimator of MSE is given by ̂ = g1 (𝜹) ̂ + g2 (𝜹), ̂ mseN [t(𝜹)]
(5.2.35)
where g1 (𝜹) and g2 (𝜹) are given in (5.2.12) and (5.2.13), respectively. Another MSE estimator is obtained by substituting 𝜹̂ for 𝜹 in the MSE approximation (5.2.34), leading to ̂ = g1 (𝜹) ̂ + g2 (𝜹) ̂ + g3 (𝜹) ̂ (5.2.36) mse1 [t(𝜹)] ̂ ≈ g2 (𝜹) and Eg3 (𝜹) ̂ ≈ g3 (𝜹) with neglected terms of lower order, It holds that Eg2 (𝜹) ̂ but g1 (𝜹) is not a second-order unbiased estimator of g1 (𝜹) because its bias is generally of the same order as g2 (𝜹) and g3 (𝜹). ̂ take a second-order Taylor expansion of g1 (𝜹) ̂ To evaluate the bias of g1 (𝜹), around 𝜹: ̂ ≈ g1 (𝜹) + (𝜹̂ − 𝜹)T ∇g1 (𝜹) + 1 (𝜹̂ − 𝜹)T ∇2 g1 (𝜹)(𝜹̂ − 𝜹) g1 (𝜹) 2 =∶ g1 (𝜹) + Δ1 + Δ2 ,
107
GENERAL LINEAR MIXED MODEL
where ∇g1 (𝜹) is the vector of first-order derivatives of g1 (𝜹) with respect to 𝜹 and ∇2 g1 (𝜹) is the matrix of second-order derivatives of g1 (𝜹) with respect to 𝜹. If 𝜹̂ is unbiased for 𝜹, then E(Δ1 ) = 0. In general, if E(Δ1 ) = E(𝜹̂ − 𝜹)T ∇g1 (𝜹) is of lower order than E(Δ2 ), then ̂ ≈ g1 (𝜹) + 1 tr[∇2 g1 (𝜹)V(𝜹)] ̂ Eg1 (𝜹) 2
(5.2.37)
∑ Furthermore, when the covariance matrix V has a linear structure V = ri=1 𝛿i Hi , noting that the second derivatives of G and V with respect to the parameters are zero when G and V are linear in the parameters 𝜹, the second term on the right-hand side of (5.2.37) reduces to −g3 (𝜹) and then (5.2.37) reduces to ̂ ≈ g1 (𝜹) − g3 (𝜹) Eg1 (𝜹)
(5.2.38)
̂ It now follows from (5.2.35), (5.2.36), and (5.2.38) that the biases of mseN [t(𝜹)] ̂ are, respectively, given by and mse1 [t(𝜹)] BN ≈ −2g3 (𝜹),
B1 ≈ −g3 (𝜹)
̂ + g3 (𝜹)] ̂ ≈ g1 (𝜹) from (5.2.38), a second-order unbiased estimaNoting that E[g1 (𝜹) ̂ tor of MSE[t(𝜹)] is given by ̂ ≈ g1 (𝜹) ̂ + g2 (𝜹) ̂ + 2g3 (𝜹) ̂ mse[t(𝜹)]
(5.2.39)
Consequently, we have ̂ ̂ E{mse[t(𝜹)]} ≈ MSE[t(𝜹)] Formula (5.2.39) holds for the REML estimator, 𝜹̂ RE , and some moment estimators. If E(Δ1 ) is of the same order as E(Δ2 ), as in the case of the ML estimator 𝜹̂ ML , then an extra term needs to be subtracted from (5.2.39). Using the additional approx̂ 𝜹)∇g1 (𝜹), where b(𝜹; ̂ 𝜹) is the bias of 𝜹̂ up to terms of lower imation E(Δ1 ) ≈ bT (𝜹; ̂ is given by order, a second-order unbiased estimator of MSE[t(𝜹)] ̂ ≈ g1 (𝜹) ̂ − bT (𝜹; ̂ 𝜹)∇g ̂ ̂ ̂ ̂ mse∗ [t(𝜹)] 1 (𝜹) + g2 (𝜹) + 2g3 (𝜹)
(5.2.40)
̂ 𝜹) is spelled out in Section 5.3.2 for the special case of block diagonal The term b(𝜹; covariance matrix V = V(𝜹) and 𝜹̂ = 𝜹̂ ML . Prasad and Rao (1990) derived the MSE estimator (5.2.39) for special cases covered by the general linear mixed model with a block diagonal covariance structure (see Section 5.3 and Chapter 6). Following Prasad and Rao (1990), Harville and Jeske (1992) proposed (5.2.39) for the general linear mixed model (5.2.1), assum̂ = 𝜹, and referred to (5.2.39) as the Prasad–Rao estimator. Das, Jiang, and ing E(𝜹) Rao (2004) provide rigorous proofs of the approximations (5.2.39) and (5.2.40) for REML and ML methods, respectively.
108
5.3
EBLUP THEORY
BLOCK DIAGONAL COVARIANCE STRUCTURE
5.3.1
EBLUP Estimator
A special case of the general linear mixed model is obtained when the vectors and matrices involved in (5.2.1) are partitioned into m components (typically the small areas) as follows: y = col1≤i≤m (yi ) = (yT1 , Z = diag1≤i≤m (Zi ),
, yTm )T ,
X = col1≤i≤m (Xi ),
v = col1≤i≤m (vi ),
e = col1≤i≤m (ei ),
where Xi is ni × p, Zi is ni × hi , and yi is an ni × 1 vector, with ∑m i=1 hi = h. Furthermore, R = diag1≤i≤m (Ri ),
∑m
i=1 ni
= n and
G = diag1≤i≤m (Gi ),
so that V has a block diagonal structure, that is, V = diag1≤i≤m (Vi ), with Vi = Ri + Zi Gi ZTi , i = 1, , m. The model, therefore, may be expressed as follows: yi = Xi + Zi vi + ei ,
i = 1,
,m
(5.3.1)
Model (5.3.1) covers many of the small area models considered in the literature. We are interested in estimating linear combinations of the form 𝜇i = lTi + mTi vi , i = 1, , m. It follows from (5.2.4) that the BLUP estimator of 𝜇i reduces to 𝜇̃ iH = ti (𝜹, y) = lTi ̃ + mTi ṽ i ,
(5.3.2)
̃ ṽ i = ṽ i (𝜹) = Gi ZTi V−1 i (yi − Xi ),
(5.3.3)
where
and
)−1
(m ∑ ̃ = ̃ (𝜹) =
m ∑
XTi V−1 i Xi i=1
XTi V−1 i yi
(5.3.4)
i=1
From (5.2.11), the MSE of the BLUP estimator 𝜇̃ iH reduces to MSE(𝜇̃ iH ) = g1i (𝜹) + g2i (𝜹)
(5.3.5)
g1i (𝜹) = mTi (Gi − Gi ZTi V−1 i Zi Gi )mi
(5.3.6)
with
and
(m ∑ g2i (𝜹) =
dTi
)−1 XTi V−1 i Xi
i=1
di ,
(5.3.7)
109
BLOCK DIAGONAL COVARIANCE STRUCTURE
where dTi = lTi − bTi Xi , with bTi = mTi Gi ZTi V−1 . Replacing 𝜹 by an estimator 𝜹̂ in i (5.3.2), we get the EBLUP estimator ̂ y) = lT ̂ + mT v̂ i , 𝜇̂ iH = ti (𝜹, i i
(5.3.8)
̂ ̂ and v̂ i = ṽ i (𝜹). where ̂ = ̃ (𝜹) 5.3.2
Estimation of MSE
The second-order approximation (5.2.34) of the MSE of 𝜇̂ iH reduces to MSE(𝜇̂ iH ) ≈ g1i (𝜹) + g2i (𝜹) + g3i (𝜹)
(5.3.9)
̂ g3i (𝜹) = tr[(𝜕bTi ∕𝜕𝜹)Vi (𝜕bTi ∕𝜕𝜹)T V(𝜹)]
(5.3.10)
with Neglected terms in the approximation (5.3.9) are of order o(m−1 ) for large m. For ̂ y) REML and some moment estimators of 𝜹, the estimator of MSE of 𝜇̂ iH = ti (𝜹, given by (5.2.39) reduces to ̂ + g2i (𝜹) ̂ + 2g3i (𝜹) ̂ mse(𝜇̂ iH ) ≈ g1i (𝜹)
(5.3.11)
for large m. For the ML estimator 𝜹̂ ML , the MSE estimator given by (5.2.40) reduces to ̂ − bT (𝜹; ̂ 𝜹)∇g ̂ ̂ ̂ ̂ (5.3.12) mse∗ (𝜇̂ iH ) = g1i (𝜹) 1i (𝜹) + g2i (𝜹) + 2g3i (𝜹) for large m, where in this case { 1 T ̂ b (𝜹ML ; 𝜹) = −1 (𝜹) 2m
[ col 1≤j≤m
m ∑
m ∑ −1 (XTi V−1 i Xi )
tr i=1
]} (j) XTi Vi Xi
i=1
(5.3.13) with
(j)
−1 −1 Vi = 𝜕V−1 i ∕𝜕𝛿j = −Vi (𝜕Vi ∕𝜕𝛿j )Vi
and
m
jk (𝜹) =
1∑ −1 tr(V−1 i 𝜕Vi ∕𝜕𝛿j )(Vi 𝜕Vi ∕𝜕𝛿k ) 2 i=1
The neglected terms in the second-order approximation (5.3.9) to MSE(𝜇̂ iH ) are of order o(m−1 ) and the MSE estimators (5.3.11) for REML and (5.3.12) for ML are second-order unbiased in the sense E[mse(𝜇̂ iH )] − MSE(𝜇̂ iH ) = o(m−1 ) for large m, under the following regularity assumptions (Datta and Lahiri 2000): (i) The elements of Xi and Zi are uniformly bounded such that [O(m)]p×p .
∑m
T −1 i=1 Xi Vi Xi
=
110
EBLUP THEORY
(ii) supi≥1 ni ≪ ∞ and supi≥1 hi ≪ ∞, where hi is the number of columns in Zi . (iii) Covariance matrices Gi and Ri have linear structures of the form ∑q ∑q Gi = j=0 𝛿j Aij ATij and Ri = j=0 𝛿j Bij BTij , where 𝛿0 = 1, Aij and Bij (i = 1, , m, j = 0, , q) are known matrices of order ni × hi and hi × hi respectively, and the elements of Aij and Bij are uniformly bounded such that Gi and Ri are positive definite matrices for i = 1, , m. The MSE estimators (5.3.11) and (5.3.12) are not area-specific in the sense that they do not depend directly on the area-specific data yi , but using the form (5.3.10) for g3i (𝜹), it is easy to define other MSE estimators that are area-specific. For example, ̃ i (𝜹, yi ) = (yi − XT ̃ )(yi − XT ̃ )T is area-specific and approximately unbiased for V i i Vi . Using this estimator of Vi , we get the following alternative area-specific estimator of g3i (𝜹): ̃ i (𝜹, yi )(𝜕bT ∕𝜕𝜹)T V(𝜹)] ̂ (5.3.14) g∗3i (𝜹, yi ) = tr[(𝜕bTi ∕𝜕𝜹)V i This choice leads to two alternative area-specific versions of (5.3.11) in the case of REML estimation: ̂ + g2i (𝜹) ̂ + 2g∗ (𝜹, ̂ yi ) mse1 (𝜇̂ iH ) ≈ g1i (𝜹) 3i
(5.3.15)
̂ + g2i (𝜹) ̂ + g3i (𝜹) ̂ + g∗ (𝜹, ̂ yi ) mse2 (𝜇̂ iH ) ≈ g1i (𝜹) 3i
(5.3.16)
and
Similarly, area-specific versions of the MSE estimator (5.3.12) for ML estimation are given by ∗ ̂ ̂ − bT (𝜹; ̂ 𝜹)∇g ̂ ̂ ̂ mse∗1 (𝜇̂ iH ) ≈ g1i (𝜹) 1i (𝜹) + g2i (𝜹) + 2g3i (𝜹, yi )
(5.3.17)
∗ ̂ ̂ − bT (𝜹; ̂ 𝜹)∇g ̂ ̂ ̂ ̂ mse∗2 (𝜇̂ iH ) ≈ g1i (𝜹) 1i (𝜹) + g2i (𝜹) + g3i (𝜹) + g3i (𝜹, yi )
(5.3.18)
and
̃ i (𝜹, ̂ yi ), based on the residuals yi − Xi ̂ , induces The use of the area-specific matrix V instability in the MSE estimators, but its effect should be small for large m because ̂ yi ) is of order O(m−1 ). the term g∗3i (𝜹, 5.3.3
Extension to Multidimensional Area Parameters
The BLUP estimator (5.3.2) readily extends to the multidimensional case of 𝝁i = Li + Mi vi for specified matrices Li and Mi , and is thus given by ̃ ̃i 𝝁̃ H i = ti (𝜹, y) = Li + Mi v ̃ = Li ̃ + Mi Gi ZTi V−1 i (yi − Xi )
(5.3.19)
111
*MODEL IDENTIFICATION AND CHECKING
The covariance matrix of 𝝁̃ H − 𝝁i follows from (5.3.5) by changing lTi and mTi to Li i and Mi , respectively, that is, T ̃H ̃H MSE(𝝁̃ H i ) = E(𝝁 i − 𝝁i )(𝝁 i − 𝝁i ) T = Mi (Gi − Gi ZTi V−1 i Zi Gi )Mi (m )−1 ∑ T −1 + Di Xi Vi Xi DTi ,
(5.3.20)
i=1
where Di = Li − Mi Gi ZTi V−1 Xi . i ̂ ̂H The EBLUP estimator is given by 𝝁̂ H i = ti (𝜹, y). An estimator of MSE(𝝁 i ) that accounts for the uncertainty due to estimating 𝜹 may be obtained along the lines of Section 5.3.2, but details are omitted here for simplicity.
5.4 5.4.1
*MODEL IDENTIFICATION AND CHECKING Variable Selection
AIC-Type Methods For the standard linear regression model, y = X + e with e ∼ N(𝟎, 𝜎 2 I), several methods have been proposed for the selection of covariates (or fixed effects) for inclusion in the model. Methods studied include stepwise regression, Mallows’ Cp statistic, Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). Extensions of those methods to the linear mixed model (5.2.1) have been developed in recent years and we provide a brief account. Müller, Scealy, and Welsh (2013) provide an excellent review of methods for model selection in linear mixed models. The AIC for the general linear mixed model (5.2.1) uses either the marginal log likelihood l( , 𝜹) or the marginal restricted log likelihood, lR (𝜹), based on normality of random effects v and errors e. In the case of l( , 𝜹), it is given by mAIC = −2l( ̂ ML , 𝜹̂ ML ) + 2(q∗ + p),
(5.4.1)
where p is the dimension of and q∗ is the effective number of variance parameters 𝜹, that is, those not estimated to be on the boundary of the parameter space. In the ̂ and case of lR (𝜹), the mAIC is given by (5.4.1) with l( ̂ ML , 𝜹̂ ML ) changed to lR (𝜹) ∗ ∗ q + p to q . Several refinements to mAIC have also been proposed (see Müller et al. 2013). The BIC is obtained from (5.4.1) by replacing the penalty term 2(q∗ + p) by log(n)(q∗ + p) in the case of l( , 𝜹) and by log(n∗ )q∗ in the case of lR (𝜹), where n∗ = n − p. Models with smaller mAIC or BIC values are preferred. Conditional AIC is more relevant than AIC when the focus is on estimation of the realized random effects v and the regression parameters . As noted in Chapter 4,
112
EBLUP THEORY
small area estimation falls into this category. In this case, the marginal log likelihood l( , 𝜹) is replaced by the conditional log likelihood l( , 𝜹|v) obtained from the conditional density f (y|v, , 𝜹): 1 l( , 𝜹|v) = const − [log |R| + (y − X − Zv)T R−1 (y − X − Zv)] 2
(5.4.2)
For given 𝜹, we can write X ∗ + Zv∗ = H1 y, where ∗ and v∗ are the solutions of the mixed model equations (5.2.8) and H1 = H1 (𝜹) is treated as a “hat” matrix. Effective degrees of freedom used in estimating and v are defined as ′
′
𝜚(𝜹) = tr[H1 (𝜹)] = tr[(X V−1 X)−1 X V−1 RV−1 X] + n − tr(RV−1 ),
(5.4.3)
(see Müller et al. 2013). A naive estimator of 𝜚(𝜹) is taken as 𝜚(𝜹̂ ML ). A simple conditional AIC is then given by cAIC = −2l( ̂ ML , 𝜹̂ ML |̂vH ) + 2[𝜚(𝜹̂ ML ) + q∗ ],
(5.4.4)
where v̂ H = ṽ (𝜹̂ ML ) is the EBLUP estimator of v. Refinements to the penalty term in (5.4.4) have been proposed (see Müller et al. 2013). In particular, for the special case of R = 𝜎 2 In , Vaida and Blanchard (2005) proposed the penalty term 𝛼n,VB ( ̂ ML , 𝜹̂ ML ) =
[ ] 𝜚(𝜹̂ ML ) − p 2n ̂ 𝜚(𝜹ML ) + 1 − , n−p−2 n−p
(5.4.5)
which tends to 2[𝜚(𝜹̂ ML ) + 1] as n → ∞ with p fixed. Note that 𝜚(𝜹) + 1 is the effective degrees of freedom for estimating , v, and the scalar 𝜎 2 , for given 𝜹∗ ∶= 𝜹∕𝜎 2 . Han (2013) studied cAIC for the Fay–Herriot area level model (4.2.5). The proposed cAIC depends on the method of estimating 𝜎𝑣2 , the variance of the random effect 𝑣i in the model. It performs better than the simple cAIC, given by (5.4.4), which ignores the error in estimating 𝜎𝑣2 . Jiang and Rao (2003) studied the selection of fixed and random effects for the general linear mixed model (5.2.1), avoiding the estimation of 𝜹 and using only the ordinary least squares (OLS) estimator of . Two cases are considered: (i) selection of fixed covariates, X, from a set of candidate covariates when the random effects are not subject to selection; (ii) selection of both covariates and random effects. Normality of v and e is not assumed, unlike in the case of AIC and BIC. For case (i), suppose x1 , , xq denote the n × 1 candidate vectors of covariates from which the columns of the n × p matrix are to be selected (p ≤ q). Let X(a) denote the matrix for a subset, a, of the q column vectors, and let ̂ (a) = [X(a)T X(a)]−1 X(a)T y be the OLS estimator of (a) for the model y = X(a) (a) + 𝝐, with 𝝐 = Zv + e. A generalized information criterion (GIC) for the selection of covariates only is given by Cn (a) = [y − X(a) ̂ (a)]T [y − X(a) ̂ (a)] + 𝜆n |a|,
(5.4.6)
*MODEL IDENTIFICATION AND CHECKING
113
where |a| is the dimension of the subset a and 𝜆n is a positive number satisfying certain conditions. GIC selects the subset a∗ that minimizes Cn (a) over all (or specified) subsets of the q column vectors. GIC is consistent in the sense that, if a0 is the subset associated with the true model, assuming that the true model is among the candidate models, then P(a∗ ≠ a0 ) → 0 as n → ∞. The statistic Cn (a) is computationally attractive because the number of candidate models can be very large and Cn (a) does not require estimates of variance parameters, unlike AIC of BIC. GIC, given by (5.4.6), does not work for the case (ii) involving the selection of both fixed and random effects. Jiang and Rao (2003) proposed an alternative method for the general ANOVA model (5.2.15). This method divides the random effect factors into several groups and then applies different model selection procedures for the different groups simultaneously. If the true underlying model is based on a single random factor, as in the case of the basic unit level model (5.3.1), the BIC choice 𝜆n = log n in (5.4.6) satisfies conditions needed for consistency but not the mAIC choice 𝜆n = 2. Fence Method Information-based criteria have several limitations when applied to mixed models, as noted by Jiang, Rao, Gu, and Nguyen (2008): (i) n should be replaced by the effective sample size n∗ because observations are correlated, but the choice of n∗ is not always clear. (ii) A log-likelihood function is used, but this is not available if a parametric family is not assumed. (iii) Finite sample performance of the criteria may be sensitive to the choice of penalty term. Fence methods, proposed by Jiang et al. (2008), are free of the above limitations. The basic idea behind fence methods consists of two steps: (1) Isolate a subgroup of “correct” models by constructing a statistical fence that eliminates incorrect models. (2) Select the “optimal” model from the subgroup according to a suitable criterion. The fence is constructed by first choosing a lack-of-fit measure QM = QM (y, M , 𝜹M ) of a candidate model M with associated parameters M and 𝜹M such that E(QM ) is minimized when M is a true model (a correct model but not necessarily optimal) and M and 𝜹M are the true parameter vectors. A simple choice of QM is QM = |y − XM M |2 , where XM is the matrix of covariates associated with ̂ M = Q(y, ̂ M , 𝜹̂ M ), where ( ̂ M , 𝜹̂ M ) minimize QM . Using Q ̂ M for each M. Now let Q ̃ ∈ such that QM̃ = minM∈ Q ̂ M , where candidate model M, find the model M ̃ = Mf if contains the full model denotes the set of candidate models. Note that M ̃ is Mf . In general, may not contain Mf but, under certain regularity conditions, M a true model with probability tending to one. ̃ does not rule out the possibility of other correct models The identified model M ̃ Therefore, a fence is constructed around M ̃ to elimiwith smaller dimension than M. nate incorrect models, and then select the optimal model from the set of models within the fence using an optimality criterion such as minimal dimension of the model. The fence is defined by the subset of models M belonging to that satisfy the following inequality: ̂M ≤ Q ̂ M̃ + an [𝑣(Q ̂M −Q ̂ M̃ )]1∕2 , Q (5.4.7)
114
EBLUP THEORY
̂M −Q ̂ M̃ ) denotes an estimate of the variance of Q ̂M −Q ̂ M̃ and an is a tuning where 𝑣(Q constant increasing slowly with n. If the minimal dimension criterion is used as optimality criterion, then the implementation of the fence method can be simplified by checking each candidate model, from the simplest to most complex, and stopping the search once a model that falls within the fence is identified, and other models of the same dimension are checked to see if they belong to the fence. However, it is often not easy to construct the fence ̂ M̃ . A simplified adap̂M −Q because of the difficulty in estimating the variance of Q tive fence (see below) may be constructed by replacing the last term of (5.4.7) by a tuning constant cn and then select cn optimally, then avoiding the computation of ̂ M̃ ). Jiang, Nguyen, and Rao (2009) developed a bootstrap method (named ̂M −Q 𝑣(Q adaptive fence) to determine cn . The method consists of the following steps: (i) Generate parametric bootstrap samples from either the full model or any large model known to be correct but not optimal. (ii) For each M ∈ , calculate the proportion of bootstrap samples p∗ (M; cn ) in which M is selected by the simplified fence method with a specified tuning constant cn . (iii) Compute p∗ (cn ) = maxM∈ p∗ (M; cn ) and choose cn that maximizes p∗ (cn ). 5.4.2
Model Diagnostics
In this section, we give a brief account of model diagnostics based on residuals and influ e nc e m e a s u r e s f o r p a r t ic u l a r c a s e s o f t h e g e ne r a l l i
Residual Analysis Ca l v in a nd Se dr a ns k ( 1991) de s c r ib e t w o b a s ic t o de fi ne r e s idu a l s u nde r m o de l ( 5. 3. 1) w it h b l o c k dia g o na Th e fi r s t a p p r o a c h is b a s e d o n t r a ns f o r m ing t h e m o de r e g r e s s io n m o de l a nd t h e n o b t a ining t h e u s u a l r e s idu a l s o n t h e t r a ns f o r m e Vdi = da𝜎te2 aAi. wLeht e 𝜎re2 e de no t e s t h e e r r o r v a r ia n ̃ li (+5.𝝐̃ i3. deX , i 1) = 1,t o , m, w h e r e Us ingAi , w e t r a ns f o r m t h e m ỹoi = −1∕2 −1∕2 −1∕2 ̃ ỹ i = Ai yi , Xi = Ai Xi , a nd𝝐̃ i = Ai (Zi vi + ei ). W e m a y e x p r e s s t h e t ̃e g+r𝝐̃e ws sitioh niid f o r m e d m o de l a s a s t a nda r d l ine a ỹr =rX m e or de r o l r, s 2 𝝐̃ , t h a t 𝝐̃ish, a s m 0e aa nd n c o v a r ia nc e𝜎e m t r eixr e f o r e , w e c a n a p In . aTh s t a nda r d l ine a r r e g r e s s io n m e t h o ds f o r m o de l s e l e c t io n a s e l e c t io n o f fi x e d e f f e c t s , r e s idu a l a na l y s is , a nd influ e nc e ̂s−1∕2 m o de l p a r a m e t e r s a r e no t k no w n a nd 𝝐̂ i r=eA idu(yai −l sXiâ ),r e de fi ne d i −1∕2 −1 −1 ̂ o oa tndoV ̂ f is o b t a ine d b y r e p l a c ing t h e u n ̂ r e is a s qu a r e rV w h eA i i i b y s u it a b l e e s t im a t e s . Ja c qm in- Ga dda e t a v a r ia nc e c o m pV−1 o ne nt s in i −1∕2 ̂ −1 ̂ a s t h e t r ia ng u l a r m a t r ix de r iv e d f r o m t h e V Ch. o l e s k y Ai i Th is t r a ns f o r m a t io n m e t h o d l o o k s a p p e a l ing b e c a u s e r u nc o r r e l a t e d, a nd t h e r e f o r e r e s idu a l p l o t s c a n b e int e r p f o r l ine a r r e g r e s s io n m o de l s . Ho w e v e r , t r a ns f o r m a t io n l w e ig h t e d a v e r a g e s o f m a ny da t a p o int s , a nd t h e r e f o r e if o u t l ie r , it s e f f e c t w il l b e m a s k e d b y t h e s m o o t h ing e f f t h e s e r e s idu a l s o nl y f o c u s o n t h e fi x e d e f f e c t s w h il e r
115
*MODEL IDENTIFICATION AND CHECKING
int e r e s t in m a ny a p p l ic a t io ns . In f a c t , if a m o de l de p a r t u r e r e s idu a l s , it is dif fi c u l t t o s a y if t h is de p a r t u r e is o n t h e dis e f f e c t s o r t h e m o de l e r r o r s . Mo r e o v e r , t h e s e ns it iv it y o t o t h e e s t im a t e s o f v a r ia nc e c o m p o ne nt s h a s no t b e e n w Xi ̃ − Z Th e s e c o nd a p p r o a c h is b a s e d o ẽ in= t hyi e− BLUP r ieṽ isdeidufi anel sd b y Ca l v in a nd Se dr a ns k ( 1991) , a l s o c a l l e d c o ndit io na l r e s idu a e t a l . ( 2007) . Th e s e r e s idu a l s a r e c o r r e l a t e d a nd t h e r e f o r e i f u l , b u t t h e y m a k e u s e o f t h e e s t im a t e d r a ndo m e f f e c t s t h e m o de l e r r o r s ; t h u s , t h e y c a n b e u s e d m o r e s p e c ifi c a dis t r ib u t io n o f t h e m o de l e r r o r s . In t h e ANOVA m o de l ( 5. 2.r15) indewp ite hnde nt r a ndo m f a c t o r s a nd w i m a l it y , Ze w o t ir a nd Ga l p in ( 2007) s t u die d t h e de t e c t io n o f o u t n f1,o r , r, t h e y p o int s . As s u m ing t h a t t h 𝜆ek = v 𝜎a k2r∕𝜎iae2 ancr ee rk ano t iowks = no t e d t h a t BLUP r eẽ s=idu y −aỹ ,l sf o ỹ r= X ̃ − Z̃v s a t isẽf=y Sy ∼ N(𝟎, 𝜎e2 S), e (I d− t hS)y, a t o r e qu iv a l e nt l iyt h, f o r t h e f o Sr = (sij ) = 𝜎e2 P. Th e y a l s o no ỹt = ∑n o b s e r v a t io n, ỹ w e h a v e = (1 − s )y + s y , w h e ≤ r esii 0≤ 1 a ndsij → 0 f o r i ii i j≠i ij j yi wt a it ph oa ints sm a sliil a vt ta rl au ce t t h e i ≠ j w h esii n→ 0. Th is m e a ns t h a t da p r e dic t e d v a l u e t o w a r d t h e m s e l v e s , t h a t is , t h e y h a v e t r gix o n r e g r e s s io n. Ba s e d o n t h is , t h e y p r o p o s e stiioo uf st he et hme adia S = 𝜎e2 P t o de t e c t h ig h l e v e r a g e p o int s . To de t e c t b o t h o u t p o int s , t h e y p r o p o s e t o siil ov oe kr sẽau2i ∕̃ tesTaẽ . pPol ointt so af r e e x p e c t e d t c o nc e nt r a t e a r o u nd t h e u p p e r - l e f t c o r ne r o f t h e p l o t . Po c l o u d o f p o int s t h a t f a l l in t h e l o swii ) ea rr -e l er ef tg ca or de r ned ar s( shmig ah l l e a g e p o int s , a nd p o int s t h a t a p p e a r s e p a r a t e d o n t h e r ig h t sii a r o u g r e s iduẽ 2ia∕̃elT ẽ ) a r e r e g a r de d a s o u t l ie r s . Th e y p r o v ide b e l o{ w w h ic h a n o b s e }r v a t io n is c o ns ide r e d a s a(1h− ig h l e v e ∑ 2p∕n) 1 − rk=1 [𝜎e2 ∕𝜎k2 + n∕(2hk )]−1 . Co nc e r ning √r e s idu a l s , Ze w o t ir a nd Ga l p in ( 2007) de fi ne int e r √ ti∗ =dẽri ∕e𝜎̂ se(i)idusiia, lws h e r e r e s idu tia=l es̃ i ∕𝜎̂ e sii a nd e x t e r na l l y St u de nt ize ∑ 2 is t h e s a m e e s t im a t 𝜎̂ e2 = n−1 (y − X ̃ )T (I + rk=1 𝜆k Zk ZTk )−1 (y − X ̃ ) a nd 𝜎̂ e(i) w it h o b s e ir rveamt ioo nv e d f r o m t h e da t a . Th e t w o t y p e s o f s r e s idu a l s a r e r e l a t e d m ti2o =non[1t o+ (n nic−a1)∕(t l l yi∗ )2 a]−1s. W h e n v a r ia nc e n−1 r a t io𝜆ks = 𝜎k2 ∕𝜎e2 a r e k no (twi∗ )2n,∼ n−p−1 (1, n − p − 1) a nd w h e n v a r ia nc e d
a snML, → ∞. Ba s e d o n t h e c r it ic a l c o m p o ne nt s a r e e s t im (ti∗a)2t −e−→d u 2 s(1)ing √ v a l u e 2 f o r t h et dis St tur de ib nt u t io n, t h e y g iv e t h 4n∕(n e c u−tpo−f3) f p o int f o tri , a b o v e w h ic h a n o b s e r v a t io n is c o ns ide r e d a s a n o u t t e ca tx a s ing l e o u t l ie r . Fo r t h e r a n a f o r m a l t e s t b ai |tsi |et do o de nm vk , k = 1, , r, Ze w o t ir a nd Ga l p in ( 2007) a r g u e d t h a t t h e BLUP ṽ k = 𝜆k ZTk Sy = 𝜆k ZTk ẽ c a n b e int e r p r e t e d a s r e s idu a l s a nd c a n t h e dia g no s t ic p u r p o s e s t o de t e c t s u b je c t s ( b l o c k s ) t h a t e o t h e r s u b je c t s in t h e da t a s e t . Th e y p r o p o s eṽ k tboy s ttha enda r diz s qu a r e r o o t o f t h e dia g o na l e l e m e nt sV(̃ ovk )f =t h𝜎e2e𝜆2k Zc TkoSZvk .a r ia nc e W h e n u s ing ML e s t im a t e s o f t h e v a r ia nc e c o m p o ne nt s , t h j t h l e v e l o f t hvk ea sf aa cnt oo ur t l ie r if t h e s t a nda r dize d BLUP e s t im a
116
EBLUP THEORY
t(1 − 𝛼∕2; n − r a nk [X Z]), w h et(𝛽; r em) de no t e s t h e 𝛽 l loe wv ee lr o f a St u de nt t dis t r ib u t io nmwdeitgh r e e s o f f r e e do m . In p r a c t ic e , w e c a l c u ̂ − Ẑ v e r s io ns o f r e s idu a l s ( EBLUP ê = yr − eX s idu a vl, s a) ,nd o f t h e BLUP ̂ f o Ŝr = 𝜎̂ e2 P, ̂ b u t th e a b o v e re s u lts fo r e s t im a tvok , rg oivf e nv̂ bk =y 𝜆̂ k ZTk Sy 𝜆k s e r v e a s a p p r o x im a t io ns . To c h e c k t h e no r m a l it y a s s u m p t io n f o r t h e r a ndo m e f f c ia l c a s e o f m o de l R(i 5. it ih= 𝚫 f o r ai, l La = 3. 𝜎 21) Ii awndG l ng e a nd Ry a n ( 1989) de v e l o p e d w e ig h t e d no r m a l p r o b a b il it y p l o t s . Th e b y Ca l v in a nd Se dr a ns k ( 1991) , u s e s t h e BLUP e s t im a t e s o f l ine r a ndo m e vf if se uc itt as b l y s t a nda r dize d b y t h e ir e s t im a t e d s t a nda r c a r ding t h e e f f e c t o f, te hs et im v ea ct ting o r o f e s t im a t e dṽ rBi a= ndo m e f T −1 T Ve−1nZb𝚫.y Th e y a tvrBi )ix= 𝚫Z g iv Gi Zi Vi (yi − Xi ), w it h c o v a r ia nc e m V(̃ i i i B T de fi ne a no r m a l Q–Q zi =p cl oṽ i t, fo of r a s u it a b l y c h o s e n v c, e c to r o f v e r sΦu−1s[Fm∗ (zi )], w h Φ e rise t h e s t a nda r d no r m a l c u m u l a t iv e dis t r ib de afil ne c . dd.af s. o f ( c . d. f . ) Fam∗ (z) nd is a w e ig h t e d e m p irzi ic ∑m Fm∗ (z)
=
i=1 𝑤i I(zi ≤ ∑m i=1 𝑤i
z)
,
w it h w e 𝑤igi =h ctT sV(̃vBi )c, w h eI(zri e≤ z) = 1 if zi ≤ z a ndI(zi ≤ z) = 0 o t h e r o m w is e . Th e y p r o v e l a rmg→e ∞) s a no m r pml ea (lF̂itfm∗ oy(z)ro fb t a ine d fFm∗r (z) b y e s t im a t ing t h e u nk no w n m o de l p a r a m e t e r s , a nd p r o p o no r m a l Q–Q p l o t b a s e d o n t h e e s t im a t e d l a r gF̂ m∗e(z).s aFom r -p l e s t a m a l no r m a l it y t e s t s f o r t h e r a ndo m e f f e c t s a nd m o de l e r r a nd Ha r t ( 2009) .
fl D g Co o k ’s ( 1977) dis t a nc e is w ide l y u s e d in s t a nda r e g r e s s io n t o s t u dy t h e e f f e c t o f de l e t ing a. Ba “c anes reje” eo n t h e a nd Fr e e s ( 1997) e x t e nde d Co o k ’s dis t a nc e t o t h e m o de l ( 5. 3. 1) c o v a r ia nc e s t r u c t u r e t o s t u dy t h e e f f e c t o f de l e t ing a b l o o f. i. Thb el oinflu c k e nc e oi of bn tlho ec k Le t̂ (i) b e t h e e s t im aaf tt oe rr odr fo p p ing ∑ TV ̂ −1 Xi )−1 ∑m XT V ̂ −1 yi m a y b e m e a s u r e d b y a Ma h a e s t im ̂a=t (e m X i=1 i i i=1 i i dis t a nc e g iv e n b y 1 Bi ( ̂ ) = ( ̂ − ̂ (i) )T p
(m ∑
) ̂ −1 Xi XTi V i
( ̂ − ̂ (i) )
( 5. 4. 8)
i=1
No t e t B h i (â )t is t h e s qu a r e d dis t ̂a tnc o ̂ e(i) rf er ol amt iv e t o t h e e s t im a t e d c ∑m T ̂ −1 −1 a i (ŝ u) mr e a y b e s im p l ifi e d t o a nc e m (a t i=1 r ixXi Vi Xi ) o f̂ . Th e m e B Bi ( ̂ ) =
1 ̂ i − Hi )−1 Hi (V ̂ i − Hi )−1 (yi − Xi ̂ ), (y − Xi ̂ )T (V p i
( 5. 4. 9)
117
*MODEL IDENTIFICATION AND CHECKING
w h e re
(m ∑
)−1 ̂ −1 Xi XTi V i
Hi = Xi
XTi
( 5. 4. 10)
i=1
An inde x p l o t o f t h eBi (m̂ ) feoair= s u1, r e , m is u s e f u l in ide nt if y ing influ e nt i b l o c k s ( o r s m a l l a r e a s ) . No t e t hVi aist cwo er ra er ce t al ys ss up me cing ifi et hd. a Ch r is t ia ns e n, Pe a r s o n, a nd Jo h ns o n ( 1992) s t u die d c a s e - de l e m ix e d ANOVA m o de l s . Th e y e x t e nde d Co o k ’s dis t a nc e t o m e a fi x e d e f f e c t s a s w e l l a s o n t h e v a r ia nc e c o m p o ne nt s . Co a l s o p r o v ide d. Th e s e m e a s u r e s a r e u s e f u l f o r ide nt if y ing c a se s). Co o k ( 1986) de v e l o p e d a “l o c a l ” influ e nc e a p p r o a c h t o dia t o w h a t e x t e nt s l ig h t p e r t u r b a t io ns t o t h e m o de l c a n influ m a n, Na c h t s h e im , a nd Co o k ( 1987) a p p l ie d t h is a p p r o a c h m o de l s 𝝎. bLee t qa × 1 v e c t o r o f p e r t u lr𝝎b( ,a𝜹)t io b ens t ah nd e c o r r e s p o nding l o g - l ik e l ih o o d f u nc t io n. Fo r e x a m p l e , t o c h e c k t h e a s s e r r o r v a r ia nc e s in t h e ANOVARm= o𝜎e2de a t r iso ,du c e p e r t u r b a t i In , lw, t eh int = 𝜎e2 D(𝝎), w h eD(𝝎) r e = dia g(𝜔1 , , 𝜔n ). W e de no t e t h e “nu l l ” p e t h e f oRr𝝎 m t u r b a t io ns t h a t y ie l d t h e o𝝎0r sigo ina o de y t h e o r ig ina l t hll𝝎a0m ( t , 𝜹) = l(l b, 𝜹), = 1 in t h e a b o v e e x a m p le . l o g - l ik e l ih o o d f u nc t𝝎io n. No t e t h a t 0 Th e influ e nc e o f t h e p𝝎ec rat nu br be aa tsios en s s e d b y u s ing t h e “l ik e p l a c e m e nt ” g iv e n b y ̂ − l( ̂ 𝝎 , 𝜹̂ 𝝎 )], LD(𝝎) = 2[l( ̂ , 𝜹) ( 5. 4. 11)
t o ur snde o fr t h e p e r t u r b e d l o g - l ik w h e( ̂r𝝎e, 𝜹̂ 𝝎 ) a r e t h e ML e s t im aa nd𝜹 l𝝎 ( , 𝜹). No t e t h a(𝝎) t is LDno nne g a t iv e a nd a c h ie v e s it s m inim u m v a ̂y f r o m 𝝎 = 𝝎0 . La r g e v a l u (𝝎) e s indic o f aLDt e t( ĥ 𝝎a, 𝜹̂t𝝎 ) dif f e r c o ns ide r a (b̂ , l𝜹) r e l a t iv e t o t h e c o nt o u r s o f t h e u np l( e, 𝜹). r t Ra u r tbh eedrl ot hg a- nl ik c ae l lcihu o l a t ing (𝝎), LD w e e x a m ine t h e b (𝝎) e h aa sv aio fru oncf tLD 𝝎 io f no or f v a l u e s t h a t a r e “l o c a𝝎0l ;”int op a r t ic u l a r , w e e x a m ine (𝝎). c u Le r v𝝎 t a0 + t ua 𝝂 r eb se o f LD a v e c t o r t𝝎 h 0 rino tuh ge hdir e c𝝂.t io Thn e n, t h e “no r m a l c u r v a t u r e ” o f (𝝎T , LD(𝝎)) in t h e dir e c t 𝝂ioisn go iv f e nb y C𝝂 = 𝜕 2 LD(𝝎0 + a𝝂)∕𝜕a2 |a=0
( 5. 4. 12)
La r g e v a l Cu𝝂 eindic s o af t e s e ns it iv it y t o t h e indu c e d p e r t u r b a t io ns 𝝂. W e fi nd t h e l a r g eCsmax t cand u itr vs ac tou r rr ee s p o nding𝝂 max dir. eAn c tinde io nx p l o t o f t h e e l e me nt s o f t h e 𝝂no r mal ize d v e c t o r ∕‖𝝂 ‖ is u s e f u l f o r ide nt if y ing max max influ e nt ial p e r t u r b at io ns ; l ar g e e l e me nt s indic at e influ e nt ial p e r t u Be c k man e t al . ( 1987) f o r c o mp u t at io nal de t ail s . Har t l e s s , Bo o t h ap p l ie d t h e ab o v e me t h o d t o t h e dat a o f c o u nt y c r o p ar e as o t u r b ing t h e e r r o r v ar ianc eR𝝎s = , t𝜎he2 D(𝝎), at is ,t hu es ing y ide nt ifi e d an e r r o ne o o b s e r v at io n in t h e o r ig inal dat a ( s e e Examp l e 7. 3. 1 o f Ch ap t e r 7)
118
5.5
EBLUP THEORY
*SOFTWARE
PROC MIXED ( SAS/STAT 13. 1 Us e r s Gu ide 2013, Ch ap t e r 63) imp l e me nt s M REML e s t imat io n o f mo de l p arand ame𝜹 ft eo rr s t h e l ine ar mixe d mo de l ( 5. 2. 1) u s ing t h e Ne w t o n–Rap h s o n al g o r it h m. Inv e r s e o f t h e o b s e r ̂ ame is u s e d t o e s t imat e t h e c o v ar ianc e mat r ix o ̂f and p ar𝜹. SCORt e r e s t imat e ING o p t io n in PROC MIXED u s e s ins t e ad Fis h e r - s c o r ing al g o r it h m and 𝜹, and t h e inv e r s e o f t h e e xp e c t e d inf o r mat io n mat r ix t o e s t im ̂ sy) toh fe EBLUP mat r ix o f p ar ame t e r e s t imat e s . PROC MIXED al s o 𝜇ĝ =ivt(e𝜹, ̂imat o r , ms e a s p e c ifi e d p ar𝜇 ame = lT t e+ rmT v al o ng w it h t h e naiv e MSE e Ns (t𝜇), g iv e n b y ( 5. 2. 35) . Op t =KENW io n DDFM ARDROGER c o mp u t e s t h e s e c o nd- o u nb ias e d MSE e s t imat o r ( 5. 2. 39) , u s ing t h e e s t imat ê RE d cand o v ar ianc e m 𝜹̂ RE b as e d o n t h e o b s e r v e d inf o r mat io n mat r ix; no t e t h at ( 5. 2. 39) i ando𝛿̂ML u nb ias e d if t h e ML e ŝ ML t imat r s ar e u s e d. Mo de l s e l e c t io n c r it e r ia mAIC and BIC f o r ML and REML me t h o Se c t io n 5. 4. 1) ar e al s o imp l e me nt e d in PROC MIXED; mAIC is de no t e d PROC MIXED w ar ns ag ains t me c h anic al u s e o f mo de l s e l e c t io n c r i t h at “s u b je c t mat t e r c o ns ide r at io ns and o b je c t iv e s ar e o f g r s e l e c t ing a mo de l . ” Se v e r al mo de l diag no s t ic s s u c h as p l o t s o f w it h in- ar e a r e s id ab il it y p l o t s o f e s t imat e d r e s idu al s c an al s o b e o b t aine d w it h PLOTS o p t io n) . INFLUENCE o p t io n in t h e MODEL s t at e me nt c an b e u s p u t e c as e - de l e t io n diag no s t ic s ( s e e Se c t io n 5. 4. 2 f o r s o me mo in t h e l ine ar mixe d mo de l w it h a b l o c k diag o nal c o v ar ianc e s t r u c Mu k h o p adh y ay and Mc Do w e l l ( 2011) il l u s t r at e s mal l ar e a e s b as ic ar e a l e v e l mo de l ( 4. 2. 5) and t h e u nit l e v e l mo de l ( 4. 3. 1) u Th e y al s o c o ns ide r e d u nmat c h e d ar e a l e v e l mo de l s , s t u die d in t h e MCMC p r o c e du r e in SAS. Sp e c ifi c SAS s o f t w ar e f o r s mal l ar e a e s t imat io n h as b e e n de v Hidir o g l o u , and Yo u ( 2014) . Th is s o f t w ar e c o v e r s EBLUP and p s e mat io n and l ine ar izat io n- b as e d MSE e s t imat io n f o r t h e b as ic ar e a l e v and t h e b as ic u nit l e v e l mo de l ( 4. 3. 1) inc l u ding s o me mo de l diag no In R s t at is t ic al s o f t w ar e ( R Co r e Te am 2013) lme f, rt oh m e tfhu encp t ac io kn ag e nlme ( Pinh e ir o e t al . 2014) fi t s t h e l ine ar mixe d mo de l ( 5. 3. 1) w it h t he at c o v ar ianc e mat r ic e s c m o bmp l ooc ske sd and o f w it h iid r ando vi , m f f is e c, w t s it h Gi = 𝜎𝑣2 I, i = 1, , m, f o𝜎r𝑣2 ≥ 0 u nk no w n. Th e f u nc t io n is de s ig ne d f o r ne do m e f f e c t s and it ac c e p t s al s o r ando m s l o p e s and g e ne r al c o r r r iuo cnlt yu rdee po ef nds i t h or on u g h it s dime ni . Th ns eio fnu nc Ri , as l o ng as t h e s t R t io n u s e s t h e E- M al g o r it h m t o o b t ain s imu l t ane o u s l y t h e e s ing th e p ar ame t 𝜹, e r fis xe d e f f e, cand t sr ando m e f fvei . cSp t s e c if ymethod=ML, E- M al g o r it h m o b t ains t h e ML 𝜹, 𝜹̂eML s ,t w imath ee or ef as method=REML s e t t ing o r o mit t ing t h is o p t io n, t h e E- M al g o r it h m imp l e me nt s t h e REML ap s ov far e t h e n o b t aine d u s ing e qu at io ns ( 5. 2. 5) and 𝜹̂ RE . Th e e s t imat e and c h o ic eFoo rf mo r e de t ail s w it𝜹hr e p l ac e𝜹̂ ML d b oy r𝜹̂ RE , de p e nding o n t h e method. o n t h e fi t t ing al g o r it h m, s e e Lair d and W ar e ( 1982) .
119
PROOFS
Fu nc t io lme n al s o r e t u r ns t h e u s u al g o o dne s s - o f - fi t me as u r e s t h e f u nc t io n o u t p u t , BIC, and t h e v al u e o f t h e l o g - l ik e l ih o o e t e r s . Th e o u t p u t o f t h e l ine ar mixe d mo delme. l fi Fit t ist eandov bal jeu ce ts o f c ̂ ̂ ŷ i and EBLUP r e s idu êal v = s y − X , i = 1, , m, c an b e o b t aine d b y ap p l y − Z i i i i i ing , r e s p e c t iv e l y , fitted() t h e f u ncand t ioresid() ns t o t h at o b je c t . Th e y c a t h e n au t o mat ic al l y b e u s e d t o o b t ain u s u al diag no s t ic p l o t s b p l o t t ing f u nc t io ns . Line ar , g e ne r al ize d l ine ar , and no nl ine ar mixe d mo de l s c an al s o b e l y ze d u s ing f u lmer, nc t io glmer, ns and nlmer, r e s p e c t iv e l y , f r o m R p ac k lme4 ( Bat e s e t al . 2014) . Th is p ac k ag e is de s ig ne d as a me mo r y - e f t o t h e me ntnlme io ne pd ac k ag e . Th e R p ac ksae ag (eMo l ina and Mar h u e nda 2013; Mo l ina and Mar h u e nda 2015) is s p e c ifi c f o r s mal l ar e a e s t imat io n. It e s t imat e s ar e a me ans u nde o f mo de l ( 5. 3. 1) t h at ar e w ide l y u s e d in s mal l ar e a e s t imat io n, nam l e v e l mo de l ( 4. 2. 5) and t h e u nit l e v e l mo de l ( 4. 3. 1) . It al s o inc l mo r e c o mp l e x ar e a l e v e l mo de l s w it h s p at ial and s p at io t e mp o r w it h e s t imat io n o f g e ne r al no nl ine ar s mal l ar e a p ar ame t e r s u nde r mo de l ( 4. 3. 1) b as e d o n t h e t h e o r y g iv e n in Se c t io n 9. 4. Th e p a f u nc t io ns f o r MSE e s t imat io n o f s mal l ar e a e s t imat o r s . In t h e c as l e v e l mo de l , MSE e s t imat e s ar e o b t aine d anal y t ic al l y , w h e r e as f b o o t s t r ap p r o c e du r e s ar e imp l e me nt e d. Final l y , it inc l u de s al s mal l ar e a e s t imat o r s s u c h as t h e dir e c t Ho r v it z–Th o mp s o n e c o u nt - s y nt h e t ic p o s t s t r at ifi e d e s t imat o r s ( 3. 2. 7) , and s amp l e t o r s de fi ne d b y ( 3. 3. 1) w𝜙iitgh iv we ne in ig (h3.t 3. s 8) . Examp l e s o f u s e o f f u nc t io ns insae t h pe ac k ag e ar e inc l u de d in t h e r e maining So f t w ar e th e b o o k .
5.6 5.6.1
PROOFS Derivation of BLUP
A l ine ar e s t imat 𝜇̂ =oaTr y + b is u nb ias e 𝜇d f=olTr + mT v u nde r t h e l ine ar mixe d X = lT and b = 0. Th e MSE o f mo de l ( 5. 2. 1) , t E( h 𝜇) ̂at =isE(𝜇), , if and o nl yaT if a l ine ar u nb ias e d e s𝜇̂ tisimat g iv o re n b y MSE(𝜇) ̂ = V(𝜇̂ − 𝜇) = aT Va − 2aT ZGm + mT Gm T X s= slT cu os ndit Minimizing V(𝜇̂ − 𝜇) s u b je c t t o t h e u nb ias e adne n r ang e ing io Lag mu l t ip l 𝝀, ie rw 2 e g e t Va + X𝝀 = ZGm
and s o l v inga w f o er o b t ain a = −V−1 X𝝀 + V−1 ZGm
( 5. 6. 1)
120
EBLUP THEORY
Su b s t it u t ing ( 5.a 6. int1)o ft oh re c o ns aT tXr =aint lT , w e no w s o𝝀:l v e f o r 𝝀 = −(XT V−1 X)−1 l + (XT V−1 X)−1 XT V−1 ZGm
( 5. 6. 2)
Ag ain, s u b s t it u t ing (𝝀 5.in6.( 5. 2) 6.f 1) o rand u s ̃ing = (XT V−1 X)−1 XT V−1 y, w e o b t ain aT y = lT ̃ + mT GZT V−1 (y − X ̃ ), w h ic h is ide nt ic al t o t h e BLUP g iv e n b y ( 5. 2. 4) .
5.6.2
Equivalence of BLUP and Best Predictor E(mT v|AT y)
Ty w itAahT X = 0, w e s h o w h e r e t h at t h e BP e s t Unde r t h e t r ans f o rAme d dat T T T E(m v|A y), o m f v r e du c e s t o t h e BLUP meT ṽs, twimat h ṽeo isrr eg iv e n b y ( 5. 2. 6) . Tny dis t r ib u t g ivio eAnal Unde r no r mal it y , it is e as y t o v e r if y t h at t h e cmoT vndit is no r mal w it h me an
E(mT v|AT y) = CA(AT VA)−1 AT y,
( 5. 6. 3)
w h eCr =e mT GZT . Sinc eV1∕2 A and V−1∕2 X ar e o r t h o g o nal t o e ac h o t h e r , V1∕2 AXT V−1∕2 = 𝟎, and r ank (V1∕2 A) + r ank(V−1∕2 X) = n, t h e f o l l o w ing de c o mp s it io n o f p r o je c t io ns h o l ds : I = PV1∕2 A + PV−1∕2 X , w h ePBr e= B(BT B)−1 BT f o r a mat B. r ixHe nc e , I = V1∕2 A(AT VA)−1 AT V1∕2 + V−1∕2 X(XT V−1 X)−1 XT V−1∕2 , o r e qu iv al e nt l y , A(AT VA)−1 AT = V−1 − V−1 X(XT V−1 X)−1 XT V−1
( 5. 6. 4)
Re p l ac ing ( 5. 6. 4) in ( 5. 6. 3) , w e o b t ain E(mT v|AT y) = C[V−1 − V−1 X(XT V−1 X)−1 XT V−1 ]y = CV−1 (y − X ̃ ), w h ic h do e s no t de p e nd o n t h eA cand h ito isic e qu o falmat t o r tixh e BLUP e s t imat o mT ṽ . Th is p r o o f is du e t o Jiang ( 1997) .
121
PROOFS
5.6.3
Derivation of MSE Decomposition (5.2.29)
Th e r e s u l t o f Se c t io n 5. 6. 2 may b e u s e d t o p r o v ide a s imp l e p r o ̂ = MSE[t(𝜹)] + E[t(𝜹) ̂ − t(𝜹)]2 g iv e n in ( 5. 2. 29) , 𝜹 ̂ ish ae r e p o s it io n [t( MSE 𝜹)] w T ̂ ̂ ̂ ̂ ̃ ̂ ̂ f u nc t io An oy. f De fi V ne∶= V(𝜹), ∶= (𝜹) and v̂ ∶= ṽ (𝜹). Al s o , wt(𝜹) r it−e𝜇 = ̂ − t(𝜹) + t(𝜹) − 𝜇. Fir s t , no t et(𝜹) ̂t h− at t(𝜹) t(𝜹) = lT ( ̂ − ̃ ) + mT (̂v − ṽ ) is a f u nc t io n oAfT y. Th is f o l l o w s b y w r it ing ̂ −1 − (XT V−1 X)−1 XT V−1 ](y − X ̃ ) ̂ −1 X)−1 XV ̂ − ̃ = [(XT V
̂band and no t ing t h at V o tyh− X ̃ ar e f u nc t ioATnsy. oSimil f ar lv̂y−, ṽ al s o de p e nds T h e −r𝜇ig= hlT (t ̃- − h and s ide o o nl y A o ny. Fu r t h e r mo r e , t h e fi r s t t e r m o n t t(𝜹) T T T 𝟎, and io nt h e l as t ) + m (̃v − v) is inde p e nde Ant yob f e c au s e o f t h eA cXo=ndit T [E(v|A T y) − v], u s ing t h e r e s u l t o f Se c t io n 5. 6. 2. He nc e , t e r m e qumal s ̂ = MSE[t(𝜹)] + E[t(𝜹) ̂ − t(𝜹)]2 MSE[t(𝜹)] ̂ − t(𝜹)]lT E( ̃ − |AT y)} + 2 E{[t(𝜹) ̂ − t(𝜹)]mT E[E(v|AT y) − v|AT y]} + 2 E{[t(𝜹) ̂ − t(𝜹)]2 , = MSE[t(𝜹)] + E[t(𝜹)
( 5. 6. 5)
̃ − |AT y) = E( ̃ − ) = 𝟎 and E[E(v|AT y) − v|AT y] = 𝟎. no t ing t E( h at Th e ab o v e p r o o f o f ( 5. 6. 5) is du e t o Jiang ( 2001) . Kac k ar and Har a s o me w h at dif f e r e nt p r o o f o f ( 5. 6. 5) .
6 EMPIRICAL BEST LINEAR UNBIASED PREDICTION (EBLUP): BASIC AREA LEVEL MODEL
We presented the empirical best linear unbiased prediction (EBLUP) theory in Chapter 5 under a general linear mixed model (MM) given by (5.2.1). We also studied the special case of a linear MM with a block diagonal covariance structure given by (5.3.1). Model (5.3.1) covers many small area models used in practice. In this chapter we apply the EBLUP results in Section 5.3 to the basic area level (4.2.5), also called the Fay–Herriot (FH) model because it was first proposed by Fay and Herriot (1979) (see Example 4.2.1). Section 6.1 spells out EBLUP estimation and also gives details of the major applications mentioned in Examples 4.2.1 and 4.2.2, Chapter 4. Section 6.2 covers second-order unbiased mean squared error (MSE) estimation, using general results given in Section 5.3.2. Several practical issues associated with the FH model are studied in Section 6.4 and methods that address those issues are presented.
6.1
EBLUP ESTIMATION
In this section we consider the basic area level model (4.2.5) and spell out EBLUP estimation, using the results in Section 5.3, Chapter 5, for the general linear MM with block diagonal covariance structure.
124
EBLUP: BASIC AREA LEVEL MODEL
6.1.1
BLUP Estimator
The basic area level model is given by 𝜃̂i = zTi 𝜷 + bi 𝑣i + ei ,
i = 1, … , m,
(6.1.1) iid
where zi is a p × 1 vector of area level covariates, area effects 𝑣i ∼ (0, 𝜎𝑣2 ) are indepenind dent of the sampling errors ei ∼ (0, 𝜓i ) with known variance 𝜓i , 𝜃̂i is a direct estimator of ith area parameter 𝜃i = g(Y i ), and bi is a known positive constant. Model (6.1.1) is obtained as a special case of the general linear MM with block diagonal covariance structure, given by (5.3.1), by setting yi = 𝜃̂i ,
Xi = zTi ,
Zi = bi
and vi = 𝑣i ,
ei = ei ,
𝜷 = (𝛽1 , … , 𝛽p )T .
Furthermore, in this special case, the covariance matrices of vi and ei become scalars, given by Gi = 𝜎𝑣2 , Ri = 𝜓i and the variance–covariance matrix of yi = 𝜃̂i becomes Vi = 𝜓i + 𝜎𝑣2 b2i . Also, the target parameter is in this case 𝜇i = 𝜃i = zTi 𝜷 + bi 𝑣i , which is a special case of the general parameter lTi 𝜷 + mTi vi with li = zi and mi = bi . Making the above substitutions in the general formula (5.3.2) for the BLUP estimator of 𝜇i , we get the BLUP estimator of 𝜃i as ̃ 𝜃̃iH = zTi 𝜷̃ + 𝛾i (𝜃̂i − zTi 𝜷)
(6.1.2)
̃ = 𝛾i 𝜃̂i + 1( − 𝛾i )zTi 𝜷,
(6.1.3)
where 𝛾i = 𝜎𝑣2 b2i ∕(𝜓i + 𝜎𝑣2 b2i )
(6.1.4)
and 𝜷̃ is the best linear unbiased estimator (BLUE) of 𝜷, given in this case by [
𝜷̃ =
̃ 𝑣2 ) 𝜷(𝜎
]−1 m ] m ∑ ∑ ( ( ) ) T 2 2 2 2 = zi zi ∕ 𝜓i + 𝜎𝑣 bi zi 𝜃̂i ∕ 𝜓i + 𝜎𝑣 bi . i=1
(6.1.5)
i=1
It is clear from (6.1.3) that the BLUP estimator, 𝜃̃iH , can be expressed as a weighted ̃ where average of the direct estimator 𝜃̂i and the regression-synthetic estimator zTi 𝜷,
125
EBLUP ESTIMATION
the weight 𝛾i (0 ≤ 𝛾i ≤ 1), given by (6.1.4), measures the uncertainty in modeling the 𝜃i ’s, namely, the model variance 𝜎𝑣2 b2i relative to the total variance 𝜓i + 𝜎𝑣2 b2i . Thus, 𝜃̃iH takes proper account of the between-area variation relative to the precision of the direct estimator. If the model variance 𝜎𝑣2 b2i is relatively small, then 𝛾i will be small and more weight is attached to the synthetic estimator. Similarly, more weight is attached to the direct estimator if the design variance 𝜓i is relatively small, or equivalently 𝛾i is large. The form (6.1.2) for 𝜃̃iH suggests that it adjusts the regression-synthetic estimator zTi 𝜷̃ to account for potential model deviation. It is important to note that 𝜃̃iH is valid for general sampling designs because we are modeling only the 𝜃̂i ’s and not the individual elements in the population, unlike unit level models, and the direct estimator 𝜃̂i uses the design weights. Furthermore, 𝜃̃iH is design-consistent because 𝛾i → 1 as the sampling variance 𝜓i → 0. The design-bias of 𝜃̃iH is given by (6.1.6) Bp (𝜃̃iH ) ≈ (1 𝛾i ) z( Ti 𝜷 ∗ 𝜃i ), ̃ is the conditional expectation of 𝜷̃ given 𝜽 (𝜃1 , , 𝜃m )T . It folwhere 𝜷 ∗ E2 (𝜷) lows from (6.1.6) that the design bias relative to 𝜃i tends to zero as 𝜓i → 0 or 𝛾i → 1. Note that Em (zTi ∗ ) Em (𝜃i ), where Em denotes expectation under the linking model 𝜃i zTi + bi 𝑣i , so that the average bias is zero when the linking model holds. The MSE of the BLUP estimator 𝜃̃iH is easily obtained either from the general result (5.3.5) or by direct calculation. It is given by MSE(𝜃̃iH )
E(𝜃̃iH
𝜃i ) 2
g1i (𝜎𝑣2 ) +g2i (𝜎𝑣2 ),
(6.1.7)
where g1i (𝜎𝑣2 )
𝜎𝑣2 b2i 𝜓i ∕ 𝜓( i + 𝜎𝑣2 b2i )
and g2i (𝜎𝑣2 )
1(
𝛾i )2 zTi
𝛾i 𝜓 i
] 1 m ∑ ( ) T 2 2 zi zi ∕ 𝜓i + 𝜎𝑣 bi zi .
(6.1.8)
(6.1.9)
i 1
(∑m ) 1 ̃ i z̃ Ti z̃ i , i 1, , m. Under the regularLet us define z̃ i zi ∕bi and h̃ ii z̃ Ti i 1z ity conditions (i) and (ii) below, the first term in (6.1.7), g1i (𝜎𝑣2 ), is O(1), whereas the second term, g2i (𝜎𝑣2 ), due to estimating , is O(m 1 ) for large m: (i) 𝜓i and bi are uniformly bounded;
(6.1.10)
O(m 1 ).
(6.1.11)
(ii)
suph̃ ii
1≤i≤m
Condition (ii) is a standard condition in linear regression analysis (Wu 1986). Comparing the leading term g1i (𝜎𝑣2 ) 𝛾i 𝜓i with MSE(𝜃̃i ) 𝜓i , the MSE of the direct estimator 𝜃̂i , it is clear that 𝜃̃iH leads to large gains in efficiency when 𝛾i is small, that is, when the variability of the model error bi 𝑣i is small relative to the total variability. Note that 𝜓i is also the design variance of 𝜃̂i .
126
EBLUP: BASIC AREA LEVEL MODEL
The BLUP estimator (6.1.3) depends on the variance component 𝜎𝑣2 , which is unknown in practical applications. Replacing 𝜎𝑣2 by an estimator 𝜎̂ 𝑣2 , we obtain an EBLUP estimator 𝜃̂iH : 𝜃̂ H 𝛾̂i 𝜃̂i + 1( 𝛾̂i )zT ̂ , (6.1.12) i
i
where 𝛾̂i and ̂ are the values of 𝛾i and ̂ when 𝜎𝑣2 is replaced by 𝜎̂ 𝑣2 . If 𝜃i g(Y i ) and Y i is the parameter of interest, then 𝜃̂iH is transformed back to the original scale ̂H ̂H to obtain an estimator of the area mean Y i as Y i h(𝜃̂iH ) ∶ g 1 (𝜃̂iH ). Note that Y i does not retain the EBLUP property of 𝜃̂iH . The EB and HB approaches (Chapters 9 and 10) are better suited to handle nonlinear cases, h(𝜃i ). Fay and Herriot (1979) recommended the use of a compromise EBLUP estimator, 𝜃̂icH , similar to the compromise James–Stein (J–S) estimator (3.4.9) in Chapter 3 and √ √ obtained as follows: (i) take 𝜃̂icH 𝜃̂iH if 𝜃̂iH lies in the interval [𝜃̂i c 𝜓i , 𝜃̂i + c 𝜓i ] √ for a specified constant c (typically c 1); and (ii) take 𝜃̂icH 𝜃̂iH c 𝜓i if 𝜃̂iH < √ √ √ 𝜃̂i c 𝜓i ; (iii) take 𝜃̂icH 𝜃̂iH + c 𝜓i if 𝜃̂iH > 𝜃̂i + c 𝜓i . The compromise EBLUP estimator 𝜃̂icH is transformed back to the original scale to obtain an estimator of the ̂H ith area mean Y i as Y ic h(𝜃̂icH ) g 1 (𝜃̂icH ). We now turn to the case where not all areas are sampled. We assume that the model (6.1.1) holds for both the sampled areas i 1, , m and the nonsampled areas 𝓁 m + 1, , M. For the nonsampled areas, direct estimates 𝜃̂i are not available and as a result we use the regression-synthetic estimator of 𝜃i based on the covariates, z𝓁 , observed from the nonsampled areas: 𝜃̂𝓁RS
zT𝓁 ̂ ,
𝓁
m + 1,
(6.1.13)
, M,
̃ (𝜎̂ 𝑣2 ), obtained from (6.1.5), is computed from the sample data where ̂ {(𝜃̂i , zi ) ;i 1, , m}. If 𝜃𝓁 g(Y 𝓁 ), then the estimator of Y 𝓁 is taken as ̂ RS Y𝓁 h(𝜃̂𝓁RS ) g 1 (𝜃̂𝓁RS ).
6.1.2
Estimation of 𝝈𝒗2
2 can be obtained by noting that A method of moment estimator 𝜎̂ 𝑣m
]
m ∑ (
𝜃̂i
E
zTi
̃
)2
∕ 𝜓( i +
𝜎𝑣2 b2i )
E[a(𝜎𝑣2 ) ] m
i 1
where ̃
2 is obtained by solving ̃ (𝜎𝑣2 ). It follows that 𝜎̂ 𝑣m
a(𝜎𝑣2 )
m
p
p,
127
EBLUP ESTIMATION
2 2 exists. Fay and Herriot iteratively and letting 𝜎̂ 𝑣m 0 when no positive solution 𝜎̃ 𝑣m (1979) suggested the following iterative solution: using a starting value 𝜎𝑣2(0) , define
𝜎𝑣2(k+1)
𝜎𝑣2(k) +
where
1 a∗′ (𝜎𝑣2(k) )
a(𝜎𝑣2(k) )],
p
[m
(6.1.14)
m ∑
a∗′ (𝜎𝑣2 )
b2i (𝜃̂i
zTi ̃ )2 ∕ 𝜓( i + 𝜎𝑣2 b2i )2
i 1 ′
is an approximation to the derivative a (𝜎𝑣2 ); FH used 𝜎𝑣2(0) 0. Convergence of the iterative procedure (6.1.14) is rapid, generally requiring less than 10 iterations. 2 2 , 0), where max(𝜎̃ 𝑣s Alternatively, a simple moment estimator is given by 𝜎̂ 𝑣s ] m m ∑ ∑ ( ) 𝜓 2 1 i 2 𝜎̃ 𝑣s (1 h̃ ii ) (6.1.15) b 1 𝜃̂ z̃ Ti ̂ WLS 2 m p i 1 i i i 1 bi and
)
( ∑ z̃ i z̃ Ti
̂ WLS
1( m
)
∑ z̃ i 𝜃̂i ∕bi
i
i 1
is a weighted least squares estimator of . If bi 1, then (6.1.15) reduces to the formula of Prasad and Rao (1990). Neither of these moment estimators of 𝜎𝑣2 require normality, and both lead to consistent estimators as m → ∞. The Fisher-scoring algorithm (5.2.18) for ML estimation of 𝜎𝑣2 reduces to 𝜎𝑣2(a+1)
𝜎𝑣2(a) + [(𝜎𝑣2(a) )] 1 s( ̃
where
m
(𝜎𝑣2 )
(a)
, 𝜎𝑣2(a) ),
(6.1.16)
4
bi 1∑ 2 2 2 i 1 (𝜎𝑣 b + 𝜓i )2
(6.1.17)
i
and s( ̃ , 𝜎𝑣2 )
m m T̃ 2 ̂ b2i 1∑ 1 ∑ 2 (𝜃i zi ) + b . 2 i 1 𝜎𝑣2 b2 + 𝜓i 2 i 1 i (𝜎𝑣2 b2 + 𝜓i )2 i
i
The final ML estimator 𝜎̂ 𝑣2 ML is taken as 𝜎̂ 𝑣2 ML max(𝜎̃ 𝑣2 ML , 0), where 𝜎̃ 𝑣2 ML is the solution obtained from (6.1.16). Similarly, the Fisher-scoring algorithm for REML estimation of 𝜎𝑣2 reduces to 𝜎𝑣2(a+1)
𝜎𝑣2(a) + [R (𝜎𝑣2(a) )] 1 sR (𝜎𝑣2(a) ),
(6.1.18)
1 tr(PBPB) 2
(6.1.19)
where R (𝜎𝑣2 )
128
EBLUP: BASIC AREA LEVEL MODEL
and
1 1 tr(PB) + yT PBPy, 2 2
sR (𝜎𝑣2 )
where B diag(b21 , , b2m ) and P is defined in (5.2.21) (see Cressie 1992). Asymptotically, (𝜎𝑣2 )∕R (𝜎𝑣2 ) → 1 as m → ∞. The final REML estimator 𝜎̂ 𝑣2 RE is taken as 𝜎̂ 𝑣2 RE max(𝜎̃ 𝑣2 RE , 0), where 𝜎̃ 𝑣2 RE is the solution obtained from (6.1.18). The EBLUP estimator 𝜃̂iH , based on a moment, ML or REML estimator of 𝜎𝑣2 , remains model-unbiased if the errors 𝑣i and ei are symmetrically distributed around 0. In particular, 𝜃̂iH is model-unbiased for 𝜃i if 𝑣i and ei are normally distributed. For the special case of bi 1 and equal sampling variances 𝜓i 𝜓, the BLUP estimator (6.1.3) reduces to 𝜃̃iH
𝛾 𝜃̂i + (1
𝛾)zTi ̂ LS
with 1 𝛾 𝜓∕ 𝜓( + 𝜎𝑣2 ), where ̂ LS is the least squares estimator of . Let S ∑m ̂ zTi ̂ LS )2 be the residual sum of squares. Under normality, it holds that i 1 ( 𝜃i 2 S∕ 𝜓( + 𝜎𝑣 ) 𝜒 2m p . Using this result, an unbiased estimator of 1 𝛾 is given by 1
𝛾∗
𝜓(m
2)∕S,
p
and an alternative EBLUP estimator of 𝜃i is therefore given by 𝜃̂iH
𝛾 ∗ 𝜃̂i + (1
𝛾 ∗ )zTi ̂ LS .
This estimator is identical to the J–S estimator, studied in Section 3.4.2, with guess 2 or 𝜎 2 𝜃i0 zTi ̂ LS . Note that the plug-in estimator 𝛾̂ of 𝛾 obtained using either 𝜎̂ 𝑣s ̂ 𝑣m leads to 1 𝛾̂ 𝜓(m p)∕S, which is approximately equal to 1 6.1.3
𝛾 ∗ for large m and fixed p.
Relative Efficiency of Estimators of 𝝈𝒗2
Asymptotic variances (as m → ∞) of ML and REML estimators are equal: ]
m ∑ 2 V(𝜎̂ 𝑣ML )
2 V(𝜎̂ 𝑣RE )
[(𝜎𝑣2 )] 1
b4i ∕
2
( 2 2 )2 𝜎𝑣 bi + 𝜓i
1
,
(6.1.20)
i 1
where the Fisher information, (𝜎𝑣2 ), is given by (6.1.17). The asymptotic variance 2 is given by of the simple moment estimator 𝜎̂ 𝑣s 2 V(𝜎̂ 𝑣s )
2m
2
m ∑ (𝜎𝑣2 b2i + 𝜓i )2 ∕b4i i 1
(6.1.21)
129
EBLUP ESTIMATION
(Prasad and Rao 1990). Datta, Rao, and Smith (2005) derived the asymptotic variance 2 , as of the FH moment estimator, 𝜎̂ 𝑣m 2 V(𝜎̂ 𝑣m )
] 2 m ∑ ( 2 2 ) 2 2m bi ∕ 𝜎𝑣 bi + 𝜓i .
(6.1.22)
i 1
Using the Cauchy–Schwarz inequality and the fact that the arithmetic mean is greater than or equal to the harmonic mean, it follows from (6.1.20) to (6.1.22) that V(𝜎̂ 𝑣2 RE )
2 2 V(𝜎̂ 𝑣2 ML ) ≤ V(𝜎̂ 𝑣m ) ≤ V(𝜎̂ 𝑣s ).
(6.1.23)
2 , Equality in (6.1.23) holds if 𝜓i 𝜓 and bi 1. The FH moment estimator, 𝜎𝑣m 2 becomes significantly more efficient relative to 𝜎̂ 𝑣s as the variability of the terms 2 relative to the 𝜎𝑣2 + 𝜓i ∕b2i increases. On the other hand, the loss in efficiency of 𝜎𝑣m REML (ML) estimator is relatively small as it depends on the variability of the terms (𝜎𝑣2 + 𝜓i ∕b2i ) 1 .
6.1.4
*Applications
We now provide some details of the applications in Examples 4.2.1 and 4.2.2, Chapter 4. Example 6.1.1. Income for Small Places. The U.S. Bureau of the Census is required to provide the Treasury Department with the estimates of per capita income (PCI) and other statistics for state and local governments receiving funds under the General Revenue Sharing Program. Those statistics are then used by the Treasury Department to determine allocations to the local government units (places) within the different states by dividing the corresponding state allocations. Initially, the Census Bureau determined the current PCI estimate for a place by multiplying the 1970 census estimate of PCI in 1969 (based on a 20% sample) by the ratio of an administrative estimate of PCI in the current year and a similarly derived estimate for 1969. But the sampling error of the PCI estimates turned out to be quite large for places having fewer than 500 persons in 1970, with coefficient of variation (CV) of about 13% for a place of 500 persons and 30% for a place of 100 persons. As a result, the Census Bureau initially decided to set aside the census estimates for these small places and to substitute the corresponding county estimates in their place. But this solution turned out to be unsatisfactory because the census estimates for many small places differed significantly from the corresponding county estimates after accounting for sampling errors. 2 , Fay and Herriot Using the EBLUP estimator (6.1.12) with bi 1 and 𝜎̂ 𝑣2 𝜎̂ 𝑣m (1979) presented empirical evidence that the EBLUP estimates of log(PCI) for small places have average error smaller than either the census estimates or the county estimates. The EBLUP estimator used by them is a weighted average of the direct estimator 𝜃̂i and a regression-synthetic estimator zTi ̂ , obtained by fitting a linear regression
130
EBLUP: BASIC AREA LEVEL MODEL
equation to (𝜃̂i , zi ), where zi (z1i , z2i , , zpi )T with z1i 1, i 1, , m, and the independent variables z2i , , zpi are based on the associated county PCI, tax return data for 1969, and data on housing from the 1970 census. The method used by Fay and Herriot (1979) was adopted by the Census Bureau in 1974 to form updated PCI estimates for small places. This was the largest application (prior to 1990) of EBLUP methods in a U.S. Federal Statistical Program. We now present some details of the FH application and the results of an external ̂ evaluation. First, based on past studies, the CV of the direct estimate, Y i , of PCI was ̂ 1∕2 taken as 3∕N̂ i for the ith small place, where N̂ i is the weighted sample count; Y i and N̂ i were available for almost all places. This suggested the use of logarithmic tranŝ ̂ formation, 𝜃̂i log(Y i ), with V(𝜃̂i ) ≈ [CV(Y i )]2 9∕N̂ i 𝜓i . Second, four separate regression models were evaluated to determine a suitable combined model, treating the sampling variances, 𝜓i , as known. The independent variables, z1 , z2 , , zp , for the four models are, respectively, given by (1) p 2 with z1 1 and z2 log(county PCI); (2) p 4 with z1 , z2 , z3 log(value of owner-occupied housing for the place) and z4 log(value of owner-occupied housing for the county); (3) p 4 with z1 , z2 , z5 log(adjusted gross income per exemption from the 1969 tax returns for the place) and z6 log(adjusted gross income per exemption from the 1969 tax returns for the county); (4) p 6 with z1 , z2 , , z6 . 2 for each of the four models, Fay and Herriot (1979) calculated the values of 𝜎̂ 𝑣m 2 indicates a better average using the iterative algorithm (6.1.14). A small value of 𝜎̂ 𝑣m fit of the regression models to the sample data, after allowing for the sampling errors in the direct estimators 𝜃̂i . For a place of estimated size N̂ i 200, we have 𝜓i 9∕200 2 ∕(𝜎 2 +𝜓) ̂ 𝑣m 1∕2 to the direct 0.045. If one desires to attach equal weight 𝛾̂im 𝜎̂ 𝑣m i T 2 ̂ ̂ 0.045 as estimate 𝜃i and the regression-synthetic estimate z , we then need 𝜎̂ 𝑣m i
2 ) 𝛾̂im 𝜓i , well. In this case, the resulting MSE, based on the leading term g1i (𝜎̂ 𝑣m is one-half of the sampling variance 𝜓i ; that is, the EBLUP estimate for a place of 200 persons has roughly the same precision as the direct estimate for a place of 400 persons. Fay and Herriot used the values of 𝜎̂ 𝑣2 relative to 𝜓i as the criterion for model selection. 2 for the states with more than 500 small places Table 6.1 reports the values of 𝜎̂ 𝑣m of size less than 500. It is clear from Table 6.1 that regressions involving either tax (models 3 and 4) or housing data (models 2 and 4), but especially those involving both types of covariates are significantly better than the regression on the county values alone; that is, model 4 and to a lesser extent models 2 and 3 provide better fits to the 2 -values than model 1. Note that the values of 𝜎 2 for model 4 ̂ 𝑣m data in terms of 𝜎̂ 𝑣m are much smaller than 0.045, especially for North Dakota, Nebraska, Wisconsin, and Iowa, which suggests that large gains in MSE can be achieved for those states. H , from 𝜃̂ H and Fay and Herriot obtained compromise EBLUP estimates, 𝜃̂i∗ i H ̂ then transformed 𝜃i∗ back to the original scale to get the estimate of Y i given ̂H H ). The latter estimates were then subjected to a two-step raking by Y i∗ exp(𝜃̂i∗ (benchmarking) to ensure consistency with the following aggregate sample estimates: (i) For each of the population size classes (< 500, 500 1000, and > 1000),
131
EBLUP ESTIMATION
TABLE 6.1
2 for States with More Than 500 Small Places Values of 𝝈̂ 𝒗m
Model State Illinois Iowa Kansas Minnesota Missouri Nebraska North Dakota South Dakota Wisconsin
(1)
(2)
(3)
(4)
0.036 0.029 0.064 0.063 0.061 0.065 0.072 0.138 0.042
0.032 0.011 0.048 0.055 0.033 0.041 0.081 0.138 0.025
0.019 0.017 0.016 0.014 0.034 0.019 0.020 0.014 0.025
0.017 0.000 0.020 0.019 0.017 0.000 0.004 * 0.004
∗ Not fitted because some z-values for model 4 were not available for several places in South Dakota. Source: Adapted from Table 1 in Fay and Herriot (1979).
the total estimated income for all places equals the direct estimate at the state level. (ii) The total estimated income for all places in a county equals the direct county estimate of total income. External Evaluation Fay and Herriot (1979) also conducted an external evaluation by comparing the 1972 estimates to “true” values obtained from a special complete census of a random sample of places in 1973. The 1972 estimates for each place were obtained by multiplying the 1970 estimates by an updating factor derived from administrative sources. Table 6.2 reports the values of percentage absolute relative error ARE (|estimate true value|∕true value) 100, for the special complete census areas using direct, county and EBLUP estimates. Values of average ARE are also reported. Table 6.2 shows that the EBLUP estimates exhibit smaller average ARE and a lower incidence of extreme errors than either the direct estimates or the county estimates: the average ARE for places with population less than 500 is 22% compared to 28.6% for the direct estimates and 31.6% for the county estimates. The EBLUP estimates were consistently higher than the special census values. But missing income was not imputed in the special census, unlike in the 1970 census. As a result, the special census values, which are based on only completed cases, may be subject to a downward bias. Example 6.1.2. U.S. Poverty Counts. Current county estimates of school-age children in poverty in the United States are used by the U.S. Department of Education to allocate government funds, called Title I funds, annually to counties, and then states distribute the funds among school districts. The allocated funds support compensatory education programs to meet the needs of educationally disadvantaged children. Title I funds of about 14.5 billion U.S. dollars were allocated in 2009. The U.S. Census Bureau, under their small area income and poverty estimation (SAIPE) program, uses the basic area level model (6.1.1) with bi 1 to produce
132
EBLUP: BASIC AREA LEVEL MODEL
TABLE 6.2 Values of Percentage Absolute Relative Error of Estimates from True Values: Places with Population Less Than 500 Special Census Area
Direct Estimate
EBLUP Estimate
County Estimate
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
10.2 4.4 34.1 1.3 34.7 22.1 14.1 18.1 60.7 47.7 89.1 1.7 11.4 8.6 23.6 53.6 51.4
14.0 10.3 26.2 8.3 21.8 19.8 4.1 4.7 78.7 54.7 65.8 9.1 1.4 5.7 25.3 10.5 14.4
12.9 30.9 9.1 24.6 6.6 14.6 18.7 25.9 99.7 95.3 86.5 12.7 6.6 23.5 34.3 11.7 23.7
Average
28.6
22.0
31.6
Source: Adapted from Table 3 in Fay and Herriot (1979).
EBLUP estimates of annual poverty rates and counts of poor school-age children (between ages 5 and 17) for counties and states. In this application, 𝜃̂i log(Ŷ i ), where Ŷ i is the direct estimator of the poverty count, Yi , for county i. Prior to 2005, direct estimates Ŷ i were obtained from the Annual Social and Economic Supplement (ASEC) of the Current Population Survey (CPS), see National Research Council (2000) and Rao (2003a, Example 7.1.2) for details of past methodology. In 2005, SAIPE made a major switch by replacing CPS estimates by more reliable direct estimates obtained from the American Community Survey (ACS). ACS is based on a much larger sample size than CPS, with monthly rolling samples of 250,000 households spread across all counties. ACS direct estimates are reliable for counties with population size 65,000 or larger, but reliable estimates are needed for all the roughly 3,140 counties. Hence, ACS direct estimates, Ŷ i , and associated covariates, zi , are used to produce EBLUP estimates, 𝜃̂iH , of 𝜃i log(Yi ) and then transformed back to the original scale to obtain Ŷ iH exp(𝜃̂iH ). The estimates Ŷ iH are then adjusted for bias, assuming normality of the errors 𝑣i and ei in the model. For the SAIPE program, very good covariates are available from the decennial census and other administrative sources. Covariates included in the county model, 𝜃̂i zTi + 𝑣i + ei , are z1i 1, z2i log(number of child exemptions claimed by families in poverty on tax returns), z3i log(number of people receiving food stamps), and z4i log(estimated population under age 18), z5i log(number of poor
133
EBLUP ESTIMATION
school-age children estimated from the previous census). For a small number of counties with small sample sizes, Ŷ i may be zero because of no poor children in the sample, and those counties were excluded in fitting the model because 𝜃̂i log(Ŷ i ) is not defined if Ŷ i 0. Moreover, direct estimates, 𝜓̂ i , of the sampling variances 𝜓i , obtained from replication methods, are used as proxies for the unknown 𝜓i . Model parameters are estimated by the ML method, and the estimator of 𝜃i is the EBLUP estimator 𝜃̂iH . For the few counties excluded from the model fitting, the regression-synthetic estimator, 𝜃̂iRS zTi ̂ , is used. To estimate the poverty count Yi , a bias-adjusted estimator, obtained from Ŷ iH exp(𝜃̂iH ), is used: Ŷ iaH
Ŷ iH exp[𝜎̂ 𝑣2 (1
𝛾̂i )∕2].
(6.1.24)
The bias-adjustment factor F̂ i for estimating a general function of 𝜃i , h(𝜃i ), is given by (6.1.25) Fi E[h(zTi + 𝑣i )]∕E{h[zTi + 𝛾i (𝑣i + ei )]} evaluated at ( , 𝜎𝑣2 ) ( ̂ , 𝜎̂ 𝑣2 ) (Slud and Maiti 2006). For the special case of h(a) exp(a), F̂ i reduces to F̂ i exp[𝜎̂ 𝑣2 (1 𝛾̂i )∕2], using the fact that E(eX )
exp(𝜇 + 𝜎 2 ∕2).
(6.1.26)
if X N(𝜇, 𝜎 2 ). In the SAIPE methodology, the term 𝜎̂ 𝑣2 (1 𝛾̂i ) g1i (𝜎̂ 𝑣2 ) in the exponent of (6.1.24) is replaced by the naive MSE estimates, mseN (𝜃̂iH ) g1i (𝜎̂ 𝑣2 ) + g2i (𝜎̂ 𝑣2 ). However, the term g2i (𝜎̂ 𝑣2 ), due to estimating is negligible relative to g1i (𝜎̂ 𝑣2 ) in the context of SAIPE since m M, is large and g2i is O(m 1 ) for large m, whereas g1i is O(1). Note that Ŷ iaH is not exactly unbiased for the total Yi . Luery (2011) compared the CVs of the 2006 SAIPE county estimates Ŷ iaH with the corresponding CVs of the ACS estimates Ŷ i . His results showed that the SAIPE estimates can lead to significant reduction in CV relative to ACS estimates, especially for small counties. State estimates of poverty counts are obtained from a model similar to the county model, except that the model is applied to state poverty rates, instead of logarithms of poverty counts. The covariates used in the model are state proportions corresponding to the same census and administrative sources used in the county model. The EBLUP estimates of state poverty rates are multiplied by the corresponding counts of school-age children obtained from the U.S. Census Bureau’s program of population estimates to obtain the EBLUP estimates of state poverty counts. Reduction in CV due to using SAIPE state estimates is marginal for most states, unlike the case of county estimates, because ACS direct estimates are based on much larger sample sizes at the state level. The SAIPE state estimates of poverty counts are first ratio adjusted (or raked) to agree with the ACS national estimate. In the second step, the SAIPE county estimates are ratio adjusted to agree with the raked SAIPE state estimates. The two-step raking method ensures that the SAIPE estimates at the county level or state level add up to the ACS national estimate.
134
EBLUP: BASIC AREA LEVEL MODEL
SAIPE computes estimates of poor school-age children also for school districts, but in this case using simple synthetic estimation, due to lack of suitable data at the school district level. Luery (2011) describes the methodology that is currently applied for school districts. Prior to the adaptation of the current SAIPE county and state models, both internal and external evaluations were conducted for model selection and for checking the validity of the chosen models. In an internal evaluation, the validity of the underlying assumptions and features of the model are examined. Additionally, an external evaluation compares the estimates derived from a model to “true” values that were not used in the development of the model. For internal evaluation, standard methods for linear regression models were used without taking account of random county or state effects. Model features examined included linearity of the regression model, choice of predictor variables, normality of the standardized residuals through quantile–quantile (q–q) plots and residual analysis to detect outliers. For external evaluation, EBLUP estimates based on the 1989 CPS direct estimates and several candidate models were compared to the 1990 census estimates by treating the latter as true values. We refer the reader to National Research Council (2000) and Rao (2003a, Example 7.1.2) for further details. Example 6.1.3. Canadian Census Undercoverage. We have already noted in Example 4.2.3, Chapter 4, that the basic area level model (6.1.1) has been used to estimate the undercount in the decennial census of the United States and in the Canadian census. We now provide a brief account of the application to the 1991 Canadian census (Dick 1995). Let Ti be the true (unknown) count and Ci is the census count in the ith domain. The objective here is to estimate census adjustment factors 𝜃i Ti ∕Ci for the m 96 2 4 12 domains obtained by crossing the categories of the variables such as sex (2 genders), age (4 classes), and province (12). The net undercoverage rate in the ith domain is given by Ui 1 𝜃i 1 . Direct estimates 𝜃̂i were obtained from a post-enumeration survey. The associated sampling variances, 𝜓i , were derived through smoothing of the estimated variances. In particular, ̂ i , was assumed to the variance of the estimated number of missing persons, M be proportional to a power of the census count Ci , and a linear regression was ̂ i ) is the estimated varî i )), log(Ci ); i 1, , m}, where 𝑣(M fitted to {log(𝑣(M ̂ ance of Mi . The sampling variances were then predicted through the fitted line 6.13 0.28 log(Ci ), and treating the predicted values as the true 𝜓i . log(𝜓i ) Auxiliary (predictor) variables z for building the model were selected from a set of 42 variables by backward stepwise regression (Draper and Smith 1981, Chapter 6). Note that the random area effects 𝑣i are not taken into account in stepwise regression. Internal evaluation of the resulting area level model (6.1.1) with bi 1 was then performed, by treating the standardized BLUP residuals ri (𝜃̂iH zTi ̂ )∕(𝜎̂ 𝑣2 + 𝜓i )1∕2 as iid N(0, 1) variables, where ̂ and 𝜎̂ 𝑣2 are the REML estimates of and 𝜎𝑣2 . No
135
EBLUP ESTIMATION
significant departures from the assumed model were observed. In particular, to check the normality assumption and to detect outliers, a normal q–q plot of the ri ’s versus Φ 1 [Fm (ri )], i 1, , m was examined, where Φ(x) is the cumulative distribution function (CDF) of a N(0, 1) variable and Fm (x) is the empirical CDF of the ri ’s, that is, ∑ 1 if ri ≤ x and 0 otherwise. Dempster Fm (x) m 1 m i 1 I(ri ≤ x), where I(ri ≤ x) and Ryan (1985) proposed a weighted normal q–q plot based on the ri ’s that is more sensitive to departures from normality than the unweighted normal q–q plot. This plot is similar to the weighted normal probability plot of Lange and Ryan (1989) for checking normality of random effects (see Section 5.4.2, Chapter 5). This weighted normal q–q plot uses the weighted empirical CDF ] ]/ m m ∑ ∑( ( 2 ) 1 ) 1 ∗ 2 𝜎̂ 𝑣 + 𝜓i I(x ri ) 𝜎̂ 𝑣 + 𝜓i Fm (x) i 1
i 1
/
m
∑ 𝛾̂i I(x
ri )
i 1
m
∑ 𝛾̂i i 1
for the special case bi 1, instead of the usual empirical CDF Fm (x). Note that Fm∗ (x) assigns greater weight to those areas for which 𝜎̂ 𝑣2 accounts for a larger part of the total estimated variance 𝜎̂ 𝑣2 + 𝜓i . The EBLUP adjustment factors, 𝜃̂iH , were converted to estimates of missing ̂ H , where Mi Ti Ci . These estimates were then subjected to two-step persons, M i raking (see Section 3.2.6, Chapter 3) to ensure consistency with the reliable direct ̂ ⋅a , where “p” denotes a province, “a” denotes ̂ p⋅ and M estimates of marginal totals, M an age–sex group, and Mi Mpa . The raked EBLUP estimates were further divided into single year of age estimates by using simple synthetic estimation: S ̂ pa M (q)
RH ̂ pa M [Cpa (q)∕Cpa ],
where Cpa is the census count in province p and age–sex group a, q denotes a sub-age group, and Cpa (q) is the associated census count. Example 6.1.4. Poverty Estimates in Spain. Molina and Morales (2009) applied the FH model to data from the 2006 Spanish Survey on Income and Living Conditions (EU-SILC) to estimate poverty rates for the 52 Spanish provinces by gender (m 52 2 104). In the application, the estimated sampling variances of the direct estimators were treated as the true variances. Domain proportions of individuals with Spanish nationality, in different age groups and in several employment categories, were used as covariates. Results of Molina and Morales (2009) indicated gains in efficiency of the EBLUPs based on the FH model with respect to the direct estimators for most of the domains. However, in general, the gains in efficiency when using the FH model with those covariates are modest (see Table 10.5 in Section 10.7.3).
136
6.2 6.2.1
EBLUP: BASIC AREA LEVEL MODEL
MSE ESTIMATION Unconditional MSE of EBLUP
In this section, we apply the general results of Section 5.3 to obtain the MSE of the EBLUP, 𝜃̂iH , and MSE estimators that are second-order unbiased. The MSE and the MSE estimators are unconditional in the sense that they are valid under the model (4.2.5) obtained by combining the sampling model (4.2.3) and the linking model (4.2.1). Conditional MSE is studied in Section 6.2.7. The second-order MSE approximation (5.3.9) is valid for the FH model, 𝜃̂i T zi + bi 𝑣i + ei , under regularity conditions (6.1.10) and (6.1.11) and normality of the errors 𝑣i and ei . It reduces to MSE(𝜃̂iH ) ≈ g1i (𝜎𝑣2 ) + g2i (𝜎𝑣2 ) + g3i (𝜎𝑣2 ),
(6.2.1)
where g1i (𝜎𝑣2 ) and g2i (𝜎𝑣2 ) are given by (6.1.8) and (6.1.9), and g3i (𝜎𝑣2 )
𝜓i2 b4i (𝜓i + 𝜎𝑣2 b2i ) 3 V(𝜎̂ 𝑣2 ),
(6.2.2)
where V(𝜎̂ 𝑣2 ) is the asymptotic variance of an estimator, 𝜎̂ 𝑣2 , of 𝜎𝑣2 . We have V(𝜎̂ 𝑣2 ) V(𝜎̂ 𝑣2 ML ) V(𝜎̂ 𝑣2 RE ), given by (6.1.20), if we use 𝜎̂ 𝑣2 ML or 𝜎̂ 𝑣2 RE to estimate 𝜎𝑣2 . If 2 , then V(𝜎 2 ) given by (6.1.21). For ̂ 𝑣2 ) V(𝜎̂ 𝑣s we use the simple moment estimator 𝜎̂ 𝑣s 2 2 2 the FH moment estimator, 𝜎̂ 𝑣m , we have V(𝜎̂ 𝑣 ) V(𝜎̂ 𝑣m ) given by (6.1.22). It follows from (6.1.23) that the term g3i (𝜎𝑣2 ) is the smallest for ML and REML estimators of 2 . The g term for 𝜎 2 is significantly smaller than that for 𝜎 2 if ̂ 𝑣m ̂ 𝑣s 𝜎𝑣2 , followed by 𝜎̂ 𝑣m 3i 2 2 the variability of 𝜎𝑣 + 𝜓i ∕bi , i 1, , m, is substantial. We now turn to the estimation of MSE(𝜃̂iH ). The estimator of MSE given by (5.3.11) is valid for REML and simple moment estimators of 𝜎𝑣2 under regularity conditions (6.1.10) and (6.1.11), and normality of the errors 𝑣i and ei . In this case, it reduces to mse(𝜃̂iH ) g1i (𝜎̂ 𝑣2 ) + g2i (𝜎̂ 𝑣2 ) + 2g3i (𝜎̂ 𝑣2 ) (6.2.3) 2 . The corresponding area-specific versions, mse (𝜃̂ H ) if 𝜎̂ 𝑣2 is chosen as 𝜎̂ 𝑣2 RE or 𝜎̂ 𝑣s 1 i and mse2 (𝜃̂iH ), are obtained from (5.3.15) and (5.3.16) by changing g3i (𝜎𝑣2 , 𝜃̂i ) to g∗3i (𝜎𝑣2 , 𝜃̂i ), where g∗3i (𝜎𝑣2 , 𝜃̂i ) is obtained from (5.3.14), which reduces to
g∗3i (𝜎𝑣2 , 𝜃̂i )
[b4i 𝜓i2 ∕(𝜓i + 𝜎𝑣2 b2i )4 ](𝜃̂i
zTi ̃ )2 V(𝜎̂ 𝑣2 ).
(6.2.4)
Then, the two area-specific MSE estimators are given by g1i (𝜎̂ 𝑣2 ) + g2i (𝜎̂ 𝑣2 ) + 2g∗3i (𝜎̂ 𝑣2 , 𝜃̂i )
(6.2.5)
g1i (𝜎̂ 𝑣2 ) + g2i (𝜎̂ 𝑣2 ) + g3i (𝜎̂ 𝑣2 ) + g∗3i (𝜎̂ 𝑣2 , 𝜃̂i ).
(6.2.6)
mse1 (𝜃̂iH ) and mse2 (𝜃̂iH )
137
MSE ESTIMATION
Rao (2001a) obtained the area-specific MSE estimators (6.2.5) and (6.2.6) for the special case of bi 1. 2 , we apply the MSE For the ML estimator 𝜎̂ 𝑣2 ML and the FH moment estimator 𝜎̂ 𝑣m estimator (5.3.12), which reduces to g1i (𝜎̂ 𝑣2 )
mse∗ (𝜃̂iH )
b𝜎̂ 2 (𝜎̂ 𝑣2 )∇g1i (𝜎̂ 𝑣2 ) + g2i (𝜎̂ 𝑣2 ) + 2g3i (𝜎̂ 𝑣2 ),
(6.2.7)
∇g1i (𝜎̂ 𝑣2 )
(6.2.8)
𝑣
where in this case b2i (1
𝛾i ) 2 .
The bias term b𝜎̂ 2 (𝜎𝑣2 ) for the ML estimator is obtained from (5.3.13), which reduces 𝑣 to ⎧ b𝜎̂ 2
𝑣 ML
(𝜎𝑣2 )
⎪ [2(𝜎𝑣2 )] 1 tr ⎨ ⎪ ⎩
]
m ∑ (
𝜓i +
𝜎𝑣2 b2i
)
1
1
zi zTi
i 1
]⎫ m ∑ ( ) 2 2 2 2 T ⎪ bi 𝜓i + 𝜎𝑣 bi zi zi ⎬ , ⎪ i 1 ⎭
(6.2.9)
where (𝜎𝑣2 ) is given by (6.1.17). It follows from (6.2.8) and (6.2.9) that the term b𝜎̂ 2 (𝜎̂ 𝑣2 ML )∇g1i (𝜎̂ 𝑣2 ML ) in (6.2.7) is positive. Therefore, ignoring this term and 𝑣 ML using (6.2.3) with 𝜎̂ 𝑣2 𝜎̂ 𝑣2 ML would lead to underestimation of MSE approximation given by (6.2.1). 2 is given by The bias term b𝜎̂ 2 (𝜎𝑣2 ) for the FH moment estimator 𝜎̂ 𝑣m 𝑣m
{ 2
) ∑ ( 2 2 m m i 1 𝜓i + 𝜎𝑣 bi
b𝜎̂ 2 (𝜎𝑣2 ) 𝑣m
2
} [∑ ( ) 1 ]2 m 2 2 i 1 𝜓i + 𝜎𝑣 bi
[∑ ( ) ]3 m 2 b2 1 𝜓 + 𝜎 𝑣 i i i 1
(6.2.10)
2 is positive, (Datta, Rao, and Smith 2005). It follows from (6.2.10) that the bias of 𝜎̂ 𝑣m 2 unlike the bias of 𝜎̂ 𝑣 ML , and it reduces to 0 if bi 1 and 𝜓i 𝜓 for all i. As a result, 2 )∇g (𝜎 2 the term b𝜎̂ 2 (𝜎̂ 𝑣m 1i ̂ 𝑣m ) in (6.2.7) is negative. Therefore, ignoring this term and 𝑣m 2 2 using (6.2.3) with 𝜎̂ 𝑣 𝜎̂ 𝑣m would lead to overestimation of the MSE approximation given by (6.2.1). Area-specific versions of (6.2.7) for the ML estimator, 𝜎̂ 𝑣2 ML , and the moment 2 , are obtained from (5.3.17) and (5.3.18): estimator, 𝜎̂ 𝑣m
mse∗1 (𝜃̂iH )
g1i (𝜎̂ 𝑣2 )
b𝜎̂ 2 (𝜎̂ 𝑣2 )∇g1i (𝜎̂ 𝑣2 ) + g2i (𝜎̂ 𝑣2 ) + 2g∗3i (𝜎̂ 𝑣2 , 𝜃̂i ), 𝑣
(6.2.11)
138
EBLUP: BASIC AREA LEVEL MODEL
and mse∗2 (𝜃̂iH )
g1i (𝜎̂ 𝑣2 )
b𝜎̂ 2 (𝜎̂ 𝑣2 )∇g1i (𝜎̂ 𝑣2 ) + g2i (𝜎̂ 𝑣2 ) + g3i (𝜎̂ 𝑣2 ) + g∗3i (𝜎̂ 𝑣2 , 𝜃̂i ), (6.2.12) 𝑣
2 . where 𝜎̂ 𝑣2 is chosen as either 𝜎̂ 𝑣2 ML or 𝜎̂ 𝑣m All the above MSE estimators are approximately unbiased in the sense of having bias of lower order than m 1 for large m. We assumed normality of the random effects, 𝑣i , in deriving the MSE estimators of the EBLUP estimator 𝜃̂iH . Lahiri and Rao (1995), however, showed that the MSE estimator (6.2.3) is also valid under nonnormal area effects 𝑣i provided that E|𝑣i |8+𝛿 < ∞ for 0 < 𝛿 < 1. This robustness result was 2 , assuming normality of the samestablished using the simple moment estimator 𝜎̂ 𝑣s pling errors ei . The latter assumption, however, is not restrictive, unlike the normality of 𝑣i , due to the central limit theorem (CLT) effect on the direct estimator 𝜃̂i . The moment condition E|𝑣i |8+𝛿 < ∞ is satisfied by many continuous distributions including the double exponential, “shifted” exponential and lognormal distributions with E(𝑣i ) 0. The proof of the validity of (6.2.3) under nonnormal 𝑣i is highly technical. We refer the reader to the Appendix in Lahiri and Rao (1995) for details of the proof. It may be noted that MSE(𝜃̂iH ) is affected by the nonnormality of the 𝑣i ’s. In fact, it depends on the fourth moment of 𝑣i , and the cross-product term E(𝜃̂iH 𝜃i )(𝜃̂iH 𝜃̃iH ) in the decomposition of the MSE given in (5.2.28) is nonzero, unlike in the case of normal 𝑣i . However, the MSE estimator (6.2.3) remains valid in the sense that E[mse(𝜃̂iH )] MSE(𝜃̂iH ) + o(m 1 ). This robustness property of the MSE estimator 2 may not hold in the case of ML, REML, and the FH estimators of obtained using 𝜎̂ 𝑣s 𝜎𝑣2 (Chen, Lahiri and Rao 2008). However, simulation results indicate robustness for those estimators as well (see Example 6.2.1).
Example 6.2.1. Simulation Study. Datta, Rao, and Smith (2005) studied the relative bias (RB) of MSE estimators through simulation. They used the basic area level model (6.1.1) with bi 1 and no covariates, that is, 𝜃̂i 𝜇 + 𝑣i + ei , i 1, , m. Since the MSE of an EBLUP is translation invariant, they took 𝜇 0 without loss of generality. However, to account for the uncertainty due to estimation of unknown regression parameters, this zero mean was estimated from each simulation run. The simulation runs consisted of R 100,000 replicates of 𝜃̂i 𝑣i + ei , i 1, , m, for iid ind m 15 generated from 𝑣i N(0, 𝜎𝑣2 1) and ei N(0, 𝜓i ) for specified sampling variances 𝜓i . In particular, three different 𝜓i -patterns, with five groups G1 , , G5 in each pattern and equal number of areas and equal 𝜓i ’s within each group Gt , were chosen, namely patterns (a) 0.7, 0.6, 0.5, 0.4, 0.3; (b) 2.0, 0.6, 0.5, 0.4, 0.2 and (c) 4.0, 0.6, 0.5, 0.4, 0.1. Note that pattern (a) is nearly balanced, while pattern (c) has the largest variability (max 𝜓i ∕min 𝜓i 40) and pattern (b) has intermediate variability (max 𝜓i ∕min 𝜓i 10). Patterns similar to (c) can occur when the sample sizes for a group of areas are significantly larger than those for the remaining areas. The true MSEs of the EBLUP estimators 𝜃̂iH were approximated from the ∑ simulated data sets, using MSE R 1 Rr 1 [𝜃̂iH (r) 𝜃i (r)]2 , where 𝜃̂iH (r) and 𝜃i (r) denote, respectively, the values of 𝜃̂iH and 𝜃i 𝑣i for the rth simulation run. The
139
MSE ESTIMATION
MSE values were approximated using the estimators of 𝜎𝑣2 discussed in Section 2 , the FH moment 6.1.2, namely the Prasad–Rao (PR) simple moment estimator 𝜎̂ 𝑣s 2 2 2 estimator 𝜎̂ 𝑣m , and the ML and REML estimators 𝜎̂ 𝑣 ML and 𝜎̂ 𝑣 RE . The RB of a MSE ∑ estimator was computed as RB [R 1 Rr 1 (mser MSE)]∕MSE, where mser is the value of an MSE estimator for the rth replicate. The MSE estimator (6.2.3) associated with PR and REML and the MSE estimator (6.2.7) for ML and FH were compared with respect to RB. For the nearly balanced pattern (a), all four methods performed well, with an average RB (ARB) lower than 2%. However, for the extreme pattern (c), PR led to large ARB when 𝜓i ∕𝜎𝑣2 was small, with about 80% ARB for group G4 and 700% for G5 . When increasing the number of areas to m 30, this overestimation of MSE for G5 decreased by 140% (Datta and Lahiri 2000). The remaining three estimation methods for 𝜎𝑣2 performed well for patterns (b) and (c) with ARB less than 13% for REML and less than 10% for FH and ML. Datta, Rao, and Smith (2005) also calculated ARB values for two nonnormal distributions for the 𝑣i ’s with mean 0 and variance 1, namely double exponential and location exponential. FH also performed well for these distributions, with ARB smaller than 10% (ARB for ML smaller than 16%). 6.2.2
MSE for Nonsampled Areas
The MSE of the regression-synthetic estimator, 𝜃̂𝓁RS , for nonsampled areas 𝓁 1, , M, is given by MSE(𝜃̂𝓁RS )
E(zT𝓁 ̂
𝜃𝓁 ) 2
E[zT𝓁 ( ̂
)
𝑣𝓁 ]2 .
m+
(6.2.13)
Now, noting that ̂ is independent of 𝑣𝓁 because ̂ is calculated from the sample data {(𝜃̂i , zi ); i 1, , m}, it follows from (6.2.13) that ]
MSE(𝜃̂𝓁RS )
𝜎𝑣2 + zT𝓁
m ∑ ( ) zi zTi ∕ 𝜓i + 𝜎𝑣2 b2i
1
z𝓁 + o(m 1 )
i 1
∶ 𝜎𝑣2 + h𝓁 (𝜎𝑣2 ) + o(m 1 ), noting that ‖V( ̂ ) V( ̃ )‖ estimator is given by
(6.2.14)
o(m 1 ). Using (6.2.14), a second-order unbiased MSE
mse(𝜃̂𝓁RS )
𝜎̂ 𝑣2 + b𝜎̂ 2 (𝜎̂ 𝑣2 ) + h𝓁 (𝜎̂ 𝑣2 ), 𝑣
(6.2.15)
where b𝜎̂ 2 (𝜎𝑣2 ) is the bias of 𝜎̂ 𝑣2 . The bias term b𝜎̂ 2 (𝜎𝑣2 ) is zero up to terms of order 𝑣 𝑣 2 , and the bias 2 and the REML estimator 𝜎 m 1 for the simple moment estimator 𝜎̂ 𝑣s ̂ 𝑣RE 2 2 are given by terms for the ML estimator 𝜎̂ 𝑣ML and the FH moment estimator 𝜎̂ 𝑣m (6.2.9) and (6.2.10), respectively.
140
EBLUP: BASIC AREA LEVEL MODEL
6.2.3
*MSE Estimation for Small Area Means
̂ H ̂H g(Y i ) Y i , then the EBLUP estimator of the small area mean Y i is Y i 𝜃i ̂H H H ̂ ̂ and mse(Y i ) mse(𝜃i ), where mse(𝜃i ) is given in Section 6.2.1. On the other hand, for nonlinear g(⋅) as in Examples 6.1.1 and 6.1.2, a naive estimator of Y i is given by ̂H Yi g 1 (𝜃̂iH ) h(𝜃̂iH ), which is subject to bias. A bias-adjusted estimator of Y i is taken as ̂H ̂H Y ia F̂ i Y i , (6.2.16) If 𝜃i
where F̂ i is obtained by evaluating Fi in (6.1.25) at ( , 𝜎𝑣2 ) ( ̂ , 𝜎̂ 𝑣2 ). As noted in Example 6.1.2, Fi reduces to exp[𝜎𝑣2 (1 𝛾i )∕2] if h(𝜃i ) exp(𝜃i ). It may be noted ̂H that Y ia is not exactly unbiased for Y i because of the estimation of and 𝜎𝑣2 . ̂H Slud and Maiti (2006) derived a second-order unbiased MSE estimator mse(Y ia ), 2 for the special case of h(𝜃i ) exp(𝜃i ), using the ML estimator of 𝜎𝑣 . The formula for ̂H mse(Y ia ) is much more complicated than mse(𝜃̂iH ) given in Section 6.2.1. ̂H A crude MSE estimator of Y i for the special case Y i exp(𝜃i ) may be obtained by first considering the case of known and 𝜎𝑣2 . In this case, the optimal (or best) ̂B estimator of 𝜃i is 𝜃̂iB zTi + 𝛾i (𝜃̂i zTi ) under normality of 𝑣i and ei . Letting Y i exp(𝜃̂ B ), an exactly unbiased estimator of Y i is given by i
̂B Y i E[exp(𝜃i )]∕E[exp(𝜃̂iB )]
̂B Y ia
(6.2.17)
̂B Fi Y i , where again Fi
exp[𝜎𝑣2 (1
Using (6.1.26), the MSE of ̂B MSE(Y ia )
̂B 𝛾i )∕2]. Note that E(Y ia ) ̂B Y ia
E(Y i )
exp(zTi
+ 𝜎𝑣2 ∕2).
may be expressed as
[ ( ) ]2 E Fi exp 𝜃̂iB exp(𝜃i ) [ ( [ ( )] )] [ ( )] Fi2 E exp 2𝜃̂iB 2Fi E exp 𝜃̂iB + 𝜃i + E exp 2𝜃i } { [ ( )] [E(Y i )]2 exp V 𝜃̂iB 2 exp[ Cov(𝜃̂iB , 𝜃i )] + exp[V(𝜃i )] ≈ [E(Y i )]2 V(𝜃̂iB
(6.2.18)
𝜃i )
[E(Y i )]2 MSE(𝜃̂iB ),
(6.2.19)
using the crude approximation exp(𝛿) ≈ 1 + 𝛿. The approximation (6.2.19) suggests ̂B that a crude MSE estimator of Y ia is given by ̂B mse(Y ia )
̂B (Y ia )2 mse(𝜃̂iB ),
(6.2.20)
MSE ESTIMATION
141
where MSE(𝜃̂iB ) ̂H mse(Y ia ) as
𝛾i 𝜓i . We now imitate (6.2.20) to get a crude approximation to ̂H mse(Y ia )
̂H (Y ia )2 mse(𝜃̂iH ).
(6.2.21)
Luery (2011) proposed an MSE estimator similar to (6.2.21) in the context of SAIPE (see Example 6.1.2). 6.2.4
*Bootstrap MSE Estimation
Bootstrap resampling from the sample data {(𝜃̂i , zi ); i 1, , m} may be used to estimate the MSE of the EBLUP, 𝜃̂iH , and more generally the MSE of a complex ̂H estimator, 𝜙̂ H , of 𝜙i h(𝜃i ), such as the bias-adjusted estimator Y ia of the mean Y i or i the estimator of Y i based on the compromise estimator 𝜃̂icH , of 𝜃i . The generality of the bootstrap method for estimating the MSE of complex estimators is attractive because the analytical method based on linearization is not readily applicable to parameters other than 𝜃i . Assuming normality of 𝑣i and ei and 𝜎̂ 𝑣2 > 0, bootstrap data (𝜃̂i∗ , zi ) are generated independently for each i 1, , m as follows: (i) generate 𝜃i∗ from N(zTi ̂ , 𝜎̂ 𝑣2 ), i 1, , m and let 𝜙i∗ h(𝜃i∗ ); (ii) generate 𝜃̂i∗ from N(𝜃i∗ , 𝜓i ), is then applied to the bootstrap i 1, , m. The method used for calculating 𝜙̂ H i H ̂ ̂ data {(𝜃i∗ , zi ); i 1, , m} to obtain 𝜙i∗ . Repeat the above steps a large number, (1), , 𝜙̂ H (B) and the bootstrap values B, of times to get B bootstrap estimates 𝜙̂ H i∗ i∗ of 𝜙i , denoted 𝜙i∗ (1), , 𝜙i∗ (B). The theoretical bootstrap estimator of MSE(𝜙̂ H ) is given by mseB (𝜙̂ H ) i i H 2 ̂ E∗ (𝜙i∗ 𝜙i∗ ) , where E∗ denotes the bootstrap expectation. We approximate ) by Monte Carlo, using the B bootstrap replicates mseB (𝜙̂ H i mseB1 (𝜙̂ H i )
B
1
B ∑ [𝜙̂ H i∗ (b)
𝜙i∗ (b)]2 .
(6.2.22)
b 1
Theoretical properties of (6.2.22) are not known for general parameters 𝜙i , but the motivation behind mseB (𝜙̂ H ) is to mimic the true MSE(𝜙̂ H ) E(𝜙̂ H 𝜙i )2 , by i i i H H changing 𝜙̂ i to 𝜙̂ i∗ , 𝜙i to 𝜙i∗ and the expectation E to the bootstrap expectation E∗ . In the special case of 𝜙i 𝜃i , by imitating the second-order approximation (6.2.1) to MSE(𝜃̂iH ), noting that the bootstrap FH model is a replica of the FH model (6.1.1) with ( , 𝜎𝑣2 ) changed to ( ̂ , 𝜎̂ 𝑣2 ), we get the approximation mseB (𝜃̂iH ) ≈ g1i (𝜎̂ 𝑣2 ) + g2i (𝜎̂ 𝑣2 ) + g3i (𝜎̂ 𝑣2 ).
(6.2.23)
Now, comparing (6.2.23) to the second-order unbiased MSE estimator (6.2.3), 2 , it follows that mse (𝜃̂ H ) is not second-order unbiased. A valid for 𝜎̂ 𝑣2 RE and 𝜎̂ 𝑣s B i double-bootstrap MSE estimator has been proposed for the basic unit level model (see Chapter 7) to rectify this problem. A similar method may be used for the FH model.
142
EBLUP: BASIC AREA LEVEL MODEL
Alternatively, for the special case of 𝜙i 𝜃i , hybrid bootstrap MSE estimators that are second-order unbiased may be used (Butar and Lahiri 2003). The hybrid method uses the representation MSE(𝜃̂iH )
[g1i (𝜎𝑣2 ) + g2i (𝜎𝑣2 )] + E(𝜃̂iH
𝜃̃iH )2
(6.2.24)
to obtain a bias-corrected bootstrap estimator of g1i (𝜎𝑣2 ) + g2i (𝜎𝑣2 ) and a bootstrap estimator of the last term in (6.2.24). The mentioned bias-corrected estimator is 2 ) + g (𝜎 2 2 is the estimator given by 2[g1i (𝜎̂ 𝑣2 ) + g2i (𝜎̂ 𝑣2 )] E∗ [g1i (𝜎̂ 𝑣∗ ̂ 𝑣∗ 2i ̂ 𝑣∗ )], where 𝜎 of 𝜎𝑣2 obtained from the bootstrap data. The bootstrap estimator of the last term, 2 ) E(𝜃̂iH 𝜃̃iH )2 , is given by E∗ [𝜃̃iH (𝜎̂ 𝑣∗ 𝜃̃iH (𝜎̂ 𝑣2 )]2 , noting that 𝜃̃iH 𝜃̃iH (𝜎𝑣2 ) and ̂𝜃 H 𝜃̃ H (𝜎̂ 𝑣2 ). The sum of the two terms leads to the hybrid bootstrap MSE estimator i i mse BL (𝜃̂iH )
2[g1i (𝜎̂ 𝑣2 ) + g2i (𝜎̂ 𝑣2 )] 2 + E∗ [𝜃̃iH (𝜎̂ 𝑣∗ )
2 2 E∗ [g1i (𝜎̂ 𝑣∗ ) + g2i (𝜎̂ 𝑣∗ )]
𝜃̃iH (𝜎̂ 𝑣2 )]2 .
(6.2.25)
2 ; the Formula (6.2.25) shows that the bootstrap data are used here only to calculate 𝜎̂ 𝑣∗ H ̂ bootstrap EBLUP estimator 𝜃i∗ of 𝜃i is not used unlike in the direct bootstrap MSE estimator, mseB1 (𝜃̂iH ), given by (6.2.22). Butar and Lahiri (2003) obtained an analytical approximation to (6.2.25), and it is interesting to note that this approximation is identical to the area-specific MSE estimator, mse∗2 (𝜃̂iH ), given by (6.2.12). We now turn to the case of obtaining 𝜎̂ 𝑣2 0. In this case, random effects 𝑣i are absent in the model. Consequently, the true bootstrap parameter 𝜃i∗ is taken as the regression-synthetic estimator obtained by evaluating ̃ (𝜎𝑣2 ) at 𝜎𝑣2 0, that is, 𝜃i∗ zTi ̃ (0), which is fixed over bootstrap replicates. Then, 𝜃̂i∗ is generated from N(𝜃i∗ , 𝜓i ). Bootstrap data {(𝜃̂i∗ , zi ); i 1, , m}, are ∗ used to calculate the regression-synthetic estimator 𝜃̂i∗RS zTi ̂ WLS , where (∑m ) ∑ ∗ 1 m T ̂ WLS ̂ i 1 zi zi ∕𝜓i i 1 zi 𝜃i∗ ∕𝜓i . Repeating the above steps a large number, B, of times, we get B bootstrap estimates 𝜃̂i∗RS (1), , 𝜃̂i∗RS (B). Theoretical bootstrap estimator of MSE is given by mseB (𝜃̂iH ) E∗ (𝜃̂i∗RS 𝜃i∗ )2 , which is approximated by
mseB1 (𝜃̂iH )
B
1
B ∑ [𝜃̂i∗RS (b)
𝜃i∗ ]2 .
(6.2.26)
b 1
Note that in the case of obtaining 𝜎̂ 𝑣2 0, the EBLUP estimate of 𝜃i reduces to 𝜃̂iH zTi ̃ (0) ∶ 𝜃̂iRS . Moreover, by the method of imitation, we have mseB (𝜃̂iH ) ≈ g2i (0), that is, the bootstrap estimator (6.2.26) is tracking the correct MSE.
(6.2.27)
143
MSE ESTIMATION
6.2.5
*MSE of a Weighted Estimator
The EBLUP estimator 𝜃̂iH runs into difficulties in the case of 𝜎̂ 𝑣2 0. In this case, it reduces to the regression-synthetic estimator zTi ̂ WLS regardless of the area sample sizes. For example, in the state model of Example 6.1.2 dealing with poverty counts of school-age children, in year 1992 it turned out that 𝜎̂ 𝑣2 ML 𝜎̂ 𝑣2 RE 0. As a result, for that year the EBLUP attached zero weight to all the direct estimates 𝜃̂i regardless of the CPS sample sizes ni (number of households). Moreover, the leading term of the MSE estimate, g1i (𝜎̂ 𝑣2 ) 𝛾̂i 𝜓i , becomes zero when 𝜎̂ 𝑣2 0. One way to get around the above problems is to use a weighted combination of 𝜃̂i and zTi ̂ with fixed weights ai and 1 ai : 𝜃̂i (ai ) ai 𝜃̂i + (1 ai )zTi ̂ , (6.2.28) 0 < ai < 1. 2 of 𝜎 2 , available from past studies, may be used to construct the A prior guess 𝜎𝑣0 𝑣 2 b2 ∕(𝜓 + 𝜎 2 b2 ). Note that 𝜃̂ (a ) remains model-unbiased for 𝜃 weight ai as ai 𝜎𝑣0 i i i i i 𝑣0 i for any fixed weight ai ∈ (0, 1), provided the linking model holds, that is, E(𝜃i ) zTi . Datta, Kubokawa, Molina, and Rao (2011) derived a second-order unbiased estimator of MSE[𝜃̂i (ai )], assuming bi 1 in the FH model (6.1.1), given by
g1i (𝜎̂ 𝑣2 ) + g2i (𝜎̂ 𝑣2 ) + g∗ai (𝜎̂ 𝑣2 )
mse[𝜃̂i (ai )]
b𝜎̂ 2 (𝜎̂ 𝑣2 ),
(6.2.29)
𝛾i )2 ].
(6.2.30)
𝑣
where b𝜎̂ 2 (𝜎𝑣2 ) is the bias of 𝜎̂ 𝑣2 and 𝑣
g∗ai (𝜎𝑣2 )
(ai
𝛾i )2 [𝜎𝑣2 + 𝜓i
g2i (𝜎𝑣2 )∕(1
The bias term b𝜎̂ 2 (𝜎𝑣2 ) is zero up to o(m 1 ) terms for the simple moment estima𝑣 2 and the REML estimator 𝜎 ̂ 𝑣2 RE . The bias terms for the ML estimator 𝜎̂ 𝑣2 ML tor 𝜎̂ 𝑣s 2 are given by (6.2.9) and (6.2.10), respectively. Note that and the FH estimator 𝜎̂ 𝑣m E[mse(𝜃̂i (ai )] MSE[𝜃̂i (ai )] o(m 1 ) for any fixed weight ai (0 < ai < 1). In the case that 𝜎̂ 𝑣2 0, the leading term of order O(1) in (6.2.29) reduces to a2i 𝜓i noting that 𝛾̂i 0, unlike the leading term of mse(𝜃̂iH ), g1i (𝜎̂ 𝑣2 ) 𝛾̂i 𝜓i , which becomes zero. Note also that a2i 𝜓i decreases as the sampling variance decreases, which is a desirable property of (6.2.29). Datta et al. (2011) conducted a limited simulation study on the performance of 2 0.75 and the weighted estimator 𝜃̂i (ai ), using weight ai based on a prior guess 𝜎𝑣0 2 letting true 𝜎𝑣 equal to 1, for selected 𝜓i -patterns and m 30. Their results indicated that the MSE of the weighted estimator (6.2.28) is very close to that of the EBLUP estimator 𝜃̂iH , and even slightly smaller in many cases because the weighted estimator avoids the uncertainty due to estimating 𝜎𝑣2 . On the other hand, the weighted estimator that uses a constant weight for all the areas, say ai 1∕2, i 1, , m, does not perform well relative to 𝜃̂iH in terms of MSE, especially for areas with smaller 𝜓i ; 2 of 𝜎 2 leads to varying weights a . The MSE estimator note that the prior guess 𝜎𝑣0 i 𝑣 (6.2.29) performed well in terms of RB.
144
6.2.6
EBLUP: BASIC AREA LEVEL MODEL
Mean Cross Product Error of Two Estimators
The small area estimators 𝜃̂iH are often aggregated to obtain an estimator for a larger area. To obtain the MSE of the larger area estimator, we also need the mean cross product error (MCPE) of the estimators 𝜃̂iH and 𝜃̂tH for two different areas i ≠ t. The MCPE of 𝜃̂iH and 𝜃̂tH may be expressed as MCPE (𝜃̂iH , 𝜃̂tH )
E(𝜃̂iH
MCPE(𝜃̃iH , 𝜃̃tH ) + lower order terms, (6.2.31) where the leading term in (6.2.31) is given by
MCPE (𝜃̃iH , 𝜃̃tH )
𝜃i )(𝜃̂tH
(1
𝛾i )(1
𝜃t )
𝛾t )zTi
m ∑ ( ) zi zTi ∕ 𝜓i + 𝜎𝑣2 b2i
]
1
zt
i 1
∶ g2it (𝜎𝑣2 ),
(6.2.32)
which is O(m 1 ), unlike MSE(𝜃̃iH ) that is O(1). It follows from (6.2.31) and (6.2.32) that MCPE (𝜃̂iH , 𝜃̂tH ) ≈ MCPE(𝜃̃iH , 𝜃̃tH ) g2it (𝜎𝑣2 ) (6.2.33) is correct up to o(m 1 ) terms. Furthermore, an estimator of MCPE(𝜃̂iH , 𝜃̂tH ) is given by mcpe(𝜃̂iH , 𝜃̂tH ) g2it (𝜎̂ 𝑣2 ), (6.2.34) which is unbiased up to o(m 1 ) terms, that is, E[mcpe(𝜃̂iH , 𝜃̂tH )]
6.2.7
MCPE(𝜃̂iH , 𝜃̂tH ) + o(m 1 ).
*Conditional MSE
Conditional on 𝜽 = (𝜽1 , … , 𝜽m )T In Section 6.2.1, we studied the estimation of unconditional MSE of the EBLUP estimator 𝜃̂iH . It is, however, more appealing to survey practitioners to consider the estimation of conditional MSE, MSEp (𝜃̂iH ) Ep (𝜃̂iH 𝜃i )2 , where the expectation Ep is with respect to the sampling model 𝜃̂i 𝜃i + ei only, treating the small area means, 𝜃i , as fixed unknown parameters. We first consider the simple case where all the parameters, ( , 𝜎𝑣2 ), of the linking model are assumed to be known and bi 1 for all areas i 1, , m. In this case, the minimum MSE estimator of 𝜃i (best prediction estimator) is given by 𝜃̂iB 𝛾i 𝜃̂i + (1 𝛾i )zTi , under normality of 𝑣i and ei . Following Section 3.4.3, Chapter 3, 𝜃̂iB ̂ where 𝜽̂ (𝜃̂1 , , 𝜃̂m )T and hi (𝜃) ̂ may be expressed as 𝜃̂i + hi ( 𝜽), (1 𝛾i )(𝜃̂i ind T ̂ z ). Now noting that 𝜃i |𝜃i N(𝜃i , 𝜓i ), i 1, , m, and appealing to the general i
145
MSE ESTIMATION
̂ a p-unbiased estimator, msep (𝜃̂ B ), of the formula (3.4.15) for the derivative of hi ( 𝜽), i B conditional MSE, MSEp (𝜃̂i ), may be written as msep (𝜃̂iB )
̂ 𝜃̂ i + h2 (𝜽) ̂ 𝜓i + 𝜓i 𝜕hi (𝜽)∕𝜕 i 𝛾i 𝜓i + (1
𝛾i )2 [(𝜃̂ i
zTi )2
(𝜓i + 𝜎𝑣2 )]
(6.2.35)
(Rivest and Belmonte 2000). Note that (6.2.35) can take negative values and, in fact, when 𝛾i is close to zero, the probability of getting a negative value is close to 0.5. Rivest and Belmonte (2000) studied the relative performance of msep (𝜃̂iB ) and the unconditional MSE estimator msep (𝜃̂iB ) 𝛾i 𝜓i , as estimators of the conditional MSE, MSEp (𝜃̂iB ), for the special case of 𝜓i 𝜓 for all i. They calculated the ratio, R, of Em {MSEp [msep (𝜃̂iB )]} to Em {MSEp [mse(𝜃̂iB )]}, where MSEp denotes the average of design MSE over the m areas and Em denotes the expectation with respect to the linking model 𝜃i zTi + 𝑣i . The ratio R measures the average efficiency of mse(𝜃̂iB ) relative to msep (𝜃̂iB ). Note that in this case R (𝜓 2 + 2𝜓𝜎𝑣2 )∕𝜎𝑣4 > 1 if 𝜎𝑣2 ∕𝜓 < 2.4. When shrinking is appreciable, that is, when 𝛾i 𝛾 is small, mse(𝜃̂iB ) is a much more efficient estimator of MSEp (𝜃̂iB ) than msep (𝜃̂iB ). For example, if 𝜎𝑣2 ∕𝜓 1 or equivalently 𝛾 1∕2, we get R 3. It may be noted that the estimator 𝜃̂iB leads to significant gain in efficiency relative to the direct estimator, 𝜃̂i , only when the shrinking factor, 𝛾, is small. Datta et al. (2011) derived explicit formulae for msep (𝜃̂iH ) in the case of REML ̂ and then using the forand FH estimators of 𝜎𝑣2 , by first expressing 𝜃̂iH as 𝜃̂i + hi ( 𝜽) ̂ ̂ They also conducted a simulation study mula msep (𝜃̂iH ) 𝜓i + 𝜕hi ( 𝜽)∕𝜕 𝜃̂i + h2i ( 𝜽). under the conditional setup for m 30. The study showed that the CV of msep (𝜃̂iH ) can be very large (ranged from 13% to 393%), especially for areas with large sampling variances 𝜓i . Therefore, msep (𝜃̂iEB ) is not reliable as an estimator of MSEp (𝜃̂iH ) although it is p-unbiased. Conditional on 𝜽̂i Fuller (1989) proposed a compromise measure of conditional MSE, given by MSEc (𝜃̂iH ) E[(𝜃̂iH 𝜃i )2 |𝜃̂i ], where the expectation is conditional on the observed 𝜃̂i for area i only. Datta et al. (2011) made a systematic study of the Fuller measure and obtained a second-order unbiased estimator of MSEc (𝜃̂iH ), using REML and FH estimators of 𝜎𝑣2 . We now summarize their main results. A second-order approximation to MSEc (𝜃̂iH ) is given by zTi )2 ∕(𝜎𝑣2 + 𝜓i )]g3i (𝜎𝑣2 ) + op (m 1 ), (6.2.36) where g1i , g2i , and g3i are as defined in Section 6.2.1 and op (m 1 ) denotes terms of lower order in probability than m 1 . Note that MSEc (𝜃̂iH ) depends on the direct estimator 𝜃̂i . Expression (6.2.36) is valid for both REML and FH estimators of 𝜎𝑣2 . Turning to conditional MSE estimation, a second-order unbiased MSE estimator is MSEc (𝜃̂iH )
g1i (𝜎𝑣2 ) + g2i (𝜎𝑣2 ) + [(𝜃̂i
146
EBLUP: BASIC AREA LEVEL MODEL
given by [ msec (𝜃̂iH )
g1i (𝜎̂ 𝑣2 ) + g2i (𝜎̂ 𝑣2 )
] 2 g(1) ( 𝜎 ̂ ) ̂ 𝑣2 , 𝜃̂i 𝑣 h(𝜎 1i
zTi ̂ )
⎤ ⎡ (𝜃̂ zT ̂ )2 i i ⎥ g (𝜎 2 ) + o (m 1 ), + 1 +⎢ p ⎥ 3i 𝑣 ⎢ 𝜎̂ 𝑣2 + 𝜓i ⎦ ⎣ where g(1) (𝜎̂ 𝑣2 ) is the first derivative of g1i (𝜎̂ 𝑣2 ) evaluated at 𝜎𝑣2 𝜎̂ 𝑣2 and h(𝜎̂ 𝑣2 , 𝜃̂i 1i T ̂ zi ) E(𝜎̂ 𝑣2 𝜎𝑣2 |𝜃̂i ) is the conditional bias of 𝜎̂ 𝑣2 given 𝜃̂i . The conditional bias term depends on the choice of the estimation method for 𝜎𝑣2 (see Datta et al. 2011 for the conditional bias under REML and FH moment methods). It is interesting to note that the conditional bias of the REML estimator, 𝜎̂ 𝑣2 RE , is not zero up to Op (m 1 ) terms unlike the unconditional bias of 𝜎̂ 𝑣2 RE , which is zero up to O(m 1 ) terms. Simulation results for m 30 reported by Datta et al. (2011) showed that the RB of b under the conditional setup is comparable to the corresponding RB of mse(𝜃̂iH ) under the unconditional setup, and both are small (median absolute RB lower than 2%). Furthermore, the CV values of the estimator msec (𝜃̂iH ) under the conditional setup and those of mse(𝜃̂iH ) under the unconditional setup are also similar, with CV increasing for areas with larger 𝜓i .
6.3
*ROBUST ESTIMATION IN THE PRESENCE OF OUTLIERS
This section describes methods for robust estimation of small area parameters, 𝜃i , under the FH model (6.1.1) with normality, in the presence of outliers in the random area effects 𝑣i , or the sampling errors ei , or both. However, due to the Central Limit Theorem CLT effect, ei is less likely prone to outliers. Datta and Lahiri (1995) used a hierarchical Bayes (HB) framework (see Chapter 10) to study the effect of outliers in 𝑣i . The distribution of 𝑣i is assumed to be a scale mixture of a normal distribution with a general mixing distribution. This mixture family induces long-tailed distributions, in particular, t and Cauchy. HB estimators based on the assumed family are more robust to outlying 𝑣i than the estimators based on normal distribution. If the areas with outlying 𝑣i are a priori known, then Datta and Lahiri (1995) recommend the use of a Cauchy distribution on the 𝑣i for those areas and a mixture distribution with tails lighter than the Cauchy on the 𝑣i for the remaining areas. Bell and Huang (2006) studied empirically how the HB estimator of 𝜃i changes when a t-distribution with small degrees of freedom, k, on the area effects 𝑣i , is used in place of the normal distribution. Their results indicate that the use of a t-distribution with small k can diminish the effect of outliers in the sense that more weight is given to the direct estimator 𝜃̂i than in the case of normal 𝑣i ; note that we are dealing with outliers in 𝑣i and 𝜃̂i is assumed to be well-behaved. On the other hand, if a 𝜃̂i is regarded as outlier, then assuming a t-distribution on the sampling error ei leads to
147
*ROBUST ESTIMATION IN THE PRESENCE OF OUTLIERS
an estimator that gives less weight to 𝜃̂i than in the case of normal ei . Bell and Huang (2006) note that outlying 𝜃̂i might occur due to nonsampling errors. Ghosh, Maiti, and Roy (2008) studied robust estimation of 𝜃i under the area level model (6.1.1) with bi 1, by robustifying the EBLUP estimator, using Huber’s (1972) 𝜓-function. Assuming 𝜎𝑣2 known, the BLUP estimator (6.1.2) of 𝜃i may be written as 𝜃̃iH 𝜃̂i (1 𝛾i )(𝜃̂i zTi ̃ ), (6.3.1) where ̃ is the BLUE of given by (6.1.5). The estimator 𝜃̃iH is robustified by applying the Huber function to the residuals 𝜃̂i zTi ̃ . This leads to the Ghosh–Maiti–Roy (GMR) estimator 𝜃̃iGMR
𝜃̂i
(1
1∕2
𝛾i )𝜓i
𝜓b [𝜓i
1∕2
zTi ̃ )],
(𝜃̂i
(6.3.2)
where 𝜓b (u) u min(1, b∕|u|) and b > 0 denotes a tuning constant. Note that as b → ∞, 𝜃̃iGMR tends to 𝜃̃iH . The robust EBLUP estimator, 𝜃̂iGMR , is obtained from (6.3.1) by substituting the non robust estimators ̂ and 𝜎̂ 𝑣2 for the unknown ̂ and 𝜎𝑣2 . GMR evaluated MSE(𝜃̃iGMR ) under the assumed FH normal model and showed that it is strictly decreasing in b and that it exceeds MSE(𝜃̃iH ), as expected. Regarding the choice of b, GMR suggest that the tuning constant b may be chosen to ensure that the increase in MSE does not exceed a pre-specified percentage value. This approach leads to an adaptive choice of b, possibly varying across areas. However, note that the adaptive b depends on the unknown 𝜎𝑣2 . GMR also derived a second-order unbiased estimator of MSE(𝜃̂iGMR ), denoted mse(𝜃̂iGMR ), under the assumed FH normal model. Using mse(𝜃̂iGMR ) and the MSE estimator of the EBLUP 𝜃̂iH , the tuning constant, b, may be estimated for each area i for desired tolerance. A limitation of 𝜃̂iGMR is that it Huberizes the composite error 𝑣i + ei 𝜃̂i zTi . An alternative robust estimator that permits Huberizing 𝑣i and ei separately, or 𝑣i only, may be obtained by robustifying the Henderson et al. (1959) MM equations (5.2.8). For the FH model, the robustified MM equation for 𝑣i is given by 𝜓i
1∕2
𝜓b1 [𝜓i
1∕2
(𝜃̂i
zTi
𝑣i )]
𝜎𝑣 1 𝜓b2 (𝜎𝑣 1 𝑣i )
0,
i
1,
, m,
(6.3.3)
where the tuning constants b1 and b2 may be different. If the sampling error ei 𝑣i is not deemed to be a possible outlier, then b1 may be chosen as b1 ∞. 𝜃̂i zTi Equ at io n ( 6. 3. 3) t h u s o f f e r s fle xiband il it𝜎y𝑣2 , . eFo qu r atgioiv ne( n6. 3. 3) mig h t b e s o l v e d𝑣ifbo yr u s ing t h e Ne w t o n–Rap h s o n ( NR) me t h o d. W e de no 2 s o f and 𝜎 2 ar e o b t aine d b y Hu b e r izing t h o 𝜎̂r𝑣R 𝑣̃iR ( , 𝜎𝑣2 ). Ro b u s t e s t ̂imat R and 𝑣 2 ML e qu at io ns fand o r𝜎𝑣 . Th is l e ads t o r o b u s t ifi e d ML e qu at io ns m ∑
zi (𝜎𝑣2 + 𝜓i )
𝛽 ∶ i 1
1∕2
𝜓b (ri )
𝟎,
( 6. 3. 4)
148
EBLUP: BASIC AREA LEVEL MODEL
𝜎𝑣2 ∶
m ∑ (𝜎𝑣2 + 𝜓i )[𝜓b2 (ri )
c]
0,
( 6. 3. 5)
i 1
w h erir e (𝜎𝑣2 + 𝜓i ) 1∕2 (𝜃̂i zTi ) and c E[𝜓b2 (u)] w it uh N(0, 1). Me t h o ds de s c r ib e d in Se c t io n 7. 4 f o r u nit l e v e l mo de l s c an b e u s e d t o 2imat ( 6. 1. 5) , l e ading t o r o b u ŝ Rt and e s 𝜎t̂ 𝑣R o rins t u r n t o t h e r o b u s t e s t im , and 2 𝑣̂ iR 𝑣̃iR ( ̂ R , 𝜎̂ 𝑣R ) o f𝑣i . Final l y , a r o b u s t EBLUP ( REBLUP) e𝜃isist imat g ivoe rn o f b y 𝜃̂iRS zTi ̂ R + 𝑣̂ iR . ( 6. 3. 6)
Th e ab o v e me t h o d is de s ig ne d t o h andl e 𝑣si yo mme rei . Intt rhice oc uast el ie r s o f no ns y mme t r ic o u t l ie r s , b ias - c o r r e c t io n me t h o ds , s imil ar t o 7. 4, c an b e u s e d, b u t w e o mit de t ail s h e r e . Al s o , a b o o t s t r ap MSE(𝜃̂iRS ) c an b e de v e l o p e d al o ng t h e l ine s o f t h e b o o t s t r ap me Unl ik e GMR, no r mal 𝑣iti and y oeifis no t r e qu ir e d f o r t h is b o o t s t r ap me t h GMR c o ndu c t e d a s imu l at io n s t u dy w h e r e o u t l ie r s in t h e r a 𝑣i w e r e g e ne r at e d f r o m a c o nt aminat e d no r mal dis t r ib u t io n. Th e ir adap t iv e c h ob,icindic e o atf e t h at , in t e r ms o f e f 𝜃̂fiiGMR c ieand nc𝜃̂iyH p, be or ft oh r m H an u np u b l is h e d s imu l at io n s imil ar l y , indic at ing r o b 𝜃̂ui st ot neo su st loie𝑣f i r. sIn in RS r e yl atr iv s t u dy , Rao and Sinh a f o u nd a s imil ar e f fi c𝜃̂iie nc e se𝜃̂iH u tulot nde f o rr c o nt aminat e d no r mal t- dis andt r ib u t io n w it h s mal l de g r e e s o f f r e e do m a c h o ic b e 1.345. Mo r e o v e r , u nde r t h e no 𝜃̂iH rand mal𝜃̂iRS mop de e r lf, o r me d s imiGMR ̂ e d t o s ig nifi c ant l ar l y in t e r ms o f e f fi c ie nc y . On 𝜃i t h w e oitbthh e1.345 r h land, ̂ H uy nde l o s s in e f fi c ie 𝜃nc o v r e t rh e no r mal mo de l . i
6.4 6.4.1
*PRACTICAL ISSUES Unknown Sampling Error Variances
Th e b as ic ar e a l e v e l mo de l ( 6. 1. 1) as s u me s t h 𝜓 ati ,t ar h e ks amp no wl ing n. v ar In p r ac t ic 𝜓i eis , s e l do m k no w n and mu s t b e r e p l ac 𝜓ei0d. bIf ya dir an e sc tt imat o r isi0 u s𝜓̂ ei . d, Al tt he er nat n iv e l y , t h e e s t ima e s t imat 𝜓ô i ,r b, as e d o n u nit l e v e l dat a 𝜓 t o r𝜓̂si may b e s mo o t h e d b y u s ing a g e ne r al ize d v ar ianc e f u nc t io n h e no t rak 𝜓si0 e 𝜓̂ iS . t o o b t ain t h e s mo o t h e𝜓̂ iSd and e s t imat It w o u l d b e u s e f u l t o s t u dy t h e s e ns it iv it y o f 𝜓si0mal . l ar e a in and t𝜎e𝑣2 ,r st h e b e s t As s u ming no r mal𝑣iti and y oeif and k no w n mo de l p ar ame e s t imat o 𝜃ri is o fg iv e n b y it s c o ndit io nal e xp e c t at io n E(𝜃i |𝜃̂i )
𝛾i 𝜃̂i + (1
𝛾i )zTi
∶ 𝜃̃iB .
( 6. 4. 1)
If 𝛾i is r e p l ac e 𝛾di0 b 𝜎y𝑣2 b2i ∕(𝜓i0 + 𝜎𝑣2 b2i ) in ( 6. 4. 1) , t h e n t h e r e s u l t ing e B 𝛾i 𝜃̂i + (1 𝛾i0 )zTi l e ads t o an inc r e as e in MSE, c o ndit 𝜓i0 . io nal o n mat o 𝜃r̃i0
149
*PRACTICAL ISSUES B B In p ar t ic u l ar , no t ing (t𝜃̃i0 h ) at MSE( MSE 𝜃̃iB ) + E(𝜃̃i0 B MSE(𝜃̃i0 )
MSE(𝜃̃iB )
(𝛾i
𝜃̃iB )2 , w e h av e
𝛾i0 )2 (𝜎𝑣2 b2i + 𝜓i ).
( 6. 4. 2)
It f o l l o w s f r o m ( 6. 4. 2) t h at t h e r e l at iv e inc r e as e in MSE is g iv e B MSE(𝜃̃i0 )∕ MSE(𝜃̃iB )
1
(𝛾i
𝛾i0 )2 ∕[𝛾i (1
( 6. 4. 3)
𝛾i )],
no t ing t h at (MSE 𝜃̃iB ) 𝜎𝑣2 (1 𝛾i ) and 𝛾i 𝜎𝑣2 b2i ∕(𝜓i + 𝜎𝑣2 b2i ). It is al s o o f int e r e s t t o e xamine t h e RB o f t h e r e p o r t e d MSE e s B ms eR (𝜃̃i0 )
𝛾i 𝜓i0 ,
( 6. 4. 4)
B ) ist enot h t ratanms e w h ic h as s u me 𝜓i0 sis tthh ate t r u e s amp l ing v ar ianc eR (. 𝜃̃No i0 do m c o ndit io nal 𝜓i0 o. RB n o f t h e r e p o r t e d MSE e s t imat o r is g iv e n b y
B )] RB[ ms eR (𝜃̃i0
B) ms eR (𝜃̃i0 MSE(𝜃̃ B ) i0
1
𝛾i (1
𝛾i (1 𝛾i ) 𝛾i ) + (𝛾i 𝛾i0 )2
1.
( 6. 4. 5)
Be l l ( 2008) r e p o r t e d t h e r e s u l t s o f an e mp ir ic al s t u dy o n t h MSE, g iv e n b y ( 6. 4. 3) , and t h e RB o f t h e r e p o r t e d MSE e s t imat o r , g c o nc l u s io ns ar e as f o l l o w s : ( i) Unde r - o r o t h e s p e c ial bi c 1.as Main e ac t s t h e RB o f t h e r e p o r t e d MSE e s t imat o r mo r e t h a o f t r 𝜓ui imp e MSE. ( ii) Unde r e s t imat𝜓io f r e s e r io u s p r o𝜎𝑣2b∕𝜓li eis ms wmalh l e. n( iii) i isn ao mo ar gh ee ,n a c as e u nl ik e l y Ov e r e s t imat 𝜓 ioi isn oa fmo r e s e r io u s p r o𝜎𝑣2b∕𝜓li eisml w p ic t o o c c u r in p r ac t ic e b𝜓ei t cy au s eal sl ymalr le s u l t s f r o m a l ar g e ar e a s am 𝜓̂ i sand Be l l ( 2008) al s o e xamine d u nc o ndit io nal p r o p 𝜓ei0r t ie b ay𝜒 2das s u ming dis t r ib u t iod n𝜓̂ if∕𝜓 o i rf o r s e v e r al vd.alHis u e main s o cf o nc l u s io ns ar e as f o l l o ( i) Inc r e as e in t h e u nc o ndit io nal d is MSE, mo dew r hat d e 16) n( o r l ardg e80) ( , is qu it e s mal l , and is l e s s t h an 10% d as es vmal e nl faso r6. ( ii) Es t imat io n e r r o r 𝜓̂ i l e ads t o do w nw ar d b ias in t h e r e p o r t e d MSE e s t imat o r . ( iii) Ef is r e l at iv e l y mil d in t h e u nc o ndit io nal s e t u p in c o nt r as t t e rro 𝜓 r̂ i in p e r s p e c t iv e . k noand w n Riv e s t and Vandal ( 2003) al s o s t u die 𝜓di0t h 𝜓 ê i ,c asasseu oming f 𝜎𝑣2 and bi 1 in t h e mo de l ( 6. 1. 1) . In p ar t ic u l ar , s u p p o s e 𝜃̂ti h at t h e d iid is t h e meyi oan fni o b s e r v yatij io N(𝜃 ns i , 𝜎i2 ), j 1, , ni . Th e𝜓̂ni s2i ∕ni , w h e r e ∑ n i 2 2 si (y yi ) ∕(ni 1) is t h e s amp l e v ar ianc (ni e1)s , 2iand ∕𝜎i2 is dis t r ib u t e d j 1 ij 2 as a𝜒 n 1 v ar iab l e . In t h is nsi 𝜓̂ei tmay t ing b, e ap p r o ximat N(𝜎ei2 ,d𝛿as i ), f o 𝛿ri i 4 B ̃ o f b e t h e n o b t aine d as 2𝜎i ∕(ni 1). An MSE e s t imat o𝜃ri0 may B ) ms e(𝜃̃i0
B ms eR (𝜃̃i0 ) + 2𝛿̂i 𝜎𝑣4 ∕(𝜓̂ i + 𝜎𝑣2 )2 ,
( 6. 4. 6)
Bf) is g iv e n b y ( 6. 4. 4) w h e𝛿̂i r e 2s4i ∕(ni 1) is t h e p l u g - in e s𝛿it and imatms o reR (o𝜃̃i0 w it 𝜓 h i0 𝜓̂ i ( Riv e s t and Vandal 2003) . It f o l l o w s f r o m ( 6. 4. 6) t h at t h
150
EBLUP: BASIC AREA LEVEL MODEL
MSE e s t imat o r ( 6. 4. 6) s h o u l d b e inflat e d b y adding t h e l as t t e r m t h v ar iab il it y o𝜓̂ fi ’s t .h Pre o o f o f ( 6. 4. 6) is b as e d o n Le mma 3. 5. 1 ( St e in’s Ch ap t e r 3. W ang and Fu l l e r ( 2003) p r e s e nt e d c o mp r e h e 𝜓nsi0 iv 𝜓ê i and re s u lts fo n o𝜎𝑣2f . kThnoe ws amp n l ing v 𝜓ari in ianc e bi 1 b y r e l axing t h e as s u mp t ioand . u nb ias e d e t h e EBLUP 𝜃̂iH , g iv e n b y ( 6. 1. 12) , is r e p l ac e d b y a de s ig𝜓̂ ni In o ne v e r s io n∑ o f t h e ir∑ pis r eo sct eimat du er de b, y t h e o r dinar y l e as t s qu ar e T 1 m z 𝜃̂ and 𝜎 2 b y t h e mo me nt e 𝜎 ( m ̂s𝑣2 W t imat o r e s t imat ̂o OLS r 𝑣 i 1 zi zi ) i 1 i i F 2 max(0, 𝜎̃ 𝑣 W ),F w h e r e 𝜎̃ 𝑣2 W
m ∑ d̃ i [(𝜃̂i
F
zTi ̂ OLS )2
( 6. 4. 7)
𝜓̂ i ],
i 1
∑ w h ed̃ ir e di ∕ m 𝜓̂ ias . De s onoc iat t e et dh we it h i 1 di and di is t h e de g r e e s o f f r e e do m r e s u l t ing e s t imat o r as 𝜃̂iW
F
𝛾̂iW 𝜃F̂i + (1
FT ̂ 𝛾̂iW )z OLS , i
( 6. 4. 8)
w h e𝛾̂iW r e F 𝜎̂ 𝑣2 W ∕( 𝜎̂ 2 + 𝜓̂ i ). W ang and Fu l l e r ( 2003) e xamine d al t e r nat iv e F 𝑣W F 2 mat o r s 𝜎o𝑣 in f a s imu l at io n s t u dy . Unde r t h e r e g u l 𝜓 ar ̂ i , it𝜓iy, andd c o indit io ns o (i 1, , m) s t at e d in Th e o r e m 1 o f W ang and Fu l l e r ( 2003) , a s e c o ndimat io n o f MSE (𝜃̂iW )F is g iv e n b y MSEA (𝜃̂iW )F
𝛾i 𝜓i + (1
𝛾i )2 zTi V( ̂ OLS )zi
+ (𝜎𝑣2 + 𝜓i ) 3 [𝜓i2 V(𝜎̂ 𝑣2 W )F+ 𝜎𝑣4 V(𝜓̂ i )],
( 6. 4. 9)
w h eV(r 𝜎̂e𝑣2 W )F is t h e as y mp t o t ic v𝜎̂ 𝑣2ar ianc , e o f W F ( V( ̂ OLS )
m ∑ zi zTi i 1
)
1
m ∑ ( 2 ) 𝜎𝑣 + 𝜓i zi zTi i 1
] 1( m ∑
)
1
zi zTi i 1
and V(𝜓̂ i ) is t h e v ar ianc 𝜓̂ i .e Th o fe ap p r o ximat io n ( 6. 4.(9) 𝜃̂iW t )oF asMSE s u me s e rt ho ef earr er as o r in t h e t h atd min(di ) al s o inc r e as e s w it h t h e num,mband 1.5 , rm max 1 d 1 , d 1.5 ). It is c l e ar f r o m ( 6. 4. 9) t h at t h ap p r o ximat io n is o f o(mr de ̂ H ),p gr oiv ximat e n bioy n t o M ap p r o ximat io n is s imil ar t o t h e s e c o nd- o r de r (𝜃ap i 2 3 4 ( 6. 2. 1) , e xc e p t t h at it inv o l v e s t (𝜎 h 𝑣e +addit nal𝜓̂ i )t ear risming f r o m 𝜓i ) io 𝜎𝑣 V( 2003) ̂ i s2i ∕n u isine wd h ic h c as e t h e v ar ianc𝜓̂ ie. No o f t e t h at Riv e s t and Vandal ( 𝜓 4 di ni 1 and V(𝜓̂ i ) 𝛿i 2𝜎i ∕(ni 1). Riv e s t and Vandal ( 2003) s u g g e s t e d a g e ne r al ize d PR MSE e s t imat o w itobhi r 1 and 𝜓i 𝜓̂ i s2i ∕ni in t h e EBLUP e s t iPR s imp l e mo me nt e s𝜎̂ 𝑣s t2 imat t h eo rp br oy p o s e d MSE e s t imat o r is g mat o 𝜃̂riH . De no t ing t h e l at t e r e𝜃̂isRVt,imat b y 2 4 ̂ 𝛿i , + 𝜓̂ i ) 3 𝜎̂ 𝑣s ( 6. 4. 10) ms e(𝜃̃iRV ) ms ePR (𝜃̃iRV ) + 2(𝜎̂ 𝑣s
*PRACTICAL ISSUES
151
w h e𝛿̂i r e 2s4i ∕(ni 1) and ms ePR de no t e s t h e c u s t o mar y MSE e s t imat o r ( 6. 𝜓i r e p l ac e 𝜓d̂ i .b Ity is c l e ar f r o m ( 6. 4. 10)(𝜃̃iRV t h) at ac cmso eu nt s f o r t h e e xt r xp e c t e d, s imu l at io n r e s u l t s indic at e v ar iab il it y du e t o e𝜓si . t As imateing c u s t o mar y MSE e s t imat o r c an l e ad t o s ig nifi c ant u nde r e s t imat io n, MSE e s t imat o r ( 6. 4. 10) , e s p enci isialsl mal y wl . hHo e nw e v e r , t h e s imu l at io n e r e d a nar r o w (1∕3, r ang 5∕2) e o f v al u e𝜓is∕𝜎o𝑣2 . fW ang and Fu l l e r ( 2003) p r o p o s h at p e r f o r me d qu it e w e l l in s imu l at io ns , e t w o e s t imat o r s(𝜃̃ioW f)F tMSE 2 𝜎𝑣2 ∕𝜓i is v e r y s mal l ; r 𝜓ang f(1∕4, 160). In t h e c as e o f v 𝜎e𝑣2 ∕𝜓 r yi , s mal l i ∕𝜎𝑣e wo as t h e MSE e s t imat o r s l e ad t o s e v e r e o v e r e s t imat io n. e eo sr ts imat o r ( 6. 4. 10) is v al In t h e c as e o f u s ing s mo o t𝜓hi0 e d𝜓̂eiS ,s tt himat he p ol ac f o r t h e MSE o f t h e r e s u l t ing s mal 𝜃̂iRVl , arb eu at ews𝜓̂ ititrimat r e 𝜓̂diSb y and 𝛿̂i b y an e s t imat V( o 𝜓r̂ iSo). fUs e o f s mo o t h e d e s t imat o r s s h o u l d mak 𝜓̂ i ),l es pr et hc an ial l y t io nal t e r m s mal l bV(e𝜓̂ciS )auw s ile l b e s ig nifi c ant l y sV(mal u l od pr , ems r f eo r m w e l l . Ri f o r l arm.g As e a r e s u l t , t h e c u s t o mar PR y , es sh toimat and Vandal ( 2003) ap p l ie d ( 6. 4. 10) w it h s mo o 𝜓̂t iSh teo d Canadian e s t imatco er ns o 10) nl y inc r e a s u s u nde r c o v e r ag e ( Examp l e 6. 1. 3) and f o u nd t h PR at b( y6. 4. as s oe cs iat e do r 1%. Th is s u g g e s t s t h at t h e u s e o f c u s t𝜃̂oiH and mart yh eEBLUP t imat ms ePR w it𝜓hi r e p l ac e𝜓̂diSbs yh o u l d p e r f o r m w e l l in p r ac t icm eis , e s p e c i no t s mal l ; in t h e Canadian c e ns u ms e 96. xamp l e , Go nzál e z- Mant e ig a e t al . ( 2010) s t u die d t h e c as e𝜓i0wo hf𝜓ie isr e an e s t i no t av ail ab l e , and t h e u s e r h as ac c e s s o nl {(𝜃ŷi , zti ); o i t h 1,e ar, m}. e a l e v e l dat e rTie ) is a s mo o t h f u nc Samp l ing v ar ianc 𝜓i is e mo de l e 𝜓di as 𝜎e2 h(zTi ), w h h(z t io n ozTif . A k e r ne l - b as e d me t h o d is u h(z s eTi d)t and o einst tuimat r n teh e mo de l 2 2 p ar ame t e ,r 𝜎su , and 𝜎e , t h e ar e a me 𝜃i ans and t h e as s o c iat e d MSE e s t imat o r Pr ac t ic al imp l e me nt at io ns o f t h is me t h o d r e qu ir e s b andw idt h s e f u nc t io n ( s e e Go nzál e z- Mant e ig a e t al . 2010 f o r de t ail s ) . 6.4.2
Strictly Positive Estimators of 𝝈𝒗2
Se c t io n 6. 1. 2 c o ns ide r e d t h e 𝜎e𝑣2 , s t thimat e vioarn ianc o f e o f t h e r ando 𝑣i , m e f f e and p r e s e nt e d ML, REML, and t w o me t h o ds o f mo me nt e s t imat o r s . A s p e cs ial l y fm, o rw s hmal ic lh ar e t h e n t r u nc at e d c an l e ad t o ne g at iv e 𝜎̃e𝑣2 ,s et imat ze r o𝜎̂ 𝑣2: max(𝜎̃ 𝑣2 , 0). A dr aw b ac k o f t r u nc at ing t o ze r o is t h at t h e r e r e eg cart dlaree sase os tf ima e s t imat 𝜃êiHs, ,w il l at t ac h ze r o w e ig h t t o al l t h e𝜃̂i dir t h e ar e a s amp l e s ize 𝜎̂ 𝑣2 s 0. w Fo h er ne xamp l e , Be l l ( 1999) u s e d t h e s t at e m f oREML r 5 y ee sart simat e s p o v e r t y r at e s ( Examp l e 6. 1. 2) t o c al c u l at e ML𝜎𝑣2and ( 1989–1993) , and o b t aine 𝜎̂ 𝑣2 d0 f o r t h e fi r s t 4 y e ar s . Giv ing a ze r o w e ig e s t imat e s f o r s t at e s w it h l ar g e s amp l e s ize s , s u c h as Cal if o r n ap p e al ing t o t h e u s e r , and may l e ad t o s u b s t ant ial dif f e r e nc e b e s t imat e and t h e c o r r e s p o nding dir e c t e s t imat e du e t o o v e r s h ze r o e s t imat 𝜎𝑣2e. o f W oe idpa rzee rs oe nt au Se v e r al me t h o ds h av e b e e n p r o p o s e d t𝜎̂o𝑣2 . av v al b r ie f ac c o u nt o f me t h o ds t h at l e ad t o s t r ic 𝜎t 𝑣2l. yWp ang o s itand iv e e s t Fu l l e r ( 2003) p r o p o s e d a dat a- b as e d t r u nc at io n l e ading t o
152
EBLUP: BASIC AREA LEVEL MODEL
[ 𝜎̂ 𝑣2 W
F
max
] 1 ̂ 1∕2 ( 2 ) 2 𝜎̃ 𝑣 , 𝜎̃ 𝑣 , V 2
( 6. 4. 11)
̂ r𝜎̃e𝑣2 ) is an e s t imat oV(r𝜎̃ 𝑣o2 ).f W ang and Fu l l e r ( 2003) s t u die d s e v e r al c w h eV( 2 2mo ̂ 𝜎̃ 𝑣2 ), b u t w e f o c u s o n t h e s imp l e𝜎̃ 𝑣s g iv meentn be ys t (imat 6. 1.o15) r o f̃𝜎𝑣 and V( and l e bti 1 in t h e b as ic mo de l ( 6. 1. 1) . Fo l l o w ing W ang and Fu l l e r ( ̂ f𝜎̃ 𝑣o2 ) ris c h o ic eV( m ∑
̂ 𝜎̃ 𝑣2 ) V(
m
{ m
2 i 1
m
p
[( 𝜃̂i
zTi ̂ OLS
}2
]
)2
(1
hii )𝜓i
2 𝜎̂ 𝑣s
,
( 6. 4. 12)
∑ T 1 ∑m z 𝜃̂ and w h ê rOLS e is t h e OLS e s t imat og rivoe fn b̂ OLS y ( m i 1 zi zi ) i 1 i i ∑ T ) 1 z ( Yo s h imo r i and Lah ir i 2014a) . hii zTi ( m z z i i i 1 i Me t h o ds b as e d o n adju s t ing t h e l ik e l ih o o d f u nc t io𝜎𝑣2n t o av o id h av e al s o b e e n s t u die bi d.1 Le in ttht ing e mo de l ( 6. 1. 1) and w r it ing it in mat r ̂ Z + v + e, t h e p r o fi l e l ik e l ih o o d and t h e r e s idu al l ik e l ih f o r m 𝜽as mal it y ar e g iv e n b y ( ) 1 ̂T ̂ ( 6. 4. 13) 𝜽 P𝜽 LP (𝜎𝑣2 ) ∝ |V| 1∕2 e xp 2 and LR (𝜎𝑣2 ) ∝ |ZT V 1 Z|
1∕2
LP (𝜎𝑣2 )
( 6. 4. 14)
r e s p e c t iv e l y , V w diag(𝜎 h e r𝑣2 e+ 𝜓1 , , 𝜎𝑣2 + 𝜓m ) and P V 1 V 1 Z T 1 1 T 1 h(𝜎t𝑣2o) ris no w int r o du c e d t o (Z V Z) Z V . An adju s t me nt f ac an adju s t e d l ik e l ih o o d as LA (𝜎𝑣2 ) ∝ h(𝜎𝑣2 )L(𝜎𝑣2 ), ( 6. 4. 15)
de fi n
t o𝑣2 )ris c h o s e n t o e ns u r e t h at t w h eL(𝜎 r e𝑣2 ) is e it hLPe(𝜎r 𝑣2 ) o rLR (𝜎𝑣2 ). Th e f ach(𝜎 2 2 or ∞) is s t r ic t l y p o s it iv e . Th e s t imat e maximizing LA (𝜎𝑣 ) w it h r e s p𝜎𝑣e oc vt et[0, f e at u r e p r e v e nt s o v e r s h r ink ag e o f tm. h e EBLUP e v e n f o r s mal l A s imp l e c h o h(𝜎 ic 𝑣2e) iso h(𝜎 f 𝑣2 ) 𝜎𝑣2 ( Mo r r is and Tang 2011, Li and Lah ir i 2010) . Th is c h o ic e g iv e s a s t r ic t l y𝜎̂ 𝑣2 LL p ,o no s itt iv ehAs(0) e inge tL att imat 0 and e 𝑣2 ac 0. h ie Fo vc eu dsating LA (𝜎𝑣2 ) → 0 as 𝜎𝑣2 → ∞. Th u s , t h e maximu m c anno t b 𝜎 asi e 𝜓, t h e b ias o f t h e c o 𝜎̂r𝑣2 rLLe rs ep l oat nding iv 𝜎e 𝑣2 tiso o nLR (𝜎𝑣2 ) and t h e c 𝜓 2 2 2 2 1 2 1 , w h e𝛾 r e𝜎𝑣 ∕(𝜎𝑣 + 𝜓) 𝜎𝑣 𝜓 ∕(𝜎𝑣 𝜓 + 1). It no w f o l l o w s e qu al (2∕m)𝛾 to 2 e ns mal l , e v e n t h o O(m u g 1h) it is t h at t h e RB 𝜎̂ 𝑣2 LL o fc an b e l ar g e 𝜎w 𝑣 ∕𝜓h is 2 2 2 2 w h 𝜎e𝑣 n> 0. Smal 𝜎l 𝑣 ∕𝜓 indu c e s ze r o REML e 𝜎ŝ 𝑣tRE imat o f𝜎e𝑣 smo r e f r e qu e nt l y , b u t it al s o c au s e s dif𝜎̂fi𝑣2 LLc in u lt te yr ms w ito hf RB. Th is du al p r o b l e m p r o ) s iv u ce hc h(0) Yo s h imo r i and Lah ir i ( 2014a) t o e xp l o r e al t eh(𝜎 r 𝑣2nat ht ho atic e s o f 2 2 2 “c tl oo s e L r R”(𝜎 t 𝑣o) o rLP (𝜎𝑣 ). Th e c h o ic e 0 s t il l h o l ds , b uLAt (𝜎l 𝑣e) ad (
{ h(𝜎𝑣2 )
t an
1
m ∑ 𝛾i i 1
)}1∕m ( 6. 4. 16)
153
*PRACTICAL ISSUES
s at is fi e s t h e s e c o ndit io ns , and w e de no t e t 𝜎 h 𝑣2 eas r𝜎̂e𝑣2 YL s .u l t ing e s t ima
An al t e r nat iv e s t r at e g 𝜎̂y𝑣2 LLis o t nl o yu w s e𝜎̂h𝑣2 RE e n 0 and r e t ain 𝜎̂ 𝑣2 RE o t h e r w is e ( Ru b in- Bl e u e r and Yo u 2013, Mo l ina, Rao , and Dat t a 2015) . W MIX e s t imat o 𝜎̂r𝑣2 LLM as . Th is w il l e ns u r e t h at t h e w e ig h t at t ac h e d t o mat o r is al w ay s p o s it iv e , and𝜎̂ t𝑣2 LLM h at ist hc el oRB s eo rf t o t h e c o r r e s p o nd o f̂𝜎𝑣2 RE . A s imil ar MIX s t r at e g y c an al s o 𝜎̂b𝑣2 YL e tu os eo db wt ain it ht h e e s t imat o r 𝜎̂ 𝑣2 YLM , b u t t h e r e du c t io n o n RB w il l b e s mal l e r b e c au s e t h e c t 𝑣o2 ) o rLP (𝜎𝑣2 ). LA (𝜎𝑣2 ) c l o s eLrR (𝜎 Yo s h imo r i and Lah ir i ( 2014b ) c o ndu c t e d a l imit e d s imu l at io n s t u d o f EBLUP e s t imat 𝜃̂iRE o ,r𝜃̂siLL , 𝜃̂iLLM , 𝜃̂iYL , and𝜃̂iYLM b as e d 𝜎ô 𝑣2 RE n , 𝜎̂ 𝑣2 LL , 𝜎̂ 𝑣2 LLM , 2 2 T e dzi 𝜇, and𝜓i 𝜓 f o r al i. lTab l e 6. 3 𝜎̂ 𝑣 YL , and𝜎̂ 𝑣 YLM . Th e s t u dym u s15, r e p o r t s t h e av e r ag e MSE vR al u10e4 ss imu b asl at e dioo nnr u ns , f o r s e l e c t e d v 2 𝜓. eAst oe xp e c t e d, Tab l e 6. 3 s h o w s t h at t o f𝜎𝑣2 ∕𝜓 r e fle c t𝜎ing 𝑣 s mal l r e l at iv ag e MSE o f al l EBLUP e s t imat o r s inc 𝜎𝑣2 ∕𝜓 r edeas ce rse as as e s f r o m 0. 33 t o 0. 05. In t e r ms o f MSE, t h e e s t imat o r LL p e 𝜎r 𝑣2f∕𝜓 o rdemsc rp eoaso er sl y, basu t t h e MIX v e r s io n o f it , LLM, r e c t ifi e s t h is p r o b l e m t o s o me e xt e nt . Tab RE, YL, and YLM p e r f o r m s imil ar l y in t e r ms o f MSE, and t h at it is no t n mo dif y YL. In v ie w o f t h e r e s u l t s in Tab l𝜃̂eiYL6.in3tse ur pmsp oo fr tMSE, ing w e f o c u s o YL RE ̂ ̂ ). No t e t𝜃hi atal s o p e r f o r ms w e l l in t e r ms o f MS t h e e s t imat io n o(𝜃if MSE iv . e u nl ik e 𝜎̂ 𝑣2 RE is no t s t r ic t l y p o s 𝜎̂it𝑣2 YL A s e c o nd- o r de r u nb ias e d MSE𝜃̂ieYLs, tdeimat no ot er do(ms 𝜃̂fiYLe), h as t h e s ame f o r m as ms e(𝜃̂iRE ) g1i (𝜎̂ 𝑣2 RE ) + g2i (𝜎̂ 𝑣2 RE ) + 2g3i (𝜎̂ 𝑣2 RE ), ( 6. 4. 17)
b u t w𝜎̂ 𝑣2itREh r e p l ac e𝜎̂ d𝑣2 YL b .y Th is r e s u l t f o l l o w s f r o m t h e f ac t t h at t v ar ianc e s𝜎̂ 𝑣2 RE o fand 𝜎̂ 𝑣2 YL ar e ide nt ic al and b ias e s o f b o t h e s t imat o r s ar o r de r tmh 1an( Yo s h imo r i and Lah ir i 2014a) . A MIX e s t imat (𝜃̂iYL o )riso t fakMSE e n RE 2 2 2 ̂ as ms (𝜃e i ) w h 𝜎ê 𝑣 nRE > 0 and as g2i (𝜎̂ 𝑣 YL ) w h 𝜎ê 𝑣 nRE 0 ( Yo s h imo r i and Lah ir i e r enat iv e l y , wg2ie(0)c wan uh se en 2014b ) . W e de no t e t h is oYLp(𝜃̂tiYL io ).n Al as tms (𝜃̂iRE e ) w h e𝜎̂ 𝑣2nRE > 0 ( Mo l ina, Rao , and Dat t a 2015) . W e 𝜎̂ 𝑣2 RE 0 and r e t ain ms 𝜃̂ YLms ). eCh e n and Lah ir i ( 2003) al s o s u g g e s t e d r e p de no t e t h is o p t io n (as MRD i
TABLE 6.3 Average MSE of EBLUP Estimators Based on REML, LL, LLM, YL, and YLM Methods of Estimating 𝝈𝒗2 𝜓
𝜎𝑣2 ∕𝜓
REML
3.0 5.7 9.0 19.0
0. 33 0. 18 0. 11 0. 05
1. 06 1. 44 1. 85 3. 02
LL 1. 19 1. 86 2. 66 5. 04
LLM 1. 03 1. 45 1. 95 3. 46
YL 1. 04 1. 42 1. 83 3. 02
YLM 1. 05 1. 42 1. 83 3. 00
154
EBLUP: BASIC AREA LEVEL MODEL YL
TABLE 6.4
% Relative Bias (RB) of Estimators of MSE(𝜽̂ i )
𝜓
𝜎𝑣2 ∕𝜓
YL
YL1
YL2
3.0 5.7 9.0 19.0
0. 33 0. 18 0. 11 0. 05
53.7 104.1 149.0 218.2
19. 8 40. 4 59. 4 88. 1
20. 0 40. 7 59. 8 88. 7
s e c o nd- o r de r u nb ias e d MSE ge 2is(𝜎̂t𝑣2imat ) w oh re sneb𝜎̂ 𝑣2vy e 0.r Th is s u g g e s t io n 2 iv ee sntbimat y (o 6.r 1. 15) . w as made in t h e c o nt e xt o f t h e s imp l e𝜎̂ 𝑣2mo 𝜎̂me 𝑣s g nt Tab l e 6. 4 r e p o r t s t h e p e r c e nt RB, av e r ag e (𝜃d̂iYLo),v e r ar e a e (𝜃̂iYLM ), de no t e d as YL, YL1, and YL2, r e s p e c ms eYL (𝜃̂iYLM ), and ms MRD t iv e l y . As e xp e c t e d, Tab l e 6. 4 s h o w s t h at t h e RB v al u e s o f Y c l o s e . Fu r t h e r mo r e , YL h as RB t w ic e as l ar g e as t h e RB o f YL1 Re s u l t s f r o m Tab l e s 6. 3 and 6. 4 s u g g e s t t h𝜃̂iYL e uands te h oe f ast hs eo -e s t im YL )e o r ms e (𝜃̂ YL ) w h e ne v e r it is de e me d c iat e d MIX MSE e s t imat oYLr (𝜃̂ms MRD i i ne c e s s ar y t o at t ac h a s t r ic t l y p o s it iv e w e ig h t t o t h e dir e c t e is al l o w e d f o r an e s𝜎t𝑣2 imat , t h oe rn o ne f c o u ldu s e t h e EBLUP e s t imat𝜎̂o𝑣2 r .0 If 2n e 0 and t h e f o r mu l a 𝜃̂iRE t o g e t h e r w it h a MIX MSE e sgt2iimat (0) wo hr :𝜎êUs 𝑣 RE ( 6. 4. 17) w 𝜎̂h𝑣2 RE e n> 0. Th is MIX e s t imat o r o f MSE w il l h av e s ig nifi c ant l RB t h an ( 6. 4. 17) , s imil ar t o MIX e s t imat o 𝜃̂riYL s (o Mo f MSE r , and Dat t a l ina,f oRao 2015) . 6.4.3
Preliminary Test Estimation
In s o me ap p l ic at io ns , it may b e p o s s ib l e t o fi znd g o o dp t he atr yc an i, v e xp l ain t h e v ar iab il it𝜃iy’s ow f itt hh oe u t t h e ne e d f o r t h e inc l u s io n o t h ea sl mal ink ling mo 𝜃i de l e r r o𝑣ir. sTh is may b e int e r p r e t e d as h𝜎𝑣2avining 0, ( 𝜎𝑣2 ). A p r e l iminar y t e s t ( PT) o f t hH0e ∶ h𝜎𝑣2y p0 o t h e s is zTi + 𝑣i w it 𝑣hi at a s u it ab l e 𝛼l may e v e tl h e n b e c o ndu c t e d t o s e l e c t b e t w e e n t h e , ifno t r e je c t e d, t h e n w e t ak e as l 𝜃i zTi + 𝑣i o r𝜃i zTi . In p ar t ic u lHar 0 is mo de𝜃li zTi and e s t imat𝜃i ew it h t h e r e g r e s s io n- s y zTint̂hPTe, w t ich ee sr teimat o r ̂ PT (∑m 𝜓 1 zi zT ) 1 ∑m 𝜓 1 zi 𝜃̂i is t h e W LS e s t imatu onde r oHr 0f. On t h e i 1 i i 1 i i H is u s e d w Hh is e n r e je c t e d ( Dat t a, Hal l , and Mandal o t h e r h and, t h e 𝜃̂iEBLUP 0 2011) . A l ar g e v al𝛼u( te y op fic 𝛼al l 0.2) y is r e c o mme nde d in t h e PT e s t imat io l it e r at u r e ( Han and Banc r o f t 1968, Eh s ane s Sal e h 2006) . In t h e SAE c o nt e xt H , 0wis hnoe nt r e je c t e d, t h e n t h e r e g r e s s io n- s y mat o zrTi ̂ PT is u s e d f o r alil ar1,e as, m, t h at is , ze r o w e ig h t is at t ac h e d e g ears dl e s s o f t h e ar e a s amp l e s ize s o r t h e qu al it t h e dir e c t e s𝜃̂ti rimat mat e s . Bu t at t ac h ing a no nze r o w e ig h t t o dir e c t e s t imat e s is r e t h e y ar e r e l iab l e , b e c au s e t h e y p r o t e c t ag ains t f ail u r e o f t h e p o w e r o f t h e t e s 𝛼, t inc t h re enas t oe sav wo id it ho v e r s h r ink ing t o t h e e s t imat o r s , it is al s o adv is ab l e t o c o 𝛼. ns ide r a no t s o s mal l
155
*PRACTICAL ISSUES
Dat t a, Hal l , and Mandal ( 2011) p r o p o s e d a H s 0imp as sl ue ming t e s tnoo rf mal it y . Th e ir t e s t s t at is t ic in t h is s e t u p is g iv e n b y T
m ∑ 𝜓i 1 (𝜃̂i
zTi ̂ PT )2 ,
( 6. 4. 18)
i 1
Thu ende PTre s t iand T is dis t r ib u tem2 dpasw it mh p de g r e e s o f f r e e Hdo 0. m o nis g iv e n b y mat o r 𝜃oi , f b as e dT, { zTi ̂ PT if T ≤ m2 p,𝛼 ; PT 𝜃̂i , i 1, , m, ( 6. 4. 19) if T > m2 p,𝛼 , 𝜃̂ H i
p pp eo rintom2 fp . Th e e s t imat 𝜃̂iPT ois r r e c o mme nde d w h e n w h erm2 ep,𝛼 is t h e u 𝛼m is mo de r at e ( s ay 15 t o 20) . Ho w e v e r , s imu m l 15 at io andn 𝛼r e s0.2u l t s w it h t e rasms o f b ias and MSE ( Mo l ina, indic at e t h𝜃̂iPTat is p r ac t ic al l y t h e𝜃̂iH siname Rao , and Dat t a 2015) , and t h e r e f o r e it may b e b e t t e r t o 𝜃̂iHal w ay s u b e c au s e it au t o mat ic al l y g iv 𝛾êi ts oa st hmal e ldirw e ec ig t eh st t imat H0 is o rnowt h e n r e je c t e d. t h ing r o ut hg eh MSE a MIX On t h e o t h e r h and, PT is u s e f u l f o r e s 𝜃t̂iHimat o f e s t imat o r o f MSE g iv e n b y { if T ≤ m2 p,𝛼 o r 𝜎̂ 𝑣2 RE 0; g2i ( RE ) ms ePT (𝜃̂iH ) ( 6. 4. 20) if T > m2 p,𝛼 and 𝜎̂ 𝑣2 RE > 0. ms e 𝜃̂i
∑ 1 T 1 ̂ RE de no t 𝜃ê Hso b t aine d w𝜎̂ 𝑣2 it h He r eg2i, ∶ g2i (0) zTi ( m i 1 𝜓i zi zi ) zi and 𝜃i i 2 e 0nisw r he eje nc t e d, it may h ap 𝜎̂ 𝑣2 RE p e n0 tbhe atc au s e t h e 𝜎̂ 𝑣 RE . No t e t h at , e v H t e s t s t atTisdot ic e s no t de p 𝜎̂e𝑣2 RE nd.oTh n is is t h e r e as o n f o r inc l u ding t h e al c o ndit io𝜎̂ 𝑣2nRE 0 in ( 6. 4. 20) . Simu l at io n r e s u l t s b y Mo l ina e t al . ( 2015) ̂H t h at ms PT (e𝜃i ) p e r f o r ms b e t t e r in t e r ms o f RB t h an t h e al t e r nat iv e M 2 17) > 0. w h e n MSE t h at u gs 2ie ws h e𝜎̂ 𝑣2nRE 0 and t h e u s u al f o r mu l a ( 6.𝜎̂ 𝑣4. RE He nc e , it is b e t t eg2ir wt o h ueHs0n is e no t r e je c t e d, e v e n if t h e r e al ize d v 𝜎̂ 𝑣2 RE is p o s it iv e . One c o u l d al s o ap p l y t h e PT me t h o d t o e s t imat e t h e MSE o f t e dirb easc et de s t im mat o 𝜃̂riYL , w h ic h at t ac h e s a p o s it iv e w e ig h 𝜃t̂i .t oTh t eh PT) is g iv e n b y MIX e s t imat o r o (f𝜃̂iYL MSE ms
ePT (𝜃̂iYL )
{ ( ) g2i 𝜎̂ 𝑣2 YL ms e(𝜃̂iRE )
if T ≤ m2 if T > m2
p,𝛼 ; p,𝛼 .
( 6. 4. 21)
Ins t e ad o f e xc l u ding al l r ando 𝑣i , im e1,f f e, m, c t in s ,t h e l ink ing mo de l and t h e n u s e t h e r e g r e s s io n- s y nt h e t ic e s t imat o r f o r H al0l ar e as w at a s p e c 𝛼, ifi Dat e d t a and Mandal ( 2014) p r o p o s e d a mo difi e d ap p r o ac h t middl e g r o u nd t o t h e PT ap p r o ac h and t h e FH mo de l . Unde 𝑣i is r t h e ir
156
EBLUP: BASIC AREA LEVEL MODEL
c h ang e 𝛿di 𝑣t io, w h e𝛿i re equ al s 1 o r 0 w it h p rpoand b ab 1 ilp,it ie r e ss p e c t iv e l y . iid 2 Co ndit io nal 𝛿oi n 1, 𝑣i N(0, 𝜎𝑣 ), i 1, , m. Th is mo difi e d mo de l p e r mit s mo r e fle xib l e dat a- de p e nde nt 𝜃̂si ht or w ink aragd et h oe f r e g r e s s io n- s y nt h e mat o zrTi ̂ PT t h an t h e PT and FH ap p r o ac h e s . A HB ap p r o ac h w as u s e HB e s t imat o 𝜃ri . oAn f ap p l ic at io n t o t h e e s t imat io n o f p o v e r t y r at e c h il dr e n in t h e s t at e s o f t h e Unit e d St at e s s h o w s adv ant ag e s o f
6.4.4
Covariates Subject to Sampling Errors
Th e b as ic ar e a l e v e l mo de l ( 6. 1. 1) as s u mezi sar te h pato t ph ue lcatoiov narv iat al e- s u e s no t s u b je c t t o s amp l ing e r r o r s . Ho w e v e rzi, may in p rbace t ic e s o s u b je c t t o s amp l ing e r r o r s . Yb ar r a and Lo h r ( 2008) s t u die d t h e b o dy mas s inde x f o r 50 s mal l ar e as ( de mo g r ap h ic s u b g r o u p s ) u 𝜃̂i , o b t aine d f r o m t h e 2003–2004 U. S. Nat io nal He al t h and Nu t r it io n Exa Su r v e y ( NHANES) . Th e y al s o u s e d t h e 2003 U. S. Nat io nal He al t h In ( NHIS) dir e c t e s tẑimat s an s e l f - r e p o r t e d b ozi dy as tmas h e s cinde o v xar ii o f e me at e . In t h e NHANES, b o dy mas s inde x f o r e ac h r e s p o nde nt w as as c me dic al e xaminat io n in c o nt r as t t o t h e NHIS v al u e s b as e d o n s e l f - r e Ho w e v e r , t h e NHIS s amp l e s ize ( 29, 652 p e r s o ns ) is mu c h l ar g s ize ( 4, 424 p e r s o ns ) f o r t h e NHANES, and t h e r e lẑiab i o fl emedirane c t e s t s e l f - r e p o r t e d b o dy mas s inde x ar e h ig h l y c o r r e l at e𝜃̂i .d w it h t h A p l o t 𝜃̂oi agf ainsẑti s h o w e d s t r o ng l ine ar r e l at io ns h ip . To s t u dy t h e e f f e c t o f s amp zi , as l ing r s kin no w n l ink ing s u eme bir r o1 and 𝜃i is o or bo t faine d s imp l y b y mo de l p ar ame tand e r 𝜎s 𝑣2 . A “naiv e ” b e s t e s t imat s u b s t it u t ing t h e av ail ab ẑ i fl oe re tsht eimat u onkzri in not w h en b e s t e s𝜃̃iBt :imat o r 𝜃̃iNB
𝛾i 𝜃̂i + (1
𝛾i )̂zTi ,
( 6. 4. 22) ind
w h e r𝛾ie 𝜎𝑣2 ∕(𝜎𝑣2 + 𝜓i ). No w , as s u ming ẑ i t hN(z at i , Ci ) w it h k no w n and rwix it zĥ i inde p e nde nt𝑣i oand f ei , it f o l l o w s v ar ianc e –c o v ar ianc eCi mat t h at MSE (𝜃̃iNB ) 𝛾i 𝜓i + (1 𝛾i )2 T Ci , w h ic h is l ar g e r t h an t h e MSE o f t h e ̂i ) 𝜓i , if T Ci > 𝜎𝑣2 + 𝜓i . Th e r e f o r e , ig no r ing t h e s amp e s t imat o r , (𝜃MSE e r r o rẑ i in and u s ing 𝜃̃iNB is l e s s e f fi c ie nt t h an u s ing t h e𝜃̂i dir if tehc et e s t ima T p r e v io u s c o ndit n o n Cio h o l ds . Th u s , w h e n c o v ar iat e s ar e e s t imat e i u s ing t h e naiv e e s t imat o r mig h t no t l e ad t o g ains in e f fi c ie nc y w e s t imat io n. Mo r e o v e r , t h e MSE t h at is t y p ic al l y r e p o r t e d, o b e s t imatẑ i eas if it w e r e t zhi and e tgr uiv ee n 𝛾bi 𝜓yi , l e ads t o s e r io u s u nde r s t at e m NB o f t h e t r u e 𝜃̃MSE o f . i Yb ar r a and Lo h r ( 2008) c o ns ide r e d t h e c l as s o f al l𝜃̂il and ine ar c o mb i ẑ Ti , t h at aisi 𝜃̂,i + (1 ai )̂zTi , f o r ≤0ai ≤ 1. Th e o p taimal t h at minimize s th e i MSE is g iv e n b y 𝜎𝑣2 + T Ci a∗i ∶ 𝛾iL , ( 6. 4. 23) T 2 𝜎𝑣 + Ci + 𝜓i
157
*PRACTICAL ISSUES
and t h e r e s u l t ing b e s t l ine ar e s t imat o r in t h e me nt io ne d c l as s is 𝜃̃iL
𝛾iL 𝜃̂i + (1
𝛾iL )̂zTi .
( 6. 4. 24)
̃ L sg tivh eats a l ar g e r w e ig h t It f o l l o w s f r o m ( 6. 𝛾iL4.>23) t h hatic h imp l 𝜃ie 𝛾i , w i NB ̂ ̃ t h e dir e c t e s𝜃ti timat h ano t rh e naiv e b e s t𝜃ie s. tNo imat r mal o r it y as s u mp t io n is no ne e de d t o de𝜃̃iLr bive ec au s e it is o b t aine d f r o m t h e c l as s o f𝜃̂il ine ar c o and ẑ Ti . Fu r t h e r mo r e , it is e as y t o (v𝜃̃iLe) r if𝛾iLy𝜓i ,t hw ath MSE ic h is l ar g𝛾i 𝜓e i ,r t h an B T ̂ ̃ t h e MSE o f t h e b e s t𝜃i e s 𝛾ti 𝜃imat o r 𝛾i )zi w h ezi nis k no w n. Al s o , if i + (1 E(̂zi ) zi , t h eE(n𝜃̃iL 𝜃i ) 0 h o l ds , t h𝜃̃iLatisisu, nb ias e d f o r t h e 𝜃ar e a me e an i . Th e s t imat𝜃̃oiL is r ne v e r l e s s e f fi c ie nt t h an t h𝜃̂ie b dir e ce au c t≤ s ee𝛾siL 0t≤imat 1. o r T inat c aniob ns e ro ef l axe d if Re s t r ic t io n t o t h e c l as s o f l ine ar𝜃̂i and c o ẑmb i d b oy r o f no r mal it y is as s u me d. Unde r no r mal it y , an e f fizi cis ieo ntb et aine s t imat c o ns ide r ing t h e l L(z ik ie) l ihf (𝜃ôi , oẑ i |zdi ) f (𝜃̂i |zi )f (̂zi |zi ) and t h e n maximizing c t ist ol e ads t o t h e “b e s t s u p p ozTir t gingiv ”ee nsbt imat y o r L(zi ) w it h r e s pzi .e Th T T z̃ i 𝛿i ẑ i + (1 𝛿i )𝜃̂i , w h e𝛿ir e (𝜎𝑣2 + 𝜓i )∕(𝜎𝑣2 + 𝜓i + T Ci ). Int e r e s t ing l y , t h e Yb ar r a–Lo h r e 𝜃̃siL t isimato ob rt aine d b y s u bz̃ Tis t itf uo trzTiing in 𝜃̃iB ( Dat t a, Rao , and To r ab i 2010; Ar ima, Dat t a, and Lis e o 2015) . It may b e no , no n ot ing n t 𝜃̂hi |zati N(zTi , 𝜎𝑣2 + 𝜓i ) and b o tẑhi and 𝜃̂i p r o v ide inf o r matziio ingi ) bt hase e ldik e l ih o ẑ i |zi N(zi , Ci ), and t h at is t h e r e as o n f o r c o ns ide r L(z ing ) t o nle ys t imat ẑ i e o n t h e jo int def (ns 𝜃̂i , ẑiti |z y i ). On t h e o t h e r h and, uf (̂szi |z i l e ads ẑt Tio , and s u b s t itẑuTi t ing f o zrTi in 𝜃̃iB g iv e s t h e naiv e b e 𝜃s̃iNB t e. s t imat o r Car r o l l e t al . ( 2006, p .f (̂ 38) zi |zi )u t so e ard g u e in f av o r o f t h e naiv e b e s t in t h e c o nt e xt o f l ine ar r e g r e s s io n mo de l s : “It r ar e l y mak e s any u nde rdeno me as u r e me nt e r r o r ”. Yb ar r a and Lo h 𝜃r̃iL ,( 2008) r ivr mal e d it y , as t h e T ̂ b e s t e s t imat 𝜃i ob ry oc f o ndit io ning o n t h 𝜃ei r eẑ i s .idu al s Th e o p t imal e s 𝜃̃tiLimat de po er nds o n t h e p ar and ame𝜎𝑣2t,ewr sh ic h ar e u nk no w n in p r ac t ic al ap p l ic at io ns . It is ne c e and s s ar 𝜎𝑣2 yb yt o s rue itpabl ac l ee e ŝ t imat o r s 2 and 𝜎̂ 𝑣 t o o b t ain t h e e mp ir ic al e s t imat o r 𝜃̂iL
𝛾̂iL 𝜃̂i + (1
𝛾̂iL )̂zTi ̂ ,
( 6. 4. 25)
2 w itr eh p l ac e𝜎̂ 𝑣2d and b y ̂ , r e s p e c t iv e l y . Yb ar r w h e𝛾̂iLr ise g iv e n b y ( 6. 4. 𝜎23) 𝑣 and and Lo h r ( 2008) p r o p o s e d a s imp l e mo me 𝜎𝑣2 gntive es nt bimat y o r o f m ∑ 2 𝜎̂ 𝑣a
(m
p)
1
ẑ Ti ̂ a )2
[(𝜃̂i
𝜓i
̂ Ta Ci ̂ a ],
( 6. 4. 26)
i 1
at ise fie es ts imat t h oe re qu at io n w h ê ar is e b as e d o n mo difi e d l e as t s qu ar e ŝ a. sTh m ∑ ( ai ẑ i ẑ Ti i 1
] ) Ci
m ∑ ai ẑ i 𝜃̂i i 1
( 6. 4. 27)
158
EBLUP: BASIC AREA LEVEL MODEL
1 h t f o r a g iv e n s e t o afi >w0,ei ig 1,h t s, m. In anal o g y w it h t h(𝜎e𝑣2 +w𝜓ie) ig u s e d t o e s t imat u nde e r t h e FH mo de l w it zi ,h wk eno ww on u l d l ik e t o u s e T w e ig ahi t s(𝜎𝑣2 + 𝜓i + ̂ Ci ̂ ) 1 in ( 6. 4. 27) t o ac c o u nt f o r t h e me as u r e me in ẑ i . In p r ac t ic e , w e c an u s e a t w o - s t e p p. rIno sct e du p 1, r e wt oe es se tt ima ai 1 and e s t imat eand 𝜎𝑣2 u s ing ( 6. 4. 27) and ( 6. 4. 26) . In s t e p 2, w e s u b t h e s e e s t imat e s int o t h e aei txp s ioh nicf oh rin t u r n l e ads t o t h e de s o rg eâei ,s t w 2 . Bo t ̂h and 𝜎 2 ar e c o ns is t e nt e s t imat o r s as t h e nu m e s t imat ô ar and s 𝜎̂ 𝑣a ̂ a 𝑣a ar e asm,, t e nds t o infi nit y . A s e c o nd- o r de r ap p r o ximat (𝜃̂iL )iois nst imil o MSE ar t o (MSE 𝜃̂iH ) g iv e n b y ( 6. 2. 1) , b u t inv o l v e s an e xt r a t e r m t h at ac c o u nt s f zô i , r and t h ite r se amp du cl ing e s e rr t o ( 6. 2. 1) wCi h e𝟎.n Yb ar r a and Lo h r ( 2008) u s e d a jac k k nif e me t h o d t o Mi ∶ MSE(𝜃̂iL ). Th e jac k k nif e me t h o d is o u t l ine d in Se c t io n 9. 2. 2 f o 𝜃̂iL , ho or sw. In e vt he er , cwas ee ne o fe t o 𝜃r̂iH , b u t it is ap p l ic ab l e t o o t h e r e s t imat L L L t o ig no r e t h e c r o s s M - p12ir o 2E( du 𝜃̃ci t t𝜃ie)(𝜃r̂im 𝜃̃i ) in t h e de c o mp o s it io n e 1i r e E(𝜃̃iL 𝜃i )2 and M2i E(𝜃̂iL 𝜃̃iL )2 in o r de r t o Mi M1i + M2i + M12i , w h M ap p l y t h e jac k k nif e me t h o d, as do ne b y Yb ar r a and Lo h r ( 2008) jac k k nif e me t h o d p e r fMo12ir is ms new g el ligl ib if l e r eMl2iat. iv e t o Th e me t h o d o f Simu l at io n Ext r ap o l at io n ( SIMEX) is w ide l y u s s u r e me nt e r r o r l it e r at u r e t o e s t imat e r e g r e s s io n p ar ame t e r r e g r e s s io n mo de l s ( Car r o l l e t al . 2006, Ch ap t e r 6) . SIMEX c an e s t imat e t h e ar e a me ans u nde r t h e b as ic mo de l ( 6. 1. 1) in t h e p p l ing e r r o zri s( Sing in h , W ang , and Car r o l l 2015a) . Bas ic ide a o f SIME t o c r e at e addit io nal ẑ b,i v (𝜁 e )c ot of rinc s r e as ing l y l ar g e (1r +v𝜁ar )Cianc i f o er s b 1, , B, w h e r e t h e nu mb e r o Bf isr e lparl gic eat io and 𝜁 ≥ns0. Th is is ac c o mp l is h e d b y g e ne rb,iat, ing r s p e nde nt l yN(𝟎, b 1,v e c, B,t oinde f r Coi )m and t h e n l e tẑtb,iing (𝜁 ) ẑ i + 𝜁 1∕2 rb,i . Car r o l l e t al . ( 2006) ẑ b,i (𝜁 c) aal“rl e me as u r e me ntẑ”i . oNo f t ing t ẑhi |zati N(zi , Ci ), it f o l l o w E[̂ s zb,it (𝜁 h )|z at i ] zi and V[̂zb,i (𝜁 )|zi ] (1 + 𝜁 )Ci (1 + 𝜁 )V(̂zi |zi ). Th e r e f o r [̂ ezb,i , (𝜁 MSE )|zi ] c o nv e r g e s t o ze r o 𝜁as→ 1. SIMEX s t e p s f o r g e t t ing t h e e s t imat e s o f s mal l ar e a me ans ar e as L dif f e r e nt v al 𝜁u: e0 < s 𝜁o1 < f · · · < 𝜁L . ( ii) Fo r e𝜁𝓁ac(𝓁h 1, , L) g e ne r at e {̂zb,i (𝜁𝓁 ); b 1, , B} and t h e n r e p lziacb ing yẑ b,i (𝜁𝓁 ) in t h e mo de l ( 6. 1. 1) , c al c u l at ̂tb,ih(𝜁e𝓁 );dat t h e naiv e EBLUP e s t imat o r f r o{𝜃̂m i a1, , m} and de no t e it i, z H (𝜁 ), f o br 1, as 𝜃̂b,i , B. ( iii) Fo r e 𝜁 ac (𝓁 h 1, , L), c al c u l at e t h e av e r ag 𝓁 𝓁 ∑ H H (𝜁 ). ( iv ) Pl o t ̂ B 1o rBb s1 𝜃:̂b,i o v e r Bt hr ee p l ic at e s o f t h e naiv 𝜃e+,ie(𝜁𝓁s) t imat 𝓁 H t h e av e 𝜃̂r+,iag(𝜁𝓁e) ag ains𝜁t𝓁 f o 𝓁r 1, , L and fi t an e xt r ap o l at io n f u nc t io n, s 𝜃̂fH,i (𝜁 ), t o t h e (𝜁 p 𝓁air , 𝜃̂+,i s (𝜁𝓁 )), 𝓁 1, , L. ( v ) Ext r ap 𝜃̂ofH,i (𝜁 l at ) t eo𝜁 1t o g e t t h e SIMEX e s t imat 𝜃̂fH,i (o 1) r o f𝜃i . Sing h , W ang , and Car r o l l ( 2015b ) al s o p r o p o s e d a c o r r e c t e me t h o d as s u ming no r mal it y . Limit e d s imu l at io n r e s u l t s indic at e d t t h e e s t imat o r b as e d o n t h e c o r r e c t e d s c o r e e qu at io ns p e r f MSE, t h an t h e Yb ar r a–Lo h r𝜃̂iLe. s t imat o r
159
*PRACTICAL ISSUES
6.4.5
Big Data Covariates
“Big Dat a” is c u r r e nt l y a h o t t o p ic in St at is t ic s , and it is g o ing t o at t e nt io n in t h e f u t u r e . Th e s e k inds o f dat a ar e al s o c al l e d “o r 2011) . Th r e e c h ar ac t e r is t ic s o f o r g anic dat a ar e v o l u me , v e l o c dat a inc l u de t r ans ac t io n dat a ( e . g . , t r af fi c flo w dat a) and s o c ial me ( e . g . , Go o g l e t r e nds o r Fac e b o o k p o s t ing s ) . W e b r ie fly me t io ns t o s mal l ar e a e s t imat io n t h at u s e b ig dat a c o v ar iat e s as addit t h e ar e a l e v e l mo de l . Mar c h e t t i e t al . ( 2015) s t u die d t h e e s t imat io n o f p o v e r t y indic in Tu s c any r e g io n o f It al y , in p ar t ic u l ar p o v e r t y r at e , w h ic h is o f h o u s e h o l ds w it h a me as u r e o f w e l f ar e b e l o w a s p e c ifi is a s p e c ial c as e o f t h e c l as s o f FGT p o v e r t y me as u r e s int r o d Big dat a o n mo b il it y c o mp r is e d o f dif f e r e nt c ar jo u r ne y s au t w it h a GPS de v ic e du r ing May 2011. Mo b il it y f𝑣oisr de a gfiivnee dn as v eah ic l e me as u r e o f e nt r o p y g iv e n b y L L ∑ ∑
p𝑣 (𝓁1 , 𝓁2 ) l o [p g 𝑣 (𝓁1 , 𝓁2 )],
M𝑣
𝓁1 1 𝓁2 1
w h e(𝓁r1 ,e𝓁2 ) de no t e s a p air o f l Loisc tath ioe ns nu , mb e r o f l o pc𝑣 (𝓁 at 1io , 𝓁2ns ) , and 𝓁2 )ns/ ( t o t al nu mb e r o f ( nu mb e r o f t r ip 𝑣s made v e h bice l tew e e n l o𝓁1cand at io 1 ∑t ak M M iiat eVi is t r ip s o f v 𝑣) e h. Th ic l ee ar e a l e v e l b ig dat a c o v ar 𝑣∈Ai e𝑣 ,n as i wl eitsh int ar h e ar e g is t e r e d GPS Vi isde tvh ice e and w h eAir is e t h e s e t o f v e h ic o vM is es u b je c t t o e r r o r in e s t im nu mb e r o f v e h ic l e s Abi . eTh l o engc ing taroi iat ing t h e t r u e mo b il i.it Mar y incarh eeat t i e t al . ( 2015) t r e at e d t h e v e h ic l e as a s imp l e r ando m s amp l e and e s t imat e d t h e v arMianc e or f t th hise me an i . Unde as s u mp t io n, t h Mei as y ua cs oe dv ar iat e in t h e FH mo de l and ap p l ie d t h e Yb a mo difi c at io n o f t h e c u s t o mar y EBLUP e s t imat o r , t r e at ing t h e e s M i as t h e t r u e v ar ianc e . Dir e c t e s t imat o r s o f ar e a p o v e r t y r at s u r v e y dat a. Th e s e c o nd ap p l ic at io n ( Po r t e r e t al . 2014) anal y s e s r e l at iv e h o u s e h o l d Sp anis h - s p e ak ing in t h e e as t e r n h al f o f t h e Unit e e s t imat o r s f o r t h e s t at e s ( s mal l ar e as ) f r o m t h e ACS and a b ig dat a f r o m Go o g l e Tr e nds o f c o mmo nl y u s e d Sp anis h w o r ds av ail ab Go o g l e Tr e nds dat a may b e r e g ar de d as f u nc t io nal c o v ar iat e s do main. 6.4.6
Benchmarking Methods
∑ Su p p o s e𝜃i is t h tath iteh ar e a me𝜃+an, m 𝓁 1 W𝓁 𝜃𝓁 is t h e ag g r e g at e me an, w h o fand u al nitl s arine arase ar a e s amp l e d W𝓁 N𝓁 ∕N is t h e k no w n p r o p o r t io n 𝓁, ∑ W 1) . Fo r e xamp l e , in t h e U. S. p o v e r t y c o u nt s ap p l ic at io ( m 𝓁 1 𝓁 e 𝜃in+ sis t at t he e nat io nal p o v e r t y r at e . As s u min 𝜃i is t h e p o v e r t y ri atand ∑ ̂ dir e c t e s t imat 𝜃̂+ o r m 𝓁 1 W𝓁 𝜃𝓁 o f𝜃+ is r e l iab l e , it is de s ir ab l e t o e ns u r e e nans ag g r e g at e d, ag r e e w it h t h e r e e s t imat o r s o f t h e s mal 𝜃li ,arwe ha me
160
EBLUP: BASIC AREA LEVEL MODEL
e s t imat 𝜃̂o+ ,r e s p e c ial l y𝜃̂+ wis hr e gn ar de d as t h e “g o l d s t andar d”. Th e EB e s t imat o𝜃̂iHr ,s do , no t s at is f y t h is “b e nc h mar k ing ” p r o p e r t y . In f ac t m ∑
m ∑
W𝓁 𝜃̂𝓁H
𝜃̂+
W𝓁 (1
𝓁 1
𝛾̂𝓁 )(𝜃̂𝓁
zT𝓁 ̂ ) ≠ 0.
( 6. 4. 28)
𝓁 1
1, , m It is t h e r e f o r e ne c e s s ar y t o mo dif y t 𝜃̂hiH ,e i EBLUP e st ot imat o r s e ns u r e b e nc h mar k ing .
Simple Adjustments to EBLUP A v e r y s imp l e b u t w ide l y u s e d mo difi c a t h el t cipol mEBLUPs is c al l e d r at io b e nc h mar k ing . It c o ns is 𝜃t̂iHs bo yf mu y ing e ∑m ∑ H , l e ading t o t h e r at io b e nc h mar k ̂ ̂ 𝜃 𝜃 W ∕ W mo n adju s t me nt f acm t o r 𝓁 1 𝓁 𝓁 𝓁 1 𝓁 𝓁 e s t imat o r (m ) m ∑ ∑ H ̂𝜃 RB 𝜃̂ H ̂ ̂ ( 6. 4. 29) W𝓁 𝜃𝓁 ∕ W𝓁 𝜃𝓁 . i i 𝓁 1
𝓁 1
∑ ∑m RB t h It r e adil y f o l l o w s f r o m (m 4.i 𝜃̂29) at 1 W𝓁 𝜃̂𝓁 𝜃̂+ . Rat io b e nc h i 6. 1W 𝓁 i mar k ing , h o w e v e r , h as t h e f o l l o w ing l imit at io ns : ( i) A c o mm H , iot hn.e ( ii) Unl i e g ar is ap p l ie d t o al l t h e 𝜃̂eiH ,s tr imat o rdls e s s o f t h e ir p r e c𝜃̂is i RB ̂ b e nc h mar k e d e s𝜃i t imat is onor t de s ig n- c o ns is t e nt as t h e s amp i l e s ize inc r e as e s b u t h o l ding t h e s amp l e s ize s in t h e r e maining ar e as fi xe u nb ias e d MSE e s t imat o r s ar e no t r e adil y av ail ab l e𝜃̂iH, (uPfnleikf fe e inr -t h e c mann, Sik o v , and Til l e r 2014) . Ano t h e r s imp l e adju s t me nt is t h e dif mar k ing g iv e n b y ( 𝜃̂iDB
𝜃̂iH
)
m ∑
m ∑
𝓁 1
W𝓁 𝜃̂𝓁H 𝓁 1
W𝓁 𝜃̂𝓁
+
.
( 6. 4. 30)
∑ DB t 𝜃 It r e adil y f o l l o w s f r o m (m 4.i 𝜃̂30) ĥ+ .at Simil ar t𝜃̂oiRB , t h e e s t imat o r i 6. 1W i DB 𝜃̂i al s o s u f f e r s f r o m t h e l imit at io ns ( i) and ( ii) , b u t it p e r mit s s e MSE e s t imat io n. St e o r t s and Gh o s h ( 2013) h av e s h o w n t h at t h e e f e mo de l ( 6. 1. 1) and u s ing t h e s imp ing is t o inc r e as e t h e MSE. bi As1sinut hming 2 o fr𝜎 2 g iv e n b y ( 6. 1. 15) , t h e y de r iv e d a s e c o nd- o r d mo me nt e s t imat 𝜎𝑣s 𝑣 ̂ H )MSE g4 (𝜎𝑣2n) t eo r tm h e u s u (𝜃al mat io n t o MSE (𝜃̂iDB ) o b t aine d b y adding a c o mmo i g iv e n b y ( 6. 2. 1) , t h at is , MSE(𝜃̂iDB ) ≈ MSE(𝜃̂iH ) + g4 (𝜎𝑣2 ).
( 6. 4. 31)
Th is c o mmo ngt4e(𝜎r𝑣2 )mis g iv e n b y g4 (𝜎𝑣2 )
m ∑ Wi2 (1 i 1
𝛾i )2 (𝜎𝑣2 + 𝜓i )
m m ∑ ∑ Wi Wj (1 i 1j 1
𝛾i )(1
𝛾j )hij ,
( 6. 4. 32)
161
*PRACTICAL ISSUES
∑ T 1 2 1 w h ehrij e zTi [ m 𝓁 1 (𝜎𝑣 + 𝜓𝓁 ) z𝓁 z𝓁 ] zj . Fu r t h e r mo r e , a s e c o nd- o r de r u DB ) is g iv e n b y e s t imat o r o (f𝜃̂iMSE ms e(𝜃̂iDB ) ≈ ms e(𝜃̂iH ) + g4 (𝜎̂ 𝑣2 ),
( 6. 4. 33)
H ) is w h e r e(𝜃̂ims e g iv e n b y ( 6. 2. 3) . Re g u l ar it y c o ndit io ns ( 6. 1. 10) and t h e addit io nal c o ndit 1≤𝓁≤m io n max W𝓁 O(m 1 ) ar e u s e d t o de r iv e ( 6. 4. 31) an ( 6. 4. 33) . St e o r t s and Gh o s h ( 2013) (𝜃ĉiDB o )mp e r t h e s t at e t o armse(𝜃̂ediH )ms u nde mo de l f o r p o v e r t y r at e s ( Examp l e 6. 1. 2) , u s ing dat a s e t s f o r 𝜃̂iH )e t o ms e 2000. Th e ir r e s u l t s indic at e d tgh4 (𝜎at𝑣2 ) tish ve et er yr ms mal l r e l at(iv f o r al l s i. t atHe e snc e , in t h is ap p l ic at io n, dif f e r e nc e b e nc h mar k ing inflat io n o f MSE.
“Optimal” Benchmarking Pf e f f e r mann and Bar nar d ( 1991) s h o w e d t h at e acohir in ar imp o s s ib l e t o fi nd a b e s 𝜃ti fe os rt imat o t efh ae c l as s o f l ine ar e s t imat o ∑m ̂ ti 𝓁 1 a𝓁i 𝜃𝓁 t h at ar e mo de l u nb ias e d and s at is f y t h e b e nc h mar k One w ay t o g e t ar o u nd t h is dif fi c u l t y is t o minimize a r e as o nab l Amo ng al l e s t imat t o(t1r, s , tm )T t h at ar e l ine ar and mo de l u nb ias e d and s at i ∑ ̂ t h e b e nc h mar k c om 𝓁 ns 1 Wt𝓁rt𝓁aint 𝜃+ , W ang , Fu l l e r , and Qu ( 2008) o b t ain t1 , , tm t h at minimize m ∑ Q(t) 𝜙i E(ti 𝜃i )2 ( 6. 4. 34) i 1
, i e 1, m. hTht se r e s u l t ing “o p t imal ” e s t imat o f o r s p e c ifi e d p o s it𝜙iiv w e , ig 𝜃i is g iv e n b y (m ) m ∑ ∑ W FQ H H 𝜃̃i ( 6. 4. 35) W𝓁 𝜃̂𝓁 W𝓁 𝜃̃𝓁 , 𝜃̃i + 𝜆i 𝓁 1
w h e re
( 𝜆i
𝓁 1
)
m ∑ 1
𝜙𝓁 W𝓁2 𝓁 1
1
𝜙i 1 Wi .
( 6. 4. 36)
∑ ∑ ̃ W FQ ̂ No t ing t h m 1, it f o l l o w s f r o m ( 6.m 35) i 𝜃i t h at 𝜃+ . Re p l ac iat1 Wi 𝜆i i 4.1 W 2 2 ing 𝜎𝑣 in ( 6. 4. 35) b y a s u it ab l e 𝜎̂ 𝑣e lset ads imat too r t h e e mp ir ic al o p t imal e s t i ( 𝜃̂iW
FQ
𝜃̂iH + 𝜆i
)
m ∑
m ∑
W𝓁 𝜃̂𝓁 𝓁 1
W𝓁 𝜃̂𝓁H
.
( 6. 4. 37)
𝓁 1
It f o l l o w s f r o m ( 6. 4. 35) and ( 6. 4. 36) t h at ( 6. 4. 35) do e s no t s u f f e r ( i) and ( ii) , p r o 𝜆vi → ide0 das t h e ar e a s amp l e s ize inc r e as e s . Re s u l t ( 6 s p e c ial c as e o f a g e ne r al r e s u l t f o r mu l t ip l e b e nc h mar k c o n b t aine Gh o s h 2013) . It f o l l o w s f r o𝜃̃iWm (FQis 6. 4.o 35) t h atd b y al l o c at ing t h e dif ∑m ∑ H u s ing ̂ ̃ 𝜃 𝜃 W W 𝜆 g iv e n b y ( 6. 4. 36) . Th𝜙ie cWhi go ivic ee s e nc e m i 𝓁 1 𝓁 𝓁 𝓁 1 𝓁 𝓁
162
EBLUP: BASIC AREA LEVEL MODEL
1 𝜃̂iDB , no t ing t 𝜆hi at 1 in t h is c as e . Po p u l ar 𝜙ci 1h inc o icl ue de s o𝜙(ifi) 𝜓i , ∑ m 1 H 1 1 H H MSE (𝜃̃i ), and ( iii)𝜙i Wi Co (v𝜃̃i , 𝓁 1 W𝓁 𝜃̃𝓁 ). Th e s e c h o ic e s w e r ( ii)𝜙i o r ig inal l y p r o p o s e d in t h e c o nt e xt o f t h e b as ic u nit l e v e l mo Dat t a e t al . ( 2011) p r o v ide d an al t e r nat iv e , s imp l e r ap p r o ac h W FQ b e nc h mar k e d e s 𝜽̃ti imat. oThr se y minimize d t h e p o s t e r io r e xp e c t at io ∑ ̂ w it h r e s p etic’s t st uo bt hje ec t t o w e ig h t e d s qu ar e d e m r li o 𝜃si )s2 |,𝜽], i E[(t i r 1r𝜙o ∑m T is t h e dat a v e c tt ’s ̂ ̂ ̂ ̂ W t , w h e 𝜽 r e ( 𝜃 , , 𝜃 ) e t r e s t r ic t e d 𝜃 + 1 m i o rar; te h no i 1 i i t o t h e l ine ar and mo de l - u nb ias e d c𝜃̃iBl as E(𝜃 s . i |No e t e s (b e s t) 𝜽̂ ) twh ,e l Bay ̂t e br es and e s t imat o r𝜃i odef p e nding o n t h e mo de l p arV(𝜃 ame t h e p o s t e r io r i | 𝜽) v ar ianc e 𝜃io. No f t ing t h at
E[(ti
𝜃i )2 | 𝜽̂ ]
𝜃̃iB )2 ,
̂ + (ti V(𝜃i | 𝜽)
( 6. 4. 38)
∑ t h e p r o b l e m r e du c e s t m minimizing 𝜃̃iB )2 w it h r e s p e cti ’st tsou tbhjee c t i (ti io 1 𝜙 ∑m t o i 1 Wi ti 𝜃̂+ . Us ing t h e Lag r ang e mu l t ip l ie r me t h o d, it r e adil y f o “o p t imal ” b e nc h mar k e d e s t imat o r is ( 𝜃̃iOB
𝜃̃iB
+ 𝜆i
)
m ∑
m ∑
𝓁 1
W𝓁 𝜃̃𝓁B 𝓁 1
W𝓁 𝜃̂𝓁
,
( 6. 4. 39)
w h e𝜆ir is e g iv e n b y ( 6. 4. 36) ( s e e Th e o r e m 1 o f Dat t a e t al . 2011) . mo de l p ar ame t e r s in ( 6. 4. 39) b y s u it ab l e e s t imat o r s , w e g e t t Fo t re tdhb ey s p e c ial c as e o f mo de l ( 6. 1. b e nc h mar k e d e s t imat o r𝜃̂iOB de. no W FQ OB ̂ ̂ no r mal 𝑣i and ei , 𝜃i r e du c e𝜃is t ono t ing t 𝜃h̃iB at 𝛾i 𝜃̂i + (1 𝛾i )zTi and 𝜃̃iH 𝛾̂i 𝜃̂i + (1 𝛾̂i )zTi ̂ . No t e t h at t h e r e s u l t ( 6. 4. 39) is qu it e g e ne r al , b u t r r ic as s u mp t io ns . Th e e s t imat o r is al s o ap p l ic ab l e u nde r t h e H 10) , s p e c if y ing p r io r s o n t h e mo de l p ar ame t e r s . An adv ant ag e o f t h e ab o v e me t h o d is t h at it r e adil y e xt e nds t o mar k c o ns t r aint s ( Dat t a e t al . 2011) . Suq p( library(sae) R> data(“milk”) R> attach(milk)
Ne xt , w e c al l t h emseFH. f u nc Dir t io en c t e s t imat e s ( l e f t formula) - h and s ide o f ar e g iv e yi n b and y s amp l ing v ar ianc e s o f dir e c t e s t imat vardir) o r s ( ar g u me ar e g iv e n in t h is c as e b y SD. t h eAss qu au xil ar eiaroy f v ar iab l e s ( r ig h t - h and s formula) , w e c o ns ide r t h e c at e g o r ie s o f MajorArea. t h e f ac t oWe r de fi ne d b u s e t h e deREML f au fil tt t ing me t h o d. Th e c o de f o r c al c u l at ing EBLUP e in ( 6. 1. 12) and t h e ir c o r r e s p o nding s e c o nd- o r de r u nb ias e d MSE e s t o ( 6. 2. 3) is as f o l l o w s : R> FH cv.FH detach(milk)
171
0 10 20 30 40 Area (sorted by decreasing sample size) (a)
CV(Direct) CV(EBLUP)
20
CV
25
30
EBLUP
15
Direct
10
0.4 0.6 0.8 1.0 1.2 1.4 1.6
Estimate
35
*SOFTWARE
0 10 20 30 40 Area (sorted by decreasing sample size) (b)
Figure 6.1 EBLUP and Dir e c t Ar e a Es t imat e s o f Av e r ag e Exp e ndit u r e o n Fr e Eac h Smal l Ar e a ( a) . CVs o f EBLUP and Dir e c t Es t imat o r s fo r Eac h Smal l Ar e ar e So r t e d b y De c r e as ing Samp l e Size .
EBLUP and dir e c t ar e a e s t imat e s o f av e r ag e e xp e ndit u r e ar e p l o t ar e a in Fig u r e 6. 1a. CVs o f t h e s e e s t imat e s ar e p l o t t e d in Fig u r e s mal l ar e as h av e b e e n s o r t e d b y de c r e as ing s amp l e s ize . Ob t h at EBLUP e s t imat e s t r ac k dir e c t e s t imat e s b u t s e e m t o b e mo in Fig u r e 6. 1b t h at CVs o f EBLUP e s t imat e s ar e s mal l e r t h an t h o s mat e s fo r al l t h e s mal l ar e as . In fac t , dir e c t e s t imat e s h av e CVs o ar e as , w h e r e as t h e CVs o f t h e EBLUP e s t imat e s do no t e xc e e d t h ar e as . Mo r e o v e r , t h e g ains in e ffi c ie nc y o f t h e EBLUP e s t imat o r ar e as w it h s mal l e r s amp l e s ize s ( t h o s e in Fig u r e 6. 1b ) . Th u s , in e s t imat e s b as e d o n FH mo de l s e e m t o b e mo r e r e l iab l e t h an dir e
7 BASIC UNIT LEVEL MODEL
In this chapter, we study the basic unit level model (4.3.1) and spell out EBLUP inference for small area means. Section 7.1 gives EBLUP estimators, and MSE estimation is covered in Section 7.2. Several practical issues associated with the unit level model are studied in Section 7.6, and methods that address those issues are presented. 7.1
EBLUP ESTIMATION
In this section, we consider the basic unit level model (4.3.1) and spell out EBLUP estimation, using the results in Section 5.3, Chapter 5, for the general linear mixed model with a block-diagonal covariance structure. We assume that all the areas are sampled (m = M), and that the sampling within areas is noninformative. In this case, we can use the sample part of the population model (4.3.1) to make inference on the ∑Ni yij . Considering without loss of generality that the population mean Y i = Ni−1 j=1 sample units from area i are the first ni units, for i = 1, … , m, the sample part of the population model (4.3.1) is given by yij = xTij 𝜷 + 𝑣i + eij , iid
j = 1, … , ni ,
i = 1, … , m,
(7.1.1)
ind
where 𝑣i ∼ (0, 𝜎𝑣2 ) and eij ∼ (0, kij2 𝜎e2 ), random effects 𝑣i are independent of sampling ∑ errors eij and m i=1 ni > p. This model may be written in matrix notation as yi = Xi 𝜷 + 𝑣i 𝟏ni + ei ,
i = 1, … , m,
where yi = (yi1 , … , yini )T , Xi = (xi1 , … , xini )T , and ei = (ei1 , … , eini )T .
(7.1.2)
174
7.1.1
BASIC UNIT LEVEL MODEL
BLUP Estimator
Model (7.1.2) is a special case of the general linear model with block-diagonal structure, given by (5.3.1), by setting yi = yi ,
Xi = Xi ,
vi = 𝑣i ,
ei = ei ,
Zi = 𝟏ni , 𝜷 = (𝛽1 , … , 𝛽p )T ,
where yi is the ni × 1 vector of sample observations yij from the ith area. Furthermore, Gi = 𝜎𝑣2 ,
Ri = 𝜎e2 diag1≤j≤ni (kij2 )
so that Vi = Ri + 𝜎𝑣2 𝟏ni 𝟏Tni . The matrix Vi can be inverted explicitly. Using the following standard result on matrix inversion: (A + uvT )−1 = A−1 − A−1 uvT A−1 ∕(1 + vT A−1 u) and denoting
(7.1.3)
ni ∑
aij = kij−2 ,
ai⋅ =
aij ,
ai = (ai1 , … , aini )T ,
j=1
the inverse is given by V−1 i =
[ ] 𝛾i 1 T (a ) − a a diag , 1≤j≤ni ij ai⋅ i i 𝜎e2
(7.1.4)
𝛾i = 𝜎𝑣2 ∕(𝜎𝑣2 + 𝜎e2 ∕ai⋅ ).
(7.1.5)
where
The target parameters are the small area means Y i , i = 1, … , m, which by the model assumptions are given by Y i = XTi 𝜷 + 𝑣i + Ei , i = 1, … , m. For Ni large, by the Strong Law of Large Numbers, Ei tends almost surely to the expected value E(eij ) = 0. According to this result, instead of focusing on Y i , Battese et al. (1988) considered as target parameter the i th area mean 𝜇i = XTi 𝜷 + 𝑣i , which is the mean of the conditional expectations E(yij |𝑣i , xij ) = xTij 𝜷 + 𝑣i , where Xi is the vector of known means Xi1 , … , Xip . Moreover, the EBLUP estimator of Y i approaches the EBLUP estimator of 𝜇i if the sampling fraction, fi = ni ∕Ni , is negligible. In this section and in Section 7.1.2, we focus on the estimation of 𝜇i . Section 7.1.3 studies the case of Y i when fi is not negligible. The issue of informative sampling within areas and informative sampling of areas (when m < M) will be addressed in Section 7.6.3.
175
EBLUP ESTIMATION
Note that the target parameter 𝜇i = XTi 𝜷 + 𝑣i is a special case of the general area parameter lTi 𝜷 + mTi v taking li = Xi and mi = 1. Making these substitutions in the general formula (5.3.2) and noting that (𝜎𝑣2 ∕𝜎e2 )(1 − 𝛾i ) = 𝛾i ∕ai⋅ , we get the BLUP estimator of 𝜇i as T ̃ 𝜇̃ iH = XTi 𝜷̃ + 𝑣̃i , 𝑣̃i = 𝛾i (yia − xia 𝜷). (7.1.6) Here yia and xia are weighted means given by ni
yia =
ni
aij yij ∕ai⋅ ,
xia =
j=1
and
(
)−1 (
m
𝜷̃ =
(7.1.7)
aij xij ∕ai⋅ , j=1
)
m
XTi V−1 i Xi
XTi V−1 i yi
i=1
(7.1.8)
i=1
is the BLUE of 𝜷, where ( Ai ∶
XTi Vi 1 Xi
𝜎e
)
ni
2
aij xij xTij
T 𝛾i ai⋅ xia xia
(7.1.9)
j 1
and
( XTi Vi 1 yi
𝜎e
)
ni
2
aij xij yij
𝛾i ai⋅ xia yia
.
(7.1.10)
j 1
Note that the computation of 𝜷̃ is reduced to the inversion of the p × p matrix (7.1.9). The BLUP estimator (7.1.6) can also be expressed as a weighted average of the “survey regression” estimator yia + X( i xia )T 𝜷̃ and the regression-synthetic estimã tor XTi 𝜷: ̃ + 1( 𝛾i )XT 𝜷. ̃ (7.1.11) 𝜇̃ iH 𝛾i [yia + X( i xia )T 𝜷] i The weight 𝛾i 𝜎𝑣2 ∕ 𝜎(𝑣2 + 𝜎e2 ∕ai⋅ ) ∈ 0( , 1) measures the unexplained between-area variability, 𝜎𝑣2 , relative to the total variability 𝜎𝑣2 + 𝜎e2 ∕ai⋅ . If the unexplained between-area variability 𝜎𝑣2 is relatively small, then 𝛾i will be small and more weight is attached to the synthetic component. On the contrary, more weight is attached to the survey regression estimator as ai⋅ increases. Note that ai⋅ is O(ni ) and it reduces to ni if kij 1 for all (i, j). Thus, in that case, more weight is attached to the survey regression estimator when the area sample size ni is large. Note also that when kij 1 for all (i, j), the survey regression estimator is approximately design-unbiased for 𝜇i under simple random sampling (SRS), provided the m total sample size n ∶ i 1 ni is large. In the case of general kij ’s, it is model-unbiased for 𝜇i conditional on the realized area effect 𝑣i , provided 𝜷̃ is conditionally unbiased for 𝜷. On the other hand, the BLUP estimator (7.1.11) is conditionally biased due ̃ The survey regression estimator to the presence of the synthetic component XTi 𝜷.
176
BASIC UNIT LEVEL MODEL
is therefore valid under weaker assumptions, but it does not increase the “effective” sample size as illustrated in Section 2.5, Chapter 2. Under SRS and kij 1 for all (i, j), the BLUP estimator is design-consistent for Y i as ni increases because 𝛾i → 1. For general designs with unequal survey weights 𝑤ij , we consider survey-weighted pseudo-BLUP estimators that are design-consistent (see Section 7.6.2). The MSE of the BLUP estimator is easily obtained either directly or from the general result (5.3.5) by letting 𝜹 (𝜎𝑣2 , 𝜎e2 )T . It is given by MSE(𝜇̃ iH )
E(𝜇̃ iH
𝜇i ) 2
g1i (𝜎𝑣2 , 𝜎e2 ) +g2i (𝜎𝑣2 , 𝜎e2 ),
(7.1.12)
where g1i (𝜎𝑣2 , 𝜎e2 )
𝛾i (𝜎e2 ∕ai⋅ )
and
( g2i (𝜎𝑣2 , 𝜎e2 )
X (i
𝛾i xia )
)
m
T
Ai
(7.1.13) 1
(Xi
𝛾i xia )
(7.1.14)
i 1
with Ai given by (7.1.9). The first term, g1i (𝜎𝑣2 , 𝜎e2 ), is due to prediction of the area effect 𝑣i and is O(1), whereas the second term, g2i (𝜎𝑣2 , 𝜎e2 ), is due to estimating and is O(m 1 ) for large m, under the following regularity conditions: (i) kij and ni are uniformly bounded. (ii) Elements of Xi are uniformly bounded such that Ai is [O(1) p×p ] . We have already mentioned that the leading term of the MSE of the BLUP estimator is g1i (𝜎𝑣2 , 𝜎e2 ) 𝛾i (𝜎e2 ∕ai⋅ ). Comparing this term with the leading term of the MSE of the sample regression estimator, 𝜎e2 ∕ai⋅ , it is clear that the BLUP estimator provides considerable gain in efficiency over the sample regression estimator if 𝛾i is small. Therefore, models that achieve smaller values of 𝛾i should be preferred, provided they give an adequate fit in terms of residual analysis and other model diagnostics. This is similar to the model choice in Example 6.1.1 for the basic area level model. T 1 1 can be calculated using The BLUE ̃ and its covariance matrix ( m i 1 Xi Vi Xi ) only ordinary least squares (OLS) by first transforming the model (7.1.2) with correlated errors uij 𝑣i + eij to a model with uncorrelated errors u∗ij . The transformed model is given by kij 1 (yij
𝜏i yia )
kij 1 (xij
𝜏i xia )T + u∗ij ,
(7.1.15)
where 𝜏i 1 (1 𝛾i )1∕2 and the new errors u∗ij have mean zero and constant variance 𝜎e2 (Stukel and Rao 1997). If kij 1 for all (i, j), then (7.1.15) reduces to the transformed model of Fuller and Battese (1973). Another advantage of the transformation method is that standard OLS model diagnostics may be applied to the
177
EBLUP ESTIMATION
transformed data {kij 1 (yij 𝜏i yia ), kij 1 (xij 𝜏i xia ) }to check the validity of the nested error regression model (7.1.2) (see Example 7.3.1). In practice, 𝜏i is estimated from the data (see Section 7.1.2). Meza and Lahiri (2005) demonstrated that the standard Cp statistic based on the original data (yij , xij ) is inefficient for variable selection when 𝜎𝑣2 is large because it ignores the within-area correlation structure. On the other hand, the transformed data lead to uncorrelated errors, and therefore the Cp statistic based on the transformed data leads to efficient model selection, as in the case of standard regression models with independent errors. The BLUP estimator (7.1.11) depends on the variance ratio 𝜎𝑣2 ∕𝜎e2 , which is unknown in practice. Replacing 𝜎𝑣2 and 𝜎e2 by estimators 𝜎̂ 𝑣2 and 𝜎̂ e2 , we obtain an EBLUP estimator 𝜇̂ iH
xia )T ̂ ] + 1( 𝛾̂i )XTi ̂ ,
𝛾̂i [yia + X( i
(7.1.16)
where 𝛾̂i and ̂ are the values of 𝛾i and ̃ when (𝜎𝑣2 , 𝜎e2 ) is replaced by (𝜎̂ 𝑣2 , 𝜎̂ e2 ). 7.1.2
Estimation of 𝝈𝒗2 and 𝝈e2
We present a simple method of estimating the variance components 𝜎𝑣2 and 𝜎e2 . It involves performing two ordinary least squares regressions and then using the method of moments to get unbiased estimators of 𝜎e2 and 𝜎𝑣2 (Stukel and Rao 1997). Fuller and Battese (1973) proposed this method for the special case kij 1 for all (i, j). We first calculate the residual sum of squares SSE(1) with 𝜈1 degrees of freedom by regressing through the origin the y-deviations kij 1 (yij yia ) on the nonzero x-deviations kij 1 (xij xia ) for those areas with sample size ni > 1. This leads to the unbiased estimator of 𝜎e2 given by 2 𝜎̂ em
𝜈1 1 SSE(1),
(7.1.17)
where 𝜈1 n m p1 and p1 is the number of nonzero x-deviations. We next calculate the residual sum of squares SSE(2) by regressing yij ∕kij on xij ∕kij . Noting that E[SSE(2) ] 𝜂1 𝜎𝑣2 + n( p)𝜎e2 , where ⎡ ai⋅ ⎢1 ⎢ 1 ⎣
(
m
𝜂1 i
m
T ai⋅ xia
)
ni
aij xij xTij i 1 j 1
1
⎤ xia ⎥ , ⎥ ⎦
(7.1.18)
an unbiased estimator of 𝜎𝑣2 is given by 2 𝜎̃ 𝑣m
𝜂1 1 [SSE(2)
n(
2 p)𝜎̂ em ].
(7.1.19)
2 and 𝜎 2 are equivalent to those found by using the well-known ̂ em The estimators 𝜎̃ 𝑣m “fitting-of-constants” method attributed to Henderson (1953). However, the latter
178
BASIC UNIT LEVEL MODEL
method requires OLS regression on p1 + m variables, in contrast to p1 variables for the transformation method, and thus gets computationally slower as the number of small areas, m, increases. 2 can take negative values, we truncate it to zero when it is negative. The Since 𝜎̃ 𝑣m 2 2 , 0) is no longer unbiased, but it is consistent as max(𝜎̃ 𝑣m truncated estimator 𝜎̂ 𝑣m m increases. For the special case of kij 1 for all (i, j), Battese, Harter, and Fuller (1988) proposed an alternative estimator of 𝛾i that is approximately unbiased for 𝛾i . Assuming normality of the errors 𝑣i and eij , ML or REML can also be employed. For example, PROC MIXED in SAS can be used to calculate ML or REML estimates of 𝜎𝑣2 and 𝜎e2 and the associated EBLUP estimate 𝜇̂ iH . The naive MSE estimator, mseN (𝜇̂ iH )
g1i (𝜎̂ 𝑣2 , 𝜎̂ e2 ) +g2i (𝜎̂ 𝑣2 , 𝜎̂ e2 ),
(7.1.20)
can also be computed using PROC MIXED in SAS, where (𝜎̂ 𝑣2 , 𝜎̂ e2 ) are the ML or REML estimators of (𝜎𝑣2 , 𝜎e2 ). 7.1.3
*Nonnegligible Sampling Fractions
We now turn to EBLUP estimation of the finite population area mean Y i when the sampling fraction fi ni ∕Ni is not negligible. We write Y i as Yi
fi yi + 1( fi )yir ,
where yi is the sample mean and yir is the mean of the nonsampled values yi𝓁 , 𝓁 ni + 1, … , Ni , of the ith area. Under the population model (4.3.4), we replace the unobserved yi𝓁 by its estimator xTi𝓁 ̃ + 𝑣̃i , where xi𝓁 is the x-value associated with yi𝓁 . The resulting BLUP estimator of Y i is given by ̃H Yi
T
fi yi + 1( fi )ỹ H , ir
fi yi + 1( fi ) x( ir ̃ + 𝑣̃i )
where xir is the mean of nonsampled values xi𝓁 , 𝓁 ỹ H ir
ni + 1, … , Ni from area i, and T
xia )T ) ̃ ] + 1( 𝛾i )xir ̃ .
𝛾i [yia + x(ir
(7.1.21)
(7.1.22)
̃H The BLUP property of Y i is easily established by showing that Cov(bT y, ̃H Yi Y i ) 0 for every zero linear function bT y, that is, for every b ≠ 𝟎 with ̃H E(bT y) 0 (Stukel 1991, Chapter 2). The BLUP property of Y i also follows from the general results of Royall (1976). given in (7.1.22), we obtain an Replacing 𝜎𝑣2 and 𝜎e2 by estimators 𝜎̂ 𝑣2 and 𝜎̂ e2 in ỹ H ir ̃ H , and an EBLUP estimator of Y is obtained by substituting of y empirical version ŷ H i ir ir ̃H ŷ H ir for yir in (7.1.21) : ̂H Y f y + 1( f )ŷ H . (7.1.23) i
i i
i
ir
179
MSE ESTIMATION
̂H The estimator Y i may be also expressed as ̂H Yi
XTi ̂ + f{i + 1( fi )𝛾i }(yi
XTi ̂ ),
(7.1.24)
noting that xir (Ni Xi ni xi ) ∕(Ni ni ). It follows from (7.1.24) that the EBLUP estimator of Y i can be computed without the knowledge of the membership of each population unit (i, j) to the sampled or nonsampled parts from ith area. It is also clear ̂H from (7.1.24) that Y i approaches 𝜇̂ iH given in (7.1.6) when the sampling fraction, fi , is negligible. In practice, sampling fractions fi are often small for most of the areas.
7.2 7.2.1
MSE ESTIMATION Unconditional MSE of EBLUP
Consider the EBLUP estimator (7.1.16) of the parameter 𝜇i XTi + 𝑣i under the nested error linear regression model (7.1.2). Under regularity conditions (i) and (ii) and normality of the errors 𝑣i and eij , a second-order approximation of the MSE is obtained by applying (5.3.9). For the mentioned model and parameters, this approximation reduces to MSE(𝜇̂ iH ) ≈g1i (𝜎𝑣2 , 𝜎e2 ) +g2i (𝜎𝑣2 , 𝜎e2 ) +g3i (𝜎𝑣2 , 𝜎e2 ),
(7.2.1)
where g1i (𝜎𝑣2 , 𝜎e2 ) and g2i (𝜎𝑣2 , 𝜎e2 ) are given by (7.1.13) and (7.1.14), respectively. Furthermore, g3i (𝜎𝑣2 , 𝜎e2 ) ai⋅ 2 (𝜎𝑣2 + 𝜎e2 ∕ai⋅ ) 3 h(𝜎𝑣2 , 𝜎e2 ) (7.2.2) with h(𝜎𝑣2 , 𝜎e2 )
𝜎e4 V 𝑣𝑣 ( ) +𝜎𝑣4 V ee ( )
2𝜎e2 𝜎𝑣2 V 𝑣e ( ),
(7.2.3)
where (𝜎𝑣2 , 𝜎e2 )T , V ee ( ), and V 𝑣𝑣 ( ) are, respectively, the asymptotic variances of the estimators 𝜎̂ e2 and 𝜎̂ 𝑣2 and V 𝑣e ( ) is the asymptotic covariance of 𝜎̂ 𝑣2 and 𝜎̂ e2 . 2 ,𝜎 2 )T with ̂ em If the method of fitting-of-constants is used, then ̂ (𝜎̂ 𝑣2 , 𝜎̂ e2 )T (𝜎̂ 𝑣m asymptotic variances V 𝑣𝑣m ( )
2𝜂1 2 [𝜈1 1 (n
p
𝜈1 ) n(
V eem ( )
p)𝜎e4 + 𝜂2 𝜎𝑣4 + 2𝜂1 𝜎e2 𝜎𝑣2 ], 2𝜈1 1 𝜎e4 ,
(7.2.4)
(7.2.5)
and asymptotic covariance V e𝑣m ( )
V 𝑣em ( )
2𝜂1 1 𝜈1 1 (n
p
𝜈1 )𝜎e4 ,
(7.2.6)
180
BASIC UNIT LEVEL MODEL
where 𝜂1 is given by (7.1.18), 𝜈1 is defined below (7.1.17) and ( ⎡ +tr ⎢ A1 1 ⎢ ⎣
m
a2i⋅ (1
𝜂2
T 2ai⋅ xia A1 1 xia )
i 1
)2
m T a2i⋅ xia xia i 1
⎤ ⎥ ⎥ ⎦
(7.2.7)
n
m i a x xT (Stukel 1991, Chapter 2). If the errors eij have equal with A1 i 1 j 1 ij ij ij variance 𝜎e2 (i.e., with kij 1 and aij kij 2 1), then (7.2.4)–(7.2.6) reduce to the formulae given by Prasad and Rao (1990). If ML or REML is used to estimate 𝜎𝑣2 and 𝜎e2 , then ̂ ̂ ML or ̂ ̂ RE with asymptotic covariance matrix V( ) V ML ( ) V RE ( ) given by the inverse of the 2 × 2 information matrix (𝛿) with diagonal elements
1 2
𝑣𝑣 ( )
1 2
ee ( )
m
tr[(Vi 1 𝜕Vi ∕𝜕𝜎𝑣2 )2 ], i 1 m
tr[(Vi 1 𝜕Vi ∕𝜕𝜎e2 )2 ] i 1
and off-diagonal elements 𝑣e ( )
1 2
e𝑣 ( )
m
tr[Vi 1 (𝜕Vi ∕𝜕𝜎𝑣2 )Vi 1 (𝜕Vi ∕𝜕𝜎e2 ) ,] i 1
where 𝜕Vi ∕𝜕𝜎𝑣2
𝜕Vi ∕𝜕𝜎e2
T ni ni ;
𝜎e 2 Ri .
Using the formula (7.1.4) for Vi 1 we get, after simplification, 𝑣𝑣 ( )
ee ( )
1 2 1 2
m
a2i⋅ 𝛼i 2 ,
(7.2.8)
i 1 m
1)𝜎e 4 + 𝛼i 2 ]
[(ni
(7.2.9)
i 1
and 𝑣e ( )
1 2
m
ai⋅ 𝛼i 2 ,
(7.2.10)
i 1
where 𝛼i
𝜎e2 + ai⋅ 𝜎𝑣2 .
In the special case of equal error variances (i.e., with kij to the formulae given by Datta and Lahiri (2000).
(7.2.11) 1), (7.2.8)–(7.2.10) reduce
181
MSE ESTIMATION
7.2.2
Unconditional MSE Estimators
The expression for the true MSE given in (7.2.1) can be estimated by (5.3.11) when variance components are estimated either by REML or by the fitting-of-constants method. Under regularity conditions (i) and (ii) and normality of the errors 𝑣i and eij in the nested error linear regression model (7.1.2), the MSE estimator (5.3.11) is second-order unbiased and it reduces to mse(𝜇̂ iH )
g1i (𝜎̂ 𝑣2 , 𝜎̂ e2 ) +g2i (𝜎̂ 𝑣2 , 𝜎̂ e2 ) +2g3i (𝜎̂ 𝑣2 , 𝜎̂ e2 ),
(7.2.12)
2 ,𝜎 2 )T or ̂ 2 ,𝜎 2 )T . The correwhere ̂ (𝜎̂ 𝑣2 , 𝜎̂ e2 )T is chosen as ̂ RE (𝜎̂ 𝑣RE ̂ eRE (𝜎̂ 𝑣m ̂ em m H H sponding area-specific versions, mse1 (𝜇̂ i ) and mse2 (𝜇̂ i ), are obtained from (5.3.15) and (5.3.16), by evaluating g∗3i ( , yi ) given by (5.3.14). After some simplification, we get T (7.2.13) g∗3i ( , yi ) ai⋅ 2 (𝜎𝑣2 + 𝜎e2 ∕ai⋅ ) 4 h(𝜎𝑣2 , 𝜎e2 ) y( ia xia ̃ )2 .
Hence, the two area-specific versions are given by mse1 (𝜇̂ iH )
g1i (𝜎̂ 𝑣2 , 𝜎̂ e2 ) +g2i (𝜎̂ 𝑣2 , 𝜎̂ e2 ) +2g∗3i ( ̂ , yi )
(7.2.14)
and g1i (𝜎̂ 𝑣2 , 𝜎̂ e2 ) +g2i (𝜎̂ 𝑣2 , 𝜎̂ e2 ) +g3i (𝜎̂ 𝑣2 , 𝜎̂ e2 ) +g∗3i ( ̂ , yi ).
mse2 (𝜇̂ iH )
(7.2.15)
Battese et al. (1988) proposed an alternative second-order unbiased estimator of MSE(𝜇̂ iH ), based on a method of moment estimators of 𝜎𝑣2 and 𝜎e2 . This estimator uses the idea behind Beale’s nearly unbiased ratio estimator of a total (see Cochran 1977, p. 176). For the ML estimator ̂ ML , the MSE estimator (5.3.12) is applicable and in this case it reduces to g1i ( ̂ )
mse(𝜇̂ iH ) where ̂
̂ ML and
bT ( ̂ ; ̂ ) ∇g1i ( ̂ ) +g2i ( ̂ ) +2g3i ( ̂ ), ∇g1i ( ̂ )
𝛼i 2 (𝜎e4 , 𝜎𝑣4 ai⋅ )T .
The bias term b( ̂ ; ) for the ML estimator ̂
b( ̂ ML ; )
(7.2.16)
) ⎧ ⎡( m 1 1 ⎪ ⎢ XTi Vi 1 Xi ( ) ⎨tr 2 ⎪ ⎢⎣ i 1 ⎩ ) ( ⎡ m T 1 ⎢ X V X tr ⎢ i 1 i i i ⎣
̂ ML is obtained from (5.3.13) as 1 m i 1
⎤ XTi (𝜕Vi 1 ∕𝜕𝜎𝑣2 )Xi ⎥ , ⎥ ⎦ T
⎤⎫ ⎪ T 1 2 Xi (𝜕Vi ∕𝜕𝜎e )Xi ⎥⎬ , ⎥⎪ 1 ⎦⎭
1 m i
(7.2.17)
182
BASIC UNIT LEVEL MODEL
where ( ) is the 2 × 2 information matrix with elements given by (7.2.8)–(7.2.10), and XTi Vi 1 Xi is spelled out in (7.1.9). Furthermore, it follows from (7.1.9) that XTi (𝜕Vi 1 ∕𝜕𝜎𝑣2 )Xi
T
𝛼i 2 a2i⋅ xia xia
(7.2.18)
and ni
XTi (𝜕Vi 1 ∕𝜕𝜎e2 )Xi
T
𝜎e 4
aij xij xTij + 𝛼i 2 𝜎e 4 𝜎𝑣2 (𝜎e2 + 𝛼i )a2i⋅ xia xia .
(7.2.19)
j 1
7.2.3
*MSE Estimation: Nonnegligible Sampling Fractions ̂H We now turn to MSE estimation of the EBLUP Y i in the case of nonnegligible samT pling fraction fi . Let 𝜇ir xir + 𝑣i be the true mean and eir the mean of the errors ei𝓁 , for nonsampled units 𝓁 ni + 1, … , Ni in area i. Noting that ̂H Yi
Yi
(1
fi ) [(ŷ H ir
𝜇ir )
eir ],
̂H the MSE of Y i is given by ̂H MSE(Y i )
̂H E(Y i
Y i )2
(1
fi )2 E(ŷ H ir
𝜇ir )2 + N(i 2 kTir kir )𝜎e2 ,
(7.2.20)
where kir is the known vector of ki𝓁 -values for nonsampled units 𝓁 ni + 1, … , Ni . Now noting that ŷ H ir is the EBLUP estimator of 𝜇ir , we can use (7.2.1) to get a 𝜇ir )2 as second-order approximation to E(ŷ H ir E(ŷ H ir
𝜇ir )2 ≈ g1i (𝜎𝑣2 , 𝜎e2 ) +̃g2i (𝜎𝑣2 , 𝜎e2 ) +g3i (𝜎𝑣2 , 𝜎e2 ),
(7.2.21)
where g̃ 2i (𝜎𝑣2 , 𝜎e2 ) is obtained from g2i (𝜎𝑣2 , 𝜎e2 ) by changing Xi to xir . Substituting ̂H (7.2.21) in (7.2.20) gives a second-order approximation to MSE(Y i ). ̂H Similarly, an estimator of MSE(Y i ), unbiased up to o(m 1 ) terms, is given by ̂H mse(Y i )
1(
2 T fi )2 mse(ŷ H ̂ e2 , ir ) + Ni( kir kir )𝜎
(7.2.22)
where mse(ŷ H ir )
g1i (𝜎̂ 𝑣2 , 𝜎̂ e2 ) +̃g2i (𝜎̂ 𝑣2 , 𝜎̂ e2 ) +2g3i (𝜎̂ 𝑣2 , 𝜎̂ e2 )
(7.2.23)
and (𝜎̂ 𝑣2 , 𝜎̂ e2 ) are the REML or the fitting-of-constants estimators of (𝜎𝑣2 , 𝜎e2 ). Area-specific versions can be obtained similarly, using (7.2.14) and (7.2.15). From ̂H the last term in (7.2.22), if follows that the MSE estimator of Y i requires the knowledge of the ki𝓁 -values for the nonsampled units 𝓁 ni + 1, … , Ni from area i, unless ki𝓁 1, 𝓁 ni + 1, … , Ni , in which case the last term reduces to Ni 2 (Ni ni )𝜎̂ e2 .
183
MSE ESTIMATION
̂H Small area EBLUP estimators Y i are often aggregated to obtain an estimator for a larger area. To obtain the MSE of the larger area estimator, we also need the mean ̂H cross-product error (MCPE) of the estimators Y i for two different areas i ≠ t belonging to the larger area. It is easy to see that the MCPE correct up to o(m 1 ) terms is given by ( ̂H ̂H MCPE(Y i , Y t ) ≈ x( ir
𝛾i xia )
)
m
T
Ai
1
𝛾i xia )
(xir
∶g2it (𝜎𝑣2 , 𝜎e2 ). (7.2.24)
i 1
It follows from (7.2.24) that an estimator of MCPE is given by ̂H ̂H mcpe(Y i , Y t )
g2it (𝜎̂ 𝑣2 , 𝜎̂ e2 ),
(7.2.25)
which is unbiased up to o(m 1 ) terms (Militino, Ugarte, and Goicoa 2007). 7.2.4
*Bootstrap MSE Estimation
̂H We first study parametric bootstrap estimation of MSE(𝜇̂ iH ) and MSE(Y i ), assumiid
iid
ing that 𝑣i ∼ N(0, 𝜎𝑣2 ), independent of eij ∼ N(0, kij2 𝜎e2 ), j 1, … , ni , i 1, … , m. The normality assumption will then be relaxed, following Hall and Maiti (2006a). The parametric bootstrap procedure generates bootstrap sample data {(y∗ij , xij ) ;j 1, … , ni , i 1, … , m} as y∗ij xTij ̂ + 𝑣∗i + e∗ij , where 𝑣∗i is generated from N(0, 𝜎̂ 𝑣2 ) and e∗ij from N(0, kij2 𝜎̂ e2 ), j 1, … , ni , for specified estimators ̂ , 𝜎̂ 𝑣2 , and 𝜎̂ e2 , assuming that 𝜎̂ 𝑣2 > 0. Let 𝜇∗ XT ̂ + 𝑣∗ be the bootstrap version of the target parameter i
i
i
H of the EBLUP 𝜇i XTi + 𝑣i . Then using the bootstrap data, bootstrap version 𝜇̂ i∗ ∗ ∗ T H Xi ̂ + 𝑣̂ ∗i , where ̂ and 𝑣̂ ∗i are calculated the same estimator 𝜇̂ iH is obtained as 𝜇̂ i∗ ̂ way as and 𝑣̂ i , but using the above bootstrap sample data. The theoretical bootstrap H 𝜇i∗ )2 . This bootstrap estimaestimator of MSE(𝜇̂ iH ) is given by mseB (𝜇̂ iH ) E∗ (𝜇̂ i∗ tor is approximated by Monte Carlo simulation, in which the above steps are repeated a large number, B, of times. In this way, we obtain B replicates 𝜇i∗ (1), … , 𝜇i∗ (B) of the bootstrap true value 𝜇i∗ , where 𝜇i∗ (b) XTi ̂ + 𝑣∗i (b), together with B repli∗ H (1), … , 𝜇̂ H (B) of the EBLUP estimate 𝜇̂ H , where 𝜇̂ H (b) XTi ̂ (b) +̂𝑣∗i (b), cates 𝜇̂ i∗ i∗ i∗ i∗ ∗ ∗ and 𝑣∗i (b), ̂ (b), and 𝑣̂ ∗i (b) are the bth replicates of 𝑣∗i , ̂ , and 𝑣̂ ∗i , respectively. A H first-order unbiased bootstrap MSE estimator of 𝜇̂ i is then given by B
mseB1 (𝜇̂ iH )
B
1
H [𝜇̂ i∗ (b)
𝜇i∗ (b) 2] .
(7.2.26)
b 1
Applying the method of imitation to mseB (𝜇̂ iH )
H E∗ (𝜇̂ i∗
𝜇i∗ )2 , it follows that
mseB (𝜇̂ iH ) ≈g1i (𝜎̂ 𝑣2 ) +g2i (𝜎̂ 𝑣2 ) +g3i (𝜎̂ 𝑣2 ).
(7.2.27)
184
BASIC UNIT LEVEL MODEL
Now noting that mseB1 (𝜇̂ iH ) is a Monte Carlo approximation to mseB (𝜇̂ iH ), for large B and comparing (7.2.27) to the second-order unbiased MSE estimator (7.2.12), valid for REML and the method of fitting-of-constants, it follows that mseB1 (𝜇̂ iH ) is not second-order unbiased. A double-bootstrap method can be used to rectify this problem (Hall and Maiti 2006a). Alternatively, hybrid methods, similar to those in Section 6.2.4, depending on g1i (𝜎̂ 𝑣2 ) and g2i (𝜎̂ 𝑣2 ), may be used to obtain second-order unbiased bootstrap estimators of MSE(𝜇̂ iH ). For the case of nonnegligible sampling fraction fi , González-Manteiga et al. (2008a) proposed a first-order unbiased bootstrap MSE estimator and also studied hybrid bootstrap MSE estimators that are second-order unbiased. Let ∗ ∗ ∗ Y i XTi ̂ + 𝑣∗i + Ei be the bootstrap population mean, where Ei is the mean of Ni ∗ bootstrap values e∗ij generated from N(0, kij2 𝜎̂ e2 ) or equivalently Ei is generated from ̂H N N[0, (Ni 2 j i 1 kij2 )𝜎̂ e2 ). Let Y i∗ be the bootstrap version of the EBLUP estimator ̂H Y i obtained from the bootstrap sample data. The above process is repeated a large ∗ ∗ number, B, of times to obtain B replicates Y i (1), … , Y i (B) of the true bootstrap ∗ ̂H ̂H ̂H mean Y i together with B replicates Y i∗ (1), … , Y i∗ (B) of the EBLUP estimator Y i∗ . ̂H Then, a first-order unbiased bootstrap MSE estimator of Y i is given by ̂H mseB1 (Y i )
B
B
̂H [Y i∗ (b)
1
Y i∗ (b) 2] .
(7.2.28)
b 1
If 𝜎̂ 𝑣2 0, then we proceed as in Section 6.2.4, that is, we generate bootstrap values y∗ij from N[xTij ̃ (0), 𝜎̂ e2 ] and then use the bootstrap data {(y∗ij , xij ; j 1, … , ni , i ∗ ∗ RS XTi ̃ (0), where ̃ (0) 1, … , m} to calculate the regression-synthetic estimator 𝜇̂ i∗ is obtained the same as ̃ (0) but using the bootstrap sample data. Repeating the process B times, a first-order bootstrap MSE estimator is given by B
mseB1 (𝜇̂ iH )
B
1
RS [𝜇̂ i∗ (b)
𝜇i∗ ]2 ,
(7.2.29)
b 1
where 𝜇i∗ XTi ̃ (0). A similar method can be used to handle the case of nonnegligible sampling fraction fi . We now turn to the case of nonparametric bootstrap MSE estimation by relaxing the normality assumption on 𝑣i and eij and noting that the EBLUP 𝜇̂ iH does not require normality if a moment method is used to estimate 𝜎𝑣2 and 𝜎e2 . Hall and Maiti (2006a) used the fitting-of-constants method to estimate 𝜎𝑣2 and 𝜎e2 and noted that the MSE of 𝜇̂ iH depends only on 𝜎𝑣2 , 𝜎e2 and the fourth moments 𝛿𝑣 Em (𝑣4i ) and 𝛿e Em (̃e4ij ) if iid
normality assumption is relaxed, where ẽ ij kij 1 eij ∼ 0,( 𝜎e2 ). They also noted that it is analytically difficult to obtain second-order unbiased MSE estimators through the linearization method in this case.
185
MSE ESTIMATION
Hall and Maiti (2006a) used a matched-moment bootstrap method to get around the difficulty with the linearization method. First, moment estimators (𝜎̂ 𝑣2 , 𝜎̂ e2 ) of (𝜎𝑣2 , 𝜎e2 ) and (𝛿̂𝑣 , 𝛿̂e ) of (𝛿𝑣 , 𝛿e ) are obtained. Then, bootstrap values 𝑣∗i and ẽ ∗ij are generated from distributions D(𝜎̂ 𝑣2 , 𝛿̂𝑣 ) and D(𝜎̂ e2 , 𝛿̂e ), respectively, where D(𝜉2 , 𝜉4 ) denotes the distribution of a random variable Z for which E(Z) 0, E(Z 2 ) 𝜉2 , and E(Z 4 ) 𝜉4 for specified 𝜉2 , 𝜉4 > 0 with 𝜉22 ≤ 𝜉4 . The proposed method implicitly assumes that 𝜎̂ 𝑣2 > 0. A simple way of generating values from D(𝜉2 , 𝜉4 ) is to let 1∕2 ̃ where Z̃ has the three-point distribution Z 𝜉2 Z, P(Z̃
0)
1
P(Z̃
p,
with p 𝜉22 ∕𝜉4 . Bootstrap data are obtained by letting y∗ij
∓p
1∕2
)
p∕2
(7.2.30)
xTij ̂ + 𝑣∗i + kij ẽ ∗ij and then we proceed
H and 𝜇 ∗ and the bootstrap MSE estimaas before to get the bootstrap versions 𝜇̂ i∗ i
tor, mseB1 (𝜇̂ iH ), given by (7.2.26). A bias-corrected MSE estimator of 𝜇̂ iH may be obtained by using a double-bootstrap method. This is accomplished by drawing 𝑣∗∗ ∗i and ẽ ∗∗ from D(𝜎̂ 𝑣2∗ , 𝛿̂𝑣∗ ) and D(𝜎̂ e2∗ , 𝛿̂e∗ ), respectively, and then letting y∗∗ xTij ̂ + ij ij ∗ 𝑣∗∗ + kij ẽ ∗∗ , where ( ̂ , 𝜎̂ 𝑣2∗ , 𝛿̂𝑣∗ , 𝜎̂ e2∗ , 𝛿̂e∗ ) are the parameter estimates obtained from i ij the first-phase bootstrap data {(y∗ij , xij ) ;j 1, … , ni , i 1, … , m}. The second-phase bootstrap sample data {(y∗∗ , xij ) ;j 1, … , ni , i 1, … , m} are used to calculate the ij ∗ H bootstrap versions 𝜇̂ i∗∗ and 𝜇i∗∗ XTi ̂ + 𝑣∗∗ . In practice, we select C bootstrap i replicates from each first-phase bootstrap replicate b and calculate B
mseBC1 (𝜇̂ iH )
1
B C
C
1
𝜇i∗∗ (bc) 2] .
(7.2.31)
mseBC1 (𝜇̂ iH ),
(7.2.32)
H [𝜇̂ i∗∗ (bc) b 1c 1
A bias-corrected MSE estimator is then given by mseBC2 (𝜇̂ iH )
2 mseB1 (𝜇̂ iH )
which is second-order unbiased. The bias-corrected MSE estimator (7.2.32) may take negative values and Hall and Maiti (2006a) proposed modifications ensuring positive MSE estimators that are second-order unbiased. Another possible modification proposed by Hall and Maiti (2006b) and Pfeffermann (2013) is given by
mseBC2 (mod)
⎧mseBC2 if mseB1 ≥ mseBC1 ; ⎪ [ ] ⎨ mseB1 mseBC1 if mseB1 < mseBC1 . ⎪mseB1 exp mseBC2 ⎩
(7.2.33)
Implementation of double-bootstrap MSE estimation involves several practical issues: (i) Sensitivity of MSE estimator to the choice of B and C. (ii) For
186
BASIC UNIT LEVEL MODEL
some of the first-phase bootstrap replicates, b, the estimate of 𝜎𝑣2 , denoted 𝜎̂ 𝑣2∗ (b), may be zero even when 𝜎̂ 𝑣2 > 0. How to generate second-phase bootstrap samples in those cases? Earlier, we suggested a modification for single-phase bootstrap when 𝜎̂ 𝑣2 0. It may be possible to use a similar modification if 𝜎̂ 𝑣2∗ (b) 0. Regarding the choice of B and C, for a given number of bootstrap replicates R BC + B B(C + 1), Erciulescu and Fuller (2014) demonstrated that the choice C 1 and B R∕2 leads to smaller bootstrap error than other choices of B and C. This result implies that one should select a single second-phase bootstrap sample for each first-phase bootstrap sample. 7.3
*APPLICATIONS
We now provide some details of the application in Example 4.3.1, Chapter 4, and describe another application on estimation of area under olive trees.
Example 7.3.1. County Crop Areas. Battese, Harter, and Fuller (1988) applied the nested error linear regression model with equal error variances (i.e., kij 1) to estimate area under corn and area under soybeans for each of m 12 counties in north-central Iowa, using farm interview data in conjunction with LANDSAT satellite data. Each county was divided into area segments, and the areas under corn and soybeans were ascertained for a sample of segments by interviewing farm operators. The number of sample segments, ni , in a county ranged from 1 to 5 ( m i 1 ni n 37), while the total number of segments in a county ranged from 394 to 687. Because of negligible sample fractions ni ∕Ni , the small area means Y i are taken as 𝜇i XTi + 𝑣i . Unit level auxiliary data {x1ij , x2ij } in the form of the number of pixels classified as corn, x1ij , and the number of pixels classified as soybeans, x2ij , were also obtained for all the area segments, including the sample segments, in each county, using the LANDSAT satellite readings. In this application, yij denotes the number of hectares of corn (soybeans) in the jth sample area segment from the ith county. Table 1 in Battese, Harter, and Fuller (1988) reports the values of ni , Ni , {yij , x1ij , x2ij } as well as the area means X 1i and X 2i . The sample data for the second sample segment in Hardin county was deleted from the estimation of means 𝜇i because the corn area for that segment looked erroneous in a preliminary analysis. It is interesting to note t h at mo de l diag no s t ic s , b as e d o n t h e l o c al influ e nc e ap p r o ac h ide nt ifi e d t h is s amp l e dat a p o int as p o s s ib l y e r r o ne o u s ( Har t l e 2000) . T h no r mal l y dis (1,1.x1ij Bat t e s e e t al . ( 1988) u s e d mo dexijl ( 7. 2), xw 2ij ) itand 2 2 t r ib u t e d e𝑣ir and r o erijsw it h c o mmo n v ar 𝜎𝑣 ianc and 𝜎ee .s Es t imat e s𝜎𝑣2oand f 𝜎e2 2 2 2 2 2 2 w e r e o b t aine 𝜎̂ e d𝜎̂as 150, 𝜎̂ 𝑣 140 fo r c o r n𝜎̂and 𝜎̂ em 195, 𝜎̂ 𝑣 272 em e 2 nt, tfrhoe me s t imat e b as e d o n l ig he t l y diffe r 𝜎ê 𝑣m fo r s o y b e ans . Th e𝜎̂ 𝑣2 eiss st imat
187
*APPLICATIONS
t h e fi t t ing - o f- c o ns t ant s me t h o d. Th e r e g r e s s io xn1ijcand o e ffi c ie nt x2ij w e r e b o t h s ig nifi c ant fo r t h e c o r n mo de l , b u t o nl y t h e c o x2ij w as s ig nifi c ant fo r t h e s o y b e ans mo de l . Th e b𝜎𝑣2e, tww ase e n- c o s ig nifi c ant at t h e 10% l e v e l fo r c o r n and at t h e 1% l e v e l fo r s o y b Bat t e s e e t al . ( 1988) r e p o r t e d s o me me t h o ds fo r v al idat ing t h 2adrand 2 tint at ic x2ij e ro mst h e w it xhij (1, x1ij , x2ij )T . Fir s t , t h e y int r o du c e d qux1ij mo de l and t e s t e d t h e nu l l h y p o t h e s is t h at t h e r e g r e s s io n c o t e r ms ar e ze r o . Th e nu l l h y p o t h e s is w as no t r e je c t e d at t h e 𝜏̂i yi )al s t h e no r mal it y o f t h e e𝑣i rand r o erij , t tehr emst r ans fo r me d r(yeij s idu T 1∕2 ̂ (xij 𝜏̂i xi ) w e r e c o mp u t 𝜏êi d, 1w (1 h e r𝛾̂i e) . Unde r t h e nu l l h y p o t h e s t h e t r ans fo r me d r e s idu al s ar e apN(0, p r 𝜎oe2 ).ximat e l ey , iid He nc s t andar d me t h o ds fo r c h e c k ing no r mal it y c an b e ap p l ie d t o t h e t r ans fo r me d r e s id w e l l - k no w n Sh ap W sir toat–Wil is t ick ap p l ie d t o t h e t r ans fo r me d r e s idu al s o f 0. 985 and 0. 957 fo r c o r n and s o y b e ans , r e s pp-e vc alt iv u e sl y( ,p yr ie o bl ding ab il it ie s o f g e t t ing v al u e s l e s s t h an t h o s e o b s e r v e d u nde r no r ma r e s p e c t iv e l y ( Sh ap ir o and Wil k 1965) . It may b e noWt ec do trh- at s mal r e s p o nd t o de p ar t u r e s fr o m no rn mal37it is y . no Al tt hl aro gu egp-, hvt hal eu e s s u g g e s t no e v ide nc e ag ains t t h e h y p o t h e s is o f no r mal it y . N ( q–q) p l o t s o f t r ans fo r me d r e s idu al s al s o indic at e d no e v ide nc e fo r b o t h c o r n and s o y b e ans . Jiang , Lah ir i, and Wu ( 2001) p r o p o s t e s t o f no r mal it y ( o r s p e c ifi e d dis t r ib u t 𝑣ioi and ns )eijo, bf tash ee deor nr o r t e a c h i- s qu ar e d s t at is t ic w it h e s t imat e d c e l l fr e qu e nc ie s . To c h o f t h e s mal l ar e a𝑣ei and ffe ct ot s de t e c t o𝑣iu’s t, l aienor r mal p r o b ab il it y p l o t EBLUP e s t imat 𝑣̂ei , s div ide d b y t h e ir s t andar dize d e r r o r s , may b e e xam Se c t io n 5. 4. 2) . s mal 𝜇i ,ans fo , r c o r n Tab l e 7. 1 r e p o r t s t h e EBLUP 𝜇̂ iHe, so tfimat e sl ,ar e a me 2 2 and s o y b e ans 𝜎û ems ing and 𝜎̂ 𝑣 . Es t imat e d s t andar d e r r o r s o f t h e EBLUP e mat e s and t h e s u r v e y r e g r e s𝜇̂ iSR s io yni + e sX(ti imat xi )Te ̂s, , de no t e d b y SR H s(𝜇̂ i ) and s(𝜇̂ i ), ar e al s o g iv e n. It is c l e ar fr o m Tab l e 7. 1 t h at t h e e s t imat e d s t andar d e r r o r o f t h e EBLUP e s t imat e t o t h at o f t h e s e s t imat e de c r e as e s fr o m 0. 97 t o 0. 77 as t h e nu mb e r onfi , s amp l e a de c r e as e s fr o m 5 t o 1. Th e r e du c t io n in t h e e s t imat e d s t andar d e r w h nei n≤ 3. Th e EBLUP e s t imat 𝜇̂ iHe, sw, e r e adju s t e d ( c al ib r at e d) t o ag r e e w it h r e g r e s s io n e s t imat e fo r t h e e nt ir e ar e a c o v e r ing t h e 12 c o u n 12 SR W𝓁r eis t h e p r o p o r t io n o f p o p u l at io n a is g iv e n𝜇̂ bSR y 𝓁 1 W𝓁 𝜇̂ 𝓁 , w h e W𝓁n𝜇me me nt s t h at b e 𝓁t l ohngar teoa, and t h e o v e r al l p o𝜇p u l12 an e 𝓁 . Th 𝓁at1io fo rl e r ando m s amp l ing e s t imat𝜇̂oSRr is ap p r o ximat e l y de s ig n- 𝜇uunbndeiasr es dimp w it h in ar e as and it s de s ig n s t andar d e r r o r is r e l at iv e l y s mal l . Th e e s t imat e s ar e g iv e n b y 𝜇̂ iH (a)
𝜇̂ iH + p̂ (1) (𝜇̂ SR i
𝜇̂ H ),
( 7. 3. 1)
188
BASIC UNIT LEVEL MODEL
TABLE 7.1 EBLUP Estimates of County Means and Estimated Standard Errors of EBLUP and Survey Regression Estimates Corn County
ni
𝜇̂ iH
Cerro Gordo Hamilton Worth Humboldt Franklin Pocahontas Winnebago Wright Webster Hancock Kossuth Hardin
1 1 1 2 3 3 3 3 4 5 5 5
122.2 126.3 106.2 108.0 145.0 112.6 112.4 122.1 115.8 124.3 106.3 143.6
Soybeans
s(𝜇̂ iH )
s(𝜇̂ iSR )
𝜇̂ iH
s(𝜇̂ iH )
s(𝜇̂ iSR )
9.6 9.5 9.3 8.1 6.5 6.6 6.6 6.7 5.8 5.3 5.2 5.7
13.7 12.9 12.4 9.7 7.1 7.2 7.2 7.3 6.1 5.7 5.5 6.1
77.8 94.8 86.9 79.7 65.2 113.8 98.5 112.8 109.6 101.0 119.9 74.9
12.0 11.8 11.5 9.7 7.6 7.7 7.7 7.8 6.7 6.2 6.1 6.6
15.6 14.8 14.2 11.1 8.1 8.2 8.3 8.4 7.0 6.5 6.3 6.9
Source: Adapted from Table 2 in Battese et al. (1988).
w here𝜇̂ H
12 H 𝓁 1 W𝓁 𝜇̂ 𝓁
and ]
[ 12 p̂ (1) i
1
W𝓁2 mse(𝜇̂ 𝓁H )
Wi mse(𝜇̂ iH ).
𝓁 1 H 𝜇̂ SR . A draw back of𝜇̂ iH (a) is that it It follow s from (7.3.1) that 12 𝓁 1 Wi 𝜇̂ i (a) H depends on the MSE estimates mse(𝜇̂ 𝓁 ). Pfeffermann and Barnard (1991) proposed changed to an “optimal” adjustment of the form (7.3.1) w ithp̂ (1) i
p̂ (2) i
[mse(𝜇̂ H )] 1 mcpe(𝜇̂ iH , 𝜇̂ H ),
(7.3.2)
w here mcpe (𝜇̂ iH , 𝜇̂ H ) is an estimate of the MCPE of 𝜇̂ iH and 𝜇̂ H , MCPE(𝜇̂ iH , 𝜇̂ H ) H 𝜇i )(𝜇̂ H 𝜇). The term p̂ (2) inv olv es estimates of MCPE (𝜇̂ iH , 𝜇̂ 𝓁H ), 𝓁 ≠ i. FolE(𝜇̂ i i low ing Isaki, Tsay, and Fuller (2000) and Wang (2000), a simple adjustment comin (7.3.1) to pared to (7.3.1) and (7.3.2) is obtained by changing p̂ (1) i ]
[ 12 p̂ (3) i
W𝓁2 mse(𝜇̂ 𝓁SR )
1
Wi mse(𝜇̂ iSR ),
(7.3.3)
𝓁 1
An alternativ e is to use the simple ratio adjustment, w hich is giv en by (7.3.1) w ith p̂ (1) changed to i 𝜇̂ iH ∕𝜇̂ H . (7.3.4) p̂ (4) i
189
*APPLICATIONS
Mantel, Singh, and Bureau (1993) conducted an empirical study on the performance of the adjustment estimators p̂ (1) , p̂(2) , and p̂ (4) , using a synthetic population based i i i on Statistics Canada’s Surv ey of Employment, Payroll and Hours. Their results indicated that the simple ratio adjustment, p̂ (4) , often performs better than the more i complex adjustments, possibly due to instability of mse and mcpe- terms inv olv ed in p̂ (1) and p̂ (2) . i i Example 7.3.2. Simulation Study. Rao and Choudhry (1995) studied the relativ e performance of some direct and indirect estimators, using real and synthetic populations. For the real population, a sample of 1, 678 unincorporated tax fi lers (units) from the prov ince of Nov a Scotia, Canada, div ided into 18 census div isions, w as treated as the ov erall population. In each census div ision, units w ere further classifi ed into four mutually exclusiv e industry groups. The objectiv e w as to estimate the total w ages and salaries (Yi ) for each nonempty census div ision by industry groupi (small areas of interest). Here, w e focus on the industry group “construction” w ith 496 units and av erage census div ision size equal to 27.5. Gross business income, av ailable for all the units, w as used as an auxiliary v ariable x). (The ov erall correlation coeffi cient betw eeny and x for the construction group w as 0.64. To make comparisons betw een estimators under customary repeated sampling, R 500 samples, each of sizen 149, from the ov erall population ofN 1, 678 units w ere selected by SRS. From each simulated sample, the follow ing estimators w ere calculated: (i) Post- stratifi ed estimator: PSTNi yi if ni ≥ 1 and PST 0 if ni 0, w here Ni and ni are the population and sample sizes in the ith area (ni is a random v ariable). (ii) Ratio- synthetic estimator: SYN (y∕x)Xi , w here y and x are the ov erall sample means in the industry group andXi is the x- total for theith area. (iii) Sample size- dependent estimator: SSD 𝜓i (D)(PST) + 1[ 𝜓i (D)](SYN) w ith 𝜓i (D) 1 if ni ∕n ≥ Ni ∕N and SSD (ni ∕n)(Ni ∕N) 1 otherw ise. (iv ) EBLUP estimâH ̂H tor, Ŷ iH Ni Y i , w here Y i is giv en by (7.1.21) and based on the nested error regres1∕2 𝛽xij and kij xij , and using the fi tting- of- constants sion model (7.1.2) w ithxTij 2 and 𝜎 2 . To examine the aptness of this model, the model w as fi tted ̂ em estimators 𝜎̂ 𝑣m to the 496 population pairs (yij , xij ) from the construction group and the standardized ̂ ij 𝑣̂ i )∕(𝜎̂ e x1∕2 ) w ere examined. A plot of these residuEBLUP residuals (yij 𝛽x ij als against the xij ’s indicated a reasonable but not good fi t in the sense that the plot rev ealed an upw ard shift w ith sev eral v alues larger than 1.0 but none below 1.0. Sev eral v ariations of the model, including a model w ith an intercept term, did not lead to better fi ts. For each estimator, av erage absolute relativ e bias ( ARB), av erage relativ e effi ARE) ( w ere calculated as follow s: ciency (EFF), and av erage absolute relativ e error
ARB EFF
1 m
| | 1 | | 500 1 ||
m i
500
(estr ∕Yi r 1
| | 1)|| , | |
[MSE(PST)∕MSE(est)]1∕2 ,
190
BASIC UNIT LEVEL MODEL
ARE
1 m
m i 1
1 500
500
|estr ∕Yi
1|,
r 1
w here the av erage is taken ovmer 18 census div isions in the industry group. rth simulated sample Here estr denotes the v alue of the estimator, est, for the (r 1, 2, … , 500), Yi is the true area total, and MSE(est)
1 m
m i 1
1 500
500
(estr
Yi )2 ;
r 1
MSE(PST) is obtained by changing estr to PSTr , the v alue of the post- stratifi ed estimator for the rth simulated sample. Note that ARB measures the bias of an estimator, w hereas bothEFF and ARE measure the accuracy of an estimator. Table 7.2 reports the percentage v alues ofARB, EFF, andARE for the construction group. It is clear from Table 7.2 that both SYN and EBLUP perform signifi cantly better than PST and SSD in terms of EFF and ARE, leading to largerEFF v alues and smaller ARE v alues. For example,EFF for the EBLUP estimator is 261.1% compared to 137.6% for SSD. In terms of ARB, SYN has the largest v alue (15.7%) as expected, follow ed by the EBLUP estimator w ARB ith 11.3%; PST and SSD hav e smaller ARB: 5.4% and 2.9%, respectiv ely. Ov erall, EBLUP is somew hat better than ARE v alue of 13.5% v ersus 16.5%. SYN: EFF v alue of 261.1% v ersus 232.8% and It is gratifying that the EBLUP estimator under the assumed model performed w ell, despite the problems found in the residual analysis. The estimators w ere also compared under a synthetic population generated from the assumed model w ith the real population x- v alues. The parameter v alues 𝛽, (𝜎𝑣2 , 𝜎e2 ) used for generating the synthetic population w ere the estimates obtained by fi tting the model to the real population pairs (yij , xij ) ∶ 𝛽 0.21, 𝜎𝑣2 1.58, and𝜎e2 1.34. TABLE 7.2 Unconditional Comparisons of Estimators: Real and Synthetic Population Estimator Quality Measure
PST
SYN
SSD
ARB% EFF% ARE%
5.4 100.0 32.2
Real Population 15.7 2.9 232.8 137.6 16.5 24.0
11.3 261.1 13.5
ARB% EFF% ARE%
5.6 100.0 35.0
Synthetic Population 12.5 2.4 313.3 135.8 13.2 25.9
8.4 319.1 11.8
Source: Adapted from Tables 27.1 and 27.3 in Rao and Choudhry (1995).
EBLUP
191
*APPLICATIONS
TABLE 7.3 Effect of Between-Area Homogeneity on the Performance of SSD and EBLUP Between-Area Homogeneity: 𝜃 Estimator
0.1
0.5
SSD EBLUP
136.0 324.3
136.0 324.6
SSD EBLUP
25.6 11.5
25.7 11.6
1.0
2.0
5.0
10.0
EFF% 135.8 135.6 319.1 305.0
134.7 270.8
133.1 239.9
ARE% 25.9 26.3 11.8 12.5
27.2 14.5
28.2 16.7
Source: Adapted from Tables 27.4 and 27.5 in Rao and Choudhry (1995).
A plot of the standardized EBLUP residuals, obtained by fi tting the model to the synthetic population, showed an excellent fi t as expected. Table 7.2 also reports the percentage v alues ofARB, EFF, andARE for the synthetic population. Comparing these v alues to the corresponding v alues for the real population, it is clear that EFF increases for EBLUP and SYN, while it remains essentially unchanged for SSD. Similarly, ARE decreases for EBLUP and SYN, while it remains essentially unchanged for SSD. The v alue ofARB also decreases for EBLUP and SYN: 11.3% v ersus 8.4% for EBLUP and 15.7% v ersus 12.5% for SYN. Conditional comparisons of the estimators were also made by conditioning on the realized sample sizes in the small areas. This is a more realistic approach because the domain sample sizes, ni , are random with known distribution. To make conditional comparisons under repeated sampling, a simple random sample of sizen 149 was fi rst selected to determine the sample sizes,ni , in the small areas. Regarding theni ’s as fi xed, 500 stratifi ed random samples were then selected, treating the small areas as strata. The conditional v alues ofARB, EFF, andARE were computed from the simulated stratifi ed samples. The conditional performances were similar to the unconditional performances. Results were different, howev er, when two separate v alues for each quality measure were computed: one by av eraging ov er areas with ni < 6 only, and another av eraging ov er areas with ni ≥ 6. In particular, EFF(ARE) for EBLUP is much larger (smaller) than the v alue for SSD whenni < 6. As noted in Chapter 3, Section 3.3.2, the SSD estimator does not take adv antage of the between-area homogeneity, unlike the EBLUP estimator. To demonstrate this point, a series of synthetic populations was generated, using the prev ious parameter v alues,𝛽 0.21, 𝜎𝑣2 1.58 and 𝜎e2 1.34, and the model 1∕2 yij 𝛽xij + 𝑣i 𝜃 1∕2 + eij xij , by v arying𝜃 from 0.1 to 10.0 (𝜃 1 corresponds to the prev ious synthetic population). Note that for a giv en ratio 𝜎𝑣2 ∕𝜎e2 , the between-area homogeneity increases as 𝜃 decreases. Table 7.3 reports the unconditional v alues of EFF and ARE for the estimators SSD and EBLUP, as𝜃 v aries from 0.1 to 10.0. It is clear from Table 7.3 that EFF and ARE for SSD remain essentially unchanged as 𝜃 increases from 0.1 to 10.0. On the other hand, EFF for EBLUP is largest when
192
BASIC UNIT LEVEL MODEL
𝜃 0.1 and decreases as 𝜃 increases by 10.0. Similarly, ARE for EBLUP is smallest when 𝜃 0.1 and increases as 𝜃 increases by 10.0. Example 7.3.3. Area Occupied by Olive Trees. Militino et al. (2006) applied the basic unit lev el model (7.1.1) to dev elop EBLUP estimates of area occupied by oliv e trees in m 8 nonirrigated areas located in a central region of Nav arra, Spain. The region was div ided into segments of 4 ha each. The total number of segments in the areas, Ni , v aried from 32 to 731. Simple random samples of segments were selected from each area with sample sizes ni v arying from 1 to 12. In the majority of cases, the study domain plots are smaller than the 4 ha. segments and of different size. Surface areas, sij , of segmentsj in each area i where oliv e trees are likely to be found were Ni ascertained from cropland maps. We denote by Si s the total surface that is j 1 ij likely to contain oliv e trees in areai, which is known for all areas. Inside each sample segment j in area i, area occupied by oliv e trees was observ ed and denoted by ysij . Furthermore, satellite imagery was used to ascertain the classifi ed oliv e trees, xij , in each population segment. Because of the unequal surface areas, sij , the areas cov ered by oliv e trees ysij and xsij are transformed in terms of portions of the surfaces sij that are cov ered by oliv e trees. The transformed data are giv en by yij
Si ysij , Ni sij
Si xsij , Ni sij
xij
j
1, … , ni ,
i
1, … , 8.
Note that yij ysij and xij xsij when sij si for all j. For each area i, the parameter of interest is the mean of the portions of the segment’s surface cov ered by oliv e trees multiplied by the total surface Si , which is equal to the total of the transformed v ariables, N Ni Si i ysij Yi yij , i 1, … , m. Ni j 1 sij j 1 Model (7.1.1) with kij2 ni was chosen for the transformed data after comparing it to alternativ e unit lev el models, using a bootstrap test of the null hypothesis H0 ∶ 𝜎𝑣2 0 and the conditional Akaike Information Criterion (cAIC) for mixed models (Section 5.4.1). Model (7.1.1) with xij (1, xij )T was v alidated by checking the normality assumption using both the transformed residuals and the EBLUP residuals in the Shapiro–Wilk statistic. Box plots of EBLUP residuals for the selected model did not show any specifi c pattern, suggesting the adequacy of the model. Note that the unequal error v ariances,ni 𝜎e2 , must be considered when standardizing residuals. Thus, in this case transformed residuals and standardized EBLUP residuals introduced in Example 7.3.1 are giv en, respectiv ely, by û ij
ni
1∕2
[(yij
𝜏y ̂ i)
(xij
𝜏x ̂ i )T ̂ ]
193
*OUTLIER ROBUST EBLUP ESTIMATION
and ê ij
ni
1∕2
𝜎̂ e 1 (yij
xTij ̂
𝑣̂ i ),
where 𝜏̂ 1 (1 𝛾̂ )1∕2 with 𝛾̂ 𝜎̂ 𝑣2 ∕(𝜎̂ 𝑣2 + 𝜎̂ e2 ). Method of fi tting-of-constants and REML (Section 7.1.2) gav e similar estimates of𝜎𝑣2 , 𝜎e2 , ̂ , and𝑣̂ i . EBLUP estimates of area totals Ŷ iH Ni (XTi ̂ + 𝑣̂ i ), i 1, … , 8, were calculated as in Section 7.1. Second-order unbiased MSE estimates were calculated using (7.2.12). The coeffi cient of v ariation (CV) of the EBLUP v aried from 0.12 to 0.16 for six of the areas, while for areas S39 and S34 it was as large as 0.37 and 0.39, respectiv ely.
7.4 7.4.1
*OUTLIER ROBUST EBLUP ESTIMATION Estimation of Area Means
̂H The EBLUP estimator 𝜇̂ iH of 𝜇i , giv en by (7.1.16), and Y i of Y i , giv en by (7.1.24), are also empirical best or Bayes (EB) estimators under normality of the random effects 𝑣i and the errors eij (see Section 9.3.1). Although the EB estimators are “optimal” under normality, they are sensitiv e to outliers in the responses,yij , and can lead to considerable inflation of the MSE. We now turn to robust methods that downweight any influential observ ation in the data. Outliers frequently occur in business surv ey data with responses exhibiting few unusually large v alues. We consider the general linear mixed model with block-diagonal cov ariance structure, giv en by (5.3.1); the basic nested error model (7.1.1) is a special case. Assuming normality and maximizing the joint density of y (yT1 , … , yTm )T and v (vT1 , … , vTm )T with respect to and v lead to “mixed model” equations giv en by m
XTi Ri 1 (yi
Xi
Zi vi )
(7.4.1)
𝟎,
i 1
ZTi Ri 1 (yi
Xi
Zi vi )
Gi 1 vi
𝟎,
i
1, … , m.
(7.4.2)
The solution to (7.4.1) and (7.4.2) is identical to the BLUE ̃ ( ) and the BLUPs ṽ i ( ), i 1, … , m, giv en by (5.3.4) and (5.3.3), respectiv ely, but the representations (7.4.1) and (7.4.2) are useful in dev eloping robust estimators. Note that for the e i 𝜎e2 Ini , Zi 𝜎𝑣2 , vi 𝑣i , and nested error model with kij 1, we hav R ni , Gi 2 2 T (𝜎𝑣 , 𝜎e ) , where Ini is the ni × ni identity matrix and ni is an ni × 1 v ector of ones. We fi rst study symmetric outliers in the distribution of𝑣i or eij or both, for the nested error model (7.1.1) with kij 1. For example, 𝑣i may be generated from a t-distribution with a small degrees of freedom (say 3) or from a mixture distribution, 2 ), which means that a large proportionp 0.9 of the 𝑣i ’s say 0.9N(0, 𝜎𝑣2 ) + 0.1N(0, 𝜎𝑣1 are generated from the true distribution N(0, 𝜎𝑣2 ) and the remaining small proportion
194
BASIC UNIT LEVEL MODEL
2 ), with𝜎 2 much larger than 1 p 0.1 from the contaminated distribution N(0, 𝜎𝑣1 𝑣1 2 𝑣i and eij . 𝜎𝑣 . In the case of symmetric outliers, we hav e zero means for We fi rst obtain robust estimators,̂ R and ̂ R of and , by solv ing a robust v ersion of the maximum-likelihood equations (under normality) of and . This is done by applying Huber’s (1972) 𝜓-function 𝜓b (u) umin(1, b∕|u|), whereb > 0 is a tuning constant, to the standardized residuals. After that, the mixed model equation (7.4.2) for vi is also robustifi ed, using again Huber’s𝜓-function. The robustifi ed equation for vi is giv en by
ZTi Ri
1∕2
Gi
𝚿b [Ri
1∕2
1∕2
𝚿b (Gi
(yi
1∕2
vi )
Xi 0,
Zi vi )] 1, … , m,
i
(7.4.3)
where 𝚿b (ui ) (𝜓b (ui1 ), … , 𝜓b (uini ))T and the tuning constant b is commonly taken as b 1.345 in the robustness literature. The choice b ∞ leads to (7.4.2). Substituting the resulting robust estimates ̂ R and ̂ R for and in (7.4.3), we obtain the robust estimators 𝑣̂ iR of 𝑣i (Sinha and Rao 2009). Robust estimators ̂ R and ̂ R are obtained by solv ing the robustifi ed ML equations for and , giv en by m
1∕2
XTi Vi 1 Ui 𝚿b (ri )
0
(7.4.4)
i 1
and m
{ 𝜕Vi 1 1∕2 V U 𝚿b (ri ) 𝜕𝛿𝓁 i i )} ( 1 𝜕Vi 0, 𝓁 1, … , q, tr Ki Vi 𝜕𝛿𝓁 1∕2
𝚿Tb (ri )Ui Vi 1
i 1
1∕2
(7.4.5)
where ri Ui (yi Xi ) (ri1 , … , rini )T , in whichUi is a diagonal matrix with diagonal elements equal to the diagonal elements of Vi . Furthermore, Ki cIni with c E[𝜓b2 (u)], whereu ∼ N(0, 1). Equations (7.4.4) and (7.4.5) reduce to ML equations for the choice b ∞, in 1∕2 which case Ui 𝚿b (ri ) yi Xi and Ki Ini . Note that in equations (7.4.4) and (7.4.5), the𝜓-function is applied to the component-wise standardized residuals rij (yij xTij )∕(𝜎𝑣2 + 𝜎e2 )1∕2 , j ∈ si , unlike the robust ML equations proposed by Huggins (1993) and Richardson and Welsh (1995), in which the𝜓-function is applied to the 1∕2 standardized v ectorsVi (yi Xi ). Sinha and Rao (2009) proposed the Newton–Raphson (NR) method for solv ing (7.4.4) and (7.4.5), respectiv ely, but the NR algorithm is subject to stability and conv ergence problems unless the starting v alues ofand are “close” to the true v alues. ′ Furthermore, it depends on the deriv ativ 𝜓 esb (rij ) 𝜕𝜓b (rij )∕𝜕rij , which takes the v alue 1 if|rij | ≤ b and 0 otherwise. The zero deriv ativ es can also cause conv ergence problems.
195
*OUTLIER ROBUST EBLUP ESTIMATION
Following Anderson (1973) for the solution of ML equations, we apply a fi xed-point algorithm to solv e the robust ML equations (7.4.5) (giv en ), for the special case of the nested error model (7.1.1) with kij 1. This method av oids the ′ calculation of deriv ativ es 𝜓b (rij ). Schoch (2012) used iterativ ely re-weighted least squares (IRWLS) to solv e (7.4.4) for giv en. IRWLS is widely used in the robust estimation literature and it is more stable than the NR method. We fi rst spell out the fi xed-point algorithm for (𝜎𝑣2 , 𝜎e2 )T , giv en . Noting that Vi 𝜎𝑣2 ni Tni + 𝜎e2 Ini and writing tr(Ki Vi 1 𝜕Vi ∕𝜕𝛿𝓁 ) as tr[Ki Vi 1 (𝜕Vi ∕𝜕𝛿𝓁 )Vi 1 Vi ] T 2 Ini , we can express (7.4.5) as a system of with 𝜕Vi ∕𝜕𝜎𝑣2 ni ni and 𝜕Vi ∕𝜕𝜎e fi xed-point equationsA( ) a( ). Here, a( ) (a1 ( ), a2 ( ))T with m 1∕2
1∕2
𝚿T (ri )Ui Vi 1 Ui 𝚿(ri ),
a1 ( ) i 1 m
1∕2
T 1∕2 ni ni Ui 𝚿(ri )
𝚿T (ri )Ui Vi 1
a2 ( ) i 1
and A( )
(ak,𝓁 ( )) is a 2 × 2 matrix with elements m
m 1
1
tr(Ki Vi Ini Vi Ini ),
a11 ( )
tr(Ki Vi 1 Ini Vi 1
a12 ( )
i 1 m
m
tr(Vi 1
a21 ( )
T ni ni ),
i 1 T 1 ni ni Vi Ini ),
tr(Vi 1
a22 ( )
i 1
T 1 T ni ni Vi ni ni )
i 1
(Chatrchi 2012). The fi xed-point iterations are then giv en by A 1(
(t+1)
(t)
)a(
(t)
t
),
0, 1, …
(7.4.6)
2 , 𝜎 2 )T , we can take the fi tting-of-constants estimators As starting v alues (0) (𝜎𝑣0 e0 2 2 of 𝜎𝑣 and 𝜎e (Section 7.1.2). We next describe the IRWLS algorithm for the solution of equation (7.4.4) for . By writing 𝜓b (rij ) rij [𝜓b (rij )∕rij ] in (7.4.4), lettingWi be the diagonal matrix ̃ i Wi V 1 U1∕2 Xi and ỹ (t) with diagonal elements [𝜓b (rij )∕rij ]1∕2 and denoting X i i i 1∕2 Wi Vi 1 Ui yi , we can solv e (7.4.4) iterativ ely as
[
]
m
(t+1) R
1 m
̃ (t) )T X ̃ (t) (X i i i 1
̃ (t) )T ỹ (t) , (X i i
(7.4.7)
i 1
for giv en . Now, solv ing (7.4.6) and (7.4.7) jointly, we obtain the robust estimators ̂ R and ̂ R . Starting with (0) and (0) , we obtain (1) from (7.4.7) and then substituting (1) R
R
R
R
and (0) in the right-hand side of (7.4.6), we obtain R 2 ,𝜎 2 )T . until conv ergence, we obtain̂ R and ̂ R (𝜎̂ 𝑣R ̂ eR
(1)
. Continuing the iterations
196
BASIC UNIT LEVEL MODEL
Based on ̂ R and ̂ R , Sinha and Rao (2009) solv ed (7.4.3) iterativ ely by the NR method to get robust EBLUP (REBLUP) estimators 𝑣̂ iR of 𝑣i for the nested error model (7.1.1), using the EBLUP estimates 𝑣̂ i as starting v alues. NR iterations are quite stable in getting 𝑣̂ iR , unlike in the case of robust estimation of . Alternativ ely, we can directly robustify the BLUP of vi , giv en by (5.3.3), to get a different REBLUP. For the nested error model with kij 1, this REBLUP is giv en by v̂ iR
1∕2
Gi ZTi Vi 1 Ui 𝚿b [Ui
1∕2
(yi
Xi ̂ R )],
(7.4.8)
2 and 𝜎 2 are substituted for 𝜎 2 and 𝜎 2 in G where 𝜎̂ 𝑣R ̂ eR 𝜎𝑣2 and Vi 1 𝜎e 2 [Ini i 𝑣 e 𝛾i ni Tni ] and Ui is the diagonal matrix with diagonal elements all equal to 𝜎𝑣2 + 𝜎e2 . A drawback of (7.4.8) is that it applies the Huber 𝜓-function to the combined errors Zi vi vi + ei unlike (7.4.3), which applies the Huber 𝜓-function to ei yi Xi and vi separately. Now using the robust estimators ̂ R and v̂ iR , a REBLUP estimator of𝜇i XTi + 𝜇iSR XTi ̂ R + 𝑣̂ iR . Similarly, a REBLUP estimator of the area mean 𝑣i is giv en bŷ Y i is obtained as ( ) ̂ SR 1 (7.4.9) Yi Ni yij + ŷ ijR , j∈si
j∈ri
where ŷ ijR xTij ̂ R + 𝑣̂ iR and si and ri denote the sets of sampled and nonsampled units in ith area. We now turn to the case of possibly nonzero means in 𝑣i or eij or both. In this case, ̂ SR Y i may lead to a signifi cant bias, which in turn can affect the MSE. A bias-corrected robust estimator is obtained by treating the model with errors symmetrically distributed around zero as a working model and then adding an estimator of the mean of ̂ SR prediction errors ê ijR yij ŷ ijR , j ∈ ri , to the “prediction” estimatorY i . This leads to ̂ SR BC ̂ SR Yi Y i + (1 ni Ni 1 )ni 1 𝜙i 𝜓c (̂eijR ∕𝜙i ), (7.4.10) j∈si
where 𝜙i is a robust estimator of the scale of the area i errors ê ijR , j ∈ si , such as the median absolute dev iation, and𝜓c (⋅) is the Huber 𝜓-function with tuning constant c > b (Chambers et al. 2014). The bias-correction in (7.4.10), howev er, is based only on the units j ∈ si and hence it can be considerably v ariable ifni is v ery small. Note ̂ SR BC ̂ SR that Y i reduces to Y i if c 0. On the other hand, ifc ∞ and ni ∕Ni ≈ 0, then SR SR BC ̂ ̂ Y i ≈ 𝜇̂ iSR and Y i ≈ (XTi ̂ R + 𝑣̂ iR ) + (yi XTi ̂ R 𝑣̂ iR ) yi + (Xi xi )T ̂ R , which may be regarded as a robust “surv ey regression” estimator. The latter is essentially a direct estimator. We now turn to fully bias-corrected robust estimators that make use of the residuals ê 𝓁jR y𝓁jR ŷ 𝓁jR from other areas 𝓁 ≠ i in addition to the residuals ê ijR from
197
*OUTLIER ROBUST EBLUP ESTIMATION
area i. Jiongo, Haziza, and Duchesne (2013) proposed two different approaches for this purpose: approach 1 follows along the lines of Chambers (1986) for constructing robust direct estimators for large areas, while approach 2 uses conditional bias to measure the influence of units in the population (Beaumont, Haziza, and Ruiz-Gazen 2013). ̂H We fi rst express the EBLUP estimatorY i as a weighted sum of all the sample observ ations{y𝓁j , 𝓁 1, … , m; j 1, … , n𝓁 }: ̂H Yi
m
Ni
1
𝑤i𝓁j ( ̂ )y𝓁j ,
(7.4.11)
𝓁 1 j∈s𝓁
where the weights 𝑤i𝓁j ( ̂ ) depend on X𝓁 , V𝓁 , x𝓁j (j ∈ ri ; 𝓁 1, … , m) and 𝜎𝑣2 , see Jiongo et al. (2013) for explicit formulae for 𝑤iij ( ̂ ) and 𝑤i𝓁j ( ̂ ), 𝓁 ≠ i. It is interesting ̂H Ni to note that Y i satisfi es calibration to known totalsXi+ x in the sense j 1 ij m
𝑤i𝓁j ( ̂ )x𝓁j
(7.4.12)
Xi+ .
𝓁 1 j∈s𝓁
Using the representation (7.4.11) and following Chambers (1986), Jiongo et al. (2013) ̂ SR ̂H show that Y i can be expressed in terms of Y i as ̂H Yi
̂ SR Y i + Ni
m 1
[𝑤iij ( ̂ )
1]̂eijR + Ni
1
j∈si
𝑤i𝓁j ( ̂ )̂eR𝓁j 𝓁≠i j∈s𝓁
m
+ Ni
1
Wi𝓁 ( ̂ )𝑣̂ 𝓁R ,
(7.4.13)
𝓁 1
where Wi𝓁 ( ̂ )
⎧ 𝑤iij ( ̂ ) Ni ⎪j∈si ⎨ 𝑤i𝓁j ( ̂ ) ⎪ ⎩j∈s𝓁
if 𝓁
i;
if 𝓁 ≠ i.
(7.4.14)
A fully bias-corrected REBLUP is now obtained by applying Huber 𝜓-functions to the weighted terms in each of the three sums on the right-hand side of (7.4.13), with possibly different tuning constants c1 and c2 . This leads to ̂ JHD1 Yi
̂ SR Y i + Ni
1
𝜓c1 {[𝑤iij ( ̂ )
1]̂eijR }
j∈si m
+ Ni
1
m
𝜓c1 {𝑤i𝓁j ( ̂ )̂eR𝓁j } + Ni 𝓁≠i j∈s𝓁
1
𝜓c2 [Wi𝓁 ( ̂ )𝑣̂ 𝓁R ]. 𝓁 1
(7.4.15)
198
BASIC UNIT LEVEL MODEL
̂ JHD1 ̂ SR ̂H Note that Y i tends to Y i as c1 → 0 and c2 → 0 and to Y i as c1 → ∞ and c2 → ∞. For the choice of c1 and c2 , Jiongo et al. (JHD) recommend c1 𝛼 𝜎̂ eR med(𝑤i𝓁j ( ̂ )) and c2 𝛼 𝜎̂ 𝑣R med(Wi𝓁 ( ̂ )) for some constant 𝛼; in particular, a larger𝛼 (say 𝛼 9) seems to control the biases, and the corresponding MSEs are small. ̂ JHD1 is closely Under the second approach, the fully bias-corrected estimator Y i related to (7.4.15), except that𝜓c2 [Wi𝓁 ( ̂ )𝑣̂ 𝓁R ] is changed to Wi𝓁 ( ̂ )𝑣̂ 𝓁R , that is, the Huber 𝜓-function is not applied to the last term of (7.4.13). 7.4.2
MSE Estimation
Giv en that the underlying distributions of𝑣i and eij are unknown, a nonparametric bootstrap method, based on resampling from the estimated random effects𝑣̂ i and the residuals ê ij , might look plausible for the estimation of MSE of the robust estimators of Y i . Howev er, as noted by Salibian-Barrera and Van Aelst (2008), the proportion of outliers in the bootstrap data {y∗ij } may be much higher than in the original data {yij } and, this diffi culty may lead to poor performance of the nonparametric bootstrap MSE estimators in the presence of outliers. Sinha and Rao (2009) proposed instead to use a parametric bootstrap method based on the robust estimates ̂ R and ̂ R assuming normality. The motiv ation behind this method is that our focus is on dev iations from the working assumption of normality of the 𝑣i and the eij , and that it is natural to use robust estimates ̂ R and ̂ R for draŵ SR ing bootstrap samples since MSE(Y i ) is not sensitiv e to outliers. The parametric bootstrap method is described as follows: (i) For giv en robust estimateŝ R and ̂ R 2 ,𝜎 2 )T , generate𝑣∗ and e∗ from N(0, 𝜎 2 ) and N(0, 𝜎 2 ) independently to create a ̂ eR ̂ 𝑣R ̂ eR (𝜎̂ 𝑣R i ij ∗ bootstrap sample {yij ; j 1, … , ni , i 1, … , m}, wherey∗ij xTij ̂ R + 𝑣∗i + e∗ij . Compute the corresponding bootstrap area mean Y i∗ N 1 y∗ + N 1 xT ̂ R + ∗
i
∗
j∈si ij
i
j∈ri ij
2 ). This is equiv (1 ni Ni 1 )(𝑣∗i + eir ), whereeir is generated from N(0, (Ni ni ) 1 𝜎̂ eR alent to generating y∗ij xTij ̂ R + 𝑣∗i + e∗ij , forj 1, … , Ni and then taking their mean ̂ SR N Y i∗ Ni 1 j i 1 y∗ij . Compute the REBLUP Y i∗ from the generated bootstrap sam̂ SR ple. (ii) Repeat step (i) B times. Let Y i∗ (b) be the robust EBLUP estimate obtained in bootstrap replicate b and Y i∗ (b) the corresponding bootstrap mean, b 1, … , B. ̂ SR Bootstrap MSE estimator of Y i is then giv en by
̂ SR mseB (Y i )
1 B
B
[ SR ̂ Y i∗ (b)
]2 Y i∗ (b) .
(7.4.16)
b 1
̂ SR BC Bootstrap MSE estimator of Y i is obtained similarly. Jiongo et al. (2013) proposed an alternativ e parametric bootstrap MSE estimator based on generating 𝑣∗i and e∗ij from N(0, 𝜎̂ 𝑣2 ) and N(0, 𝜎̂ e2 ), where 𝜎̂ 𝑣2 and 𝜎̂ e2 are the nonrobust ML or REML estimators.
*OUTLIER ROBUST EBLUP ESTIMATION
199
̂ SR Chambers et al. (2014) studied conditional MSE of Y i and its bias-corrected ̂ SR BC v ersionY i , by conditioning on the realized v alues of the area effects{𝑣i ; i 1, … , m}. Two different estimators of the conditional MSE were proposed. The fi rst ̂ SR one is based on a “pseudo-linear” representation for Y i , and the second estimator is based on a linearization of the robustifi ed equations (7.4.3) for𝑣i and (7.4.4) for . Similar conditional MSE estimators were calculated also for the EBLUP estimator ̂H Y i . The fi rst MSE estimator, although computationally simpler, is signifi cantly less stable, in terms of CV, than the second MSE estimator. Simulation results in Chambers et al. (2014) showed that, in the case of no outliers, customary second-order unbiased Prasad–Rao (PR) MSE estimator (7.2.12) of the EBLUP estimator is much more stable, in terms of relativ e root mean squared error (RRMSE) or CV, than the proposed conditional MSE estimators. This is true especially when the area sample sizes, ni , are small. Under the same no outlier scenario, ̂ SR the bootstrap MSE estimator (7.4.16) of the REBLUP estimator Y i was signifi cantly more stable than the conditional MSE estimators. Bootstrap MSE estimator for the ̂ SR BC performed ev en better, with median RRMSE about bias-corrected estimator Y i one-third of the v alues for the conditional MSE estimators. 7.4.3
Simulation Results
Sinha and Rao (2009) conducted a simulation study on the performance of REBLUP ̂ SR ̂H estimators 𝜇̂ iSR and Y i relativ e to the EBLUP estimators𝜇̂ iH and Y i , respectiv ely, under two different simulation models to generate symmetric outliers in the area effects 𝑣i only, or the errors eij , or in both. The fi rst simulation model 2 ) for used mixture normal distributions of the form (1 p1 )N(0, 𝜎𝑣2 ) + p1 N(0, 𝜎𝑣1 2 2 2 2 𝑣i and (1 p2 )N(0, 𝜎e ) + p2 N(0, 𝜎e1 ) for eij with p1 p2 0.1, 𝜎𝑣 𝜎e 1, and 2 2 𝜎e1 25. This means that a large proportion 1 p1 0.9 of the 𝑣i ’s are 𝜎𝑣1 generated from the underlying “true” distribution N(0, 1) and the remaining small proportion p1 0.10 are generated from the contaminated distribution N(0, 25), and similarly for the eij ’s. The second simulation model used t-distribution with k 3 degrees of freedom for both 𝑣i and eij . The case of no outliers in 𝑣i or eij is also studied. They considered m 30 small areas with sample sizes ni 4, i 1, … , m. Results of simulations may be summarized as follows: (i) In the case of no outliers, REBLUP estimators are similar to the corresponding EBLUP estimators in terms of MSE, indicating v ery small loss in effi ciency. (ii) In the case of outliers in the errors eij , REBLUP estimators are much more effi cient than the corresponding EBLUP estimators. (iii) On the other hand, in the case of outliers only in the area effects 𝑣i , REBLUP and EBLUP gav e similar results, indicating robustness of EBLUP in this case. (iv ) The bootstrap MSE estimator performs well in tracking the MSE of the REBLUP. Simulation results in Jiongo et al. (2013) for the case of nonsymmetric outliers ̂ JHD2 ̂ JHD1 and Y i perform indicated that the fully bias-corrected robust estimators Y i
200
BASIC UNIT LEVEL MODEL
̂ SR ̂ SR well in terms of MSE unlike Y i and Y i to signifi cant bias. 7.5
BC
; in the latter case, MSE is inflated due
*M-QUANTILE REGRESSION
As an alternativ e to modeling the between-area v ariation through additiv e area random effects in the nested error model (7.1.1) and with the purpose of obtaining robust estimators, Chambers and Tzav idis (2006) proposed M-quantile regression models. In these models, the between-area v ariation is incorporated through area-specifi c quantile-like coeffi cients. For a real-v alued random v ariable y with probability density function f (y), Breckling and Chambers (1988) defi ned the M-quantile of orderq ∈ (0, 1) as the solution Q𝜓 (q) of the following equation in Q: ∞
∫
𝜓q (y
Q)f (y)dy
0,
(7.5.1)
∞
where 𝜓q (u) 2𝜓(u)[qI(u > 0) + (1 q)I(u ≤ 0)] and 𝜓(u) is a monotone nondecreasing function with 𝜓( ∞) < 𝜓(0) 0 < 𝜓(∞). For 𝜓(u) sign(u), Q𝜓 (q) reduces to the ordinary quantile of order q, while the choice𝜓(u) u with q 1∕2 yields the expected v alue. The abov e defi nition of M-quantile of order q readily extends to the conditional density f (y|x) by replacing f (y) in (7.5.1) by f (y|x). The resulting solution is denoted as Q𝜓 (x; q). A linear M-quantile regression model for a fi xedq and specifi ed𝜓 is giv en byQ𝜓 (x; q) xT 𝜓 (q). The standard linear regression model E(y|x) xT is a special case of this model by letting q 1∕2 and 𝜓(u) u. Similarly, the ordinary quantile regression (Koenker and Bassett 1978) is obtained as a special case by setting 𝜓(u) sign(u). In the small area estimation context, a linear M-quantile regression model is assumed for ev eryq in (0, 1), in contrast to the basic unit lev el model, which assumes a nested error regression model only on the conditional mean E(y|x). Moreov er, 𝜓 (q) is assumed to be a continuous function of q for a giv en𝜓. In M-quantile regression, based on a sample{(y𝓁 , x𝓁 ); 𝓁 1, … , n}, the estimator of the M-quantile regression coeffi cient 𝜓 (q) is obtained by solv ing the following sample estimating equation for 𝜓 (q), (
n
𝜓q 𝓁 1
y𝓁
xT𝓁 s̃
) 𝜓 (q)
x𝓁
0,
(7.5.2)
where s̃ is a suitable robust scale estimator such as the mean absolute dev iation s̃ med𝓁 1,…,n |y𝓁 xT𝓁 𝜓 (q)|∕0.6745. Using 𝜓(u) sign(u), we obtain the estimating equation for ordinary quantile regression, while forq 1∕2 and 𝜓(u) u the solution of (7.5.2) is the least squares estimator of in the usual regression model E(y𝓁 |x𝓁 ) xT𝓁 , 𝓁 1, … , n.
201
*M-QUANTILE REGRESSION
Equation (7.5.2), for giv en 𝜓 and q, can be solv ed by IRWLS. Defi ning the weight function Wq (u) 𝜓q (u)∕u for u ≠ 0 and Wq (0) 𝜓q′ (0), (7.5.2) can be expressed as (
n
Wq
y𝓁
)
xT𝓁
𝜓 (q)
s̃
𝓁 1
xT𝓁
x𝓁 (y𝓁
𝜓 (q))
0.
(7.5.3)
Solv ing for the 𝜓 (q) within the second parenthesis, we obtain the IRWLS updating equation, giv en by ̂ (k+1) (q) 𝜓
(k) ⎤ xT𝓁 ̂ 𝜓 (q) ⎞ ⎟ x𝓁 xT ⎥ 𝓁⎥ ⎟ s̃ (k) ⎦ ⎠
⎛y ⎡ n ⎢ Wq ⎜ 𝓁 ⎜ ⎢j 1 ⎣ ⎝ n
× 𝓁
⎛y 𝓁 Wq ⎜ ⎜ 1 ⎝
1
(k)
xT𝓁 ̂ 𝜓 (q) ⎞ ⎟x y . ⎟ 𝓁 𝓁 s̃(k) ⎠
(7.5.4)
The resulting solution is denoted by ̂ 𝜓 (q) and the associated conditional ̂ 𝜓 (x; q) xT ̂ 𝜓 (q). The main reason for considering M-quantiles M-quantile by Q instead of the usual quantiles is that a continuous monotone 𝜓 function ensures that the IRWLS algorithm (7.5.4) conv erges to a unique solution (Kokic et al. 1997). Moreov er, selecting a bounded 𝜓 ensures robustness in the sense of bounded influence function. An example of a bounded and continuous 𝜓-function is Huber’s Proposal 2, giv en by 𝜓 (u) uI(|u| ≤ c) + c sign(u)I(|u| > c). Chambers and Tzav idis (2006) proposed small area estimators based on a specifi c M-quantile regression model that accommodates area effects in a different way. They defi ned the M-quantile coeffi cient q𝓁 associated with (y𝓁 , x𝓁 ) for the population unit 𝓁 as the v alueq𝓁 such that Q𝜓 (x𝓁 ; q𝓁 ) xT𝓁 𝜓 (q𝓁 ) y𝓁 . Furthermore, they argued that if there is a hierarchical structure in the distribution of the population v aluesy𝓁 giv enx𝓁 with between-area and within-area v ariability, then units in the same area should hav e similar M-quantile coeffi cients. Using this argument, they assumed the population model Q𝜓 (x𝓁 ; 𝜃i )
xT𝓁
𝜓 (𝜃i ),
𝓁 ∈ Ui ,
i
1, … , m,
(7.5.5)
where Ui is the set of Ni population units from i th area and 𝜃i Ni 1 𝓁∈Ui q𝓁 is the mean of the M-quantile coeffi cientsq𝓁 for the units 𝓁 in area i. In practice, 𝜃i for each area i needs to be estimated from the sample. This is ̂ 𝜓 (x𝓁 ; q) y𝓁 for q to get q̂ 𝓁 for each 𝓁 1, … , n, and then taking done by solv ingQ ̂𝜃i ∶ n 1 ̂ 𝓁 as the estimator of 𝜃i . The v alueŝq𝓁 are determined by calculating 𝓁∈si q i ̂ 𝜓 (x𝓁 ; q), 𝓁 1, … , n, for a fi ne grid of Q q-v alues in the(0, 1) interv al, and selecting ̂ 𝜓 (x𝓁 ; q̂ 𝓁 ) y𝓁 by linear interpolation. For each 𝜃̂i , the corthe v aluêq𝓁 such that Q responding ̂ 𝜓 (𝜃̂i ) is obtained by IRWLS applied to the sample data. The M-quantile predictor of a nonsample unit y𝓁 , 𝓁 ∈ Ui si ri , is taken aŝyMQ xT𝓁 ̂ 𝜓 (𝜃̂i ) using 𝓁
202
BASIC UNIT LEVEL MODEL
the assumed population model (7.5.5). Finally, the resulting predictor ofith area mean Y i Ni 1 𝓁∈Ui y𝓁 is giv en by ( ̂ MQ Yi
Ni
)
1
ŷ MQ 𝓁
y𝓁 + 𝓁∈si
(7.5.6)
.
𝓁∈ri
Estimator (7.5.6) does not minimize any sound alternativ e criterion to the mean squared error (MSE), unlike the EBLUP under the assumed unit lev el model, which minimizes the MSE among the linear and unbiased estimators. Furthermore, it appears that uniform consistency of the estimators ̂ 𝜓 (q) is needed in order to justify the method theoretically, where uniform consistency means that ̂ ̂ sup | ̂ 𝜓 (q) 𝜓 (q)|→p 0. Also, note that 𝜓 (𝜃i ) can be signifi cantly different
0 data("cornsoybean") R> data("cornsoybeanmeans")
Now we create a data frame with the true county means of the auxiliary v ariables called Xmean. We create another data frame with the county population sizes, called Popn. In these two data frames, the fi rst column must contain the county codes. This column helps to identify the counties in the case that they were arranged differently in Xmean and Popn. R> Xmean Popn set.seed(123) R> BHF R> + + R>
sqrtmse.BHF R> R> R>
X
In R>
plot(resid2s,diag(S),main="",xlab="ei ̂ 2/e ̂ Te", ylab="sii") cutoffs + + R>
sqrtmse.BHF data("grapes") R> data("grapesprox")
Next, obtain EBLUP estimators based on REML fitting method (default), accompanied by the analytical MSE estimates given by Singh et al. (2005): R> SFH SFH.npb R> R> + R>
xmin cv.SFH grapes.est sortedgrapes R> R> R> +
plot(sortedgrapes$dir,type="n",xlab="area",ylab="Estimate") points(sortedgrapes$dir,type="b",col=3,lwd=2,pch=1) points(sortedgrapes$eblup.SFH,type="b",col=4,lwd=2,pch=4) legend(1,350,legend=c("Direct","EBLUP"),ncol=2,col=c(3,4), lwd=rep(2,2),pch=c(1,4))
Plot also the CVs of EBLUP and direct estimates: R> plot(sortedgrapes$cv.dir,type="n",xlab="area",ylab="CV", ylim=c(0,400)) R> points(sortedgrapes$cv.dir,type="b",col=3,lwd=2,pch=1) R> points(sortedgrapes$cv.SFH,type="b",col=4,lwd=2,pch=4) R> legend(1,400,legend=c("CV(Direct)","CV(EBLUP)"),col=c(3,4), + lwd=rep(2,2),pch=c(1,4))
EBLUP
CV(Direct) CV(EBLUP)
CV
0
100
200
300
Direct
400
50 100 150 200 250 300 350
EBLUP: EXTENSIONS
0
Estimate
268
0
50
100
150 Area (a)
200
250
0
50
100
150
200
250
Area (b)
Figure 8.2 EBLUP Estimates, Based on the Spatial FH Model with SAR Random Effects, and Direct Estimates of Mean Surface Area Used for Production of Grapes for Each Municipality (a). CVs of EBLUP Estimates and of Direct Estimates for Each Municipality (b). Municipalities are Sorted by Increasing CVs of Direct Estimates.
Figure 8.2a shows EBLUP and direct estimates of mean surface area used for grape production for each municipality and Figure 8.2b shows the CVs of EBLUP and direct estimates, with municipalities sorted by increasing CVs of direct estimates. Figure 8.2a shows that EBLUP estimates are more stable than direct estimates. Moreover, the CVs of EBLUP estimates are smaller than those of direct estimates for most municipalities, and the reduction in CV is considerable in the municipalities where direct estimates are very inefficient (on the right side of the plots).
9 EMPIRICAL BAYES (EB) METHOD
9.1
INTRODUCTION
The empirical best linear unbiased prediction (EBLUP) method, studied in Chapters 5–8, is applicable to linear mixed models that cover many applications of small area estimation. Normality of random effects and errors is not needed for point estimation, but normality is used for getting accurate MSE estimators. The MSE estimator for the basic area level model remains valid under nonnormality of the random effects, 𝑣i (see Section 6.2.1), but normality is generally needed. Linear mixed models are designed for continuous variables, y, but they are not suitable for handling binary or count data. In Section 4.6, we proposed suitable models for binary and count data; in particular, logistic regression models with random area effects for binary data, and log-linear models with random effects for count data. Empirical Bayes (EB) and hierarchical Bayes (HB) methods are applicable more generally in the sense of handling models for binary and count data as well as normal linear mixed models. In the latter case, EB and EBLUP estimators are identical. In this chapter, we study the EB method in the context of small area estimation. The EB approach may be summarized as follows: (i) Using the conditional density, f (y|𝝁, 𝝀1 ), of y given 𝝁 and the density f (𝝁|𝝀2 ) of 𝝁, obtain the posterior density, f (𝝁|y, 𝝀), of the small area (random) parameters of interest, 𝝁, given the data y, where 𝝀 = (𝝀T1 , 𝝀T2 )T denotes the vector of model parameters. (ii) Estimate the model parameters, 𝝀, from the marginal density, f (y|𝝀), of y. (iii) Use the estimated postê for making inferences about 𝝁, where 𝝀̂ is an estimator of 𝝀. rior density, f (𝝁|y, 𝝀),
270
EMPIRICAL BAYES (EB) METHOD
The density of 𝝁 is often interpreted as prior density on 𝝁, but it is actually a part of the postulated model on (y, 𝝁) and it can be validated from the data, unlike subjective priors on model parameters, 𝝀, used in the HB approach. In this sense the EB approach is essentially frequentist, and EB inferences refer to averaging over the joint distribution of y and 𝝁. Sometimes, a prior density is chosen for 𝝀 but used only to derive estimators and associated measures of variability as well as confidence intervals (CIs) with good frequentist properties (Morris 1983b). In the parametric empirical Bayes (PEB) approach, a parametric form, f (𝝁|𝝀2 ), is assumed for the density of 𝝁. On the other hand, nonparametric empirical Bayes (NPEB) methods do not specify the form of the (prior) distribution of 𝝁. Nonparametric maximum likelihood is used to estimate the (prior) distribution of 𝝁 (Laird 1978). Semi-nonparametric (SNP) representations of the density of 𝝁 are also used in making EB inferences. For example, Zhang and Davidian (2001) studied linear mixed models with block-diagonal covariance structures by approximating the density of the random effects by a SNP representation. Their representation includes normality as a special case, and it provides flexibility in capturing nonnormality through a user-chosen tuning parameter. In this chapter, we focus on PEB approach to small area estimation. We refer the reader to Maritz and Lwin (1989) and Carlin and Louis (2008) for excellent accounts of the EB methodology. The basic area level model (6.1.1) with normal random effects, 𝑣i , is used in Section 9.2 to introduce the EB methodology. A jackknife method of MSE estimation is given in Section 9.2.2. This method is applicable more generally, as shown in subsequent sections. Inferences based on the estimated posterior density of the small area parameters, 𝜃i , do not account for the variability in the estimators of model parameters (𝜷, 𝜎𝑣2 ). Methods that account for the variability are studied in Section 9.2.3. CI estimation is addressed in Section 9.2.4. Section 9.3 provides extensions to linear mixed models with a block-diagonal covariance structure. Section 9.4 describes EB estimation of general nonlinear parameters under the basic unit level model and an application to poverty mapping. The case of binary data is studied in Section 9.5. Applications to disease mapping using count data are given in Section 9.6. Section 9.7 describes design-weighted EB estimation under exponential family models. The EB (or EBLUP) estimator, 𝜃̂iEB , may not perform well in estimating the histogram of the 𝜃i ’s or in ranking them; Section 9.8 proposes constrained EB and other estimators that address this problem. Finally, empirical linear Bayes (ELB) and empirical constrained linear Bayes (ECLB) methods are studied in Sections 9.9 and 9.10, respectively. These methods avoid distributional assumptions. Section 9.11 describes R software for EB estimation of general area parameters under the basic unit level model.
9.2
BASIC AREA LEVEL MODEL
Assuming normality, the basic area level model (6.1.1) may be expressed as a ind ind two-stage hierarchical model: (i) 𝜃̂i |𝜃i ∼ N(𝜃i , 𝜓i ), i = 1, … , m; (ii) 𝜃i ∼ N(zTi 𝜷, b2i 𝜎𝑣2 ), i = 1, … , m, where 𝜷 is the p × 1 vector of regression parameters. In the Bayesian framework, the model parameters 𝜷 and 𝜎𝑣2 are random, and the two-stage
271
BASIC AREA LEVEL MODEL
hierarchical model is called the conditionally independent hierarchical model (CIHM) because the pairs (𝜃̂i , 𝜃i ) are independent across areas i, conditionally on 𝜷 and 𝜎𝑣2 (Kass and Steffey 1989). 9.2.1
EB Estimator
The “optimal” estimator of the realized value of 𝜃i is given by the conditional expectation of 𝜃i given 𝜃̂i , 𝜷, and 𝜎𝑣2 : E(𝜃i |𝜃̂i , 𝜷, 𝜎𝑣2 ) = 𝜃̂iB = 𝛾i 𝜃̂i + (1 − 𝛾i )zTi 𝜷,
(9.2.1)
where 𝛾i = b2i 𝜎𝑣2 ∕(b2i 𝜎𝑣2 + 𝜓i ). The result (9.2.1) follows from the posterior (or conditional) distribution of 𝜃i given 𝜃̂i , 𝜷, and 𝜎𝑣2 , which is given by ind 𝜃i |𝜃̂i , 𝜷, 𝜎𝑣2 ∼ N(𝜃̂iB , g1i (𝜎𝑣2 ) = 𝛾i 𝜓i ).
(9.2.2)
The estimator 𝜃̂iB = 𝜃̂iB (𝜷, 𝜎𝑣2 ) is the “Bayes” estimator under squared error loss and it is optimal in the sense that its MSE, MSE(𝜃̂iB ) = E(𝜃̂iB − 𝜃i )2 , is smaller than the MSE of any other estimator of 𝜃i , linear or nonlinear in the 𝜃̂i ’s (see Section 6.2.3, Chapter 6). It may be more appropriate to name 𝜃̂iB as the best prediction (BP) estimator of 𝜃i because it is obtained from the conditional distribution (9.2.1) without assuming a prior distribution on the model parameters (Jiang, Lahiri, and Wan 2002). The Bayes estimator 𝜃̂iB depends on the model parameters 𝜷 and 𝜎𝑣2 , which are ind estimated from the marginal distribution given by 𝜃̂i ∼ N(zTi 𝜷, b2i 𝜎𝑣2 + 𝜓i ), using ML or REML. Denoting the resulting estimators as 𝜷̂ and 𝜎̂ 𝑣2 , we obtain the EB (or the empirical best prediction (EBP)) estimator of 𝜃i from 𝜃̂iB by substituting 𝜷̂ for 𝜷 and 𝜎̂ 𝑣2 for 𝜎𝑣2 : ̂ 𝜎̂ 𝑣2 ) = 𝛾̂i 𝜃̂i + (1 − 𝛾̂i )zT 𝜷. ̂ 𝜃̂iEB = 𝜃̂iB (𝜷, (9.2.3) i The EB estimator, 𝜃̂iEB , is identical to the EBLUP estimator 𝜃̂iH given by (6.1.12). ̂ 𝜷, ̂ 𝜎̂ 𝑣2 ), of 𝜃i Note that 𝜃̂iEB is also the mean of the estimated posterior density, f (𝜃i |𝜽, given the data 𝜽̂ = (𝜃̂1 , … , 𝜃̂m )T , which is N(𝜃̂iEB , 𝛾̂i 𝜓i ). For the special case of equal sampling variances 𝜓i = 𝜓 and bi = 1 for all i, Morris (1983b) studied the use of an unbiased estimator of 1 − 𝛾i = 1 − 𝛾 given by 1 − 𝛾 ∗ = 𝜓(m − p − 2) ∕S,
(9.2.4)
∑ T̂ 2 ̂ ̂ where S = m i=1 (𝜃i − zi 𝜷 LS ) and 𝜷 LS is the least squares estimator of 𝜷. The resulting EB estimator (9.2.5) 𝜃̂iEB = 𝛾 ∗ 𝜃̂i + 1( − 𝛾 ∗ )zTi 𝜷̂ LS is identical to the James–Stein estimator, studied in Section 3.4.2. Note that (9.2.4) may be expressed as 1 − 𝛾 ∗ = [(m − p − 2) ∕m( − p) 𝜓 ] ∕ 𝜓( + 𝜎̃ 𝑣2 ),
(9.2.6)
272
EMPIRICAL BAYES (EB) METHOD
where 𝜎̃ 𝑣2 = S∕ m ( − p) −𝜓 is the unbiased moment estimator of 𝜎𝑣2 . Morris (1983b) used the REML estimator 𝜎̂ 𝑣2 = max(0, 𝜎̃ 𝑣2 ) in (9.2.6), instead of the unbiased estimator 𝜎̃ 𝑣2 , to ensure that 1 − 𝛾 ∗ < 1. The multiplying constant (m − p − 2) ∕m( − p) offsets the positive bias introduced by the substitution of a nearly unbiased estimator 𝜎̂ 𝑣2 into 1 − 𝛾 = 𝜓∕ 𝜓( + 𝜎𝑣2 ); note that 1 − 𝛾 is a convex nonlinear function of 𝜎𝑣2 so that E[𝜓∕ 𝜓( + 𝜎̃ 𝑣2 ) ]> 1 − 𝛾 by Jensen’s inequality. For the case of unequal sampling variances 𝜓i with bi = 1, Morris (1983b) used the multiplying constant (m − p − 2) ∕m( − p) in the EB estimator (9.2.3); that is, 1 − 𝛾̂i is replaced by ] i ∕ 𝜓( i + 𝜎̂ 𝑣2 ), (9.2.7) 1 − 𝛾i∗ = [(m − p − 2) ∕m( − p) 𝜓 where 𝜎̂ 𝑣2 is the REML estimator of 𝜎𝑣2 . He also proposed an alternative estimator of 𝜎𝑣2 that is similar to the Fay–Herriot (FH) moment estimator (Section 6.1.2). This estimator is obtained by solving, iteratively for 𝜎𝑣2 , the equation
𝜎𝑣2
( m )−1 m [ ] ∑ ∑ m ̂ ̃ 2 − 𝜓i , = 𝛼i 𝛼i (𝜃i − zTi 𝜷) m−p i=1 i=1
(9.2.8)
̃ 𝑣2 ) is the weighted least squares estimator of 𝜷 and 𝛼i = 1∕ 𝜎(𝑣2 + 𝜓i ). where 𝜷̃ = 𝜷(𝜎 To avoid a negative solution, 𝜎̃ 𝑣2 , we take 𝜎̂ 𝑣2 = max(𝜎̃ 𝑣2 , 0). If 𝛼i is replaced by 𝛼i2 in (9.2.8), then the resulting solution is approximately equal to the REML estimator of 𝜎𝑣2 . An advantage of EB (or EBP) is that it can be applied to find the EB estimator of any function 𝜙i = h(𝜃i ); in particular, Y i = g−1 (𝜃i ) =h(𝜃i ). The EB estimator is ̂ 𝜎̂ 𝑣2 ) for obtained from the Bayes estimator 𝜙̂ Bi = E(𝜙i |𝜃̂i , 𝜷, 𝜎𝑣2 ) by substituting (𝜷, EB 2 ̂ (𝜷, 𝜎𝑣 ). The computation of the EB estimator 𝜙i might require the use of Monte , R} can be simulated Carlo or numerical integration. For example, {𝜃i(r) , r = 1, from the estimated posterior density, namely, N(𝜃̂iEB , 𝛾̂i 𝜓i ), to obtain a Monte Carlo approximation: R 1∑ 𝜙̂ EB ≈ h(𝜃i(r) ). (9.2.9) i R r=1 The computation of (9.2.9) can be simplified by rewriting (9.2.9) as R
√ 1 ∑ ̂ EB 𝜙̂ EB h(𝜃i + z(r) 𝛾̂i 𝜓i ), i ≈ R i r=1
(9.2.10)
where {z(r) , r = 1, , R} are generated from N(0, 1). i The approximation (9.2.10) will be accurate if the number of simulated samples, R, is large. Note that we used h(𝜃̂iH ) = h(𝜃̂iEB ) in Section 6.2.3 and remarked that the estimator h(𝜃̂iH ) does not retain the optimality of 𝜃̂iH .
273
BASIC AREA LEVEL MODEL
9.2.2
MSE Estimation
The results in Section 6.2.1 on the estimation of MSE of the EBLUP estimator, 𝜃̂iH , are applicable to the EB estimator 𝜃̂iEB because 𝜃̂iH and 𝜃̂iEB are identical under normality. Also, the area-specific estimator (6.2.5) or (6.2.6) may be used as an estimator of the conditional MSE, MSEc (𝜃̂iEB ) = E[(𝜃̂iEB − 𝜃i )2 |𝜃̂i ], where the expectation is conditional on the observed 𝜃̂i for the ith area (see Section 6.2.7). As noted in Section 6.2.1, the MSE estimators are second-order unbiased, that is, their bias is of lower order than m−1 , for large m. Jiang, Lahiri, and Wan (2002) proposed a jackknife method of estimating the MSE of EB estimators. This method is more general than the Taylor linearization methods of Section 6.2.1 for 𝜃̂iH = 𝜃̂iEB , in the sense that it can also be applied to MSE estimation under models for binary and count data (see Sections 9.5 and 9.6). We illustrate its use here for estimating MSE(𝜃̂iEB ). Let us first decompose MSE(𝜃̂iEB ) as follows: MSE(𝜃̂iEB ) = E(𝜃̂iEB − 𝜃̂iB )2 + E(𝜃̂iB − 𝜃i )2 = E(𝜃̂iEB
−
𝜃̂iB )2
+
(9.2.11)
g1i (𝜎𝑣2 )
=∶ M2i + M1i ,
(9.2.12)
where the expectation is over the joint distribution of (𝜃̂i , 𝜃i ), i = 1, , m; see Section , 𝜃̂m )T , 9.12 for a proof of (9.2.11). Note that 𝜃̂iEB depends on all the data, 𝜽̂ = (𝜃̂1 , 2 ̂ through the estimators 𝜷 and 𝜎̂ 𝑣 . The jackknife steps for estimating the two terms, M2i and M1i , in (9.2.12) are as ̂ 𝜎̂ 𝑣2 ) be the EB estimator of 𝜃i expressed as a function of follows. Let 𝜃̂iEB = ki (𝜃̂i , 𝜷, ̂ the direct estimator 𝜃i and the parameter estimators 𝜷̂ and 𝜎̂ 𝑣2 . 2 Step 1. Calculate the delete-𝓁 estimators 𝜷̂ −𝓁 and 𝜎̂ 𝑣,−𝓁 by deleting the 𝓁th area data set (𝜃̂𝓁 , z𝓁 ) from the full data set {(𝜃̂i , zi ); i = 1, , m}. This calculation 2 ); 𝓁 = 1, , m} is done for each 𝓁 to get m estimators of 𝜷 and 𝜎𝑣2 , {(𝜷̂ −𝓁 , 𝜎̂ 𝑣,−𝓁 EB EB ̂ ̂ which, in turn, provide m estimators of 𝜃i , {𝜃i,−𝓁 ; 𝓁 = 1, , m}, where 𝜃i,−𝓁 = ki (𝜃̂i , 𝜷̂ −𝓁 , 𝜎̂ 2 ). 𝑣,−𝓁
Step 2. Calculate the estimator of M2i as m
∑ ̂ 2i = m − 1 (𝜃̂ EB − 𝜃̂iEB )2 . M m 𝓁=1 i,−𝓁
(9.2.13)
Step 3. Calculate the estimator of M1i as m
∑ ̂ 1i = g1i (𝜎̂ 𝑣2 ) − m − 1 [g (𝜎̂ 2 ) − g1i (𝜎̂ 𝑣2 )]. M m 𝓁=1 1i 𝑣,−𝓁 ̂ 1i corrects the bias of g1i (𝜎̂ 𝑣2 ). The estimator M
(9.2.14)
274
EMPIRICAL BAYES (EB) METHOD
Step 4. Calculate the jackknife estimator of MSE(𝜃̂iEB ) as ̂ 1i + M ̂ 2i . mseJ (𝜃̂iEB ) = M
(9.2.15)
̂ 1i estimates the MSE when the model parameters are known, and M ̂ 2i estiNote that M mates the additional variability due to estimating the model parameters. The jackknife estimator of MSE, given by (9.2.15), is also second-order unbiased. The proof of this result is highly technical, and we refer the reader to Jiang, Lahiri, and Wan (2002) for details. The jackknife method is applicable to ML, REML, or moment estimators of the model parameters. It is computer-intensive compared to the MSE estimators studied in Section 6.2.1. In the case of ML or REML, computation of 2 ); 𝓁 = 1, , m} may be simplified by performing only a single step {(𝜷̂ −𝓁 , 𝜎̂ 𝑣,−𝓁 ̂ 𝜎̂ 𝑣2 ) as the starting values of 𝜷 and 𝜎𝑣2 . of the Newton–Raphson algorithm using (𝜷, However, properties of the resulting simplified jackknife MSE estimator are not known. The jackknife method is applicable to the EB estimator of any function 𝜙i = h(𝜃i ); in particular, Y i = g−1 (𝜃i ) = h(𝜃i ). However, the computation of ̂ 1i + M ̂ 2i might require repeated Monte Carlo or numerical inte)=M mseJ (𝜙̂ EB i ∑ ̂ ̂ EB ̂ EB 2 gration to obtain M2i = (m − 1)m−1 m 𝓁=1 (𝜙i,−𝓁 − 𝜙i ) and the bias-corrected ̂ 1i , of E(𝜙̂ B − 𝜙i )2 . estimator, M i Example 9.2.1. Visits to Doctor’s Office. Jiang et al. (2001) applied the jackknife method to data from the U.S. National Health Interview Survey (NHIS). The objective here is to estimate the proportion of individuals who did not visit a doctor’s office during the previous 12 months for all the 50 states and the District of Columbia (regarded as small areas). Direct NHIS estimates, P̂ i , of the proportions Pi are not reliable for the smaller states. The arcsine transformation √ was used to stabilize the variances of the ind ̂ i ∕ 4n ( i )), direct estimates. We take 𝜃̂i = arcsin P̂ i and assume that 𝜃̂i 𝜃(i , 𝜓i = D ̂ ̂ where Di is the estimated design effect (deff) of Pi and ni is the sample size from the ith area. Note that V(𝜃̂i ) ≈ Di ∕ 4n ( i ), where Di = Vi [Pi (1 − Pi )∕ni ]−1 is the pop̂ ̂ ̂ i was calculated as D ̂i = ulation deff of Pi and Vi = V(Pi ). The estimated deff D ̂Vi [P̂ i (1 − P̂ i )∕ni ]−1 , where V̂ i is the NHIS variance estimate of P̂ i . For the covariate selection, the largest 15 states with small 𝜓i -values were chosen. This permitted the use of standard selection methods for linear regression models because the basic area level model for those states may be treated as 𝜃̂i ≈ zTi 𝜷 + 𝑣i iid
with 𝑣i 0, ( 𝜎𝑣2 ). Based on Cp , R2 , and adjusted R2 criteria used in SAS, the following covariates were selected: z1 = 1, z2 =1995 Bachelor’s degree completion for population aged 25+, and z3 =1995 health insurance coverage. The simple moment 2 = max(𝜎 2 , 0) was used to estimate 𝜎 2 , where 𝜎 2 is given by (6.1.15) ̃ 𝑣s ̃ 𝑣s estimator 𝜎̂ 𝑣s 𝑣 2 2 with m = 51 and p = 3. The resulting weights 𝛾̂i = 𝜎̂ 𝑣 ∕ 𝜎̂(𝑣 + 𝜓i ) varied from 0.09 (South Dakota) to 0.95 (California). Using the estimated weights 𝛾̂i , the EB (EBLUP) estimates 𝜃̂iEB were computed for each of the small areas, which in turn provided the
275
BASIC AREA LEVEL MODEL
estimates of the proportions Pi as P̃ EB = sin2 (𝜃̂iEB ). Note that P̃ EB is not equal to the i i EB 2 ̂ 𝜎̂ 𝑣 ), but it simplifies the computations. true EB estimate P̂ i = E(Pi |𝜃̂i , 𝜷, MSE estimates of the estimated proportions P̃ EB were computed from the jacki EB ̂ knife estimates mseJ (𝜃i ) given by (9.2.15). Using Taylor linearization, we have mseJ (P̃ EB ) ≈ 4P̃ EB (1 − P̃ EB )mseJ (𝜃̂iEB ) =∶ s2J (P̃ EB ). Performance of P̃ EB relative to i i i i i the direct estimate P̂ i was measured by the percent improvement, PIi = 100[s(P̂ i ) − )]∕s(P̂ i ), where s2 (P̂ i ) = V̂ i , the NHIS variance estimate of P̂ i . Values of PIi sJ (P̃ EB i indicated that the improvement is quite substantial (30% to 55%) for small states (e.g., South Dakota and Vermont). On the other hand, the improvement is small for large states (e.g., California and Texas), as expected. Note that 𝜃̂iEB gives more weight, 𝛾̂i , to the direct estimate 𝜃̂i for large states (e.g., 𝛾̂i = 0.95 for California and 𝛾̂i = 0.94 for Texas).
9.2.3
Approximation to Posterior Variance
In Section 9.2.2, we studied MSE estimation, but alternative measures of variability associated with 𝜃̂iEB have also been proposed. Those measures essentially use a HB approach and provide approximations to the posterior variance of 𝜃i , denoted by ̂ based on a prior distribution on the model parameters 𝜷 and 𝜎𝑣2 . V(𝜃i |𝜽), If the model parameters 𝜷 and 𝜎𝑣2 are given, then the posterior (or conditional) ̂ is completely known and it provides a basis for inference on 𝜽 = distribution f (𝜽|𝜽) T , 𝜃m ) . In particular, the Bayes estimator 𝜃̂iB = E(𝜃i |𝜃̂i , 𝜷, 𝜎𝑣2 ) is used to esti(𝜃1 , mate the realized value of 𝜃i , and the posterior variance V(𝜃i |𝜃̂i , 𝜷, 𝜎𝑣2 ) = g1i (𝜎𝑣2 ) = 𝛾i 𝜓i is used to measure the variability associated with 𝜃i . The posterior variance is identical to MSE(𝜃̂iB ), and 𝜃̂iB is the BP estimator of 𝜃i . Therefore, the frequentist approach agrees with the Bayesian approach for the basic area level model when the model parameters are known. This agreement also holds for the general linear mixed model, but not necessarily for nonlinear models with random effects. In practice, the model parameters are not known and the EB approach uses the ind marginal distribution of the 𝜃̂i ’s, namely, 𝜃̂i N(zTi 𝜷, b2i 𝜎𝑣2 + 𝜓i ), to estimate 𝜷 2 and 𝜎𝑣 . A naive EB approach uses the estimated posterior density of 𝜃i , namely, N(𝜃̂iEB , 𝛾̂i 𝜓i ), to make inferences on 𝜃i . In particular, 𝜃i is estimated by 𝜃̂iEB , and the estimated posterior variance g1i (𝜎̂ 𝑣2 ) = 𝛾̂i 𝜓i is used as a measure of variability. The use of g1i (𝜎̂ 𝑣2 ), however, leads to severe underestimation of MSE(𝜃̂iEB ). Note that the naive EB approach treats 𝜷 and 𝜎𝑣2 as fixed, unknown parameters, so no prior distributions are involved. However, if we adopt an HB approach by treating the model parameters 𝜷 and 𝜎𝑣2 as random with a prior density f (𝜷, 𝜎𝑣2 ), then the ̂ is used as an estimator of 𝜃i and the posterior variance posterior mean 𝜃̂iHB = E(𝜃i |𝜽) ̂ ̂ as V(𝜃i |𝜽) as a measure of variability associated with 𝜃̂iHB . We can express E(𝜃i |𝜽) ̂ = E 2 [E(𝜃i |𝜃̂i , 𝜷, 𝜎𝑣2 )] = E 2 [𝜃̂ B (𝜷, 𝜎𝑣2 )] E(𝜃i |𝜽) 𝜷,𝜎 𝜷,𝜎 i 𝑣
𝑣
(9.2.16)
276
EMPIRICAL BAYES (EB) METHOD
̂ as and V(𝜃i |𝜽) ̂ = E 2 [V(𝜃i |𝜃̂i , 𝜷, 𝜎𝑣2 )] + V 2 [E(𝜃i |𝜃̂i , 𝜷, 𝜎𝑣2 )] V(𝜃i |𝜽) 𝜷,𝜎 𝜷,𝜎 𝑣
𝑣
= E𝜷,𝜎 2 [g1i (𝜎𝑣2 )] + V𝜷,𝜎 2 [𝜃̂iB (𝜷, 𝜎𝑣2 )], 𝑣
𝑣
(9.2.17)
where E𝜷,𝜎 2 and V𝜷,𝜎 2 , respectively, denote the expectation and the variance with 𝑣 𝑣 ̂ that is, f (𝜷, 𝜎𝑣2 |𝜽). ̂ respect to the posterior distribution of 𝜷 and 𝜎𝑣2 , given 𝜽, ̂ 𝜎̂ 𝑣2 ) = 𝜃̂ EB , For large m, we can approximate the last term of (9.2.16) by 𝜃̂iB (𝜷, i ̂ = where 𝜷̂ and 𝜎̂ 𝑣2 are ML (REML) estimators. More precisely, we have E(𝜃i |𝜽) ̂ 𝜎̂ 𝑣2 )[1 + O(m−1 )], regardless of the prior f (𝜷, 𝜎𝑣2 ). Hence, the EB estimator 𝜃̂ EB 𝜃̂iB (𝜷, i ̂ well. However, the naive EB measure of variability, g1i (𝜎̂ 𝑣2 ), provides tracks E(𝜃i |𝜽) only a first-order approximation to the first variance term E𝜷,𝜎 2 [g1i (𝜎𝑣2 )] on the 𝑣 right-hand side of (9.2.17). The second variance term, V𝜷,𝜎 2 [𝜃̂iB (𝜷, 𝜎𝑣2 )], accounts for 𝑣 the uncertainty about the model parameters, and the naive EB approach ignores this uncertainty. As a result, the naive EB approach can lead to severe underestimation ̂ Note that the naive measure of variability, of the true posterior variance, V(𝜃i |𝜽). g1i (𝜎̂ 𝑣2 ), also underestimates MSE(𝜃̂iEB ). ̂ and the The HB approach may be used to evaluate the posterior mean, E(𝜃i |𝜽), ̂ posterior variance, V(𝜃i |𝜽), exactly, for any specified prior on the model parameters 𝜷 and 𝜎𝑣2 . Moreover, it can handle complex small area models (see Chapter 10). Typically, “improper” priors that reflect lack of information on the model parameters are used in the HB calculations; for example, f (𝜷, 𝜎𝑣2 ) ∝ 1 may be used for the basic area level model. In the EB context, two methods of approximating the posterior variance, regardless of the prior, have been proposed. The first method, based on ̂ imibootstrap resampling, attempts to account for the underestimation of V(𝜃i |𝜽) ̂ tating the decomposition (9.2.17) of the posterior variance V(𝜃i |𝜽). In the bootstrap method, a large number, B, of independent samples {𝜃̂i∗ (b); b = 1, , B} are first ̂ b2 𝜎̂ 𝑣2 + 𝜓i ), i = 1, drawn from the estimated marginal distribution, N(zTi 𝜷, , m. i ∗ Estimates {𝜷̂ (b), 𝜎̂ 𝑣∗2 (b)} and EB estimates 𝜃̂i∗EB (b) are then computed from the , m}. This leads to the following approximation bootstrap data {(𝜃̂i∗ (b), zi ); i = 1, to the posterior variance: B
̂ = VLL (𝜃i |𝜽)
B
1∑ 1 ∑ ̂ ∗EB g1i (𝜎̂ 𝑣∗2 (b)) + [𝜃 (b) − 𝜃̂i∗EB (⋅)]2 , B b=1 B b=1 i
(9.2.18)
∑ where 𝜃̂i∗EB (⋅) = B−1 Bb=1 𝜃̂i∗EB (b) (see Laird and Louis 1987). Note that (9.2.18) has the same form as the decomposition (9.2.17) of the posterior variance. The last term of (9.2.18) is designed to account for the uncertainty about the model parameters. ̂ as a measure of The bootstrap method uses 𝜃̂iEB as the estimator of 𝜃i and VLL (𝜃i |𝜽) its variability. Butar and Lahiri (2003) studied the performance of the bootstrap measure, ̂ as an estimator of MSE(𝜃̂ EB ). They obtained an analytical approximation VLL (𝜃i |𝜽), i
277
BASIC AREA LEVEL MODEL
to (9.2.18) by letting B → ∞. Under REML estimation, we have ̂ ≈ g1i (𝜎̂ 𝑣2 ) + g2i (𝜎̂ 𝑣2 ) + g∗ (𝜎̂ 𝑣2 , 𝜃̂i ) =∶ Ṽ LL (𝜃i |𝜽), ̂ VLL (𝜃i |𝜽) 3i
(9.2.19)
for large m, where g2i (𝜎𝑣2 ) and g∗3i (𝜎̂ 𝑣2 , 𝜃̂i ) are given by (6.1.9) and (6.2.4), respectively. Comparing (9.2.19) to the second-order unbiased MSE estimator (6.2.6), it follows ̂ is not second-order unbiased for MSE(𝜃̂ EB ). In particular, that VLL (𝜃i |𝜽) i ̂ = MSE(𝜃̂i ) − g3i (𝜎𝑣2 ). E[Ṽ LL (𝜃i |𝜽)]
(9.2.20)
A bias-corrected MSE estimator is therefore given by ̂ + g3i (𝜎̂ 𝑣2 ). mseBL (𝜃̂iEB ) = Ṽ LL (𝜃i |𝜽)
(9.2.21)
This estimator is second-order unbiased and also area-specific. It is identical to mse2 (𝜃̂iH ) given by (6.2.6). In the second method, due to Kass and Steffey (1989), 𝜃̂iEB is taken as the estimator of 𝜃i , noting that ̂ = 𝜃̂ EB [1 + O(m−1 )], E(𝜃i |𝜽) (9.2.22) i but a positive correction term is added to the estimated posterior variance, g1i (𝜎̂ 𝑣2 ), to ̂ This positive term depends on the inforaccount for the underestimation of V(𝜃i |𝜽). mation matrix and the partial derivatives of 𝜃̂iB evaluated at the ML (REML) estimates ̂ is obtained, noting that 𝜷̂ and 𝜎̂ 𝑣2 . The following first-order approximation to V(𝜃i |𝜽) the information matrix for 𝜷 and 𝜎𝑣2 is block diagonal: ̂ = g1i (𝜎̂ 𝑣2 ) + 𝜕(𝜃̂ B ∕𝜕𝜷)T V(𝜷)(𝜕 ̂ 𝜃̂ B ∕𝜕𝜷)| ̂ 2 2 VKS (𝜃i |𝜽) 𝜷=𝜷,𝜎 =𝜎̂ i i 𝑣
+ 𝜕(𝜃̂iB ∕𝜕𝜎𝑣2 )2 V(𝜎̂ 𝑣2 )|𝜷=𝜷,𝜎 ̂ 2 =𝜎̂ 2 , 𝑣
𝑣
𝑣
(9.2.23)
̂ = [∑m zi zT ∕ 𝜓( i + 𝜎𝑣2 b2 )]−1 is the asymptotic covariance matrix of where V(𝜷) i=1 i i ̂ V(𝜎̂ 𝑣2 ) = [(𝜎𝑣2 )]−1 is the asymptotic variance of 𝜎̂ 𝑣2 with (𝜎𝑣2 ) given by (6.1.17), 𝜷, 𝜕 𝜃̂iB 𝜕𝜷 and
𝜕 𝜃̂iB 𝜕𝜎𝑣2
=
= (1 − 𝛾i )zi
𝜓i b2i (𝜓i + 𝜎𝑣2 b2i )2
(𝜃̂i − zTi 𝜷).
(9.2.24)
(9.2.25)
After simplification, (9.2.23) reduces to ̂ = g1i (𝜎̂ 𝑣2 ) + g2i (𝜎𝑣2 ) + g∗ (𝜎𝑣2 , 𝜃̂i ), VKS (𝜃i |𝜽) 3i
(9.2.26)
278
EMPIRICAL BAYES (EB) METHOD
̂ Therefore, VKS (𝜃i |𝜽) ̂ is also not second-order unbiwhich is identical to Ṽ LL (𝜃i |𝜽). ̂ ased for MSE(𝜃i ). Kass and Steffey (1989) have given a more accurate approximation ̂ This approximation ensures that the neglected terms are of lower order to V(𝜃i |𝜽). −1 than m , but it depends on the prior density, f (𝜷, 𝜎𝑣2 ), unlike the first-order approximation (9.2.23). If a prior needs to be specified, then it may be better to use the HB approach (see Chapter 10) because it is free of asymptotic approximations. Singh, Stukel, and Pfeffermann (1998) studied Kass and Steffey’s (1989) approximations in the context of small area estimation. Kass and Steffey’s (1989) method is applicable to general functions 𝜙i = h(𝜃i ), but the calculations might require the use of Monte Carlo or numerical integration. Similarly as in (9.2.22), we have ̂ = 𝜙̂ EB [1 + O(m−1 )] E(𝜙i |𝜽) i
(9.2.27)
̂ is then given by and the first-order approximation to V(𝜙i |𝜽) ̂ = V(𝜙i |𝜽, ̂ 𝜷, ̂ 𝜎̂ 𝑣2 ) + 𝜕(𝜙̂ B ∕𝜕𝜷)T V(𝜷)(𝜕 ̂ 𝜙̂ B ∕𝜕𝜷)| ̂ 2 2 VKS (𝜙i |𝜽) 𝜷=𝜷,𝜎 =𝜎̂ i i 𝑣
+ 𝜕[(𝜙̂ Bi ∕𝜕𝜎𝑣2 )2 V(𝜎̂ 𝑣2 )]|𝜷=𝜷,𝜎 ̂ 2 =𝜎̂ 2 , 𝑣
𝑣
𝑣
(9.2.28)
̂ 𝜷, ̂ 𝜎̂ 𝑣2 ) denotes V(𝜙i |𝜽, ̂ 𝜷, 𝜎𝑣2 ) evaluated at 𝜷 = 𝜷̂ and where 𝜙̂ Bi = h(𝜃̂iB ) and V(𝜙i |𝜽, 𝜎𝑣2 = 𝜎̂ 𝑣2 . Morris (1983a) studied the basic area level model without auxiliary information (zi = 1 for all i) and in the case of equal sampling variances, 𝜓i = 𝜓 for all i, with ind iid known 𝜓. This model may be expressed as 𝜃̂i |𝜃i N(𝜃i , 𝜓) and 𝜃i |𝜇, 𝜎𝑣2 N(𝜇, 𝜎𝑣2 ), i = 1, , m. The Bayes estimator (9.2.1) reduces in this case to 𝜃̂iB = 𝛾 𝜃̂i + 1( − 𝛾)𝜇,
(9.2.29)
where 𝛾 = 𝜎𝑣2 ∕ 𝜎(𝑣2 + 𝜓). We first obtain the HB estimator, 𝜃̂iHB (𝜎𝑣2 ), for a given 𝜎𝑣2 , assuming that 𝜇 uniform(−∞, ∞) to reflect the absence of prior information on 𝜇; that is, f (𝜇) = constant. It is easy to verify that ̂ 𝜎𝑣2 𝜇|𝜽,
N(𝜃̂⋅ , m−1 (𝜎𝑣2 + 𝜓)),
(9.2.30)
∑ ̂ where 𝜃̂⋅ = m i=1 𝜃i ∕m (see Section 9.12.2). It now follows from (9.2.29) that ̂ 𝜎𝑣2 ) 𝜃̂iHB (𝜎𝑣2 ) = 𝛾 𝜃̂i + 1( − 𝛾)E(𝜇|𝜽, = 𝜃̂i − (1 − 𝛾)(𝜃̂i − 𝜃̂⋅ ). The HB estimator 𝜃̂iHB (𝜎𝑣2 ) is identical to the BLUP estimator 𝜃̃iH .
(9.2.31)
279
BASIC AREA LEVEL MODEL
The posterior variance of 𝜃i given 𝜽̂ and 𝜎𝑣2 is given by ̂ 𝜎𝑣2 ) = E𝜇 [V(𝜃i |𝜃̂i , 𝜇, 𝜎𝑣2 )] + V𝜇 [E(𝜃i |𝜃̂i , 𝜇, 𝜎𝑣2 )] V(𝜃i |𝜽, = E𝜇 [g1 (𝜎𝑣2 )] + V𝜇 (𝜃̂iB ),
(9.2.32)
where E𝜇 and V𝜇 , respectively, denote the expectation and the variance with respect to the posterior distribution of 𝜇 given 𝜽̂ and 𝜎𝑣2 , and g1 (𝜎𝑣2 ) = 𝛾𝜓 = 𝜓 − (1 − 𝛾)𝜓.
(9.2.33)
It now follows from (9.2.29), (9.2.30), and (9.2.32) that ̂ 𝜎𝑣2 ) = g1 (𝜎𝑣2 ) + g2 (𝜎𝑣2 ), V(𝜃i |𝜽,
(9.2.34)
g2 (𝜎𝑣2 ) = (1 − 𝛾)2 (𝜎𝑣2 + 𝜓)∕m = (1 − 𝛾)𝜓∕m.
(9.2.35)
where
The conditional posterior variance (9.2.34) is identical to the MSE of BLUP estimator. To take account of the uncertainty associated with 𝜎𝑣2 , Morris (1983a) assumed that 2 ̂ 𝜎𝑣 uniform[0, ∞); that is, f (𝜎𝑣2 ) =constant. The resulting posterior density, f (𝜎𝑣2 |𝜽), ̂ ̂ and E[(1 − 𝛾)2 |𝜽]. however, does not yield closed-form expressions for E[(1 − 𝛾)|𝜽)] The latter terms are needed in the evaluation of 𝜃̂iHB = E𝜎 2 [𝜃̂iHB (𝜎𝑣2 )]
(9.2.36)
𝑣
and the posterior variance ̂ = E 2 [V(𝜃i |𝜽, ̂ 𝜎𝑣2 )] + V 2 [𝜃̂ HB (𝜎𝑣2 )], V(𝜃i |𝜽) 𝜎 𝜎 i 𝑣
𝑣
(9.2.37)
where E𝜎 2 and V𝜎 2 , respectively, denote the expectation and the variance with respect 𝑣 𝑣 ̂ Morris (1983a) evaluated 𝜃̂ HB and V(𝜃i |𝜽) ̂ numerically. to f (𝜎𝑣2 |𝜽). i ̂ and Morris (1983a) also obtained closed-form approximations to E[(1 − 𝛾)|𝜽] ̂ by expressing them as ratios of definite integrals over the range [0, 1) E[(1 − 𝛾)2 |𝜽] 1 ∞ and then replacing ∫0 by ∫0 . This method is equivalent to assuming that 𝜎𝑣2 + 𝜓 is uniform on [0, ∞). We have ̂ ≈ 𝜓(m − 3)∕S = 1 − 𝛾 ∗ , E𝜎 2 [(1 − 𝛾)|𝜽] 𝑣
where S =
∑m
(9.2.38)
̂ − 𝜃̂⋅ )2 . Hence, it follows from (9.2.31) and (9.2.36) that
i=1 (𝜃i
𝜃̂iHB ≈ 𝛾 ∗ 𝜃̂i + 1( − 𝛾 ∗ )𝜃̂⋅ ,
(9.2.39)
280
EMPIRICAL BAYES (EB) METHOD
which is identical to 𝜃̂iEB given by (9.2.5) with zi = 1 for all i. Turning to the approx̂ we first note that imation to the posterior variance V(𝜃i |𝜽), ̂ ≈ 2𝜓 2 (m − 3)∕S2 = 2(1 − 𝛾 ∗ )2 ∕ m E𝜎 2 [(1 − 𝛾)2 |𝜽] ( − 3). 𝑣
(9.2.40)
̂ ≈ VM (𝜃i |𝜽), ̂ It now follows from (9.2.31), (9.2.33), (9.2.34), and (9.2.37) that V(𝜃i |𝜽) where 2(1 − 𝛾 ∗ )2 ̂ m−1 (1 − 𝛾 ∗ )𝜓 + (𝜃i − 𝜃̂⋅ )2 m m−3 (1 − 𝛾 ∗ )𝜓 2(1 − 𝛾 ∗ )2 ̂ = 𝜓𝛾 ∗ + + (𝜃i − 𝜃̂⋅ )2 . m m−3
̂ =𝜓 − VM (𝜃i |𝜽)
(9.2.41) (9.2.42)
̂ Morris (1983a) obtained (9.2.41). The alternative form (9.2.42) shows that VM (𝜃i |𝜽) H −1 ̂ is asymptotically equivalent to mse2 (𝜃i ) up to terms of order m , noting that 𝜓𝛾 ∗ = 𝜓 𝛾̂ +
2 (1 − 𝛾̂ )𝜓 m−1
(9.2.43)
and (1 − 𝛾̂ )𝜓 2(1 − 𝛾̂ )𝜓 2(1 − 𝛾̂ )2 ̂ + + (𝜃i − 𝜃̂⋅ )2 m m m = g1 (𝜎̂ 𝑣2 ) + g2 (𝜎̂ 𝑣2 ) + g3 (𝜎̂ 𝑣2 ) + g∗3i (𝜎̂ 𝑣2 , 𝜃̂i ),
mse2 (𝜃̂iH ) = 𝜓 𝛾̂ +
(9.2.44)
where 1 − 𝛾̂ = 𝜓(m − 1)∕S and 𝜎̂ 𝑣2 is the REML estimator of 𝜎𝑣2 . This result shows that the frequentist inference, based on the MSE estimator, and the HB inference, based on the approximation to posterior variance, agree up to terms of order m−1 . Neglected terms in this comparison are of lower order than m−1 . The approximation (9.2.41) to the posterior variance extends to the regression model, 𝜃i = zTi 𝜷 + 𝑣i , with equal sampling variances 𝜓i = 𝜓 (Morris 1983b). We have ∗ 2 ̂ ≈ 𝜓 − m − p (1 − 𝛾 ∗ )𝜓 + 2(1 − 𝛾 ) (𝜃̂i − zT 𝜷̂ LS )2 , VM (𝜃i |𝜽) i m m−p−2
(9.2.45)
where 1 − 𝛾 ∗ is given by (9.2.6). Note that the approximation to 𝜃̂iHB is identical to the EB estimator given by (9.2.5). Morris (1983b) also proposed an extension to the case of unequal sampling variances 𝜓i but with bi = 1. This extension is obtained from (9.2.45) by changing 𝜓 to 𝜓i , 1 − 𝛾 ∗ to 1 − 𝛾i∗ given by (9.2.7), 𝜷̂ LS to the weighted ̂ and finally multiplying the last term of (9.2.45) by the least squares estimator 𝜷, ∑ ∑ 2 factor (𝜓 𝑤 + 𝜎̂ 𝑣 )∕ 𝜓( i + 𝜎̂ 𝑣2 ), where 𝜓 𝑤 = m ̂ 𝑣2 + 𝜓i )−1 𝜓i ∕ m ̂ 𝑣2 + 𝜓i )−1 and i=1 (𝜎 i=1 (𝜎 2 𝜎̂ 𝑣 is either the REML estimator or the estimator obtained by solving the moment equation (9.2.8) iteratively. Note that the adjustment factor (𝜓 𝑤 + 𝜎̂ 𝑣2 )∕ 𝜓( i + 𝜎̂ 𝑣2 )
281
BASIC AREA LEVEL MODEL
reduces to 1 in the case of equal sampling variances, 𝜓i = 𝜓. Theoretical justification of this extension remains to be studied. Example 9.2.2. Simulation Study. Jiang, Lahiri, and Wan (2002) reported simulation results on the relative performance of estimators of MSE(𝜃̂iEB ) under the simple ind iid mean model 𝜃̂i |𝜃i N(𝜃i , 𝜓) and 𝜃i N(𝜇, 𝜎𝑣2 ), i = 1, , m, where 𝜃̂iEB is given by (9.2.39), the approximation of Morris (1983a) to 𝜃̂iHB . The estimators studied include (i) the Prasad–Rao estimator msePR (𝜃̂iEB ) = g1 (𝜎̂ 𝑣2 ) + g2 (𝜎̂ 𝑣2 ) + 2g3 (𝜎̂ 𝑣2 ), where g1 (𝜎𝑣2 ) and g2 (𝜎𝑣2 ) are given by (9.2.33) and (9.2.35), respectively, and g3 (𝜎𝑣2 ) = 2𝜓(1 − 𝛾)∕m; (ii) the area-specific estimator mse2 (𝜃̂iEB ), which is equivalent to mseBL (𝜃̂iEB ) given by (9.2.21); (iii) the approximation to Laird and Louis ̂ given by (9.2.19); (iv) the jackknife estimator mseJ (𝜃̂ EB ); bootstrap, Ṽ LL (𝜃i |𝜽), i ̂ given by (9.2.45), and (vi) the naive estima(v) the Morris estimator, VM (𝜃i |𝜽), tor mseN (𝜃̂iEB ) = g1 (𝜎̂ 𝑣2 ) + g2 (𝜎̂ 𝑣2 ), which ignores the variability associated with 𝜎̂ 𝑣2 . Average relative bias, RB, was used as the criterion for comparison of the ∑ estimators, where RB = m−1 m i=1 RBi , RBi = [E(msei ) − MSEi ]∕MSEi , and msei denotes an estimator of MSE for the ith area and MSEi = MSE(𝜃̂iEB ). Note that ̂ and mseJ (𝜃̂ EB ) are second-order unbiased unlike msePR (𝜃̂iEB ), mseBL (𝜃̂iEB ), VM (𝜃i |𝜽), i ̂ ̃ VLL (𝜃i |𝜽). One million samples were simulated from the model by letting 𝜇 = 0 without loss of generality, 𝜎𝑣2 = 𝜓 = 1 and m = 30, 60 and 90. Values of RB calculated from the simulated samples are reported in Table 9.1. As expected, the naive estimator ̂ lead to mseN (𝜃̂iEB ) and, to a lesser extent, the Laird and Louis estimator Ṽ LL (𝜃i |𝜽) underestimation of MSE. The remaining estimators are nearly unbiased with RB less than 1%. The performance of all the estimators improves as m increases, that is, RB decreases as m increases. 9.2.4
*EB Confidence Intervals
We now turn to EB CIs on the individual small area parameters, 𝜃i , under the basic ind ind , m. We define area level model (i) 𝜃̂i |𝜃i N(𝜃i , 𝜓i ) and (ii) 𝜃i N(zTi 𝜷, 𝜎𝑣2 ), i = 1,
TABLE 9.1
Percent Average Relative Bias (RB) of MSE Estimators
MSE Estimators mseN ̂ Ṽ LL (𝜃i |𝜽) mseJ ̂ VM (𝜃i |𝜽) mseBL msePR
m = 30
m = 60
m = 90
−8.4 −3.8 0.6 0.7 0.1 0.7
−4.8 −2.9 0.2 −0.1 −0.2 −0.1
−3.2 −2.0 0.1 0.0 −0.1 0.0
Source: Adapted from Table 1 in Jiang, Lahiri, and Wan (2002).
282
EMPIRICAL BAYES (EB) METHOD
a (1 − 𝛼)-level CI on 𝜃i as Ii (𝛼) such that P[𝜃i ∈ Ii (𝛼)] = 1 − 𝛼
(9.2.46)
for all possible values of 𝜷 and 𝜎𝑣2 , where P(⋅) refers to the assumed model. A normal theory EB interval is given by IiNT (𝛼) = [𝜃̂iEB − z𝛼∕2 {s(𝜃̂iEB )}, 𝜃̂iEB + z𝛼∕2 {s(𝜃̂iEB )}],
(9.2.47)
where z𝛼∕2 is the upper (𝛼∕2)-point of N(0, 1) and s2 (𝜃̂iEB ) is a second-order unbiased estimator of MSE(𝜃̂iEB ). The interval IiNT (𝛼), however, is only first-order correct in the sense of P[𝜃i ∈ IiNT (𝛼)] = 1 − 𝛼 + O(m−1 ) (see Diao et al. 2014). It is desirable to construct more accurate second-order correct intervals Ii (𝛼) that satisfy P[𝜃i ∈ Ii (𝛼)] = 1 − 𝛼 + o(m−1 ). Section 9.2.4 of Rao (2003a) reviews second-order correct intervals for the special case of equal sampling variances 𝜓i = 𝜓 for all i. Here, we provide a brief account of recent methods of constructing second-order correct intervals for the general case of unequal 𝜓i . (i) Parametric Bootstrap Intervals In Section 6.2.4, bootstrap data {(𝜃̂i∗ , zi ); i = 1, , m} are generated by first draŵ 𝜎̂ 𝑣2 ) and then drawing 𝜃̂ ∗ from N(𝜃 ∗ , 𝜓i ). Here, we use the booting 𝜃i∗ from N(zTi 𝜷, i i strap data to approximate the distribution of t = (𝜃̂iEB − 𝜃i )∕[g1i (𝜎̂ 𝑣2 )]1∕2 ,
(9.2.48)
where g1i (𝜎𝑣2 ) = 𝛾i 𝜓i with 𝛾i = 𝜎𝑣2 ∕(𝜎𝑣2 + 𝜓i ). Denote the EB estimator of 𝜃i and the EB and 𝜎 2 , respectively, and estimator of 𝜎𝑣2 obtained from the bootstrap data by 𝜃̂i∗ ̂ 𝑣∗ the corresponding value of t by EB 2 1∕2 − 𝜃i∗ )∕[g1i (𝜎̂ 𝑣∗ )] . t∗ = (𝜃̂i∗
(9.2.49)
2 in any bootstrap sample is zero, we set it to When 𝜎̂ 𝑣2 computed from the data or 𝜎̂ 𝑣∗ a small threshold such as 0.01. Alternatively, any of the strictly positive estimators of 𝜎𝑣2 given in Section 6.4.2 may be used. We approximate the distribution of t by the known bootstrap distribution of t∗ and calculate quantiles q1 and q2 (q1 < q2 ) satisfying P∗ (q1 ≤ t∗ ≤ q2 ) = 1 − 𝛼, where P∗ refers to the bootstrap distribution. A bootstrap interval on 𝜃i is then obtained from q1 ≤ t ≤ q2 as
IiCLL (𝛼) = [𝜃̂iEB − q2 {g1i (𝜎̂ 𝑣2 )}1∕2 , 𝜃̂iEB − q1 {g1i (𝜎̂ 𝑣2 )}1∕2 ].
(9.2.50)
The choice of q1 and q2 may be based on equal tail probabilities of size 𝛼∕2 or shortest length interval. In practice, we generate a large number, B, of t∗ -values, denoted , ti∗ (B), and determine q1 and q2 from the resulting empirical bootstrap by ti∗ (1), distribution. The number of bootstrap samples, B, for obtaining the quantiles q1 and q2 should be much larger than the value of B used for bootstrap MSE estimation.
283
BASIC AREA LEVEL MODEL
Chatterjee, Lahiri, and Li (2008) proved that, under regularity assumptions, P[𝜃i ∈ IiCLL (𝛼)] = 1 − 𝛼 + O(m−3∕2 ),
(9.2.51)
which establishes second-order accuracy of IiCLL (𝛼) in terms of coverage. Chatterjee, Lahiri, and Li (CLL) have, in fact, extended the result to construct bootstrap intervals under a general linear mixed model (Section 5.2). Proof of (9.2.51) is highly technical and we refer the reader to Chatterjee et al. (2008) for details. If 𝜃i = g(Y i ) is a one-to-one function of the area mean Y i , then the corresponding second-order correct bootstrap interval for the area mean Y i is given by [g−1 (c1 ), g−1 (c2 )], where c1 = 𝜃̂iEB − q2 {g1i (𝜎̂ 𝑣2 )}1∕2 and c2 = 𝜃̂iEB − q1 {g1i (𝜎̂ 𝑣2 )}1∕2 . For example, if 𝜃i = log(Y i ), then g−1 (c1 ) = exp(c1 ) and g−1 (c2 ) = exp(c2 ). Hall and Maiti (2006b) proposed a different parametric bootstrap method to construct CIs on 𝜃i . We first note that a (1 − 𝛼)-level CI on 𝜃i , without using the direct estimator 𝜃̂i , is given by (zTi 𝜷 − z𝛼∕2 𝜎𝑣 , zTi 𝜷 + z𝛼∕2 𝜎𝑣 ) if 𝜷 and 𝜎𝑣 are known. Now replacing 𝜷 and 𝜎𝑣 by their estimators 𝜷̂ and 𝜎̂ 𝑣 , we get the estimated interval (zTi 𝜷̂ − z𝛼∕2 𝜎̂ 𝑣 , zTi 𝜷̂ + z𝛼∕2 𝜎̂ 𝑣 ) with coverage probability deviating from the nominal 1 − 𝛼 by terms of order m−1 . To achieve higher order accuracy, parametric bootstrap is used to find points b𝛼∕2 and b1−𝛼∕2 such that P∗ (𝜃i∗ ≤ zTi 𝜷̂ + b𝛼∕2 𝜎̂ 𝑣∗ ) = 𝛼∕2 and P∗ (𝜃i∗ ≤ zTi 𝜷̂ + b1−𝛼∕2 𝜎̂ 𝑣∗ ) = 1 − 𝛼∕2. The resulting bootstrap interval for 𝜃i , given by IiHM (𝛼) = (zTi 𝜷̂ − b𝛼∕2 𝜎̂ 𝑣 , zTi 𝜷̂ + b1−𝛼∕2 𝜎̂ 𝑣 ), achieves second-order accuracy in the sense P[𝜃i ∈ IiHM (𝛼)] = 1 − 𝛼 + o(m−1 ), as shown by Hall and Maiti (2006b). However, the Hall–Maiti (HM) bootstrap interval is not centered around the reported point estimator, 𝜃̂iEB , unlike the CLL bootstrap interval. On the other hand, note that ̂ of 𝜃i , and therefore it IiHM (𝛼) is based on the regression-synthetic estimator, zTi 𝜷, T ̂ can be used for nonsampled areas 𝓁 for which only z𝓁 𝜷 is available for estimating , M. 𝜃𝓁 , 𝓁 = m + 1, Example 9.2.3. Simulation study. Chatterjee et al. (2008) reported the results of a limited simulation study, based on R = 10, 000 simulation runs, on the relative performance of the normal theory (NT) interval (9.2.47) and the bootstrap interval (9.2.50) in terms of CI coverage and length. The setup used in Example 6.2.1 was employed for this purpose. It consists of the FH model with zTi 𝜷 = 𝜇, and five groups of areas with three areas in each group (m = 15). Two different patterns for the sampling variances 𝜓i with equal 𝜓i ’s within each group were used: (a) 0.2, 0.4, 0.5, 0.6, 4.0; and (b) 0.4, 0.8, 1.0, 1.2, 8.0. The ratio 𝜓i ∕(𝜎𝑣2 + 𝜓i ) is made equal for the two patterns by choosing 𝜎𝑣2 = 1 for pattern (a) and 𝜎𝑣2 = 2 for pattern (b). The MSE estimator s2 (𝜃̂iEB ) used in the NT interval (9.2.47) is based on the second-order correct MSE estimator (6.2.7), based on the FH moment estimator 2 . The latter estimator of 𝜎 2 is also used in the bootstrap interval (9.2.50) based 𝜎̂ 𝑣m 𝑣 on B = 1, 000 bootstrap replicates. The NT interval (9.2.47) consistently led to undercoverage for pattern (b) with coverage ranging from 0.84 to 0.90 compared to nominal 0.95. Undercoverage of the NT interval is less severe for pattern (a) with smaller variability: 0.90 to 0.95. On the other hand, the performance of the bootstrap
284
EMPIRICAL BAYES (EB) METHOD
interval (9.2.50) remains stable over the two patterns (a) and (b), with coverage close to the nominal 0.95. In terms of average length, the shortest length bootstrap interval is slightly better than the bootstrap interval based on equal tail probabilities. (ii) Closed-Form CIs We now study closed-form second-order correct CI that avoids bootstrap calibration and is computationally simpler. Using an expansion of the coverage probability P[𝜃̂iEB − z s(𝜃̂iEB ) ≤ 𝜃i ≤ 𝜃̂iEB + z s(𝜃̂iEB )], Diao et al. (2014) showed that the customary choice z = z𝛼∕2 leads to a first-order correct interval. However, an adjusted choice 2 ) gives second-order accuracy in the case of REML estimator 𝜎 2 , where z = zi (𝜎̂ 𝑣RE ̂ 𝑣RE 2 zi (𝜎̂ 𝑣RE ) = z𝛼∕2 +
2 )𝜓 2 (z3𝛼∕2 + z𝛼∕2 )g3i (𝜎̂ 𝑣RE i 2 )(𝜎 2 8ki2 (𝜎̂ 𝑣RE ̂ 𝑣RE + 𝜓i )
.
(9.2.52)
2 ) = g (𝜎 2 2 ) and g , g , and g are as defined in Chapter 6. Here, ki (𝜎̂ 𝑣RE ̂ 𝑣RE 1i ̂ 𝑣RE ) + g2i (𝜎 1i 2i 3i For the special case 𝜓i = 𝜓, (9.2.52) reduces to
2 zi (𝜎̂ 𝑣RE )
= z𝛼∕2 +
(z3𝛼∕2 + z𝛼∕2 )𝜓 2 2 4m(𝜎̂ 𝑣RE + 𝜓∕m)2
(9.2.53)
,
noting that ki (𝜎𝑣2 ) = 𝜎𝑣2 𝜓∕(𝜎𝑣2 + 𝜓) + 𝜓 2 ∕[m(𝜎𝑣2 + 𝜓)] and g3i (𝜎𝑣2 ) = 2𝜓 2 ∕[m(𝜎𝑣2 + 𝜓)]. Proof of (9.2.52) is highly technical and we refer the reader to Diao et al. (2014) for details. Result (9.2.52) requires that 𝜎𝑣2 ∕𝜓i values are bounded away from zero. Simulation results under pattern (a) of Chatterjee et al. (2008) showed that the Diao et al. method, based on equal tail probabilities, is comparable to the Chatterjee et al. method in terms of coverage accuracy. Diao et al. (2014) also studied coverage accuracy for the nonnormal case by generating the sampling errors ei and the random effects 𝑣i from shifted chi-squared distributions with mean 0 and variance 𝜓i and mean 0 and variance 𝜎𝑣2 , respectively. Simulation results indicated that the interval based on (9.2.52) performs better than the normality-based bootstrap interval (9.2.50) in terms of coverage accuracy for that nonnormal case. Yoshimori and Lahiri (2014c) introduced a new adjusted residual (restricted) maximum-likelihood method to produce second-order correct CIs for the means 𝜃i . This method is based on the estimator of 𝜎𝑣2 maximizing the adjusted residual likelihood Li (𝜎𝑣2 ) = hi (𝜎𝑣2 )LR (𝜎𝑣2 ) for a specified function hi (𝜎𝑣2 ), where LR (𝜎𝑣2 ) is the residual likelihood of 𝜎𝑣2 (Section 5.2.4). Note that the choice hi (𝜎𝑣2 ) = 𝜎𝑣2 is studied in Section 6.4.2 in the context of getting positive estimators of 𝜎𝑣2 . Denote 2 and the NT interval based on the pivotal quantity the resulting estimator of 𝜎𝑣2 by 𝜎̂ 𝑣h i (9.2.48) by 2 2 Iihi (𝛼) = [𝜃̂iEB − z𝛼∕2 {g1i (𝜎̂ 𝑣h )}1∕2 , 𝜃̂iEB + z𝛼∕2 {g1i (𝜎̂ 𝑣h )}1∕2 ]. i
i
(9.2.54)
Yoshimori and Lahiri (YL) showed that the coverage error term associated with (9.2.54) is O(m−1 ) and depends on hi (𝜎𝑣2 ). They chose the adjustment function hi (𝜎𝑣2 ) = hi (𝜎𝑣2 ) such that the O(m−1 ) term vanishes at hi (𝜎𝑣2 ). Then, they
285
BASIC AREA LEVEL MODEL
found the corresponding estimator of 𝜎𝑣2 that maximizes the adjusted likelihood Li = hi (𝜎𝑣2 )LR (𝜎𝑣2 ). Denote this estimator by 𝜎̂ 2 and the corresponding 𝜃̂iEB obtained 𝑣hi
from 𝜎̂ 2 by 𝜃̂iEB (𝜎̂ 2 ). The resulting second-order correct CI on 𝜃i is given by 𝑣hi
𝑣hi
] [ IiYL (𝛼) = 𝜃̂iEB (𝜎̂ 2 ) − z𝛼∕2 {g1i (𝜎̂ 2 )}1∕2 , 𝜃̂iEB (𝜎̂ 2 ) + z𝛼∕2 {g1i (𝜎̂ 2 )}1∕2 . (9.2.55) 𝑣hi
𝑣hi
𝑣hi
𝑣hi
The coverage error associated with IiYL (𝛼) is O(m−3∕2 ). Proofs for establishing the second-order correct property of IiYL (𝛼) are highly technical, and we refer the reader to Yoshimori and Lahiri (2014c) for details. Note that the EB estimator 𝜃̂iEB (𝜎̂ 2 ) 𝑣hi
2 ). Yoshimori and Lahiri (2014c) derived a second-order is different from 𝜃̂iEB (𝜎̂ 𝑣RE unbiased estimator of MSE associated with 𝜃̂iEB (𝜎̂ 2 ). However, they recommended 𝑣hi
the use of 𝜃̂iEB and estimator of MSE(𝜃̂iEB ) based on the positive estimator of 𝜎𝑣2 proposed in Yoshimori and Lahiri (2014a), see Section 6.4.2. Simulation results reported in Yoshimori and Lahiri (2014c) indicated that the interval IiYL (𝛼) is comparable to the bootstrap interval IiCLL (𝛼) in terms of coverage accuracy. Concerning length, intervals IiYL (𝛼) are designed to have smaller length than the intervals for the direct estimators. In simulations, the bootstrap intervals IiCLL (𝛼) also showed average lengths smaller than those of direct estimators, but currently there is no formal theory supporting this fact. Note that the closed-form intervals are at present available only for the basic area level model, unlike the parametric bootstrap method, which can be extended to general linear mixed models including the basic unit level model (Chatterjee et al. 2008). Dass et al. (2012) studied CI estimation for the case of unknown sampling variances 𝜓i in the FH model. They assumed simple random sampling within areas and iid considered that 𝜃̂i = yi is the mean of ni observations yij N(𝜃i , 𝜎i2 ), similar to Rivest and Vandal (2003), see Section 6.4.1. Further, they assume 𝜃i N(zTi 𝜷, 𝜎𝑣2 ), [(ni − iid
1)s2i ∕𝜎i2 ]|𝜎i2 n2 −1 and 𝜎i−2 G(a, b), a > 0, b > 0, independently for i = i 1, , m. Under this setup, they obtained improved EB estimators of the area means 𝜃i as well as smoothed estimators of the sampling variances 𝜓i = 𝜎i2 ∕ni , assuming , m} are available. Dass et al. (2012) also developed that the data {(yi , s2i , zi ); i = 1, CIs for the means 𝜃i using a decision theory approach. Note that no prior distribution on the model parameters 𝜷, 𝜎𝑣2 , a and b is assumed, unlike in the HB approach. The proposed model, however, assumes random sampling variances 𝜎i2 , which in turn implies that 𝜓i = 𝜎i2 ∕ni is also random. An HB approach by You and Chapman (2006), under the above variant of the FH model, is outlined in Section 10.3.4. (iii) Population-Specific Simultaneous Coverage In sections (i) and (ii), we studied unconditional coverage of CIs under the FH model, assuming normality of random effects 𝑣i and sampling errors ei . It is of practical interest to study the population-specific (or design-based) coverage of , 𝜃m )T and referring only to the model-based CIs, by conditioning on 𝜽 = (𝜃1 , ind , m. sampling model 𝜃̂i |𝜃i N(𝜃i , 𝜓i ), i = 1,
286
EMPIRICAL BAYES (EB) METHOD
Zhang (2007) studied design-based coverages of CIs for the special case of known model parameters 𝜷 and 𝜎𝑣2 . In this case, the (1 − 𝛼)-level model-based interval on 𝜃i is given by Ii (𝛼) = [𝜃̂iB − z𝛼∕2 {g1i (𝜎𝑣2 )}1∕2 , 𝜃̂iB + z𝛼∕2 {g1i (𝜎𝑣2 )}1∕2 ]
(9.2.56)
The interval Ii (𝛼) has exact coverage of 1 − 𝛼 under the assumed FH model, where 𝜃̂iB = zTi 𝜷 + 𝛾i (𝜃̂i − zTi 𝜷) is the BP estimator of 𝜃i and g1i (𝜎𝑣2 ) = 𝛾i 𝜓i = (1 − 𝛾i )𝜎𝑣2 as earlier. Letting Ai = Ai (𝛼) = 1 iff 𝜃i ∈ Ii (𝛼), model coverage of Ii (𝛼) may be expressed as Δi (𝛼) = E(Ai ) = 1 − 𝛼, where the expectation E is with respect to both the sampling model and the linking model. Let 𝛿i (𝛼) = Ep (Ai ) denote the conditional design coverage of the interval Ii (𝛼), where Ep is the expectation with respect to the sampling model conditional on 𝜃i (or 𝑣i ). Also, suppose that 𝜎𝑣2 ∕𝜓i ≈ 0. In this case, the interval Ii (𝛼) for 𝜃i = zTi 𝜷 + 𝑣i reduces to the following interval for 𝑣i : (9.2.57)
Ii (𝛼) = (−z𝛼∕2 𝜎𝑣 ≤ 𝑣i ≤ z𝛼∕2 𝜎𝑣 ).
It follows from (9.2.57) that 𝛿i (𝛼) = 1 if |𝑣i | ≤ z𝛼∕2 𝜎𝑣 and 𝛿i (𝛼) = 0 otherwise. Hence, the area-specific interval Ii (𝛼) leads to degenerate design coverage; that is, the design coverage of Ii (𝛼) is uncontrollable. To get around the above difficulty with area-specific design coverage, Zhang ∑ (2007) proposed the use of simultaneous design coverage 𝛿(𝛼) = m−1 m i=1 𝛿i (𝛼), which summarizes all the area-specific design coverages in a single number. The proposed coverage 𝛿(𝛼) may be interpreted as the expected proportion of parameters 𝜃i covered by the set of intervals Ii (𝛼) under the sampling model conditional on 𝜽. Zhang (2007) showed that 𝛿(𝛼) converges in model probability to the nominal level 1 − 𝛼 as m → ∞. That is, the design-based simultaneous coverage of model-based intervals Ii (𝛼) is close to the nominal level, given a large number of areas. This result follows by noting that Em [𝛿i (𝛼)] = 1 − 𝛼, Vm [𝛿i (𝛼)] = V(Ai ) − Em Vp (Ai ) ≤ V(Ai ) = 𝛼(1 − 𝛼), Covm [𝛿i (𝛼), 𝛿t (𝛼)] = 0, i ≠ t, and then appealing to the Law of Large Numbers (LLN), where V denotes the total variance and Vp the variance with respect to the sampling model. The covariance result follows by noting that Covm [𝛿i (𝛼), 𝛿t (𝛼)] = Cov(Ai , At ) = 0,
i ≠ t,
since Ai is a function of (𝑣i , ei ), which is independent of At , a function of (𝑣t , et ) under the FH model, where Cov denotes the total covariance. The above result on simultaneous coverage is applicable to intervals in Section (ii) for the case of unknown 𝜷 and 𝜎𝑣2 , provided E(Ai ) ≈ 1 − 𝛼 as m → ∞. We can apply the above ideas to show that the design expectation of the average of the model-based MSE estimators, mse(𝜃̂iB ) = g1i (𝜎𝑣2 ) = 𝛾i 𝜓i , converges in model probability to the average of the design MSEs, MSEp (𝜃̂iB ) = Ep (𝜃̂i − 𝜃i )2 , as m → ∞. That is, the average design bias of the estimators mse(𝜃̂iB ), i = 1, , m, converges in model
287
LINEAR MIXED MODELS
probability to zero, as m → ∞. This result follows by expressing the average design bias as m
1∑ {E [mse(𝜃̂iB )] − MSEp (𝜃̂iB )} m i=1 p m
m
1∑ 1∑ (1 − 𝛾i )2 (𝑣2i − 𝜎𝑣2 ) =∶ − u = −u, =− m i=1 m i=1 i appealing to ]the LLN, noting that Em (ui ) = 0 and that V(u) = m−1 [and−1 then ∑m 2 2 m i=1 (1 − 𝛾i ) V(𝑣i ) → 0, as m → ∞.
9.3
LINEAR MIXED MODELS
9.3.1
EB Estimation of 𝝁i = lTi 𝜷 + mTi vi
Assuming normality, the linear mixed model (5.3.1) with block-diagonal covariance structure may be expressed as yi |vi with vi
ind
ind
(9.3.1)
N(Xi 𝜷 + Zi vi , Ri )
i = 1,
N(𝟎, Gi ),
, m,
where Gi and Ri depend on a vector of variance parameters 𝜹. The Bayes or BP estimator of realized 𝜇i = lTi 𝜷 + mTi vi is given by the conditional expectation of 𝜇i given yi , 𝜷, and 𝜹: 𝜇̂ iB = 𝜇̂ iB (𝜷, 𝜹) ∶= E(𝜇i |yi , 𝜷, 𝜹) = lTi 𝜷 + mTi v̂ Bi ,
(9.3.2)
v̂ Bi = E(vi |yi , 𝜷, 𝜹) = Gi ZTi V−1 i (yi − Xi 𝜷)
(9.3.3)
where
and Vi = Ri + Zi Gi ZTi . The result (9.3.2) follows from the posterior (or conditional) distribution of 𝜇i given yi : 𝜇i |yi , 𝜷, 𝜹
ind
N(𝜇̂ iB , g1i (𝜹)),
(9.3.4)
where g1i (𝜹) is given by (5.3.6). The estimator 𝜇̂ iB depends on the model parameters 𝜷 and 𝜹, which are estimated from the marginal distribution yi
ind
N(Xi 𝜷, Vi ),
i = 1,
,m
(9.3.5)
288
EMPIRICAL BAYES (EB) METHOD
̂ we obtain the EB or the using ML or REML. Denoting the estimators as 𝜷̂ and 𝜹, B ̂ EBP estimator of 𝜇i from 𝜇̂ i by substituting 𝜷 for 𝜷 and 𝜹̂ for 𝜹: ̂ 𝜹) ̂ = lT 𝜷̂ + mT v̂ B (𝜷, ̂ 𝜹). ̂ 𝜇̂ iEB = 𝜇̂ iB (𝜷, i i i
(9.3.6)
The EB estimator 𝜇̂ iEB is identical to the EBLUP estimator 𝜇̂ iH given in (5.3.8). Note ̂ 𝜹), ̂ of 𝜇i , which that 𝜇̂ iEB is also the mean of the estimated posterior density, f (𝜇i |yi , 𝜷, ̂ is N[𝜇̂ EB , g1i (𝜹)]. i
9.3.2
MSE Estimation
The results in Section 5.3.2 on the estimation of the MSE of the EBLUP estimator 𝜇̂ iH are applicable to the EB estimator 𝜇̂ iEB because 𝜇̂ iH and 𝜇̂ iEB are identical under normality. Also, the area-specific estimators (5.3.15)–(5.3.18) may be used as estimators of the conditional MSE, MSEc (𝜇̂ iEB ) = E[(𝜇̂ iEB − 𝜇i )2 |yi ], where the expectation is conditional on the observed yi for the ith area. As noted in Section 5.3.2, the MSE estimators are second-order unbiased. The jackknife MSE estimator, (9.2.15), for the basic area level model, readily extends to the linear mixed model with block-diagonal covariance structure. We calculate the delete-𝓁 estimators 𝜷̂ −𝓁 and 𝜹̂ −𝓁 by deleting the 𝓁th area data set (y𝓁 , X𝓁 , Z𝓁 ) from the full data set. This calculation is done for each 𝓁 to get m , m} which, in turn, provide m estimators of estimators {(𝜷̂ −𝓁 , 𝜹̂ −𝓁 ); 𝓁 = 1, EB ; 𝓁 = 1, EB is obtained from 𝜇̂ B = k (y , 𝜷, 𝜹) by changing , m}, where 𝜇̂ −𝓁 𝜇i , {𝜇̂ −𝓁 i i i 𝜷 and 𝜹 to 𝜷̂ −𝓁 and 𝜹̂ −𝓁 , respectively. The jackknife estimator is given by ̂ 1i + M ̂ 2i mseJ (𝜇̂ iB ) = M with
and
(9.3.7)
m
∑ ̂ − m−1 ̂ ̂ 1i = g1i (𝜹) M [g (𝜹̂ ) − g1i (𝜹)] m 𝓁=1 1i −𝓁
(9.3.8)
m
∑ ̂ 2i = m − 1 (𝜇̂ EB − 𝜇̂ iEB )2 , M m 𝓁=1 i,−𝓁
(9.3.9)
where g1i (𝜹) is given by (5.3.6). The jackknife MSE estimator, mseJ (𝜇̂ iB ), is also second-order unbiased (Jiang, Lahiri, and Wan 2002). 9.3.3
Approximations to the Posterior Variance
As noted in Section 9.2.3, the naive EB approach uses the estimated posterior density, ̂ 𝜹) ̂ to make inference on 𝜇i . In practice, 𝜇i is estimated by 𝜇̂ EB and the f (𝜇i |yi , 𝜷, i ̂ 𝜹) ̂ = g1i (𝜹) ̂ is used as a measure of variability. estimated posterior variance V(𝜇i |yi , 𝜷,
*EB ESTIMATION OF GENERAL FINITE POPULATION PARAMETERS
289
̂ can lead to severe underestimation of the true posterior But the naive measure, g1i (𝜹), variance V(𝜇i |y) as well as of MSE(𝜇̂ iEB ). The bootstrap method of Laird and Louis (1987) readily extends to the linear mixed model. Bootstrap data {y∗i (b), Xi , Zi ; i = 1, , m} are first generated from the ̂ i ), for b = 1, ̂ ̂ ̂ V estimated marginal distribution, N(Xi 𝜷, , B. Estimates {𝜷(b), 𝜹(b)} and EB estimates 𝜇̂ i∗EB (b) are then computed from the bootstrap data. The bootstrap approximation to the posterior variance is given by B
B
1∑ 1 ∑ ∗EB ̂ VLL (𝜇i |y) = g1i (𝜹(b)) + [𝜇̂ (b) − 𝜇̂ i∗EB (⋅)]2 , B b=1 B b=1 i
(9.3.10)
∑ where 𝜇̂ i∗EB (⋅) = B−1 Bb=1 𝜇̂ i∗EB (b). By deriving an approximation, Ṽ LL (𝜇i |y) of VLL (𝜇i |y), as B → ∞, Butar and Lahiri (2003) showed that VLL (𝜇i |y) is not second-order unbiased for MSE(𝜇̂ iEB ). Then, they proposed a bias-corrected MSE estimator ̂ mseBL (𝜇̂ iEB ) = Ṽ LL (𝜇i |y) + g3i (𝜹),
(9.3.11)
where g3i (𝜹) is given by 5.3.10 and ̂ + g2i (𝜹) ̂ + g∗ (𝜹, ̂ yi ), Ṽ LL (𝜇i |y) = g1i (𝜹) 3i
(9.3.12)
where g2i (𝜹) and g∗3i (𝜹, yi ) are given by (5.3.7) and (5.3.14), respectively. It now follows from (9.3.11) and (9.3.12) that mseBL (𝜇̂ iEB ) is identical to the area-specific MSE estimator given by (5.3.16). Kass and Steffey (1989) obtained a first-order approximation to V(𝜇i |y), denoted here VKS (𝜇i |y). After simplification, VKS (𝜇i |y) reduces to Ṽ LL (𝜇i |y) given by (9.3.12). Therefore, VKS (𝜇i |y) is also not second-order unbiased for MSE(𝜇̂ iEB ). A second-order approximation to V(𝜇i |y) ensures that the neglected terms are of lower order than m−1 , but it depends on the prior density, f (𝜷, 𝜹), unlike the first-order approximation. 9.4 *EB ESTIMATION OF GENERAL FINITE POPULATION PARAMETERS This section deals with EB estimation of general (possibly nonlinear) parameters of a finite population. The finite population P contains N units and a sample s of size n is drawn from P. We denote by r = P − s the sample complement of size N − n. We denote by yP the vector with the unit values of the target variable in the population, which is assumed to be random with a given joint probability distribution. We denote by ys the subvector of yP corresponding to the sample units, by yr the subvector of nonsampled units and consider without loss of generality that the first n elements of y are the sample elements, that is, yP = (yTs , yTr )T . The target parameter is a real measurable function 𝜏 = h(yP ) of the population vector yP that we want to estimate using the available sample data ys .
290
9.4.1
EMPIRICAL BAYES (EB) METHOD
BP Estimator Under a Finite Population
Let 𝜏̂ denote an estimator of 𝜏 based on ys . The MSE of 𝜏̂ is defined as MSE(𝜏) ̂ = E(𝜏̂ − 𝜏)2 , where E denotes expectation with respect to the joint distribution of yP . The BP estimator of 𝜏 is the function of ys with minimum MSE. Let us define 𝜏 0 = Eyr (𝜏|ys ), where the expectation is now taken with respect to the conditional distribution of yr given ys . Note that 𝜏 0 is a function of the sample data ys and model parameters. We can write MSE(𝜏) ̂ = E(𝜏̂ − 𝜏 0 + 𝜏 0 − 𝜏)2 = E(𝜏̂ − 𝜏 0 )2 + 2 E[(𝜏̂ − 𝜏 0 )(𝜏 0 − 𝜏)] + E(𝜏 0 − 𝜏)2 . In this expression, the last term does not depend on 𝜏. ̂ For the second term, using the Law of Iterated Expectations, we get E[(𝜏̂ − 𝜏 0 )(𝜏 0 − 𝜏)] = Eys {Eyr [(𝜏̂ − 𝜏 0 )(𝜏 0 − 𝜏)|ys ]} = Eys {(𝜏̂ − 𝜏 0 )[𝜏 0 − Eyr (𝜏|ys )]} = 0. Thus, the BP estimator of 𝜏 is the minimizer 𝜏̂ of E[(𝜏̂ − 𝜏 0 )2 ]. Since this quantity is nonnegative and its minimum value of zero is obtained at 𝜏̂ = 𝜏 0 , the BP estimator of 𝜏 is given by (9.4.1) 𝜏̂ B = 𝜏 0 = Eyr (𝜏|ys ). Note that the BP estimator is unbiased in the sense that E(𝜏̂ B − 𝜏) = 0 because Eys (𝜏̂ B ) = Eys [Eyr (𝜏|ys )] = E(𝜏). Typically, the distribution of yP depends on an unknown parameter vector 𝝀, which can be estimated using the sample data ys . Then, the EB (or EBP) estimator of 𝜏, denoted 𝜏̂ EB , is given by (9.4.1), with the expectation taken with respect to the diŝ The EB estimator is not exactly tribution of yr |ys with 𝝀 replaced by an estimator 𝝀. unbiased, but the bias coming from the estimation of 𝝀 is negligible for large samples, provided that 𝝀̂ is consistent for 𝝀.
9.4.2
EB Estimation Under the Basic Unit Level Model
Suppose now that the population contains m areas (or subpopulations) P1 , , Pm of , Nm . Let si be a sample of size ni drawn from Pi and ri = Pi − si the sizes N1 , sample complement, i = 1, , m. We assume that the value yij of a target variable for jth unit in ith area follows the basic unit level model (4.3.1). In this section, the target parameter is 𝜏i = h(yPi ), where yPi = (yTis , yTir )T is the vector with variables yij for sample and nonsampled units from area i. Applying (9.4.1) to 𝜏i = h(yPi ) and noting
*EB ESTIMATION OF GENERAL FINITE POPULATION PARAMETERS
that yP1 , given by
291
, yPm are independent under the assumed model, the BP estimator of 𝜏i is 𝜏̂iB = Eyir [h(yPi )|yis ] =
∫ℝNi −ni
h(yPi )f (yir |yis ) dyir ,
(9.4.2)
where f (yir |yis ) is the joint density of yir given the observed data vector yis . For the special case of the area mean 𝜏i = Y i , if Ni is large, then by (4.3.1) and the LLN, T 𝜏i ≈ Xi 𝜷 + 𝑣i , which is a special case of the mixed effect 𝜇i = lTi 𝜷 + mTi vi , studied in Section 9.3.1. For area parameters 𝜏i = h(yPi ) with nonlinear h(⋅), we might not be able to calculate analytically the expectation in (9.4.2). Molina and Rao (2010) proposed to approximate this expectation by Monte Carlo. For this, first generate A replicates {y(a) ; a = 1, , A} of yir from the conditional distribution f (yir |yis ). Then, attach ir as y(a) = ((y(a) )T , yTis )T . With the area vector the vector of sample elements yis to y(a) ir i ir (a) (a) (a) yi , calculate the target quantity 𝜏i = h(yi ) for each a and then average over the A replicates, that is, take A 1 ∑ (a) B 𝜏 . (9.4.3) 𝜏̂i ≈ A a=1 i Let yPi = (yi1 , , yiNi )T , XPi = (xi1 , , xiNi )T , i = 1, , m, Ik the identity matrix of order k, and 𝟏k a k 1 vector of ones. We assume the unit level model (4.3.1) with normality of 𝑣i and eij , so that yPi
ind
N(XPi 𝜷, VPi ),
i = 1,
, m,
(9.4.4)
, m. Consider the decomposition where VPi = 𝜎𝑣2 𝟏Ni 𝟏TN + 𝜎e2 diag1≤j≤Ni (kij2 ), i = 1, i into sample and nonsampled elements of XPi and of the elements of the covariance matrix, ( ) ( ) Xis Vis Visr XPi = , VPi = . Xir Virs Vir By the normality assumption in (9.4.4), we have that yir |yis N(𝝁ir|s , Vir|s ), where the conditional mean vector and covariance matrix are given by 𝝁ir|s = Xir 𝜷 + 𝛾i (yia − xTia 𝜷)𝟏Ni −ni ,
(9.4.5)
Vir|s = 𝜎𝑣2 (1 − 𝛾i )𝟏Ni −ni 𝟏TN −n + 𝜎e2 diagj∈ri (kij2 ),
(9.4.6)
i
i
∑ where 𝛾i = 𝜎𝑣2 (𝜎𝑣2 + 𝜎e2 ∕ai⋅ )−1 , with ai⋅ = j∈si aij and aij = kij−2 , and where yia and xia are defined in (7.1.7). Note that the application of the Monte Carlo approximation (9.4.3) involves simulation of m multivariate normal vectors yir of sizes Ni − ni , i = 1, , m, from the conditional distribution of yir |yis . Then this process has to be repeated a large number of times A, which may be computationally unfeasible for Ni − ni large. This can be avoided by noting that the conditional covariance matrix
292
EMPIRICAL BAYES (EB) METHOD
Vir|s , given by (9.4.6), corresponds to the covariance matrix of a random vector yir generated from the model (9.4.7)
yir = 𝝁ir|s + ui 𝟏Ni −ni + 𝝐 ir , with new random effects ui and errors 𝝐 ir that are independent and satisfy ui
N(0, 𝜎𝑣2 (1 − 𝛾i )) and
𝝐 ir
N(
2 2 Ni −ni , 𝜎e diagj∈ri (kij ));
see Molina and Rao (2010). Using model (9.4.7), instead of generating a multivariate normal vector of size Ni − ni , we just need to generate 1 + Ni − ni independent uniind iid variate normal variables ui N(0, 𝜎𝑣2 (1 − 𝛾i )) and 𝜖ij N(0, 𝜎e2 kij2 ), for j ∈ ri . Then, we obtain the corresponding nonsampled elements yij , j ∈ ri , from (9.4.7) using as means the corresponding elements of 𝝁ir|s given by (9.4.5). In practice, the model parameters 𝝀 = (𝜷 T , 𝜎𝑣2 , 𝜎e2 )T are replaced by consistent estiT mates 𝝀̂ = (𝜷̂ , 𝜎̂ 𝑣2 , 𝜎̂ e2 )T , such as the ML or REML estimates, and then the variables yij ̂ leading to the Monte Carlo approxare generated from (9.4.7) with 𝝀 replaced by 𝝀, imation of the EB estimator 𝜏̂iEB of 𝜏i . for j = 1, , Ni as For nonsampled areas i = m + 1, , M, we generate y(a) ij y(a) = xTij 𝜷̂ + 𝑣(a) + 𝜖ij(a) , where 𝑣(a) ij i i
iid
N(0, 𝜎̂ 𝑣2 ) and 𝜖ij(a)
ind
N(0, 𝜎̂ e2 kij2 ) and 𝜖ij(a) is inde-
pendent of 𝑣(a) . The EB estimator of 𝜏i for a nonsampled area i is then given by i A
𝜏̂iEB =
1 ∑ P(a) h(yi ), A a=1
(9.4.8)
where yP(a) = (y(a) , , y(a) )T , for i = m + 1, , M. i iNi i1 For very large populations and/or computationally complex indicators such as those that require sorting all population elements, the EB method described in Section 9.4.2 might be computationally unfeasible. For those cases, a faster version of the EB method was proposed by Ferretti and Molina (2012). This method is based on replacing in the Monte Carlo approximation (9.4.3), the true value of the parameter for ath Monte Carlo population 𝜏i(a) by a design-based estimator 𝜏̂iDB(a) of 𝜏i(a) based drawn from the population units Pi in area i, independently for each on a sample s(a) i , from each area a = 1, , A. In particular, we can select simple random samples, s(a) i i. Then the values of the auxiliary variables corresponding to the units drawn in s(a) i (a) are taken: xij , j ∈ s(a) . Using those values, the corresponding responses y , j ∈ s , ij i i are generated for i = 1, , m from (9.4.7). Denoting the vector containing those generated sample values as y(a) s , calculate the design-based estimator of the poverty indicator, 𝜏̂iDB(a) . Finally, the fast EB estimator is given by A
𝜏̂iFEB ≈
1 ∑ DB(a) 𝜏̂ . A a=1 i
(9.4.9)
293
*EB ESTIMATION OF GENERAL FINITE POPULATION PARAMETERS
In the fast EB estimator (9.4.9), only the response variables corresponding to the sample units j ∈ s(a) need to be generated from (9.4.11) for each Monte Carlo replii cate a, avoiding the generation of the full population of responses. Another advantage of this variation of the EB method is that the identification of the sample units in the population register from where the auxiliary variables are obtained is not necessary; linking the units in the sample file to the population register is often not possible. Simulation results not reported here indicate that the fast EB method loses little efficiency as compared with the EB estimator described in Section 9.4.2. Diallo and Rao (2014) extended the Molina–Rao normality-based EB results by allowing the random effects 𝑣i and/or errors eij to follow skew normal (SN) family of distributions. This extended SN family includes the normal distribution as a special case. 9.4.3
FGT Poverty Measures
Nonlinear area parameters of special relevance are poverty indicators. Poverty can be measured in many different ways but, when measured in terms of a quantitative welfare variable such as income or expenditure, popular poverty indicators are the members of the FGT family (Foster, Greer, and Thorbecke, 1984). Let Eij be the welfare measure for jth individual within ith area, and let z be a (fixed) poverty line defined for the actual population. The FGT family of poverty indicators for area i is ∑Ni F𝛼ij , where given by F𝛼i = Ni−1 j=1 ( F𝛼ij =
z − Eij z
)𝛼 I(Eij < z),
𝛼 ≥ 0,
j = 1,
(9.4.10)
, Ni .
For 𝛼 = 0, F0i reduces to the proportion of individuals with income below the poverty line, which is called poverty incidence or at risk of poverty rate. For 𝛼 = 1, F1i is the average of the relative distances of the individuals’ income to the poverty line, which is known as poverty gap. Thus, the poverty incidence F0i measures the frequency of poverty, whereas the poverty gap F1i measures the degree or intensity of poverty. The distribution of the welfare variables Eij involved in F𝛼i is seldom normal due to the typical right skewness of economic variables. However, after some transformation, such as log(Eij + c) for given c, the resulting distribution might be approximately normal. Molina and Rao (2010) assumed that yij = T(Eij ) follows a normal distribution, where T(⋅) is a one-to-one transformation. Then F𝛼i , given by (9.4.10), may be expressed as a function of yPi = (yi1 , , yiNi )T as follows: Ni 1 ∑ F𝛼i = Ni j=1
{
z − T −1 (yij ) z
}𝛼 I{T −1 (yij ) < z} =∶ h𝛼 (yPi ),
𝛼 ≥ 0.
The EB estimator of 𝜏i = F𝛼i = h𝛼 (yPi ) is then obtained by Monte Carlo approximation as in (9.4.3).
294
EMPIRICAL BAYES (EB) METHOD
In some cases, it may not be possible to link the sample file to the population register from where the auxiliary variables are obtained. Then we simulate the census values y(a) , for j = 1, , Ni , by changing (9.4.7) to ij yPi = 𝝁Pi|s + ui 𝟏Ni + 𝝐 Pi ,
(9.4.11)
and generating from (9.4.11), where 𝝁Pi|s = XPi 𝜷 + 𝛾i (yia − xTia 𝜷)𝟏Ni and 𝝐 Pi
ind
N(
2 2 Ni , 𝜎e diag1≤j≤Ni (kij )).
The EB estimator is then calculated from (9.4.8)
by using the above simulated census values yP(a) , a = 1, i 9.4.4
, A.
Parametric Bootstrap for MSE Estimation
Estimators of the MSE of EB estimators 𝜏̂iEB of 𝜏i can be obtained using the parametric bootstrap for finite populations introduced in Section 7.2.4. This method proceeds as follows: (1) Fit the basic unit level model (4.3.1) by ML, REML, or a moments method to ̂ 𝜎̂ 𝑣2 and 𝜎̂ e2 . obtain model parameter estimators 𝜷, iid
(2) Generate bootstrap domain effects as 𝑣∗i N(0, 𝜎̂ 𝑣2 ), i = 1, (3) Generate, independently of 𝑣∗1 , , 𝑣∗m , unit errors as e∗ij
ind
N(0, 𝜎̂ e2 kij2 ),
j = 1,
, Ni ,
i = 1,
, m.
, m.
(4) Generate a bootstrap population of response variables from the model y∗ij = xij ′ 𝜷̂ + 𝑣∗i + e∗ij ,
j = 1,
, Ni ,
i = 1,
, m.
(5) Let yP∗ = (y∗i1 , , y∗iN )T denote the vector of generated bootstrap response i i variables for area i. Calculate target quantities for the bootstrap population as ), i = 1, , m. 𝜏i∗ = h(yP∗ i (6) Let y∗s be the vector whose elements are the generated y∗ij with indices contained in the sample s. Fit the model to the bootstrap sample data {(y∗ij , xij ); j ∈ si , i = 1, , m} and obtain bootstrap model parameter estimators, denoted 𝜎̂ 𝑣2∗ , 𝜎̂ e2∗ , ∗ ̂ and 𝜷 . (7) Obtain the bootstrap EB estimator of 𝜏i through the Monte Carlo approximation, , m. denoted 𝜏̂iEB∗ , i = 1, (8) Repeat steps (2)–(7) a large number of times B. Let 𝜏i∗ (b) be true value and 𝜏̂iEB∗ (b) the EB estimator obtained in bth replicate of the bootstrap procedure, b = 1, , B.
295
*EB ESTIMATION OF GENERAL FINITE POPULATION PARAMETERS
(9) The bootstrap MSE estimator of 𝜏̂iEB is given by B ∑
mseB (𝜏̂iEB ) = B−1
[𝜏̂iEB∗ (b) − 𝜏i∗ (b)]2 .
(9.4.12)
b=1
A similar parametric bootstrap approach can be used to get a bootstrap MSE estimator for the fast EB estimator 𝜏̂iFEB . 9.4.5
ELL Estimation
The method of Elbers, Lanjouw, and Lanjouw (2003), called ELL method, assumes a nested error model on the transformed welfare variables but using random cluster effects, where the clusters are based on the sampling design and may be different from the small areas. In fact, the small areas are not specified in advance. Then ELL estimators of area parameters, 𝜏i , are computed by applying a nonparametric bootstrap method similar to the parametric bootstrap procedure described in Section 9.4.4. To make their method comparable with the EB method described in Section 9.4.2, here we consider that the clusters are the same as areas and assume normality of 𝑣i and eij . ELL estimator of 𝜏i under the above setup is obtained as follows: ̂ 𝜎̂ 𝑣2 , and 𝜎̂ e2 be the (1) With the original sample data ys , fit model (4.3.1). Let 𝜷, 2 2 resulting estimators of 𝜷, 𝜎𝑣 , and 𝜎e . iid
(2) Generate bootstrap area/cluster effects as 𝑣∗i N(0, 𝜎̂ 𝑣2 ), i = 1, , m. (3) Independently of the cluster effects, generate bootstrap model errors as e∗ij
iid
N(0, 𝜎̂ e2 kij2 ),
j = 1,
, Ni ,
)T , (4) Construct a population vector yP∗ = ((yP∗ 1 model ′ y∗ij = xij 𝜷̂ + 𝑣∗i + e∗ij ,
j = 1,
, Ni ,
i = 1,
, m.
T T , (yP∗ m ) ) from the bootstrap
i = 1,
, m.
(9.4.13)
(5) Calculate the target area quantities for the generated bootstrap population, 𝜏i∗ = h(yP∗ ), i = 1, , m. i (6) The ELL estimator of 𝜏i is then given by the bootstrap expectation 𝜏̂iELL = E∗ (𝜏i∗ ). The MSE estimator of 𝜏̂iELL obtained by the ELL method is the bootstrap variance of 𝜏i∗ , that is, mse(𝜏̂iELL ) = E∗ [𝜏i∗ − E∗ (𝜏i∗ )]2 ,
296
EMPIRICAL BAYES (EB) METHOD
where E∗ denotes the expectation with respect to bootstrap model (9.4.13) given the sample data. Note that E∗ (𝜏i∗ ) is tracking E(𝜏i ) and E∗ [𝜏i∗ − E∗ (𝜏i∗ )]2 is tracking E[𝜏i − E(𝜏i )]2 . In practice, ELL estimators are obtained from a Monte Carlo approximation by generating a large number, A, of population vectors yP∗(a) = ((yP∗(a) )T , , (yP∗(a) )T )T , a = 1, , A, from model (9.4.13), m 1 calculating the bootstrap area parameters for each population a in the form 𝜏i∗(a) = h(yP∗(a) ), i = 1, , m, and then averaging over the A populations. This i leads to A
𝜏̂iELL
1 ∑ ∗(a) ≈ 𝜏 , A a=1 i
A
mse(𝜏̂iELL )
1 ∑ ∗(a) ≈ (𝜏 − 𝜏̂iELL )2 . A a=1 i
(9.4.14)
Note that, in contrast to the EB method described in Section 9.4.2, the population vectors yP∗(a) in ELL method are generated from the marginal distribution of the model responses instead of the conditional distribution given sample data ys , and they do not contain the observed sample data. Thus, if the model parameters were known, the ELL method as described here would not be using the sample data at all. In fact, it is easy to see that for 𝜏i = Y i , the ELL estimator is equal to the synthetic estimator T ̂ which is a poor estimator when area effects 𝑣i are present (i.e., 𝜎𝑣2 is 𝜏̂iELL = Xi 𝜷, significant). However, EB and ELL estimators coincide for areas with zero sample size or when data do not show significant area effects, that is, when the available covariates explain all the between-area variability. 9.4.6
Simulation Experiments
Molina and Rao (2010) carried out a simulation study to analyze the performance of the EB method to estimate area poverty incidences and poverty gaps. Populations of size N = 20, 000 composed of m = 80 areas with Ni = 250 elements in each area i = 1, , m were generated from model (4.3.1) with kij = 1. The transformation T(⋅) of the welfare variables Eij , defined in Section 9.4.2, is taken as T(Eij ) = log (Eij ) = yij . As auxiliary variables in the model, we considered two binary variables x1 and x2 apart from the intercept. These binary variables were simulated from Bernoulli , m. Indedistributions with probabilities p1i = 0.3 + 0.5 i∕m and p2i = 0.2, i = 1, pendent simple random samples si without replacement are drawn from each area , m. Variables x1 and x2 for the populai with area sample sizes ni = 50, i = 1, tion units and sample indices were held fixed over all Monte Carlo simulations. The regression coefficients were taken as 𝜷 = (3, 0.03, −0.04)T . The random area effects variance was taken as 𝜎u2 = (0.15)2 and the model error variance as 𝜎e2 = (0.5)2 . The poverty line was fixed at z = 12, which is roughly 0.6 times the median of the welfare variables Eij for a population generated as mentioned earlier. Hence, the poverty incidence for the simulated populations is approximately 16%. T T A total of K = 10,000 population vectors yP(k) = ((yP(k) )T , , (yP(k) m ) ) were 1 generated from the true model described earlier. For each population vector yP(k) ,
297
*EB ESTIMATION OF GENERAL FINITE POPULATION PARAMETERS
(k) true values of poverty incidence and gap F𝛼i , 𝛼 = 0, 1 were calculated, along with direct, EB, and ELL estimators, where the direct estimator of F𝛼i is the area sample , K of true mean of F𝛼ij , j ∈ si . Then, means over Monte Carlo populations k = 1, values of poverty indicators were computed as
E(F𝛼i ) =
K 1 ∑ (k) F , K k=1 𝛼i
𝛼 = 0, 1,
i = 1,
, m.
EB , and the Biases were computed for the direct estimator F̂ 𝛼i , the EB estimator F̂ 𝛼i ELL as E(F ̂ 𝛼i ) − E(F𝛼i ), E(F̂ EB ) − E(F𝛼i ), and E(F̂ ELL ) − E(F𝛼i ). The ELL estimator F̂ 𝛼i 𝛼i 𝛼i MSEs over Monte Carlo populations of the three estimators were also computed as EB − F )2 , and E(F ̂ ELL − F𝛼i )2 . E(F̂ 𝛼i − F𝛼i )2 , E(F̂ 𝛼i 𝛼i 𝛼i Figure 9.1 reports the model biases and MSEs of the three estimators of the poverty gap for each area. Figure 9.1a shows that the EB estimator has the smallest absolute biases followed by the ELL estimator, although in terms of bias all estimators perform reasonably well. However, in terms of MSE, Figure 9.1b shows that the EB estimator is significantly more efficient than the ELL and the direct estimators. In this simulation study, the auxiliary variables are not very informative, and, due to this, ELL estimators turn out to be even less efficient than direct estimators. Conclusions for the poverty incidence are similar but plots are not reported here. Turning to MSE estimation, the parametric bootstrap procedure described in Section 9.4.4 was implemented with B = 500 replicates and the results are plotted in Figure 9.2 for the poverty gap. The number of Monte Carlo simulations was K = 500, and the true values of the MSEs were independently computed with K = 50, 000 Monte Carlo replicates. Figure 9.2 shows that the bootstrap MSE
MSE, poverty gap (x10,000) 1 2 3 4 5
0.10
Direct
ELL
0.00
0.05
EB
Model MSE
−0.10 −0.05
Bias, poverty gap (x100)
Model bias
0
20
40 Area (a)
60
80
EB
0
20
Direct
40
ELL
60
80
Area (b)
Figure 9.1 Bias (a) and MSE (b) over Simulated Populations of EB, Direct, and ELL Estimates of Percent Poverty Gap 100 F1i for Each Area i. Source: Adapted from Molina and Rao (2010).
298
Bootstrap MSE
0.80
0.82
0.84
True MSE
0.78
MSE, poverty gap (x10,000)
0.86
EMPIRICAL BAYES (EB) METHOD
0
20
40
60
80
Area
Figure 9.2 True MSEs of EB Estimators of Percent Poverty Gap and Average of Bootstrap MSE Estimators Obtained with B = 500 for Each Area i. Source: Adapted from Molina and Rao (2010).
estimator tracks the pattern of the true MSE values. Similar results were obtained for the poverty incidence. An additional simulation experiment was carried out to study the performance of the estimators under repeated sampling from a fixed population. For this, a fixed population was generated exactly as described earlier. Keeping this population fixed, K = 1000 samples were drawn independently from the population using simple random sampling without replacement within each area. For each sample, the three estimators of the poverty incidences and gaps were computed, namely, direct, EB, and ELL estimators. Figure 9.3 displays the design bias and the design MSE of the estimators of the poverty gap for each area. As expected, direct estimator shows practically zero bias, whereas the EB estimators show small biases, and the largest biases are for ELL estimators. Concerning MSE, ELL estimators show small MSEs for some of the areas but very large for other areas. In contrast, the MSEs of EB and direct estimators are small for all areas. Surprisingly, the design MSEs of the EB estimators are even smaller than those of the direct estimators for most of the areas. In the above simulation study, L = 50 and A = 50 were used for EB and ELL methods, respectively. A limited comparison of EB estimators for L = 50 with the corresponding values for L = 1000 showed that the choice L = 50 gives fairly accurate results. In practice, however, when dealing with a given sample data set, it is advisable to use L ≥ 200.
9.5
BINARY DATA
In this section, we study unit level models for binary responses, yij , that is, yij = 1 or 0. In this case, linear mixed models are not suitable and alternative models have
299
BINARY DATA
Design MSE Direct
ELL
10
15
20
EB
5
ELL
0
Direct
MSE, poverty gap (x10,000)
EB
−0.04 −0.02 0.00 0.02 0.04
Bias, poverty gap (x100)
Design bias
0
20
40
60
80
0
20
40
60
80
Area (b)
Area (a)
Figure 9.3 Bias (a) and MSE (b) of EB, Direct and ELL Estimators of the Percent Poverty Gap 100 F1i for Each Area i under Design-Based Simulations. Source: Adapted from Molina and Rao (2010).
been proposed. If all the covariates xij associated with yij are area-specific, that is, ∑ni xij = xi , then we can transform the sample area proportions, p̂ i = j=1 yij ∕ni = yi ∕ni , using an arcsine transformation, as in Example 9.2.1, and reduce the model to an area level model. However, the resulting transformed estimators, 𝜃̂i , may not satisfy the sampling model with zero mean sampling errors if ni is small. Also, the estimator of due to the proportion, pi , obtained from 𝜃̂iEB is not equal to the true EB estimator p̂ EB i the transformation. Unit level modeling avoids the above difficulties, and EB estimators of proportions may be obtained directly for the general case of unit-specific covariates. In Section 9.5.1, we study the special case of no covariates. A generalized linear mixed model is used in Section 9.5.2 to handle covariates. 9.5.1
*Case of No Covariates
We assume a two-stage model on the sample observations yij , j = 1, ni , i = 1,
, m. In the first stage, we assume that yij |pi
iid
Bernoulli(pi ), i = 1,
, , m.
iid
A model linking the pi ’s is assumed in the second stage; in particular, pi beta(𝛼, 𝛽), with 𝛼 > 0 and 𝛽 > 0, where beta(𝛼, 𝛽) denotes the beta distribution with parameters 𝛼 and 𝛽, that is, f (pi |𝛼, 𝛽) =
Γ(𝛼 + 𝛽) 𝛼−1 p (1 − pi )𝛽−1 , Γ(𝛼)Γ(𝛽) i
𝛼 > 0, 𝛽 > 0,
(9.5.1)
where Γ(⋅) is the gamma function. We reduce yi = (yi1 , , yini )T to the sample total ∑ni yi = j=1 yij , noting that yi is a minimal sufficient statistic for the first-stage model.
300
EMPIRICAL BAYES (EB) METHOD
We note that yi |pi
ind
binomial (ni , pi ), that is, f (yi |pi ) =
( ) ni yi p (1 − pi )ni −yi . yi i
(9.5.2)
ind
It follows from (9.5.1) and (9.5.2) that pi |yi , 𝛼, 𝛽 beta(yi + 𝛼, ni − yi + 𝛽). Therefore, the Bayes estimator of pi and the posterior variance of pi are given by p̂ Bi (𝛼, 𝛽) = E(pi |yi , 𝛼, 𝛽) = (yi + 𝛼)∕(ni + 𝛼 + 𝛽) and V(pi |yi , 𝛼, 𝛽) =
(yi + 𝛼)(ni − yi + 𝛽) . (ni + 𝛼 + 𝛽 + 1)(ni + 𝛼 + 𝛽)2
(9.5.3)
(9.5.4)
Note that the linking distribution, f (pi |𝛼, 𝛽), is a “conjugate prior” in the sense that the posterior (or conditional) distribution, f (pi |yi , 𝛼, 𝛽), has the same form as the prior distribution. We obtain estimators of the model parameters from the marginal distribution, ind given by yi |𝛼, 𝛽 beta-binomial, with pdf f (yi |𝛼, 𝛽) =
( ) ni Γ(𝛼 + yi )Γ(𝛽 + ni − yi ) Γ(𝛼 + 𝛽) . yi Γ(𝛼 + 𝛽 + ni ) Γ(𝛼)Γ(𝛽)
(9.5.5)
Maximum-likelihood (ML) estimators, 𝛼̂ ML and 𝛽̂ML , may be obtained by maximizing the loglikelihood, given by [y −1 m i ∑ ∑
ni −yi −1
∑
log (𝛼 + h) +
l(𝛼, 𝛽) = const + i=1
h=0
log (𝛽 + h) h=0
]
ni −1
∑
log(𝛼 + 𝛽 + h) ,
−
(9.5.6)
h=0
∑ni −yi −1 ∑yi −1 log(𝛼 + h) is taken as zero if yi = 0 and log (𝛽 + h) is where h=0 h=0 taken as zero if yi = ni . A convenient representation is in terms of the mean E(yij ) = 𝜇 = 𝛼∕(𝛼 + 𝛽) and 𝜏 = 1∕(𝛼 + 𝛽), which is related to the intraclass correlation 𝜌 = Corr(yij , yik ) = 1∕(𝛼 + 𝛽 + 1) for j ≠ k. In terms of 𝜇 and 𝜏, the loglikelihood (9.5.6) takes the form [y −1 m i ∑ ∑
ni −yi −1
∑
log(𝜇 + h𝜏) +
l(𝜇, 𝜏) = const + i=1 ni −1
h=0
log (1 − 𝜇 + h𝜏) h=0
]
∑
log(1 + h𝜏) .
− h=0
(9.5.7)
301
BINARY DATA
Closed-form expressions for 𝛼̂ ML and 𝛽̂ML (or 𝜇̂ ML and 𝜏̂ML ) do not exist, but the ML estimates may be obtained by the Newton–Raphson method or some other iterative method. McCulloch and Searle (2001, Section 2.6) have given the asymptotic covariance matrices of (𝛼̂ ML , 𝛽̂ML ), and (𝜇̂ ML , 𝜏̂ML ). Substituting 𝛼̂ ML and 𝛽̂ML in (9.5.3) and (9.5.4), we get the EB estimator of pi and the estimated posterior variance. We can also use simple method of moments estimators of 𝛼 and 𝛽. We equate ∑ pi and the weighted sample variance the weighted sample mean p̂ = m i=1 (ni ∕n)̂ ∑ 2 to their expected values and solve the resulting moment (n ∕n)(̂ p − p ̂ ) s2p = m i i=1 i ∑ ̂ ̂ and 𝛽, equations for 𝛼 and 𝛽, where n = m i=1 ni . This leads to moment estimators, 𝛼 given by ̂ = p̂ 𝛼∕( ̂ 𝛼̂ + 𝛽) (9.5.8) and
ns2p − p̂ (1 − p̂ )(m − 1) 1 ; = ∑m 𝛼̂ + 𝛽̂ + 1 p̂ (1 − p̂ )[n − i=1 n2i ∕n − (m − 1)]
(9.5.9)
see Kleinman (1973). We substitute the moment estimators 𝛼̂ and 𝛽̂ into (9.5.3) to get an EB estimator of pi as ̂ = 𝛾̂i p̂ i + 1( − 𝛾̂i )̂p = ki (̂pi , 𝛼, ̂ ̂ Bi (𝛼, ̂ 𝛽) ̂ 𝛽), (9.5.10) p̂ EB i =p ̂ Note that p̂ EB is a weighted average of the direct estimator where 𝛾̂i = ni ∕(ni + 𝛼̂ + 𝛽). i p̂ i and the synthetic estimator p̂ , and more weight is given to p̂ i as the ith area sample size, ni , increases. It is therefore similar to the FH estimator for the basic area level model, but the weight 𝛾̂i avoids the assumption of known sampling variance of p̂ i . is nearly unbiased for pi in the sense that its bias, E(̂pEB − pi ), is The estimator p̂ EB i i −1 of order m , for large m. as the estimator of realized pi , and its variability A naive EB approach uses p̂ EB i is measured by the estimated posterior variance V(pi |yi , 𝛼, 𝛽) =∶ g1i (𝛼, 𝛽, yi ), eval̂ yi ) can lead to severe underestimation ̂ However, g1i (𝛼, ̂ 𝛽, uated at 𝛼 = 𝛼̂ and 𝛽 = 𝛽. ̂ Note that ) because it ignores the variability associated with 𝛼̂ and 𝛽. of MSE(̂pEB i EB − p B )2 = M + M , say. MSE(̂pEB ) = E[g (𝛼, 𝛽, y )] + E(̂ p ̂ 1i i 1i 2i i i i ). We have The jackknife method may be applied to the estimation of MSE(̂pEB i EB EB ̂ ̂ ̂ 𝛽), p̂ i,−𝓁 = ki (̂pi , 𝛼̂ −𝓁 , 𝛽−𝓁 ) and the estimator of M2i is taken as p̂ i = ki (̂pi , 𝛼, m
∑ 2 ̂ 2i = m − 1 M (̂pEB − p̂ EB i ) , m 𝓁=1 i,−𝓁
(9.5.11)
where 𝛼̂ −𝓁 and 𝛽̂−𝓁 are the delete−𝓁 moment estimators obtained from {(̂pi , ni ); i = 1, , 𝓁 − 1, 𝓁 + 1, , m}. Furthermore, we propose to estimate M1i by m ∑
̂ 1i = g1i (𝛼, ̂ yi ) − M ̂ 𝛽,
̂ yi )] [g1i (𝛼̂ −𝓁 , 𝛽̂−𝓁 , yi ) − g1i (𝛼, ̂ 𝛽, 𝓁=1 𝓁≠i
(9.5.12)
302
EMPIRICAL BAYES (EB) METHOD
(Lohr and Rao 2009). The jackknife estimator of M1i given in Rao (2003a, p. 199) is not second-order unbiased, unlike (9.5.12), as shown by Lohr and Rao (2009). The jackknife estimator of MSE(̂pEB ) is then given by i ̂ ̂ mseJ (̂pEB i ) = M1i + M2i .
(9.5.13)
This estimator is second-order unbiased for MSE(̂pEB ). i ̂ yi ) of mseJ (̂pEB ) is area-specific in the sense ̂ 𝛽, Note that the leading term g1i (𝛼, i that it depends on yi . Our method of estimating M1i differs from the unconditional method used by Jiang, Lahiri, and Wan (2002). They evaluated E[g1i (𝛼, 𝛽, yi )] = ̂ in (9.5.12) in g̃ 1i (𝛼, 𝛽) using the marginal distribution of yi , and then used g̃ 1i (𝛼, ̂ 𝛽) ̂ yi ) to get an estimator of M1i . This estimator is computationally place of g1i (𝛼, ̂ 𝛽, ̂ 1i and its leading term, g̃ 1i (𝛼, ̂ is not area-specific, unlike more complex than M ̂ 𝛽), ̂ yi ). ̂ 𝛽, g1i (𝛼, The jackknife estimator (9.5.13) can also be used to estimate the conditional MSE of p̂ EB given by MSEc (̂pEB ) = E[(̂pEB − pi )2 |yi ], which depends on yi , unlike in the i i i ) is condicase of linear mixed models. Lohr and Rao (2009) showed that mseJ (̂pEB i tionally second-order unbiased in the sense that its conditional bias is of order op (m−1 ) in probability. Alternative two-stage models have also been proposed. The first-stage model is not changed, but the second-stage linking model is changed to either (i) logit(pi ) = iid iid log [pi ∕(1 − pi )] N(𝜇, 𝜎 2 ) or (ii) Φ−1 (pi ) N(𝜇, 𝜎 2 ), where Φ(⋅) is the cumulative distribution function (CDF) of a N(0, 1) variable. The models (i) and (ii) are called logit-normal and probit-normal models, respectively. Implementation of EB is more complicated for these alternative models because no closed-form expressions for the Bayes estimator and the posterior variance of pi exist. For the logit-normal model, the Bayes estimator of pi can be expressed as a ratio of single-dimensional integrals. Writing pi as pi = h1 (𝜇 + 𝜎zi ), where h1 (a) = ea ∕(1 + ea ) and zi N(0, 1), we obtain p̂ Bi (𝜇, 𝜎) = E(pi |yi , 𝜇, 𝜎) from the conditional distribution of zi given yi . We have p̂ Bi (𝜇, 𝜎) = A(yi , 𝜇, 𝜎)∕B(yi , 𝜇, 𝜎),
(9.5.14)
A(yi , 𝜇, 𝜎) = E[h1 (𝜇 + 𝜎z) exp {h2 (yi , 𝜇 + 𝜎z)}]
(9.5.15)
B(yi , 𝜇, 𝜎) = E[exp {h2 (yi , 𝜇 + 𝜎z)}],
(9.5.16)
where
and
where h2 (yi , a) = ayi − ni log(1 + ea ) and the expectation is with respect to z N(0, 1); see McCulloch and Searle (2001, p. 67). We can evaluate (9.5.15) and (9.5.16) by simulating samples from N(0, 1). Alternatively, numerical integration, as sketched below, can be used.
303
BINARY DATA
The loglikelihood, l(𝜇, 𝜎), for the logit-normal model may be written as m ∑
l(𝜇, 𝜎) = const +
log [B(yi , 𝜇, 𝜎)],
(9.5.17)
i=1
where B(yi , 𝜇, 𝜎) is given by (9.5.16). Each of the single-dimensional integrals in (9.5.15)–(9.5.16) is of the form E[a(z)] for a given function a(z), which may be √ ∑ approximated by a finite sum of the form dk=1 bk a(zk )∕ 𝜋, using Gauss–Hermite quadrature (McCulloch and Searle 2001, Section 10.3). The weights bk and the evaluation points zk for a specified value of d can be calculated using mathematical software. For ML estimation of 𝜇 and 𝜎, d = 20 usually provides good approximations. Derivatives of l(𝜇, 𝜎), needed in Newton–Raphson type methods for calculating ML estimates, can be approximated in a similar manner. = p̂ Bi (𝜇, ̂ 𝜎). ̂ Using ML estimators 𝜇̂ and 𝜎, ̂ we obtain an EB estimator of pi as p̂ EB i The posterior variance, V(pi |yi , 𝜇, 𝜎), may also be expressed in terms of expectation over z N(0, 1), noting that V(pi |yi , 𝜇, 𝜎) = E(p2i |yi , 𝜇, 𝜎) − [̂pBi (𝜇, 𝜎)]2 . Denoting V(pi |yi , 𝜇, 𝜎) = g1i (𝜇, 𝜎, yi ), the jackknife method can be applied to ). We obtain the jackobtain a second-order unbiased estimator of MSE(̂pEB i ), from (9.5.13) by substituting p̂ EB = ki (yi , 𝜇, ̂ 𝜎) ̂ knife estimator, mseJ (̂pEB i i EB ̂ 𝜎, ̂ yi ) and g1i (𝜇̂ −𝓁 , 𝜎̂ −𝓁 , yi ) in and p̂ i,−𝓁 = ki (yi , 𝜇̂ −𝓁 , 𝜎̂ −𝓁 ) in (9.5.11) and g1i (𝜇, (9.5.12), where 𝜇̂ −𝓁 and 𝜎̂ −𝓁 are the delete−𝓁 ML estimators obtained from , 𝓁 − 1, 𝓁 + 1, , m}. {(yi , ni ), i = 1, ) using ML estimators is very cumbersome. However, Computation of mseJ (̂pEB i computations may be simplified by using moment estimators of 𝜇 and 𝜎 obtained by equating the weighted mean p̂ and the weighted variance s2p to their expected values, as in the beta-binomial model, and solving the resulting equations for 𝜇 and 𝜎. The expected values involve the marginal moments E(yij ) = E(pi ) and E(yij yik ) = E(p2i ), j ≠ k, which can be calculated by numerical or Monte Carlo integration, using ∑ E(pi ) = E[h1 (𝜇 + 𝜎z)] and E(p2i ) = E[h21 (𝜇 + 𝜎z)]. Jiang (1998) equated m i=1 yi = ∑m 2 ∑m 2 2 ∑m ∑ n̂p and i=1 ( j≠k yij yik ) = i=1 (yi − yi ) = i=1 ni p̂ i − n̂p to their expected values to obtain different moment estimators, that is, solved equations m ∑
yi = nE[h1 (𝜇 + 𝜎z)], i=1
[m ] m ∑ ∑ 2 (yi − yi ) = ni (ni − 1) E[h21 (𝜇 + 𝜎z)] (9.5.18) i=1
i=1
for 𝜇 and 𝜎 to get the moment estimators, and established asymptotic consistency as m → ∞. Jiang and Zhang (2001) proposed more efficient moment estimators, using a two-step procedure. In the first step, (9.5.18) is solved for 𝜇 and 𝜎, and then used in the second step to produce two “optimal” weighted combinations ∑ 2 , y2m − ym ). Note that the first step uses (1, 0, , 0) and of ( m i=1 yi , y1 − y1 , (0, 1, , 1) as the weights. The optimal weights involve the derivatives of E[h1 (𝜇 + 𝜎z)] and E[h21 (𝜇 + 𝜎z)] with respect to 𝜇 and 𝜎 as well as the covariance ∑ 2 , y2m − ym ), which depend on 𝜇 and 𝜎. Replacing 𝜇 matrix of ( m i=1 yi , y1 − y1 , and 𝜎 by the first-step estimators, estimated weighted combinations are obtained.
304
EMPIRICAL BAYES (EB) METHOD
The second-step moment estimators are then obtained by equating the estimated weighted combinations to their expectations, treating the estimated weights as fixed. Simulation studies indicated considerable overall gain in efficiency over the first-step estimators in the case of unequal ni ’s. Note that the optimal weights reduce to first-step weights in the balanced case, ni = n. Jiang and Zhang (2001) established the asymptotic consistency of the second-step moment estimators. Jiang and Lahiri (2001) used p̂ EB based on the first-step moment estimators and i ), similar to the second-order unbiobtained a Taylor expansion estimator of MSE(̂pEB i ased MSE estimators of Chapter 5 for the linear mixed model. This MSE estimator is second-order unbiased, as in the case of the jackknife MSE estimator. Lahiri et al. (2007) proposed a bootstrap-based perturbation method to estimate ) = Mi (𝜹) = M1i (𝜹) + M2i (𝜹) under a given two-stage model, where 𝜹 is the MSE(̂pEB i vector of model parameters. Their estimator of Mi (𝜹) does not require the explicit evaluation of M1i (𝜹) at 𝜹 = 𝜹̂ using the marginal distribution of yi , unlike the Jiang et al. (2002) estimator. Moreover, it is always positive and second-order unbiased. We refer the reader to Lahiri et al. (2007) for technical details, which are quite complex. It may be noted that the proposed method is applicable to general two-stage area level models. 9.5.2
Models with Covariates
The logit-normal model of Section 9.5.1 readily extends to the case of covariates. In ind the first stage, we assume that yij |pij Bernoulli(pij ) for j = 1, , Ni , i = 1, , m. The probabilities pij are linked in the second stage by assuming a logistic regression iid
model with random area effects, logit(pij ) = xTij 𝜷 + 𝑣i , where 𝑣i N(0, 𝜎𝑣2 ) and xij is the vector of fixed covariates. The two-stage model belongs to the class of generalized linear mixed models and is called a logistic linear mixed model. The target parameters ∑ are the small area proportions Pi = Y i = j yij ∕Ni , i = 1, , m. As in the case of the basic unit level model, we assume that the model holds for the sample {(yij , xij ); j ∈ , m}, where si is the sample of size ni from the ith area. si , i = 1, We express the area proportion Pi as Pi = fi yi + 1( − fi )yir , where fi = ni ∕Ni , yi ∑ is the sample mean (proportion) and yir = 𝓁∈ri yi𝓁 ∕(Ni − ni ) is the mean of the nonsampled units ri in the ith area. Now noting that E(yi𝓁 |pi𝓁 , yi , 𝜷, 𝜎𝑣 ) = pi𝓁 , for 𝓁 ∈ ri , the Bayes estimator of yir is given by p̂ Bi(c) = E(pi(c) |yi , 𝜷, 𝜎𝑣 ), where ∑ pi(c) = 𝓁∈ri pi𝓁 ∕(Ni − ni ) and yi is the vector of sample y-values from the ith area. Therefore, the Bayes estimator of Pi may be expressed as P̂ Bi = P̂ Bi (𝜷, 𝜎𝑣 ) = fi yi + 1( − fi )̂pBi(c) .
(9.5.19)
If the sampling fraction fi is negligible, we can express P̂ Bi as ( 1 P̂ Bi ≈ E Ni
)
Ni ∑
pi𝓁 |yi , 𝜷, 𝜎𝑣 𝓁=1
.
(9.5.20)
305
BINARY DATA
The posterior variance of Pi reduces to V(Pi |yi , 𝜷, 𝜎𝑣 ) = (1 − fi )2 E(yir − p̂ Bi(c) )2 {[ ]} ] [ ∑ ∑ −2 E ; = Ni pi𝓁 (1 − pi𝓁 )|yi , 𝜷, 𝜎𝑣 ) +V pi𝓁 |yi , 𝜷, 𝜎𝑣 𝓁∈ri
𝓁∈ri
(9.5.21) see Malec et al. (1997). Note that (9.5.21) involves expectations of the form ∑ ∑ E( 𝓁∈ri p2i𝓁 |yi , 𝜷, 𝜎𝑣 ) and E[( 𝓁∈ri pi𝓁 )2 |yi , 𝜷, 𝜎𝑣 ], as well as the expectation ∑ E( 𝓁∈ri pi𝓁 |yi , 𝜷, 𝜎𝑣 ) = (Ni − ni )̂pBic . No closed-form expressions for these expectations exist. However, we can express the expectations as ratios of single-dimensional ∑ integrals, similar to (9.5.14). For example, writing 𝓁∈ri p2i𝓁 as a function of z N(0, 1), we can write E
∑ p2i𝓁 |yi , 𝜷, 𝜎𝑣
E
=
𝓁∈ri
where yi =
∑
) xTij yij , yi , 𝜎z, 𝜷
)
( ∑
∑ j∈si
)
2 𝓁∈ri pi𝓁
j∈si yij ,
( hi
[ (∑ )]} T exp hi j∈si xij yij , yi , 𝜎z, 𝜷 , { [ (∑ )]} T y , y , 𝜎z, 𝜷 E exp hi x j∈si ij ij i (9.5.22)
{(∑
)
(
∑ xTij yij
= j∈si
log [1 + exp(xTij 𝜷 + 𝜎z)]
𝜷 + 𝜎z)y ( i− j∈si
(9.5.23) and the expectation is with respect to z N(0, 1). Note that (9.5.23) reduces to h2 (yi , 𝜇 + 𝜎z) for the logit-normal model without covariates, that is, with xTij 𝜷 = 𝜇 for all i and j. ML estimation of model parameters, 𝜷 and 𝜎𝑣 , for the logistic linear mixed model and other generalized linear mixed models has received considerable attention in recent years. Methods proposed include numerical quadrature, EM algorithm, Markov chain Monte Carlo (MCMC), and stochastic approximation. We refer the readers to McCulloch and Searle, (2001, Section 10.4) for details of the algorithms. Simpler methods, called penalized quasi-likelihood (PQL), based on maximizing , yTm )T and v = (𝑣1 , , 𝑣m )T with respect to 𝜷 the joint distribution of y = (yT1 , and v have also been proposed. For the special case of linear mixed models under normality, PQL leads to “mixed model” equations 5.2.8, whose solution is identical to the BLUE of 𝜷 and the BLUP of v. However, for the logistic linear mixed model, the PQL estimator of 𝜷 is asymptotically biased (as m → ∞) and hence inconsistent, unlike the ML estimator of 𝜷. Consistent moment estimators of 𝜷 and 𝜎𝑣 may be obtained by equating ∑m ∑ ∑m ∑ j∈si xij yij and i=1 i=1 ( j≠k yij yik ) to their expectations and then solving the equations for 𝜷 and 𝜎𝑣 . The expectations depend on E(yij ) = E(pij ) = E[(h1 (xTij 𝜷 + 𝜎z)] and E(yij yik ) = E(pij pik ) = E[h1 (xTij 𝜷 + 𝜎z)h1 (xTik 𝜷 + 𝜎z)]. The
306
EMPIRICAL BAYES (EB) METHOD
two-step method of Section 9.5.1 can be extended to get more efficient moment estimators. Sutradhar (2004) proposed a quasi-likelihood (QL) method for the estimation of 𝜷 and 𝜎𝑣2 . It requires the covariance matrix of the vectors yi and ui = (yi1 yi2 , , yij yik , , yi(ni −1) yni )T . This covariance matrix involves the evaluation of third- and fourth-order expectations of the form E(pij pik pi𝓁 ) and E(pij pik pi𝓁 pit ) in addition to E(pij ) and E(p2ij ). These expectations can be evaluated by simulating samples from N(0, 1) or by one-dimensional numerical integration, as outlined in Section 9.5.1. Sutradhar (2004) made a numerical comparison of the QL method and the one-step moment method in terms of asymptotic variance, assuming the model logit(pij ) = iid
𝛽xi + 𝑣i , with 𝑣i N(0, 𝜎𝑣2 ) and equal sample sizes ni = n, where xi is an area level covariate. The QL method turned out to be significantly more efficient in estimating 𝜎𝑣2 than the first-step moment method. Relative efficiency of QL with respect to the two-step method has not been studied. Note that the equivalence of the first-step and the second-step moment estimators in the case of equal ni holds only for the special case of a mean model logit(pij ) = 𝜇 + 𝑣i . Using either ML or moment estimators 𝜷̂ and 𝜎̂ 𝑣 , we obtain an EB estimator of ̂ 𝜎̂ 𝑣 ). The estimator P̂ EB is nearly unbiased the ith area proportion Pi as P̂ EB = P̂ Bi (𝜷, i i EB for Pi in the sense that its bias, E(P̂ i − P̂ Bi ), is of order m−1 for large m. A naive EB as the estimator of the realized proportion Pi and its variability approach uses P̂ EB i ̂ 𝜎̂ 𝑣 ) = g1i (𝜷, ̂ 𝜎̂ 𝑣 , yi ), using is measured by the estimated posterior variance V(Pi |yi , 𝜷, (9.5.21). However, the estimated posterior variance ignores the variability associated with 𝜷̂ and 𝜎̂ 𝑣 . The bootstrap method of estimating MSE, described in Section 7.2.4 for the basic unit level model, is applicable to MSE(P̂ EB ), in particular the one-step i bootstrap estimator (7.2.26) and the bias-corrected estimators (7.2.32) or (7.2.33) based on double bootstrap. Pfeffermann and Correa (2012) proposed an alternative empirical bias-correction method and applied it to the estimation of MSE(P̂ EB ). This i method involves (i) drawing several plausible values of the model parameter vector 𝜹 = (𝜷 T , 𝜎𝑣2 )T , say 𝜹1 , , 𝜹T ; (ii) generating a pseudo-original sample from the model corresponding to each 𝜹t ; and (iii) generating a large number of bootstrap samples from each pseudo-original sample generated in step (i). The method makes use of the one-step bootstrap estimators computed from the original sample and the pseudo-original samples to construct an empirical bootstrap bias-corrected MSE estimator. We refer the reader to Pfeffermann–Correa (PC) paper for further details of the method. PC studied the performance of the proposed method in a simulation iid study, using the linking model logit(pij ) = xTij 𝜷 + 𝑣i with 𝑣i N(0, 𝜎𝑣2 ), i = 1, , m. In terms of average absolute relative bias (ARB), the PC estimator of MSE performed similar to the double-bootstrap MSE estimator, with ARB values of 5.5% and 4.15%, respectively. On the other hand, the one-step bootstrap MSE estimator led to consistent underestimation (as expected). The PC estimator performed significantly
307
BINARY DATA
better than the double-bootstrap estimator in terms of average coefficient of variation (CV), with CV of 26% compared to about 50% for the double bootstrap estimator. It is clear from the foregoing account that the implementation of EB for the logistic linear mixed model is quite cumbersome computationally. Approximate EB methods that are computationally simpler have been proposed in the literature, but those methods are not asymptotically valid as m → ∞. As a result, the approximate methods might perform poorly in practice, especially for small sample sizes, ni . We give a brief account of an approximate method proposed by MacGibbon and Tomberlin (1989), based on approximating the posterior distribution f (𝜷, v|y, 𝜎𝑣2 ) by a multivariate normal distribution, assuming a flat prior on 𝜷, that is, f (𝜷) = constant. We have ] ( m ) [ ∑ ∏ yij 1−yij 2 2 2 exp − (9.5.24) f (𝜷, v|y, 𝜎𝑣 ) ∝ pij (1 − pij ) 𝑣i ∕2𝜎𝑣 , i,j
i=1
where logit(pij ) = xTij 𝜷 + 𝑣i . The right-hand side (9.5.24) is approximated by a multivariate normal having its mean at the mode of (9.5.24) and covariance matrix equal to the inverse of the information matrix evaluated at the mode. The elements of the information matrix are given by the second derivatives of log f (𝜷, v|y, 𝜎𝑣2 ) with respect to the elements of 𝜷 and v, multiplied by −1. The mode (𝜷 ∗ , v∗ ) is obtained by maximizing log f (𝜷, v|y, 𝜎𝑣2 ) with respect to 𝜷 and v, using the Newton–Raphson algorithm, and then substituting the ML estimator of 𝜎𝑣2 obtained from the EM algorithm. An EB estimator of Pi is then taken as ( 1 P̂ EB i∗ = N i
) ∑ p∗ij
ni yi +
,
(9.5.25)
j∈ri
where p∗ij = exp(xTij 𝜷 ∗ + 𝑣∗i )∕[1 + exp(xTij 𝜷 ∗ + 𝑣∗i )].
(9.5.26)
̂ 𝜎̂ 𝑣 ), the estimator P̂ EB is not nearly unbiased Unlike the EB estimator P̂ EB = P̂ Bi (𝜷, i i∗ −1 for Pi . Its bias is of order ni , and hence it may not perform well when ni is small; 𝑣∗i is a biased estimator of 𝑣i to the same order. Farrell, MacGibbon, and Tomberlin (1997a) used a bootstrap measure of accuracy of P̂ EB , similar to the Laird and Louis (1987) parametric bootstrap (also called type i∗ III bootstrap) to account for the sampling variability of the estimates of 𝜷 and 𝜎𝑣2 . Simulation results, based on m = 20 and ni = 50, indicated good performance of the . Farrell, MacGibbon, and Tomberlin (1997b) relaxed the proposed EB estimator P̂ EB i∗ assumption of normality on the 𝑣i ’s, using the following linking model: logit(pij ) = xTij 𝜷 + 𝑣i and area effects 𝑣i are iid following an unspecified distribution. They used a nonparametric ML method, proposed by Laird (1978), to obtain an EB estimator of Pi , similar in form to (9.5.25). They also used the type II bootstrap of Laird and Louis (1987) to obtain a bootstrap measure of accuracy.
308
EMPIRICAL BAYES (EB) METHOD
The EB estimators P̂ EB and P̂ EB require the knowledge of individual x-values i i∗ in the population, unlike the EB estimator for the basic unit level model, which ∑Ni xij . This is a practical drawback depends only on the population mean Xi = Ni−1 j=1 of logistic linear mixed models and other nonlinear models because microdata for all individuals in a small area may not be available. Farrell, MacGibbon, and Tomberlin (1997c) obtained an approximation to P̂ EB depending only on the mean Xi and the i∗ ∑Ni ∑Ni T cross-product matrix j=1 xij xij with elements j=1 xija xijb , a, b = 1, , p. It uses a second-order multivariate Taylor expansion of p∗ij around xir , the mean vector of the nonsampled elements xij , j ∈ ri . However, the deviations xij − xir in the expansion are of order O(1), and hence terms involving higher powers of the deviations are not necessarily negligible. Farrell et al. (1997c) investigated the accuracy of the second-order approximation for various distributions of xTij 𝜷 + 𝑣i and provided some guidelines. 9.6
DISEASE MAPPING
Mapping of small area mortality (or incidence) rates of diseases such as cancer is a widely used tool in public health research. Such maps permit the analysis of the geographical variation in the rates of diseases, which may be useful in formulating and assessing etiological hypotheses, resource allocation, and identifying areas of unusually high-risk warranting intervention. Examples of disease rates studied in the literature include lip cancer rates in the 56 counties (small areas) of Scotland (Clayton and Kaldor 1987), incidence of leukemia in 281 census tracts (small areas) of upstate New York (Datta, Ghosh, and Waller 2000), stomach cancer mortality rates in Missouri cities (small areas) for males aged 47–64 years (Tsutakawa, Shoop, and Marienfeld 1985), all cancer mortality rates for white males in health service areas (small areas) of the United States (Nandram, Sedransk, and Pickle 1999), prostate cancer rates in Scottish counties (Langford et al. 1999), and infant mortality rates for local health areas (small areas) in British Columbia, Canada (Dean and MacNab 2001). We refer the reader to the October 2000 issue of Statistics in Medicine for a review of methods used in disease mapping. In disease mapping, data on event counts and related auxiliary variables are typically obtained from administrative sources, and sampling is not used. Suppose that the country (or the large region) used for disease mapping is divided into m nonoverlapping small areas. Let 𝜃i be the unknown relative risk (RR) in the ith area. A direct (or crude) estimator of 𝜃i is given by the standardized mortality ratio (SMR), 𝜃̂i = yi ∕ei , where yi and ei denote, respectively, the observed and expected number of deaths (cases) over a given period in area i, for i = 1, , m. The expected counts ei are calculated as (m ) m ∑ ∑ (9.6.1) yi ∕ ni , ei = ni i=1
i=1
where ni is the number of person-years at risk in the ith area, and then treated as fixed. Some authors use mortality (event) rates, 𝜏i , as parameters, instead of RRs, and
309
DISEASE MAPPING
a crude estimator of 𝜏i is then given by 𝜏̂i = yi ∕ni . The two approaches, however, are ∑m ∑ equivalent because the factor m i=1 yi ∕ i=1 ni is treated as a constant. ind
A common assumption in disease mapping is that yi |𝜃i Poisson(ei 𝜃i ). Under this assumption, the ML estimator of 𝜃i is the SMR, 𝜃̂i = yi ∕ei . However, a map of crude rates {𝜃̂i } can badly distort the geographical distribution of disease incidence or mortality because it tends to be dominated by areas of low population, ei , exhibiting extreme SMR’s that are least reliable. Note that V(𝜃̂i ) = 𝜃i ∕ei is large if ei is small. EB or HB methods provide reliable estimators of RR by borrowing strength across areas. As a result, maps based on EB or HB estimates are more reliable than crude maps. In this section, we give a brief account of EB methods based on simple linking models. Various extensions have been proposed in the literature, including bivariate models and models exhibiting spatial correlation. 9.6.1
Poisson–Gamma Model
We first study a two-stage model for count data {yi } similar to the beta-binomial ind Poisson(ei 𝜃i ), i = model for binary data. In the first stage, we assume that yi 1, , m. A “conjugate” model linking the RRs 𝜃i is assumed in the second stage: iid 𝜃i gamma(𝜈, 𝛼), where gamma(𝜈, 𝛼) denotes the gamma distribution with shape parameter 𝜈(> 0) and scale parameter 𝛼(> 0). Then, f (𝜃i |𝛼, 𝜈) =
𝛼 𝜈 −𝛼𝜃i 𝜈−1 𝜃i e Γ(𝜈)
(9.6.2)
and E(𝜃i ) = 𝜈∕𝛼 = 𝜇,
V(𝜃i ) = 𝜈∕𝛼 2 .
(9.6.3)
ind
Noting that 𝜃i |yi , 𝛼, 𝜈 gamma(yi + 𝜈, ei + 𝛼), the Bayes estimator of 𝜃i and the posterior variance of 𝜃i are obtained from (9.6.3) by changing 𝛼 to ei + 𝛼 and 𝜈 to yi + 𝜈, that is, (9.6.4) 𝜃̂iB (𝛼, 𝜈) = E(𝜃i |yi , 𝛼, 𝜈) = (yi + 𝜈)∕(ei + 𝛼) and V(𝜃i |yi , 𝛼, 𝜈) = g1i (𝛼, 𝜈, yi ) = (yi + 𝜈)∕(ei + 𝛼)2 .
(9.6.5)
We can obtain ML estimators of 𝛼 and 𝜈 from the marginal distribution, yi |𝛼, 𝜈 negative binomial, using the loglikelihood [y −1 m i ∑ ∑
] log(𝜈 + h) + 𝜈 log(𝛼) − (yi + 𝜈) log (ei + 𝛼) .
l(𝛼, 𝜈) = i=1
iid
(9.6.6)
h=0
Closed-form expressions for 𝛼̂ ML and 𝜈̂ML do not exist. Marshall (1991) obtained simple moment estimators by equating the weighted sample mean 𝜃̂e⋅ = ∑ 2 −1 ∑m (e ∕e )(𝜃̂ − 𝜃̂ )2 ̂ m−1 m i e⋅ 𝓁=1 (e𝓁 ∕e⋅ )𝜃𝓁 and the weighted sample variance se = m i=1 i ⋅
310
EMPIRICAL BAYES (EB) METHOD
to their expected values and then solving the resulting moment equations for 𝛼 and 𝜈, ∑ where e⋅ = m ̂ and 𝜈, ̂ given by i=1 ei ∕m. This leads to moment estimators, 𝛼 𝜈∕ ̂ 𝛼̂ = 𝜃̂e⋅
(9.6.7)
𝜈∕ ̂ 𝛼̂ 2 = s2e − 𝜃̂e⋅ ∕e⋅ .
(9.6.8)
and
Lahiri and Maiti (2002) provided more efficient moment estimators. The moment estimators may also be used as starting values for ML iterations. We substitute the moment estimators 𝛼̂ and 𝜈̂ into (9.6.4) to get an EB estimator of 𝜃i as 𝜃̂iEB = 𝜃̂iB (𝛼, ̂ 𝜈) ̂ = 𝛾̂i 𝜃̂i + 1( − 𝛾̂i )𝜃̂e⋅ , (9.6.9) where 𝛾̂i = ei ∕(ei + 𝛼). ̂ Note that 𝜃̂iEB is a weighted average of the direct estimator ̂ (SMR) 𝜃i and the synthetic estimator 𝜃̂e⋅ , and more weight is given to 𝜃̂i as the ith area expected deaths, ei , increases. If s2e < 𝜃̂e⋅ ∕e⋅ , then 𝜃̂iEB is taken as the synthetic estimator 𝜃̂e⋅ . The EB estimator is nearly unbiased for 𝜃i in the sense that its bias is of order m−1 , for large m. As in the binary case, the jackknife method may be used to obtain a second-order unbiased estimator of MSE(𝜃̂iEB ). We obtain the jackknife estimator, mseJ (𝜃̂iEB ) from EB = k (y , 𝛼 ̂ 𝜈) ̂ for p̂ EB and 𝜃̂i,−𝓁 ̂ EB (9.5.13) by substituting 𝜃̂iEB = ki (yi , 𝛼, i i ̂ −𝓁 , 𝜈̂−𝓁 ) for p i i,−𝓵 in (9.5.11) and using g1i (𝛼, ̂ 𝜈, ̂ yi ) and g1i (𝛼̂ −𝓁 , 𝜈̂−𝓁 , yi ) in (9.5.12), where 𝛼̂ −𝓁 and 𝜈̂−𝓁 are the delete−𝓁 moment estimators obtained from {(yi , ei ); i = 1, , 𝓁 − 1, 𝓁 + 1, , m}. Note that mseJ (𝜃̂iEB ) is area-specific in the sense that it depends on yi . Lahiri and Maiti (2002) obtained a Taylor expansion estimator of MSE, using a parametric bootstrap to estimate the covariance matrix of (𝛼, ̂ 𝜈). ̂ EB estimators of RRs 𝜃i can be improved by extending the linking gamma model on the 𝜃i ’s to allow for area level covariates, zi , such as degree of urbanization of areas. Clayton and Kaldor (1987) allowed varying scale parameters, 𝛼i , and assumed a log-linear model on E(𝜃i ) = 𝜈∕𝛼i given by log [E(𝜃i )] = zTi 𝜷. EB estimation for this extension is implemented by changing 𝛼 to 𝛼i in (9.6.4) and (9.6.5) and using ML or moment estimators of 𝜈 and 𝜷. Christiansen and Morris (1997) studied the Poisson-gamma regression model in detail, and proposed accurate approximations to the posterior mean and the posterior variance of 𝜃i . The posterior mean approximation is used as the EB estimator and the posterior variance approximation as a measure of its variability. 9.6.2
Log-normal Models
Log-normal two-stage models have also been proposed. The first-stage model is not iid changed, but the second-stage linking model is changed to 𝜉i = log (𝜃i ) N(𝜇, 𝜎 2 ), i = 1, , m in the case of no covariates. As in the case of logit-normal models, implementation of EB is more complicated for the log-normal model because no
311
DISEASE MAPPING
closed-form expression for the Bayes estimator, 𝜃̂iB (𝜇, 𝜎 2 ), and the posterior variance, V(𝜃i |yi , 𝜇, 𝜎 2 ), exist. Clayton and Kaldor (1987) approximated the joint posterior density of 𝝃 = (𝜉1 , , 𝜉m )T , f (𝝃|y, 𝜇, 𝜎 2 ), for y = (y1 , , ym )T , by a multivariate normal distribution, which gives an explicit approximation to the BP estimator 𝜉̂iB of 𝜉i . ML estimators of model parameters 𝜇 and 𝜎 2 were obtained using the EM algorithm, and then used in the approximate formula for 𝜉̂iB to get EB estimators 𝜉̂iEB of 𝜉i and 𝜃̂iEB = exp(𝜉̂iEB ) of 𝜃i . The EB estimator 𝜃̂iEB , however, is not nearly unbiased for 𝜃i . We can employ numerical integration, as done in Section 9.5.1, to get nearly unbiased EB estimators, but we omit details here. Moment estimators of 𝜇 and 𝜎 (Jiang and Zhang 2001) may be used to simplify the calculation of a jackknife estimator of MSE(𝜃̂iEB ). The above basic log-normal model readily extends to the case of covariates, zi ind by considering the linking model 𝜉i = log(𝜃i ) N(zTi 𝜷, 𝜎 2 ). Also, the basic model can be extended to allow spatial correlations; mortality data sets often exhibit significant spatial relationships between the log RRs, 𝜉i = log(𝜃i ). A simple conditional autoregression (CAR) normal model on 𝝃 assumes that 𝝃 is multivariate normal, with m ∑
E(𝜉i |𝜉𝓁 , 𝓁 ≠ i) = 𝜇 + 𝜌
qi𝓁 (𝜉𝓁 − 𝜇),
(9.6.10)
𝓁=1 𝓁≠i
V(𝜉i |𝜉𝓁 , 𝓁 ≠ i) = 𝜎 2 ,
(9.6.11)
where 𝜌 is the correlation parameter and Q = (qi𝓁 ) is the “adjacency” matrix of the map, with elements qi𝓁 = 1 if i and 𝓁 are adjacent areas and qi𝓁 = 0 otherwise. It follows from Besag (1974) that 𝝃 is multivariate normal with mean 𝝁 = 𝜇𝟏 and covariance matrix 𝚺 = 𝜎 2 (I − 𝜌Q−1 ), where 𝜌 is bounded above by the inverse of the largest eigenvalue of Q. Clayton and Kaldor (1987) approximated the posterior density, f (𝝃|y, 𝜇, 𝜎 2 , 𝜌), similar to the log-normal case. The assumption (9.6.11) of a constant conditional variance for the 𝜉i ’s results in the conditional mean (9.6.10) proportional to the sum, rather than the mean, of the neighboring 𝜉i ’s. Clayton and Bernardinelli (1992) proposed an alternative joint density of the 𝜉i ’s given by
2 −m∕2
f (𝝃) ∝ (𝜎 )
⎤ ⎡ m m ⎥ ⎢ 1 ∑∑ 2 exp ⎢− 2 (𝜉i − 𝜉𝓁 ) qi𝓁 ⎥ . ⎥ ⎢ 2𝜎 i=1 𝓁=1 ⎦ ⎣ 𝓁≠i
(9.6.12)
This specification leads to (
)−1
m ∑
E(𝜉i |𝜉𝓁 , 𝓁 ≠ i) =
m ∑
qi𝓁 𝓁=1
qi𝓁 𝜉𝓁 𝓁=1
(9.6.13)
312
EMPIRICAL BAYES (EB) METHOD
and
( V(𝜉i |𝜉𝓁 , 𝓁 ≠ i) = 𝜎
)−1
m ∑
2
qi𝓁
.
(9.6.14)
𝓁=1
∑ Note that the conditional variance is now inversely proportional to m 𝓁=1 qi𝓁 , the number of neighbors of area i, and the conditional mean is equal to the mean of the neighboring values 𝜉𝓁 . In the context of disease mapping, the alternative specification may be more appropriate. Example 9.6.1. Lip Cancer. Clayton and Kaldor (1987) applied EB estimation to data on observed cases, yi , and expected cases, ei , of lip cancer registered during the period 1975–1980 in each of 56 counties (small areas) of Scotland. They reported the SMR, the EB estimate of 𝜃i based on the Poisson-gamma model, denoted 𝜃̂iEB (1), and the approximate EB estimates of 𝜃i based on the log-normal model and the CAR-normal model, denoted 𝜃̂iEB (2) and 𝜃̂iEB (3), for each of the 56 counties (all values multiplied by 100). The SMR values varied between 0 and 652, while the EB estimates showed considerably less variability across counties, as expected: 𝜃̂iEB (1) varied between 31 and 422 (with CV=0.78) and 𝜃̂iEB (2) varied between 34 and 495 (with CV=0.85), suggesting little difference between the two sets of EB estimates. Ranks of EB estimates differed little from the corresponding ranks of the SMRs for most counties, despite less variability exhibited by the EB estimates. Turning to the CAR-normal model, the adjacency matrix, Q, was specified by listing adjacent counties for each county i. The ML estimate of 𝜌 was 0.174 compared to the upper bound of 0.175, suggesting a high degree of spatial relationship in the data set. Most of the CAR estimates, 𝜃̂iEB (3), differed little from the corresponding estimates 𝜃̂iEB (1) and 𝜃̂iEB (2) based on the independence assumption. Counties with few cases, yi , and SMRs differing appreciably from adjacent counties are the only counties affected substantially by spatial correlation. For example, county number 24 EB (3) = with y24 = 7 is adjacent to several low-risk counties, and the CAR estimate 𝜃̂24 EB EB 83.5 is substantially smaller than 𝜃̂24 (1) = 127.7 and 𝜃̂24 (2) = 123.6, based on the independence assumption. 9.6.3
Extensions
Various extensions of the disease mapping models studied in Sections 9.6.1 and 9.6.2 have been proposed in the recent literature. DeSouza (1992) proposed a two-stage, bivariate logit-normal model to study joint RRs (or mortality rates), 𝜃1i and 𝜃2i , of two cancer sites (e.g., lung and large bowel cancers), or two groups (e.g., lung cancer in males and females) over several geographical areas. Denote the observed and expected number of deaths at the two sites as (y1i , y2i ) and , m). The first stage assumes that (e1i , e2i ), respectively, for the ith area (i = 1, ind , m, where ∗ denotes (y1i , y2i )|(𝜃1i , 𝜃2i ) Poisson(e1i 𝜃1i ) ∗ Poisson(e2i 𝜃2i ), i = 1, that f (y1i , y2i |𝜃1i , 𝜃2i ) = f (y1i |𝜃1i )f (y2i |𝜃2i ). The joint risks (𝜃1i , 𝜃2i ) are linked in the
*DESIGN-WEIGHTED EB ESTIMATION: EXPONENTIAL FAMILY MODELS
313
second stage by assuming that the vectors (logit(𝜃1i ), logit(𝜃2i )) are independent, bivariate normal with means 𝜇1 , 𝜇2 ; standard deviations 𝜎1 and 𝜎2 ; and correlation 𝜌, denoted N(𝜇1 , 𝜇2 , 𝜎1 , 𝜎2 , 𝜌). Bayes estimators of 𝜃1i and 𝜃2i involve double integrals, which may be calculated numerically using Gauss–Hermite quadrature. EB estimators are obtained by substituting ML estimators of model parameters in the Bayes estimators. DeSouza (1992) applied the bivariate EB method to two different data sets consisting of cancer mortality rates in 115 counties of the state of Missouri during 1972–1981: (i) lung and large bowel cancers; (ii) lung cancer in males and females. The EB estimates based on the bivariate model lead to improved efficiency for each site (group) compared to the EB estimates based on the univariate logit-normal model, because of significant correlation; 𝜌̂ = 0.54 for data set (i) and 𝜌̂ = 0.76 for data set (ii). Kass and Steffey’s (1989) first-order approximation to the posterior variance was used as a measure of variability of the EB estimates. Kim, Sun, and Tsutakawa (2001) extended the bivariate model by introducing spatial correlations (via CAR) and covariates into the model. They used a HB approach instead of the EB approach. They applied the bivariate spatial model to male and female lung cancer mortality in the State of Missouri, and constructed disease maps of male and female lung cancer mortality rates by age group and time period. Dean and MacNab (2001) extended the Poisson-gamma model to handle nested data structures, such as a hierarchical health administrative structure consisting of health districts, i, in the first level and local health areas, j, within districts in the , m). The data consist of incidence or mortality second level (j = 1, , ni , i = 1, counts, yij , and the corresponding population at risk counts, nij . Dean and MacNab (2001) derived EB estimators of the local health area rates, 𝜃ij , using a nested error Poisson-gamma model. The Bayes estimator of 𝜃ij is a weighted combination of the crude local area rate, yij ∕nij , the correspond crude district rate yi⋅ ∕ni⋅ , and the over∑ni ∑ all rate y⋅⋅ ∕n⋅⋅ , where yi⋅ = j=1 yij and y⋅⋅ = m i=1 yi⋅ , and (ni⋅ , n⋅⋅ ) similarly defined. Dean and MacNab (2001) used the Kass and Steffey (1989) first-order approximation to posterior variance as a measure of variability. They applied the nested error model to infant mortality data from the province of British Columbia, Canada.
9.7 *DESIGN-WEIGHTED EB ESTIMATION: EXPONENTIAL FAMILY MODELS In Section 7.6.2, we studied pseudo-EBLUP estimation of area means under the H, basic unit level model (7.1.1) with kij = 1. The pseudo-EBLUP estimator 𝜇̂ iw given by (7.6.8), takes account of the design weights, 𝑤ij , and in turn leads to a design-consistent estimator of the area mean 𝜇i . Ghosh and Maiti (2004) extended the pseudo-EBLUP approach to exponential family models. Their results are applicable to the case of area level covariates xi , and the basic data available to the ∑ni , m but not the 𝑤̃ ij yij , i = 1, user consist of the weighted area means yiw = j=1 unit level values, yij . , ni , are independently and For a given 𝜃i , we assume that the yij , j = 1, identically distributed with probability density function f (yij |𝜃i ) belonging to the
314
EMPIRICAL BAYES (EB) METHOD
natural exponential family with canonical parameter 𝜃i and quadratic variance function. Specifically, f (yij |𝜃i ) = exp{𝜃i yij − a(𝜃i ) + b(yij )}
(9.7.1)
with E(yij |𝜃i ) = a′ (𝜃i ) = 𝜈i ,
V(yij |𝜃i ) = a′′ (𝜃i ) = c(𝜈i ),
(9.7.2)
where c(𝜈i ) = d0 + d1 𝜈i + d2 𝜈i2 . This family covers the Bernouilli distribution (d0 = 0, d1 = 1, d2 = −1) and the Poisson distribution (d0 = d2 = 0, d1 = 1). In the special case of yij Bernouilli(pi ), we have 𝜈i = pi , 𝜃i = log [pi (1 − pi )] and a(𝜃i ) = − log (1 − pi ), where pi = e𝜃i ∕(1 + e𝜃i ). The canonical parameters 𝜃i are assumed to obey the conjugate family with density function f (𝜃i ) = C(𝝀, hi ) exp{𝝀[hi 𝜃i − a(𝜃i )]},
(9.7.3)
where hi = a′ (zTi 𝜷) is the canonical link function and 𝝀 > max(0, d2 ). The mean and variance of 𝜈i are given by E(𝜈i ) = hi ,
V(𝜈i ) = c(hi )(𝝀 − d2 )−1
(9.7.4)
(Morris 1983c). It now follows from (9.7.2) and (9.7.4) that E(yiw ) = hi ,
V(yiw ) = c(hi )(𝝀 − d2 )−1 (1 + 𝝀𝛿2i ) =∶ 𝜙i c(hi ),
(9.7.5)
and Cov(yiw , 𝜈i ) = c(hi )(𝝀 − d2 )−1 , where 𝛿2i =
∑ni j=1
(9.7.6)
𝑤̃ 2ij . The BLUP estimator of 𝜈i based on yiw is given by GM = hi + 𝜈̃iw
Cov(yiw , 𝜈i ) (yiw − hi ) V(yiw )
= riw yiw + 1( − riw )hi ,
(9.7.7) (9.7.8)
where riw = (1 + 𝝀𝛿2i )−1 . Expression (9.7.8) follows from (9.7.5) and (9.7.6). Note that (9.7.8) requires the knowledge of 𝛿2i . The BLUP estimator (9.7.8) depends on the unknown parameters 𝜷 and 𝝀. The marginal distribution of yiw is not tractable, and hence it cannot be used to estimate 𝜷 and 𝝀. Therefore, Ghosh and Maiti (2004) proposed an estimating function approach by combining the elementary unbiased estimating functions , m, where u1i = yiw − hi and u2i = (yiw − hi )2 − 𝜙i c(hi ) ui = (ui1 , ui2 )T , i = 1, with E(u1i ) = 0 and E(u2i ) = 0. The resulting optimal estimating equations (Godambe and Thompson 1989) require the evaluation of third and fourth moments ∑ni ∑ni 𝑤̃ 4ij . Estimators 𝜷̂ and 𝝀̂ are 𝑤̃ 3ij and 𝛿4i = j=1 of yiw , which depend on 𝛿3i = j=1
315
TRIPLE-GOAL ESTIMATION
obtained by solving the optimal estimating equations iteratively. We refer the reader to Ghosh and Maiti (2004) for details. Substituting 𝜷̂ and 𝝀̂ for 𝜷 and 𝝀 in (9.7.8), we obtain an EBLUP estimator of 𝜈i as GM 𝜈̂iw = r̂iw yiw + 1( − r̂iw )ĥ i , (9.7.9) ̂ 2i )−1 and ĥ i = a′ (zT 𝜷), ̂ i = 1, where r̂iw = (1 + 𝝀𝛿 , m. i Ghosh and Maiti (2004) also obtained a second-order approximation to GM ), using the linearization method. Using this approximation, a second-order MSE(𝜈̂iw unbiased MSE estimator is also obtained. Example 9.7.1. Poverty Proportions. Ghosh and Maiti (2004) applied the EBLUP estimator (9.7.9) to m = 39 county poverty proportions p̂ iw = yiw of poor school-age children of a certain U.S. state for the year 1989, where yij is 1 or 0 according as the jth ∑ni sample child in county i is poor or not poor. The estimates yiw and j=1 𝑤̃ tij , t = 2, 3, 4 from the March supplement of the Current Population Survey were provided by the Census Bureau. The responses yij follow Bernouilli(pi = 𝜈i ), and the link function is taken as log[hi ∕(1 − hi )] = 𝛽0 + 𝛽1 z1i + 𝛽2 z2i + 𝛽3 z3i + 𝛽4 z4i = zTi 𝜷, z4i for county i are taken as z1i = log(proportion of child where the covariates z1i , exemptions reported by families in poverty on tax returns), z2i = log(proportion of people receiving food stamps), z3i = log(proportion of child exemptions on tax returns), and z4i = log(proportion of poor school-age children estimated from the previous census. GM = 𝜈̂iw were compared to the estimate p̂ FH The estimates p̂ iw = yiw and p̂ GM iw iw iid obtained under the FH model 𝜃̂i = log(yiw ) = zT 𝜷 + 𝑣i + ei with 𝑣i N(0, 𝜎𝑣2 ) and i
ind
ei N(0, 𝜓i = 9.97∕ni ); the known sampling variance is based on the value 9.97 supplied by the Census Bureau. Note that the above model is different from the poverty counts model of the Census Bureau (e.g., 6.1.2). Note also that the counties with zero poor school-age children cause difficulty with the FH model because yiw = 0 for those counties. On the other hand, this is not a problem with the models (9.7.1) and (9.7.3) used to derive p̂ GM . iw An external evaluation was conducted by comparing the three estimates for 1989 to the corresponding 1990 long-form census estimate for 1989, in terms of absolute relative error (ARE) averaged over the counties. Average ARE values reported are , and 0.47 for the 0.54 for the direct estimates yiw , 0.66 for the FH estimates p̂ FH iw Ghosh–Maiti estimates p̂ GM . These values suggest that the Ghosh–Maiti estimator iw performs somewhat better than the other two estimators.
9.8
TRIPLE-GOAL ESTIMATION
We have focused so far on the estimation of area-specific parameters (means, RRs, etc.), but in some applications the main objective is to produce an ensemble of parameter estimates whose distribution is in some sense close enough to the distribution
316
EMPIRICAL BAYES (EB) METHOD
of area-specific parameters, 𝜃i . For example, Spjøtvoll and Thomsen (1987) were interested in finding how 100 municipalities in Norway were distributed according to proportions of persons in the labor force. By comparing with the actual distribution in their example, they showed that the EB estimates, 𝜃̂iEB , under a simple area level ∑ model distort the distribution by overshrinking toward the synthetic component ̂ 𝜃̂⋅ = m i=1 𝜃i ∕m. In particular, the variability of the EB estimates was smaller than the variability of the 𝜃i ’s. On the other hand, the set of direct estimates {𝜃̂i } were overdispersed in the sense of variability larger than the variability of the 𝜃i ’s. We are also often interested in the ranks of the 𝜃i ’s (e.g., ranks of schools, hospitals or geographical areas) or in identifying domains (areas) with extreme 𝜃i ’s. Ideally, it is desirable to construct a set of “triple-goal” estimates that can produce good ranks, a good histogram, and good area-specific estimates. However, simultaneous optimization is not feasible, and it is necessary to seek a compromise set that can strike an effective balance between the three goals (Shen and Louis 1998). 9.8.1
Constrained EB ind
iid
Consider a two-stage model of the form 𝜃̂i |𝜃i f (𝜃̂i |𝜃i , 𝝀1 ) and 𝜃i f (𝜃i |𝝀2 ), i = 1, , m, where 𝝀 = (𝝀T1 , 𝝀T2 ) is the vector of model parameters. The set of direct estimators {𝜃̂i } are generally overdispersed under this model. For iid ( 𝜓) independent of example, consider the simple model 𝜃̂i = 𝜃i + ei with ei 0, iid iid 2 2 ̂ ( 𝜎𝑣 ), i = 1, , m. Noting that 𝜃i 𝜇, ( 𝜓 + 𝜎𝑣 ), it immediately follows 𝜃i 𝜇, that ] ] [ [ m m ∑ 1 1 ∑ ̂ 2 2 2 2 (𝜃 − 𝜃̂⋅ ) = 𝜓 + 𝜎𝑣 > 𝜎𝑣 = E (𝜃 − 𝜃⋅ ) , (9.8.1) E m − 1 i=1 i m − 1 i=1 i ∑ ̂ ∑m ̂B ̂ where 𝜃̂⋅ = m i=1 𝜃i ∕m and 𝜃⋅ = i=1 𝜃i ∕m. On the other hand, if 𝜃i = E(𝜃i |𝜃i , 𝝀) denotes the Bayes estimator of 𝜃i under squared error, the set of Bayes estimators ind {𝜃̂iB } exhibit underdispersion under the two-stage model 𝜃̂i |𝜃i f (𝜃̂i |𝜃i , 𝝀1 ) and 𝜃i
iid
f (𝜃i |𝝀2 ). Specifically, we have [ ] m m m ∑ ∑ ∑ 1 1 2 ̂ ̂ + 1 E (𝜃i − 𝜃⋅ ) |𝜽 = V(𝜃i − 𝜃⋅ |𝜽) (𝜃̂ B − 𝜃̂⋅B ) m − 1 i=1 m − 1 i=1 m − 1 i=1 i
2
(9.8.2) m
>
1 ∑ ̂B ̂B 2 ( 𝜃 − 𝜃⋅ ) , m − 1 i=1 i
(9.8.3)
∑ ̂B where 𝜽̂ = (𝜃̂1 , , 𝜃̂m )T , 𝜃̂⋅B = m the dependence on 𝝀 is suppressed i=1 𝜃i ∕m, and ∑m ̂ B ̂ B 2 ∑ 2 for simplicity. It follows from (9.8.3) that E[ m i=1 (𝜃i − 𝜃⋅ ) ] > E[ i=1 (𝜃i − 𝜃⋅ ) ]. ̂ which, in However, note that {𝜃̂iB } match the ensemble mean because 𝜃̂⋅B = E(𝜃⋅ |𝜽) B ̂ turn, implies E(𝜃⋅ ) = E(𝜃⋅ ).
317
TRIPLE-GOAL ESTIMATION
We can match the ensemble variance by finding the estimators t1 , , tm that min∑ 2 |𝜽] ̂ imize the posterior expected squared error loss E[ m (𝜃 − t ) subject to the i i=1 i constraints t⋅ = 𝜃̂⋅B [ ] m m 1 ∑ 1 ∑ 2 2 ̂ (t − t ) = E (𝜃 − 𝜃⋅ ) |𝜽 , m − 1 i=1 i ⋅ m − 1 i=1 i
(9.8.4) (9.8.5)
∑ where t⋅ = m i=1 ti ∕m. Using Lagrange multipliers, we obtain the constrained Bayes (CB) estimators {𝜃̂iCB } as the solution to the minimization problem, where ̂ 𝝀)(𝜃̂ B − 𝜃̂⋅B ) ti, opt = 𝜃̂iCB = 𝜃̂⋅B + a(𝜽, i with
{ ̂ 𝝀) = a(𝜽,
1+ [1∕(m −
∑m
̂
i=1 V(𝜃i |𝜃i , 𝝀) ∑ ̂B ̂B 2 1)] m i=1 (𝜃i − 𝜃⋅ )
(1∕m)
(9.8.6) }1∕2 .
(9.8.7)
Louis (1984) derived the CB estimator under normality. Ghosh (1992b) obtained (9.8.6) for arbitrary distributions. A proof of (9.8.6) is given in Section 9.12.3. It fol∑ ̂ 𝝀) ̂ CB ̂ CB 2 ∑m (𝜃̂ B − 𝜃̂⋅B )2 because the term a(𝜽, lows from (9.8.6) that m i=1 (𝜃i − 𝜃⋅ ) > i=1 i in (9.8.6) is greater than 1 and 𝜃̂⋅CB = 𝜃̂⋅B , that is, the variability of {𝜃̂iCB } is larger than ∑ ̂ CB ̂ CB 2 that of {𝜃̂iB }. Note that the constraint (9.8.5) implies that E[ m i=1 (𝜃i − 𝜃⋅ ) ] = ∑m CB 2 ̂ E[ i=1 (𝜃i − 𝜃⋅ ) ], so that {𝜃i } matches the variability of {𝜃i }. ̂ 𝝀)} The CB estimator 𝜃̂iCB is a function of the set of posterior variances {V(𝜃i |𝜽, ̂ 𝝀)}. Replacing the model parameters 𝝀 and the set of Bayes estimators {𝜃̂iB = E(𝜃i |𝜽, ̂ we obtain an empirical CB (ECB) estimator 𝜃̂ ECB = 𝜃̂ CB (𝝀). ̂ by suitable estimators 𝝀, i i Example 9.8.1. Simple Model. We illustrate the calculation of 𝜃̂iCB for the simple iid iid model 𝜃̂i = 𝜃i + ei , with ei N(0, 𝜓) and independent of 𝜃i N(𝜇, 𝜎𝑣2 ). The Bayes estimator is given by 𝜃̂iB = 𝛾 𝜃̂i + 1( − 𝛾)𝜇, where 𝛾 = 𝜎𝑣2 ∕(𝜎𝑣2 + 𝜓). Furthermore, ∑m ̂ B ̂ B 2 2 ∑m (𝜃̂ − 𝜃̂ )2 . V(𝜃i |𝜃̂i , 𝝀) = 𝛾𝜓, 𝜃̂⋅B = 𝛾 𝜃̂⋅ + 1( − 𝛾)𝜇 and ⋅ i=1 (𝜃i − 𝜃⋅ ) = 𝛾 i=1 i Hence, { 𝜃̂iCB = [𝛾 𝜃̂⋅ + 1( − 𝛾)𝜇] +
𝜓∕𝛾 1+ ∑ ̂ ̂ 2 [1∕(m − 1)] m i=1 (𝜃i − 𝜃⋅ )
}1∕2 𝛾(𝜃̂i − 𝜃̂⋅ ).
(9.8.8) ∑ iid ̂i − 𝜃̂⋅ )2 conNoting that 𝜃̂i N(𝜇, 𝜓 + 𝜎𝑣2 ), it follows that 𝜃̂⋅ and (m − 1)−1 m ( 𝜃 i=1 verge in probability to 𝜇 and 𝜓 + 𝜎𝑣2 = 𝜓∕(1 − 𝛾), respectively, as m → ∞. Hence, 𝜃̂iCB ≈ 𝛾 1∕2 𝜃̂i + 1( − 𝛾 1∕2 )𝜇.
(9.8.9)
318
EMPIRICAL BAYES (EB) METHOD
It follows from (9.8.9) that the weight attached to the direct estimator is larger than the weight used by the Bayes estimator, and the shrinkage toward the synthetic component 𝜇 is reduced. Assuming normality, Ghosh (1992b) proved that the total MSE, ∑m ∑m 2 ̂ CB ̂ CB i=1 MSE(𝜃i ) = i=1 E(𝜃i − 𝜃i ) , of the CB estimators is smaller than the total MSE of the direct estimators if m ≥ 4. Hence, the CB estimators perform better than the direct estimators, but are less efficient than the Bayes estimators. Shen and Louis (1998) studied the performance of CB estimators for exponential families f (𝜃̂i |𝜃i , 𝝀1 ) with conjugate f (𝜃i |𝝀2 ). They showed that, for large m, CB estimators are always more efficient than the direct estimators in terms of total MSE. Further, the maximum loss in efficiency relative to the Bayes estimators is 24%. Note that the exponential families cover many commonly used distributions, including the binomial, Poisson, normal, and gamma distributions.
9.8.2
Histogram
The empirical distribution function based on the CB estimators is given by ∑ ̂ CB ≤ t), −∞ < t < ∞, where I(𝜃i ≤ t) = 1 if 𝜃i ≤ t and FmCB (t) = m−1 m i=1 I(𝜃i I(𝜃i ≤ t) = 0 otherwise. The estimator FmCB (t) is generally not consistent for the true distribution of the 𝜃i ’s as m → ∞, though the CB approach matches the first and second moments. As a result, FmCB (t) can perform poorly as an estimator of ∑ Fm (t) = m−1 m i=1 I(𝜃i ≤ t). An “optimal” estimator of Fm (t) is obtained by minimizing the posterior expected ̂ The optimal A(⋅) is given by integrated squared error loss E[∫ {A(t) − Fm (t)}2 dt|𝜽]. m
Aopt (t) = F m (t) =
1∑ P(𝜃i ≤ t|𝜃̂i ). m i=1
(9.8.10)
Adding the constraint that A(⋅) is a discrete distribution with at most m mass points, 2𝓁−1 the optimal estimator F̂ m (⋅) is discrete with mass 1∕m at Û 𝓁 = F −1 m ( 2m ), 𝓁 = 1, , m; see Shen and Louis (1998) for a proof. Note that Û 𝓁 depends on model parameters, 𝝀, and an EB version of Û 𝓁 is obtained by substituting a suitable estimator 𝝀̂ for 𝝀.
9.8.3
Ranks
How good are the ranks based on the BP estimators 𝜃̂iB compared to those based on the true (realized but unobservable) values 𝜃i ? In the context of BLUP estimation under the linear mixed model, Portnoy (1982) showed that ranking based on the BLUP estimators is “optimal” in the sense of maximizing the probability of correctly ranking with respect to the true values 𝜃i . Also, the ranks based on the 𝜃̂iB ’s often agree with the ranks based on the direct estimators 𝜃̂i , and it follows from (9.8.6) that the ranks based on the CB estimators 𝜃̂iCB are always identical to the ranks based on the 𝜃̂iB ’s.
319
EMPIRICAL LINEAR BAYES
∑ Let R(i) be the rank of the true 𝜃i , that is, R(i) = m 𝓁=1 I(𝜃i ≥ 𝜃𝓁 ). Then the “optimal” estimator of R(i) that minimizes the expected posterior squared error loss ∑ 2 ̂ E[ m i=1 (Q(i) − R(i)) |𝜽] is given by the Bayes estimator m ∑
̂ = Qopt (i) = R̃ B (i) = E[R(i)|𝜽]
̂ P(𝜃i ≥ 𝜃𝓁 |𝜽).
(9.8.11)
𝓁=1
Generally, the estimators R̃ B (i) are not integers, so we rank the R̃ B (i) to produce integer ranks R̂ B (i) = rank of R̃ B (i) in the set {R̃ B (1), , R̃ B (m)}. TG ̂ ̂ Shen and Louis (1998) proposed 𝜃i = UR̂ B (i) as a compromise triple-goal estimator of the realized 𝜃i . The set {𝜃̂iTG } is optimal for estimating the distribution Fm (t) as well as the ranks {R(i)}. Simulation results indicate that the proposed method performs better than the CB method and achieves the three inferential goals.
9.9
EMPIRICAL LINEAR BAYES
EB methods studied so far are based on distributional assumptions on 𝜃̂i |𝜃i and 𝜃i . Empirical linear Bayes (ELB) methods avoid distributional assumptions by specifying only the first and second moments, but confining to the linear class of estimators, as in the case of EBLUP for the linear mixed models. Maritz and Lwin (1989) provide an excellent account of linear Bayes (LB) methods.
9.9.1
LB Estimation
ind ind We assume a two-stage model of the form 𝜃̂i |𝜃i 𝜃(i , 𝜓i (𝜃i )) and 𝜃i 𝜇( i , 𝜎i2 ), ind i = 1, , m. Then, we have 𝜃̂i 𝜇( i , 𝜓i + 𝜎i2 ) unconditionally, where 𝜓i = E[𝜓i (𝜃i )]. This result follows by noting that E(𝜃̂i ) = E[E(𝜃̂i |𝜃i )] = E(𝜃i ) = 𝜇i and V(𝜃̂i ) = E[V(𝜃̂i |𝜃i )] + V[E(𝜃̂i |𝜃i )] = E[𝜓i (𝜃i )] + V(𝜃i ) = 𝜓i + 𝜎i2 . We consider a linear class of estimators of the realized 𝜃i of the form ai 𝜃̂i + bi and then minimize the unconditional MSE, E(ai 𝜃̂i + bi − 𝜃i )2 with respect to the constants ai and bi . The optimal estimator, called the LB estimator, is given by
𝜃̂iLB = 𝜇i + 𝛾i (𝜃̂i − 𝜇i ) = 𝛾i 𝜃̂i + 1( − 𝛾i )𝜇i ,
(9.9.1)
where 𝛾i = 𝜎i2 ∕(𝜓i + 𝜎i2 ) (see Griffin and Krutchkoff 1971 and Hartigan 1969). A proof of (9.9.1) is given in Section 9.12.4. The LB estimator (9.9.1) involves 2m parameters (𝜇i , 𝜎i2 ), i = 1, , m. In practice, we need to assume that 𝜇i and 𝜎i2 depend on a fixed set of parameters 𝝀 in order to “borrow strength”. The MSE of 𝜃̂iLB is given by (9.9.2) MSE(𝜃̂iLB ) = E(𝜃̂iLB − 𝜃i )2 = 𝛾i 𝜓i .
320
EMPIRICAL BAYES (EB) METHOD
We estimate 𝝀 by the method of moments and use the estimator 𝝀̂ in (9.9.1) to obtain the ELB estimator, given by 𝜃̂iELB = 𝛾̂i 𝜃̂i + 1( − 𝛾̂i )𝜇̂ i ,
(9.9.3)
̂ and 𝛾̂i = 𝛾i (𝝀). ̂ A naive estimator of MSE(𝜃̂ ELB ) is obtained as where 𝜇̂ i = 𝜇i (𝝀) i mseN (𝜃̂iELB ) = 𝛾̂i 𝜓̂ i ,
(9.9.4)
̂ But the naive estimator underestimates the MSE because it where 𝜓̂ i = 𝜓i (𝝀). ̂ It is difficult to find approximately ignores the variability associated with 𝝀. unbiased MSE estimators without further assumptions. In general, MSE(𝜃̂iELB ) ≠ MSE(𝜃̂iLB ) + E(𝜃̂iELB − 𝜃̂iLB )2 because of the nonzero covariance term E(𝜃̂iLB − 𝜃i )(𝜃̂iELB − 𝜃̂iLB ). As a result, the jackknife method is not applicable here without further assumptions. To illustrate ELB estimation, suppose that 𝜃i is the RR and 𝜃̂i = yi ∕ei is the SMR for the ith area. We assume a two-stage model of the form E(𝜃̂i |𝜃i ) = 𝜃i , 𝜓i (𝜃i ) = V(𝜃̂i |𝜃i ) = 𝜃i ∕ei and E(𝜃i ) = 𝜇i = 𝜇, V(𝜃i ) = 𝜎i2 = 𝜎 2 . The LB estimator is given by (9.9.1) with 𝜇i = 𝜇 and 𝛾i = 𝜎 2 ∕(𝜎 2 + 𝜇∕ei ). Note that the conditional first and second moments of yi are identical to the Poisson moments. We obtain moment esti∑ ̂ mators of 𝜇 and 𝜎 2 by equating the weighted sample mean 𝜃̂e⋅ = m−1 m i=1 (ei ∕e⋅ )𝜃i ∑ m 2 −1 2 ̂ ̂ and the weighted variance, se = m i=1 (ei ∕e⋅ )(𝜃i − 𝜃e⋅ ) to their expected values, as in the case of the Poisson-gamma model (Section 9.6.1). This leads to the moment estimators (9.9.5) 𝜇̂ = 𝜃̂e⋅ , 𝜎̂ 2 = s2e − 𝜃̂e⋅ ∕e⋅ . Example 9.9.1. Infant Mortality Rates. Marshall (1991) obtained ELB estimates of infant mortality rates in m = 167 census area units (CAUs) of Auckland, New Zealand, for the period 1977–1985. For this application, we change (𝜃i , 𝜃̂i , ei ) to (𝜏i , 𝜏̂i , ni ) and let E(𝜏i ) = 𝜇, V(𝜏i ) = 𝜎 2 , where 𝜏i is the mortality rate, 𝜏̂i = yi ∕ni is the crude rate, ni is the number of person-years at risk, and yi is the number of deaths in the ith area. The ni for a CAU was taken as nine times its recorded population in the 1981 census. This ni -value should be a good approximation to the true ni , because 1981 is the midpoint of the study period 1977–1985. “Global” ELB estimates of the 𝜏i ’s were obtained from (9.9.3), using the moment estimator based on (9.9.5). These estimators shrink the crude rates 𝜏̂i toward the overall mean 𝜇̂ = 2.63 deaths per thousand. Marshall (1991) also obtained “local” estimates by defining the neighbors of each CAU to be those sharing a common boundary; the smallest neighborhood contained 3 CAUs and the largest 13 CAUs. A local estimate of 𝜏i was obtained by using 2 of 𝜇 and 𝜎 2 for each CAU i. The estimates 𝜇̂ and 𝜎 2 ̂ (i) local estimates 𝜇̂ (i) and 𝜎̂ (i) (i) were obtained from (9.9.5) using only the neighborhood areas of i to calculate the weighted sample mean and variance. This heuristic method of local smoothing is similar to smoothing based on spatial modeling of the areas. Marshall (1991) did not report standard errors of the estimates. For a global estimate, the naive standard
321
EMPIRICAL LINEAR BAYES
error based on (9.9.4) should be adequate because m = 167 is large, but it may be inadequate as a measure of variability for local estimates based on 3–13 CAUs. Karunamuni and Zhang (2003) studied LB and ELB estimation of finite population ∑Ni yij ∕Ni , for unit level models, without covariates. Supsmall area means, Y i = j=1 pose that the population model is a two-stage model of the form yij |𝜃i iid
ind
𝜃(i , 𝜇2 (𝜃i ))
𝜇, ( 𝜎𝑣2 ), 𝜎e2
for each i (j = 1, , Ni ) and 𝜃i = E[𝜇2 (𝜃i )]. We assume that the model , ni , i = 1, , m}. We consider a linear class of holds for the sample {yij ; j = 1, estimators of the realized Y i of the form ai yi + bi , where yi is the ith area sample mean. Minimizing the unconditional MSE, E(ai yi + bi − Y i )2 , with respect to the constants ai and bi , the “optimal” LB estimator of Y i is obtained as ̂ LB Y i = fi yi + 1( − fi )[𝛾i yi + 1( − 𝛾i )𝜇],
(9.9.6)
2 2 2 where 𝑣 + 𝜎e ∕ni ) and fi = ni ∕Ni . If we replace 𝜇 by the BLUE estimator ∑𝛾i = 𝜎𝑣 ∕(𝜎 ∑m 𝜇̃ = m 𝛾 y ∕ 𝛾 , i=1 i i i=1 i we get a first-step ELB estimator similar to the BLUP estimator for the basic unit level model (7.1.1) without covariates and kij = 1. Ghosh and Lahiri (1987) used ANOVA-type estimators of 𝜎e2 and 𝜎𝑣2 given by m ni ∑ ∑
𝜎̂ e2
(yij − yi )2 ∕(n − m) =∶ s2𝑤
=
(9.9.7)
i=1 j=1
where n =
∑m
i=1 ni
and
(9.9.8) 𝜎̂ 𝑣2 = max{0, (s2b − s2𝑤 )(m − 1)g−1 }, ∑ ∑ ∑ m m 2 2 where s2b = m i=1 ni (yi − y⋅ ) ∕(m − 1), with y⋅ = i=1 ni yi ∕n and g = n − i=1 ni ∕n. 2 2 2 2 Replacing 𝜎e and 𝜎𝑣 by 𝜎̂ e and 𝜎̂ 𝑣 in the first-step estimator, we get an ELB estimator of Y i . Note that the first-step estimator depends only on the ratio 𝜏 = 𝜎𝑣2 ∕𝜎e2 . Then, instead of 𝜏̂ = 𝜎̂ 𝑣2 ∕𝜎̂ 2 , we can use an approximately unbiased estimator of 𝜏 along the lines of Morris (1983b): ] [ { } (m − 1)s2b ∗ −1 . (9.9.9) 𝜏 = max 0, − 1 (m − 1)g (m − 3)s2𝑤 The multiplier (m − 1)∕(m − 3) corrects the bias of 𝜏. ̂ The modified ELB estimator is obtained from the first-step estimator by substituting 𝜏 ∗ for 𝜏, and using 𝜇̂ = ∑m ∗ ∑m ∗ ∗ ≠ 0 and 𝜇̂ = ∑m n y ∕n if 𝜏 ∗ = 0. 𝛾 y ∕ 𝛾 if 𝜏 i=1 i i i=1 i i=1 i i ̂ LB The MSE of Y i is given by ̂ LB ̂ LB MSE(Y i ) = E(Y i − Y i )2 = (1 − fi )2 g1i (𝜎𝑣2 , 𝜎e2 ) + ≈ (1 − fi )2 g1i (𝜎𝑣2 , 𝜎e2 ),
1 (1 − fi )𝜎e2 Ni
(9.9.10) (9.9.11)
322
EMPIRICAL BAYES (EB) METHOD
where g1i (𝜎𝑣2 , 𝜎e2 ) = 𝛾i 𝜎e2 ∕ni
(9.9.12)
and 𝛾i = 𝜎𝑣2 ∕(𝜎𝑣2 + 𝜎e2 ∕ni ). The last term in (9.9.10) is negligible if Ni is large, leading to the approximation (9.9.11). Note that (9.9.10) has the same form as the MSE of the Bayes estimator of Y i under the basic unit level model with normal effects 𝑣i and ̂ ELB normal errors eij with equal variance 𝜎e2 . A naive estimator of MSE(Y i ) is obtained ̂ ELB by substituting (𝜎̂ 𝑣2 , 𝜎̂ e2 ) for (𝜎𝑣2 , 𝜎e2 ) in (9.9.10), but it underestimates MSE(Y i ) because it ignores the variability associated with (𝜎̂ 𝑣2 , 𝜎̂ e2 ). Again, it is difficult to find approximately unbiased estimators of MSE without further assumptions. 9.9.2
Posterior Linearity
A different approach, called linear EB (LEB), has also been used to estimate the small area means Y i under the two-stage unit level model studied in Section 9.9.1 (Ghosh and Lahiri 1987). The basic assumption underlying this approach is the posterior linearity condition (Goldstein 1975): E(𝜃i |yi , 𝝀) = ai yi + bi ,
(9.9.13)
where yi is the ni 1 vector of sample observations from the ith area and 𝝀 = (𝜇, 𝜎𝑣2 , 𝜎e2 )T . This condition holds for a variety of distributions on the 𝜃i ’s. However, the distribution of the 𝜃i ’s becomes a conjugate family if the conditional distribution of yij given 𝜃i belongs to the exponential family. For example, the model iid
yij |𝜃i Bernoulli(𝜃i ) with posterior linearity (9.9.13) implies that 𝜃i beta(𝛼, 𝛽), for 𝛼 > 0, 𝛽 > 0. The LEB approach assumes posterior linearity and uses the two-stage model on the yij ’s without distributional assumptions. It leads to an estimator of Y i ̂ LB identical to the estimator under the LB approach, namely, Y i given by (9.9.6). The two approaches are therefore similar, but the ELB approach is more appealing as it avoids further assumptions on the two-stage model by confining to the linear class of estimators. We briefly outline the LEB approach. Using the two-stage model and the posterior linearity condition (9.9.13), it follows from Section 9.9.1 that the optimal ai and bi that minimize E(𝜃i − ai yi − bi )2 are given by a∗i = 𝛾i and b∗i = (1 − 𝛾i )𝜇, where 𝛾i = 𝜎𝑣2 ∕(𝜎𝑣2 + 𝜎e2 ∕ni ) as in Section 9.9.1. Therefore, the Bayes estimator under posterior linearity is given by (9.9.14) E(𝜃i |yi , 𝝀) = 𝛾i yi + 1( − 𝛾i )𝜇, which is identical to the LB estimator. Using (9.9.14), we get E(Y i |yi , 𝝀) = fi yi + 1( − fi )[𝛾i yi + 1( − 𝛾i )𝜇],
(9.9.15)
noting that for any unit j in the nonsampled set ri , it holds that E(yij |yi , 𝝀) = E[E(yij |𝜃i , yi , 𝝀)|yi , 𝝀] = E(𝜃i |yi , 𝝀); see Ghosh and Lahiri (1987). It follows
323
EMPIRICAL LINEAR BAYES
̂ LB from (9.9.15) that E(Y i |yi , 𝝀) is identical to the LB estimator Y i given by (9.9.6). Substituting moment estimators 𝝀̂ in (9.9.15), we get the LEB estimator, ̂ LEB ̂ which is identical to the ELB estimator Ŷ ELB Y i = E(Y i |yi , 𝝀), . i It appears that nothing is gained over the ELB approach by making the posterior linearity assumption and then using the LEB approach. However, it permits the use of the jackknife to obtain an approximately unbiased MSE estimator, unlike the ELB approach, by making use of the orthogonal decomposition ̂ LB ̂ LEB ̂ LB ̂ LEB MSE(Y i ) = E(Y i − Y i )2 + E(Y i − Y i )2 ̂ LEB ̂ LB = g̃ 1i (𝜎𝑣2 , 𝜎e2 ) + E(Y i − Y i )2 = M1i + M2i ,
(9.9.16)
where g̃ 1i (𝜎𝑣2 , 𝜎e2 ) is given by (9.9.10). We obtain a jackknife estimator of the last term in (9.9.16) as m ∑ ̂ LEB ̂ LEB ̂ 2i = m − 1 (Y i,−𝓁 − Y i )2 , (9.9.17) M m 𝓁=1 ̂ LEB ̂ LEB where Y i,−𝓁 = E(Y i |yi , 𝝀̂ −𝓁 ) is obtained from Y i by substituting the delete−𝓁 estî The leading term, M1i , is estimated as mators 𝝀̂ −𝓁 for 𝝀. m
∑ ̂ 1i = g̃ 1i (𝜎̂ 𝑣2 , 𝜎̂ e2 ) − m − 1 M [̃g (𝜎̂ 2 , 𝜎̂ 2 ) − g̃ 1i (𝜎̂ 𝑣2 , 𝜎̂ e2 )]. m 𝓁=1 1i 𝑣,−𝓁 e,−𝓁
(9.9.18)
The jackknife MSE estimator is then given by ̂ LEB ̂ 1i + M ̂ 2i . mseJ (Y i ) = M
(9.9.19)
Results for the infinite population case are obtained by letting fi = 0. The above results are applicable to two-stage models without covariates. Raghunathan (1993) proposed two-stage models with area level covariates, by specifying only the first and second moments (see Section 4.6.5). He obtained a quasi-EB estimator of the small area parameter, 𝜃i , using the following steps: (i) Based on the mean and variance specifications, obtain the conditional quasi-posterior density of 𝜃i given the data and model parameters 𝝀. (ii) Evaluate the quasi-Bayes estimator and the conditional quasi-posterior variance by numerical integration, using the density from step (i). (iii) Use a generalized EM (GEM) algorithm to get quasi-ML estimator 𝝀̂ of the model parameters. (iv) Replace 𝝀 by 𝝀̂ in the quasi-Bayes estimator to obtain the quasi-EB estimator of 𝜃i . Raghunathan (1993) also proposed a jackknife method of estimating the MSE of the quasi-EB estimator, ̂ This method is different from the jackknife taking account of the variability of 𝝀. method of Jiang, Lahiri, and Wan (2002), and its asymptotic properties have not been studied.
324
9.10
EMPIRICAL BAYES (EB) METHOD
CONSTRAINED LB
ind iid ( 𝜎 2 ) and Consider the two-stage model (i) 𝜃̂i |𝜃i 𝜃(i , 𝜓i (𝜃i )) and (ii) 𝜃i 𝜇, 𝜓i = E(𝜓i (𝜃i )). We consider a linear class of estimators of the realized 𝜃i of the form ai 𝜃̂i + bi and determine the constants ai and bi to match the mean 𝜇 and the variance 𝜎 2 of 𝜃i :
E(ai 𝜃̂i + bi ) = 𝜇,
(9.10.1)
E(ai 𝜃̂i + bi − 𝜇)2 = 𝜎 2 .
(9.10.2)
Noting that E(𝜃̂i ) = 𝜇, we get bi = 𝜇 − (1 − ai ) from (9.10.1), and (9.10.2) reduces to a2i E(𝜃̂i − 𝜇)2 = a2i (𝜎 2 + 𝜓i ) = 𝜎 2 (9.10.3) 1∕2
or ai = 𝛾i , where 𝛾i = 𝜎 2 ∕(𝜎 2 + 𝜓i ). The resulting estimator is a constrained LB (CLB) estimator: 1∕2 1∕2 𝜃̂iCLB = 𝛾i 𝜃̂i + 1( − 𝛾i )𝜇 (9.10.4) (Spjøtvoll and Thomsen 1987). Method of moments may be used to estimate the model parameters. The resulting estimator is an empirical CLB estimator. Note that 𝜃̂iLB attaches a smaller weight, 𝛾i , to the direct estimator 𝜃̂i which leads to over-shrinking toward 𝜇 compared to 𝜃̂iCLB . ind As an example, consider the model for binary data: 𝜃̂i |𝜃i 𝜃(i , 𝜃i (1 − 𝜃i )∕ni ) and iid 𝜃i 𝜇, ( 𝜎 2 ), where 𝜃̂i is the sample proportion and ni is the sample size in the ith area. Noting that E[𝜃i (1 − 𝜃i )] = 𝜇(1 − 𝜇) − 𝜎 2 , we get 𝛾i = 𝜎 2
[( ) ]−1 𝜇(1 − 𝜇) 1 . 1− 𝜎2 + ni ni
(9.10.5)
Spjøtvoll and Thomsen (1987) used empirical CLB to study the distribution of m = 100 municipalities in Norway with respect to the proportion of persons not in the labor force. A comparison with the actual distribution in their example showed that the CLB method tracked the actual distribution much better than the LB method. The above CLB method ignores the simultaneous aspect of the problem, by considering each area separately. However, the CLB estimator (9.10.4) is similar to the CB estimator, 𝜃̂iCB . In fact, as shown in (9.8.9), 𝜃̂iCB ≈ 𝜃̂iCLB under the simple model iid iid 𝜃̂i = 𝜃i + ei with ei N(0, 𝜓) independent of 𝜃i N(𝜇, 𝜎𝑣2 ). To take account of the simultaneous aspect of the problem, we can use the method of Section 9.8.1 assuming posterior linearity E(𝜃i |𝜃̂i ) = ai 𝜃̂i + bi . We minimize the posterior expected squared error loss under the two-stage model subject to the constraints (9.8.4) and (9.8.5) on the ensemble mean and variance, respectively. The resulting CLB estimator is equal to (9.8.6) with 𝜃̂iB and V(𝜃i |𝜃̂i , 𝝀) changed to 𝜃̂iLB =
325
*SOFTWARE
𝛾i 𝜃̂i + 1( − 𝛾i )𝜇 and E[(𝜃i − 𝜃̂iLB )2 |𝜃̂i , 𝝀], respectively. This estimator avoids distributional assumptions. Lahiri (1990) called this method “adjusted” Bayes estimation. The posterior variance term E[(𝜃i − 𝜃̂iLB )2 |𝜃̂i , 𝝀] cannot be calculated without additional assumptions; Lahiri (1990) and Ghosh (1992b) incorrectly stated that it is equal ∑ ̂ LB 2 ̂ to 𝛾i 𝜓i . However, for large m, we can approximate m−1 m i=1 E[(𝜃i − 𝜃i ) |𝜃i , 𝝀] by ∑ ∑ m m LB −1 2 −1 ̂ its expectation m i=1 E[(𝜃i − 𝜃i ) |𝝀] = m i=1 𝛾i 𝜓i . Using this approximation, we get the following CLB estimator: ̂ 𝝀)(𝜃̂ LB − 𝜃̂⋅LB ), 𝜃̂iCLB (1) = 𝜃̂⋅LB + a∗ (𝜽, i
(9.10.6)
where { ̂ 𝝀) = a (𝜽, ∗
}1∕2 ∑ (1∕m) m i=1 𝛾i 𝜓i 1+ . ∑ ̂ LB ̂ LB 2 [1∕(m − 1)] m i=1 (𝜃i − 𝜃⋅ )
(9.10.7)
In general, 𝜃̂iCLB (1) differs from 𝜃̂iCLB given by (9.10.4), but for the special case of equal 𝜓i , we get 𝜃̂iCLB (1) ≈ 𝜃̂iCLB for large m, similar to the result (9.8.9). The CLB estimator (9.10.6) depends on the unknown model parameters ̂ we obtain an empirical CLB 𝝀 = (𝜇, 𝜎 2 )T . Replacing 𝝀 by moment estimators 𝝀, estimator.
9.11
*SOFTWARE
As already mentioned, under the basic area level model (6.1.1) with normality, the EB estimator of 𝜃i given in (9.2.3) equals the EBLUP of 𝜃i . Section 6.5 describes the functions of the R package sae that calculate EBLUP estimates and analytical MSE estimates based on the basic area level model. Similarly, equivalence of EB and EBLUP occurs when estimating a small area mean Y i under the basic unit level model (7.1.1) with kij = 1 and with normality. The corresponding functions of the R package sae are described in Section 7.7. For estimation of general nonlinear parameters 𝜏i = h(yPi ) under the basic unit level model (7.1.1) with kij = 1, EB estimates together with parametric bootstrap MSE estimates can be obtained using functions ebBHF() and pbmseebBHF(). The calls to these functions are as follows: ebBHF(formula, dom, data, transform pbmseebBHF(formula, data, transform
selectdom, Xnonsample, MC = 100, = "BoxCox", lambda = 0, constant = 0, indicator) dom, selectdom, Xnonsample, B = 100, MC = 100, = "BoxCox", lambda = 0, constant = 0, indicator)
These functions assume that the target variable for jth individual in ith area is Eij , but the response (or dependent) variable in the basic unit level model is yij = T(Eij ), where T(⋅) is a one-to-one transformation. A typical example of this situation is when
326
EMPIRICAL BAYES (EB) METHOD
Eij is income or expenditure, and we consider the model for yij = log(Eij + c). The , Ni }) = h({T(Eij ); j = 1, , Ni }). target area parameter is 𝜏i = h({yij ; j = 1, Thus, on the left-hand side of formula, we must write the name of the vector containing the sample data on the original target variables Eij . On the right-hand side of formula, we must write the auxiliary variables considered in the basic unit level model separated by “+”. Note that an intercept is assumed by default as in any usual R formula. The target variables Eij , placed on the left-hand side of formula, can be transformed by first adding a constant to it through the argument constant, and then applying a transformation chosen from either the Box–Cox or power families through the argument transform. The value of the parameter for the chosen family of transformations can be specified in the argument lambda. This parameter is set by default to 0, which corresponds to log transformation for both families. Setting lambda=1 means no transformation. Note that these functions fit the basic unit level model for the transformed variables yij = T(Eij ). The target parameter 𝜏i , expressed as , Ni }), an R function of the original target variables Eij , that is, as h({T(Eij ); j = 1, must be specified in argument indicator. For example, if we want to estimate the , Ni }, we just need to specify indicator=median. median of {Eij ; j = 1, The vector with the area codes for sample observations must be specified in dom. The functions allow to select a subset of the areas for estimation, just specifying the vector of selected (unique) area codes in selectdom. Moreover, to generate the nonsample vectors and apply the Monte Carlo approximation to the EB estimator of 𝜏i as given in (9.4.3), we need the values of the auxiliary variables for all nonsampled units. A matrix or data frame with those values must be specified in Xnonsample. Additionally, the desired number of Monte Carlo samples can be specified in MC. The function ebBHF() returns a list of two objects: the first one is called eb and contains the EB estimates for each selected area, and the second one is called fit and contains a summary of the model-fitting results. The function pbmseebBHF() obtains the parametric bootstrap MSE estimates together with EB estimates. It delivers a list with two objects. The first one, est, is another list containing itself two objects: the results of the point estimation process (eb) and a summary of the fitting results (fit). The second object, called mse, contains a data frame with the estimated MSEs for each selected area. Example 9.11.1. Poverty Mapping, with R. In this example, we estimate poverty incidences in Spanish provinces (areas) using the predefined data set incomedata, which contains synthetic unit level data on income Eij and other sociological variables for a sample of individuals, together with province identifiers. The following variables from the data set incomedata will be used: province name (provlab), province code (prov), income (income), sampling weight (weight), education level (educ), labor status (labor), and finally the indicators of each of the categories of educ and labor. We calculate EB estimates of province poverty incidences based on the basic unit level model (with kij = 1), for a transformation of the variable income and the categories of education level and of labor status as auxiliary variables. The EB method described in Section 9.4 requires (at least approximately) normality of the
327
0.4 0.0
0.2
Density
0.6
0.8
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
Residuals
*SOFTWARE
0
5,000
10,000
15,000
Index (a)
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 Residuals (b)
Figure 9.4 Index Plot of Residuals (a) and Histogram of Residuals (b) from the Fitting of the Basic Unit Level Model with Response Variable log(income+constant).
response variable in the model. The histogram of income is severely right-skewed, but log(income + 3, 500) leads to model residuals with an approximately symmetric distribution (see Figure 9.4). The poverty incidence for province i is obtained by taking 𝛼 = 0 in the FGT poverty indicator F𝛼i given in (9.4.10), that is, F0i . In R, it can be calculated as the province mean of the indicator variable taking value 1 when the person’s income is below the poverty line z and 0 otherwise. The poverty line is defined as 0.6 the median income, which turns out to be z = 6,557.143. We first create the function povertyincidence() defining the target parameter 𝜏i = F0i as function of y=income: R > z povertyincidence data("Xoutsamp")
328
EMPIRICAL BAYES (EB) METHOD
Now we create the vector with the codes for the five selected provinces. These codes are taken as the unique province codes that appear in the first column of Xoutsamp data set. Then we construct the data frame Xoutsamp_AuxVar, containing the values of the selected auxiliary variables (education levels and labor status) for nonsampled individuals from these five selected provinces: R> provincecodes provincelabels Xoutsamp_AuxVar set.seed(123) R> EB EB$fit$summary Linear mixed-effects model fit by REML Data: NULL AIC BIC logLik 18,980.72 19,034.99 -9,483.361 Random effects: Formula: ̃1 | as.factor(dom) (Intercept) Residual StdDev: 0.09436138 0.4179426 Fixed effects: ys ̃ -1 + Xs Value Std.Error DF t-value p-value Xs(Intercept) 9.505176 0.014384770 17,143 660.7805 0 Xseduc1 -0.124043 0.007281270 17,143 -17.0359 0 Xseduc3 0.291927 0.010366323 17,143 28.1611 0 Xslabor1 0.145985 0.006915979 17,143 21.1084 0 Xslabor2 -0.081624 0.017082634 17,143 -4.7782 0 Correlation: Xs(In) Xsedc1 Xsedc3 Xslbr1 Xseduc1 -0.212 Xseduc3 -0.070 0.206 Xslabor1 -0.199 0.128 -0.228 Xslabor2 -0.079 0.039 -0.039 0.168
329
*SOFTWARE Standardized Within-Group Residuals: Min Q1 Med Q3 -4.2201202 -0.6617181 0.0203607 0.6881828
Max 3.5797393
Number of Observations: 17,199 Number of Groups: 52
Checking model assumptions is crucial in the EB method since the optimality properties of the EB estimates depend on the extent to which those model assumptions are true. To detect departures from normality of the transformed income, we can display the usual residual plots. The following R commands draw an index plot and a histogram of residuals: R> plot(EB$fit$residuals, xlab = "Index", ylab = "Residuals", + cex.axis = 1.5, cex.lab = 1.5) R> hist(EB$fit$residuals, prob = TRUE, xlab = "Residuals", + main = "", cex.axis = 1.5, cex.lab = 1.5)
Figure 9.4 displays the two mentioned plots, which show no evidence of serious model departure. Finally, we compute parametric bootstrap MSE estimates and calculate CVs of EB estimates. This process might be slow for large number of bootstrap or Monte Carlo iterations B and MC, respectively, large sample size or large number of auxiliary variables. Note that function pbmseebBHF() gives again the EB estimates: R> set.seed(123) R> pbmse.EB pbcv.EB results.EB results.EB ProvinceIndex ProvinceName SampleSize EB cv.EB 1 42 Soria 20 0.2104329 21.06776 2 5 Avila 58 0.1749877 19.49466 3 34 Palencia 72 0.2329916 11.57829 4 44 Teruel 72 0.2786618 11.89621 5 40 Segovia 58 0.2627178 13.21378
330
EMPIRICAL BAYES (EB) METHOD
Finally, we obtain direct estimates for comparison: R> poor DIR results.DIR 0) very small (say 0.001) to reflect lack of prior information on 𝜇, 𝜎𝑣2 , and 𝜎e2 . Here, G(a, b) denotes a gamma distribution with shape parameter a and scale parameter b and that the variance of G(a0 , a0 ) is 1∕a0 , which becomes very large as a0 → 0. The posterior resulting from the above priors remains proper as 𝜎02 → ∞, which is equivalent to choosing f (𝜇) ∝1, but it becomes improper as a0 → 0. Therefore, the posterior is nearly improper (or barely proper) for very small a0 , and this feature can affect the convergence of the Gibbs sampler. Alternative choices are 𝜎𝑣2 ∼ uniform(0, 1∕a0 ) and 𝜎e2 ∼ uniform(0, 1∕a0 ), which avoid this difficulty in the sense that the posterior remains proper as a0 → 0. Gelman (2006) studied the choice of prior on 𝜎𝑣2 for the simple nested error model without covariates. He noted that the posterior inferences based on the prior 𝜎𝑣 2 ∼ G(a0 , a0 ) with very small a0 > 0 becomes very sensitive to the choice of a0 for data sets in which small values of 𝜎𝑣2 are possible. He recommended the flat prior on 𝜎𝑣 , f (𝜎𝑣 ) ∝1 if m ≥ 5. He also suggested the use of a proper uniform(0, A) prior on 𝜎𝑣 with sufficiently large A, say A 100, if m ≥ 5. Resulting posterior inferences are not sensitive to the choice of A and they can be implemented in WinBUGS, which requires a proper prior. We have noted in Section 10.1 that it is desirable to choose diffuse priors that lead to well-calibrated inferences. Browne and Draper (2006) compared frequentist performances of posterior quantities under the simple nested error model and G(a0 , a0 ) or uniform(0, 1∕a0 ) priors on the variance parameters. In particular, for 𝜎𝑣2 they examined the bias of the posterior mean, median, and mode and the Bayesian interval coverage in repeated sampling. All the Gibbs conditionals have closed forms here, so Gibbs sampling was used to generate samples from the posterior. They found that the posterior quantities are generally insensitive to the specific choice of a0 around 0.001 (default setting used in BUGS with gamma prior). In terms of bias, the posterior median performed better than the posterior mean (HB estimator) for the gamma prior, while the posterior mode performed much better than the posterior mean for the uniform prior. Bayesian intervals for uniform or gamma priors did not attain nominal coverage when the number of small areas, m, and/or the variance ratio 𝜏 𝜎𝑣2 ∕𝜎e2 are small. Browne and Draper (2006) did not study the frequentist behavior of the posterior quantities associated with the small area means 𝜇i 𝜇 + 𝑣i . Datta, Ghosh, and Kim (2001) developed noninformative priors for the simple nested error model, called probability matching priors, and focused on the variance ratio 𝜏. These priors ensure that the frequentist coverage of Bayesian intervals for 𝜏 approaches the
MCMC METHODS
341
nominal level asymptotically, as m → ∞. In Section 10.3, we consider priors for the basic area level model that make the posterior variance of a small area mean, 𝜇i , approximately unbiased for the MSE of EB/HB estimators, as m → ∞. Single Run versus Multiple Runs Sections 10.2.2 and 10.2.3 discussed the genera, 𝜼(d+D) . One single long run may provide tion of one single long run 𝜼(d+1) , reliable Monte Carlo estimates of posterior quantities by choosing D sufficiently large, but it may leave a significant portion of the space generated by the posterior, f (𝜼|y), unexplored. To avoid this problem, Gelman and Rubin (1992) and others recommended the use of multiple parallel runs with different starting points, leading to parallel samples. Good starting values are required to generate multiple runs, and it may not be easy to find such values. On the other hand, for generating one single run, REML estimates of model parameters, 𝝀, and EB estimates of small area parameters, 𝝁, may work quite well as starting points. For multiple runs, Gelman and Rubin (1992) recommended generating starting points from an “overdispersed” distribution compared to the target distribution 𝜋(𝜼|y). They proposed some methods for generating overdispersed starting values, but general methods that lead to good multiple starting values are not available. Note that the normalizing factor of 𝜋(𝜼|y) is not known in a closed form. Multiple runs can be wasteful because initial burn-in periods are discarded from each run, although this may not be a serious limitation if parallel processors are used to generate the parallel samples. Gelfand and Smith (1990) used many short runs, each consisting of (d + 1) 𝜼-values, and kept only the last observation from each run. , 𝜼(d+1) (L), where 𝜼(d+1) (𝓁) This method provides independent samples 𝜼(d+1) (1), is the last 𝜼-value of the 𝓁th run (𝓁 1, , L), but it discards most of the generated values. Gelman and Rubin (1992) used small L (say 10) and generated 2d values for each run and kept the last d values from each run, leading to L independent sets of 𝜼-values. This approach facilitates convergence diagnostics (see later). Note that it is not necessary to generate independent samples for getting Monte Carlo estimates of posterior quantities because of the ergodic theorem for Markov chains. However, independent samples facilitate the calculation of Monte Carlo standard errors. Proponents of one single long run argue that comparing chains can never “prove” convergence. On the other hand, multiple runs’ proponents assert that comparing seemingly converged runs might disclose real differences if the runs have not yet attained stationarity. Burn-in Length The length of burn-in, d, depends on the starting point 𝜼(0) and the convergence rate of P(k) (⋅) to the stationary distribution 𝜋(⋅). Convergence rates have been studied in the literature (Roberts and Rosenthal 1998), but it is not easy to use these rates to determine d. Note that the convergence rates of different MCMC algorithms may vary significantly and also depend on the target distribution 𝜋(⋅). Convergence diagnostics, based on the MCMC output 𝜼(k) , are often used in practice to determine the burn-in length d. At least 13 convergence diagnostics have been proposed in the literature (see Cowles and Carlin 1996). A diagnostic
342
HIERARCHICAL BAYES (HB) METHOD
measure based on multiple runs and classical analysis of variance (ANOVA) is currently popular (Gelman and Rubin 1992). Suppose h(𝜼) denotes a scalar summary of 𝜼; for example, h(𝜼) 𝜇i , the mean of the ith small area. Denote the values of h(𝜼) for the 𝓁th run as h𝓁,d+1 , , h𝓁,2d , 𝓁 1, , L. Calculate ∑L 2 the between-run variance B d 𝓁 1 (h𝓁⋅ h⋅⋅ ) ∕(L 1) and the within-run ∑L ∑2d ∑2d variance W h𝓁⋅ )2 ∕ L(d 1) , where h𝓁⋅ 𝓁 1 k d+1 (h𝓁,k k d+1 h𝓁k ∕d ∑L and h⋅⋅ h ∕L. Using the two variance components B and W, we calculate an 𝓁 1 𝓁⋅ estimate of the variance of h(𝜼) in the target distribution as V̂
1
d d
1 W + B. d
(10.2.10)
Under stationarity, V̂ is unbiased but it is an overestimate if the L points h1,d+1 , , hL,d+1 are overdispersed. Using the latter property, we calculate the estimated “potential scale reduction” factor R̂
̂ V∕W.
(10.2.11)
If stationarity is attained, R̂ will be close to 1. Otherwise, R̂ will be significantly larger than 1, suggesting a larger burn-in length d. It is necessary to calculate R̂ for all scalar summaries of interest, for example, all small area means 𝜇i . Unfortunately, most of the proposed diagnostics have shortcomings. Cowles and Carlin (1996) compared the performance of 13 convergence diagnostics, including ̂ in two simple models and concluded that “all of the methods can fail to detect the R, sorts of convergence failure that they were designed to identify”. Methods of generating independent samples from the exact stationary distribution of a Markov chain, that is, 𝜋(𝜼) f (𝜼|y), have been proposed recently. These methods eliminate the difficulties noted above, but currently the algorithms are not easy to implement. We refer the reader to Casella, Lavine, and Robert (2001) for a brief introduction to the proposed methods, called perfect (or exact) samplings. 10.2.6
Model Determination
MCMC methods are also used for model determination. In particular, methods based on Bayes factors (BFs), posterior predictive densities and cross-validation predictive densities, are employed for model determination. These approaches require the specification of priors on the model parameters associated with the competing models. Typically, noninformative priors are employed with the methods based on posterior predictive and cross-validation predictive densities. Because of the dependence on priors, some authors have suggested a hybrid strategy, in which prior-free, frequentist methods are used in the model exploration phase, such as those given in Chapters 5–9 and Bayesian diffuse-prior MCMC methods for inference from the selected model (see, e.g., Browne and Draper 2006). We refer the reader to Gelfand (1996) for an excellent account of model determination procedures using MCMC methods.
343
MCMC METHODS
Bayes Factors Suppose that M1 and M2 denote two competing models with associated parameters (random effects and model parameters) 𝜼1 and 𝜼2 , respectively. We denote the marginal densities of the “observables” y under M1 and M2 by f (y|M1 ) and f (y|M2 ), respectively. Note that f (y|Mi )
∫
f (y|𝜼i , Mi )f (𝜼i |Mi )d𝜼i ,
where f (𝜼i |Mi ) is the density of 𝜼i under model Mi (i observations as yobs . The BF is defined as
1, 2). We denote the actual
f (yobs |M1 ) , f (yobs |M2 )
BF12
(10.2.12)
(10.2.13)
where yobs denotes the actual observations. It provides relative weight of evidence for M1 compared to M2 . BF12 may be interpreted as the ratio of the posterior odds for M1 versus M2 to prior odds for M1 versus M2 , noting that the posterior odds is f (M1 |y) ∕f (M2 |y) and the prior odds is f (M1 ) ∕f (M2 ), where f (Mi |y)
f (y|Mi )f (Mi ) ∕f (y),
(10.2.14)
f (Mi ) is the prior on Mi and f (y)
f (y|M1 )f (M1 ) +f (y|M2 )f (M2 ).
(10.2.15)
Kass and Raftery (1995) classified the evidence against M2 as follows: (i) BF12 between 1 and 3: not worth more than a bare mention; (ii) BF12 between 3 and 20: positive; (iii) BF12 between 20 and 50: strong; (iv) BF12 > 50: very strong. BF is appealing for model selection, but f (y|Mi ) is necessarily improper if the prior on the model parameters 𝝀i is improper, even if the posterior f (𝜼i |y, Mi ) is proper. A proper prior on 𝝀i (i 1, 2) is needed to implement BF. However, the use of BF with diffuse proper priors usually gives bad answers (Berger and Pericchi 2001). Gelfand (1996) noted several limitations of BF, including the above-mentioned impropriety of f (y|Mi ), and concluded that “use of the Bayes factor often seems inappropriate in real applications.” Sinharay and Stern (2002) showed that the BF can be very sensitive to the choice of prior distributions for the parameters of M1 and M2 . We refer the reader to Kass and Raftery (1995) for methods of calculating BFs. Posterior Predictive Densities The posterior predictive density, f (y|yobs ), is defined as the predictive density of new independent observations, y, under the model, given yobs . We have f (y|yobs )
∫
f (y|𝜼)f (𝜼|yobs )d𝜼,
(10.2.16)
344
HIERARCHICAL BAYES (HB) METHOD
and the marginal posterior predictive density of an element yr of y is given by f (yr |yobs )
∫
f (yr |𝜼)f (𝜼|yobs )d𝜼.
(10.2.17)
, d + D , we can draw a sample y(k) Using the MCMC output 𝜼(k) ; k d + 1, (k) from f (y|yobs ) as follows: For each 𝜼 , draw y(k) from f (y|𝜼(k) ). Furthermore, y(k) r constitutes a sample from the marginal density f (yr |yobs ), where y(k) r is the rth element of y(k) . To check the overall fit of a proposed model to data yobs , Gelman and Meng (1996) proposed the criterion of posterior predictive p-value, noting that the hypothetical replications y(k) should look similar to yobs if the assumed model is reasonably accurate. Let T(y, 𝜼) denote a measure of discrepancy between y and 𝜼. The posterior predictive p-value is then defined as pp
P T(y, 𝜼) ≥ T(yobs , 𝜼)|yobs .
(10.2.18)
The MCMC output 𝜼(k) may be used to approximate pp by d+D
p̂p
1 ∑ I T(y(k) , 𝜼(k) ) ≥ T(yobs , 𝜼(k) ) , D k d+1
(10.2.19)
where I(⋅) is the indicator function taking the value 1 when its argument is true and 0 otherwise. A limitation of the posterior predictive p-value is that it makes “double use” of the data yobs , first to generate y(k) from f (y|yobs ) and then to compute p̂p given by (10.2.19). This double use of the data can induce unnatural behavior, as demonstrated by Bayarri and Berger (2000). They proposed two alternative p-measures, named the partial posterior predictive p-value and the conditional predictive p-value, that attempt to avoid double use of the data. These measures, however, are more difficult to implement than the posterior predictive p-value, especially for small area models. If the model fits yobs , then the two values T(y(k) , 𝜼(k) ) and T(yobs , 𝜼(k) ) should tend to be similar for most k, and p̂p should be close to 0.5. Extreme value of p̂p (closer to 0 or 1) suggest poor fit. It is informative to plot the realized values T(yobs , 𝜼(k) ) versus the predictive values T(y(k) , 𝜼(k) ). If the model is a good fit, then about half the points in the scatter plot would fall above the 45∘ line and the remaining half below the line. Brooks, Catchpole, and Morgan (2000) studied competing animal survival models, and used the above scatter plots for four competing models, called discrepancy plots, to select a model. Another measure of fit is given by the posterior expected predictive deviance (EPD), E Δ(y, yobs )|yobs , where Δ(y, yobs ) is a measure of discrepancy between yobs and y. For count data yr ; r 1, , n , we can use a chi-squared measure given by n ∑
Δ(y, yobs )
(yr,obs r 1
yr )2 ∕(yr + 0.5).
(10.2.20)
345
MCMC METHODS
It is a general measure of agreement. Nandram, Sedransk, and Pickle (1999) studied (10.2.20) and some other measures in the context of disease mapping (see Section 10.11). Using the predictions y(k) , the posterior EPD may be estimated as d+D
Ê Δ(y, yobs )|yobs
1 ∑ Δ(y(k) , yobs ). D k d+1
(10.2.21)
Note that the deviance measure (10.2.21) also makes double use of the data. The deviance measure is useful for comparing the relative fits of candidate models. The model with the smallest deviance value may then be used to check its overall fit to the data yobs , using the posterior predictive p-value. Cross-validation Predictive Densities Let y(r) be the vector obtained by removing the rth element yr from y. The cross-validation predictive density of yr is given by f (yr |y(r) )
∫
f (yr |𝜼, y(r) )f (𝜼|y(r) )d𝜼.
(10.2.22)
This density suggests what values of yr are likely when the model is fitted to y(r) . The actual observation yr,obs may be compared to the hypothetical values yr in a variety of ways to see whether yr,obs , for each r, supports the model. In most applications, f (yr |𝜼, y(r) ) f (yr |𝜼), that is, yr and y(r) are conditionally independent given 𝜼. Also, f (𝜼|y(r) ) is usually proper if f (𝜼|y) is proper. As a result, f (yr |y(r) ) will remain proper, unlike f (y) used in the computation of the BF. If f (y) is proper, then the set f (yr |y(r) ) is equivalent to f (y) in the sense that f (y) is uniquely determined from f (yr |y(r) ) and vice versa. The product of the densities f (yr |y(r) ) is often used as a substitute for f (y) to avoid an improper marginal f (y). This leads to a pseudo-Bayes factor (PBF) PBF12
∏ f (yr,obs |y(r),obs , M1 ) r
,
f (yr,obs |y(r),obs , M2 )
(10.2.23)
which is often used as a substitute for BF. However, PBF cannot be interpreted as the ratio of posterior odds to prior odds, unlike BF. Global summary measures, such as the posterior predictive p-value (10.2.18) and the posterior expected predictive deviance (10.2.21), are useful for checking the overall fit of a model, but not helpful for discovering the reasons for poor global performance. The univariate density f (yr |y(r),obs ) can be used through a “checking” function c(yr , yr,obs ) to see whether yr,obs , for each r, supports the model. Gelfand (1996) proposed a variety of checking functions and calculated their expectations over f (yr |y(r),obs ). In particular, choosing c1 (yr , yr,obs )
1 Iy 2𝜖 r,obs
𝜖 ≤ yr ≤ yr,obs + 𝜖 ,
(10.2.24)
346
HIERARCHICAL BAYES (HB) METHOD
then taking its expectation, and letting 𝜖 → 0, we obtain the conditional predictive ordinate (CPO): (10.2.25) CPOr f (yr,obs |y(r),obs ). Models with larger CPO values provide better fit to the observed data. A plot of CPOr versus r for each model is useful for comparing the models visually. We can easily see from a CPO plot which models are better than the others, which models are indistinguishable, which points, yr,obs , are poorly fit under all the competing models, and so on. If two or more models are indistinguishable and provide good fits, then we should choose the simplest among the models. We approximate CPOr from the MCMC output 𝜼(k) as follows: [ ̂r CPO
]
d+D
1 1 ∑ D k d+1 f (yr,obs |y(r),obs , 𝜼(k) )
f̂ (yr,obs |y(r),obs )
1
.
(10.2.26)
̂r is the harmonic mean of the D ordinates f (yr,obs |y(r),obs , 𝜼(k) ), k d + That is, CPO 1, , d + D. For several small area models, yr and y(r) are conditionally independent given 𝜼, that is, f (yr,obs |y(r),obs , 𝜼(k) ) f (yr,obs |𝜼(k) ) in (10.2.26). A proof of (10.2.26) is given in Section 10.18. A checking function of a “residual” form is given by c2 (yr , yr,obs ) with expectation d2r
yr,obs
yr,obs
yr
(10.2.27)
E(yr |y(r),obs ). We can use a standardized form ∗ d2r
√
d2r
(10.2.28)
V(yr |y(r),obs )
∗ versus r, similar to standard residual analysis, where V(y |y and plot d2r r (r),obs ) is the variance of yr with respect to f (yr |y(r),obs ). Another choice of checking function is c3r I( ∞< yr ≤ yr,obs ) with expectation
d3r
P(yr ≤ yr,obs |y(r),obs ).
(10.2.29)
We can use d3r to measure how unlikely yr is under f (yr |y(r),obs ). If yr is discrete, then we can use P(yr yr,obs |y(r),obs ). Yet another choice is given by c4 (yr , y(r),obs ) I(yr ∈ Br ), where Br
yr ∶ f (yr |y(r),obs ) ≤ f (yr,obs |y(r),obs ) .
(10.2.30)
P(Br |y(r),obs ).
(10.2.31)
Its expectation is equal to d4r
We can use d4r to see how likely f (yr |y(r),obs ) will be smaller than the corresponding conditional predictive ordinate, CPOr .
347
BASIC AREA LEVEL MODEL
To estimate the measures d2r , d3r , and d4r , we need to evaluate expectations of the form E a(yr )|y(r),obs for specified functions a(yr ). Suppose that f (yr |y(r),obs , 𝜼) f (yr |𝜼). Then we can estimate the expectations as 1 f̂ (yr |y(r),obs ) Dk
Ê a(yr )|y(r),obs
d+D
∑
br (𝜼(k) ) , (k) d+1 f (yr,obs |𝜼 )
(10.2.32)
where br (𝜼) is the conditional expectation of a(yr ) over yr given 𝜼. A proof of (10.2.32) is given by Section 10.18. If a closed-form expression for br (𝜼) is not available, then we have to draw a sample from f (yr |y(r) ) and estimate E a(yr )|y(r),obs directly. Gelfand (1996) has given a method of drawing such a sample without rerunning the MCMC sampler, using only y(r),obs . 10.3
BASIC AREA LEVEL MODEL
In this section, we apply the HB approach to the basic area level model (6.1.1), assuming a prior distribution on the model parameters (𝜷, 𝜎𝑣2 ). We first consider the case of known 𝜎𝑣2 and assume a “flat” prior on 𝜷 given by f (𝜷) ∝1, and rewrite (6.1.1) as an HB model: ind (i) 𝜃̂i |𝜃i , 𝜷, 𝜎𝑣2 ∼ N(𝜃i , 𝜓i ),
1,
i
ind
(ii) 𝜃i |𝜷, 𝜎𝑣2 ∼ N(zTi 𝜷, b2i 𝜎𝑣2 ),
i
, m,
1,
, m, (10.3.1)
(iii) f (𝜷) ∝1.
We then extend the results to the case of unknown 𝜎𝑣2 by replacing (iii) in (10.3.1) by (iii)′ f (𝜷, 𝜎𝑣2 )
f (𝜷)f (𝜎𝑣2 ) ∝f (𝜎𝑣2 ),
(10.3.2)
where f (𝜎𝑣2 ) is a prior on 𝜎𝑣2 . 10.3.1
Known 𝝈𝒗2
Straightforward calculations show that the posterior distribution of 𝜃i given the data , 𝜃̂m )T and 𝜎𝑣2 , under the HB model (10.3.1), is normal with mean equal to 𝜽̂ (𝜃̂1 , the best linear unbiased predictor (BLUP) estimator 𝜃̃iH and variance equal to M1i (𝜎𝑣2 ) given by (6.1.7). Thus, the HB estimator of 𝜃i is 𝜃̃iHB (𝜎𝑣2 )
̂ 𝜎𝑣2 ) E(𝜃i |𝜽,
𝜃̃iH ,
(10.3.3)
MSE(𝜃̃iH ).
(10.3.4)
and the posterior variance of 𝜃i is ̂ 𝜎𝑣2 ) V(𝜃i |𝜽,
M1i (𝜎𝑣2 )
Hence, when 𝜎𝑣2 is assumed to be known and f (𝜷) ∝1, the HB and BLUP approaches under normality lead to identical point estimates and measures of variability.
348
HIERARCHICAL BAYES (HB) METHOD
10.3.2
*Unknown 𝝈𝒗2 : Numerical Integration
In practice, 𝜎𝑣2 is unknown and it is necessary to take account of the uncertainty about 𝜎𝑣2 by assuming a prior, f (𝜎𝑣2 ), on 𝜎𝑣2 . The HB model is given by (i) and (ii) of (10.3.1) and (iii)′ given by (10.3.2). We obtain the HB estimator of 𝜃i as ̂ 𝜃̂ HB E(𝜃i |𝜽) E 2 𝜃̃ HB (𝜎𝑣2 ) , (10.3.5) 𝜎𝑣
i
i
where E𝜎 2 denotes the expectation with respect to the posterior distribution of 𝜎𝑣2 , 𝑣 ̂ The posterior variance of 𝜃i is given by f (𝜎𝑣2 |𝜽). E𝜎 2 M1i (𝜎𝑣2 ) +V𝜎 2 𝜃̃iHB (𝜎𝑣2 ) ,
̂ V(𝜃i |𝜽)
𝑣
𝑣
(10.3.6)
̂ It follows from (10.3.5) and where V𝜎 2 denotes the variance with respect to f (𝜎𝑣2 |𝜽). 𝑣 HB ̂ ̂ (10.3.6) that the evaluation of 𝜃i and V(𝜃i |𝜽) involves only single dimensional integrations. ̂ may be obtained from the restricted likelihood function The posterior f (𝜎𝑣2 |𝜽) 2 LR (𝜎𝑣 ) as ̂ ∝LR (𝜎𝑣2 )f (𝜎𝑣2 ), f (𝜎𝑣2 |𝜽) (10.3.7) where m
log LR (𝜎𝑣2 )
1∑ log(𝜎𝑣2 b2i + 𝜓i ) 2 i 1
const
1 log 2
m | |∑ | | | (𝜎𝑣2 b2i + 𝜓i ) 1 zi zTi | | | | |i 1
m
1∑ ̂ 𝜃 2 i 1 i
̃ 𝑣2 ) 2 ∕(𝜎𝑣2 b2 + 𝜓i ). zTi 𝜷(𝜎 i
(10.3.8)
̃ 𝑣2 ) 𝜷̃ is the weighted least squares Here, zi is a p × 1 vector of covariates and 𝜷(𝜎 estimator of 𝜷 given in equation (6.1.5); see Harville (1977). Under a “flat” prior ̂ is proper provided m > p + 2, and proportional to f (𝜎𝑣2 ) ∝1, the posterior f (𝜎𝑣2 |𝜽) 2 2 LR (𝜎𝑣 ); note that f (𝜎𝑣 ) is improper. More generally, for any improper prior f (𝜎𝑣2 ) ∝ ̂ will be proper if h(𝜎𝑣2 ) is a bounded function of 𝜎𝑣2 and h(𝜎𝑣2 ), the posterior f (𝜎𝑣2 |𝜽) m > p + 2. Note that h(𝜎𝑣2 ) 1 for the flat prior. Morris (1983a) and Ghosh (1992a) used the flat prior f (𝜎𝑣2 ) ∝1. The posterior mean of 𝜎𝑣2 under the flat prior f (𝜎𝑣2 ) ∝1 may be expressed as ∞ 2 ̂ ∶ E(𝜎𝑣2 |𝜽) 𝜎̂ 𝑣HB
∫0
∞
𝜎𝑣2 LR (𝜎𝑣2 )d𝜎𝑣2
∫0
LR (𝜎𝑣2 )d𝜎𝑣2 .
(10.3.9)
This estimator is always unique and positive, unlike the posterior mode or the REML estimator of 𝜎𝑣2 . Also, the HB estimator 𝜃̂iHB attaches a positive weight 𝛾̂iHB to the direct estimator 𝜃̂i , where 𝛾̂iHB E 𝛾i (𝜎𝑣2 )|𝜽̂ is obtained from (10.3.9) by changing 2 0, the empirical 𝜎𝑣2 to 𝛾i (𝜎𝑣2 ) 𝛾i . On the other hand, if the REML estimator 𝜎̂ 𝑣RE best linear unbiased prediction (EBLUP) (or EB) will give a zero weight to 𝜃̂i for all
349
BASIC AREA LEVEL MODEL
the small areas, regardless of the area sample sizes. Because of the latter difficulty, the HB estimator 𝜃̂iHB may be more appealing (Bell 1999). 2 can be substantial when 𝜎𝑣2 is small (Browne and The frequentist bias of 𝜎̂ 𝑣HB HB 2 ̂ Draper 2006). However, 𝜃i may not be affected by the bias of 𝜎̂ 𝑣HB because the T ̂ ̂ weighted average ai 𝜃i + (1 ai )zi 𝜷 is approximately unbiased for 𝜃i for any choice of the weight ai (0 ≤ ai ≤ 1), provided the linking model (ii) of the HB model (10.3.1) is correctly specified. The posterior variance (10.3.6), computed similar to (10.3.9), is used as a measure of uncertainty associated with 𝜃̂iHB . As noted in Section 10.1, it is desirable to select a “matching” improper prior that leads to well-calibrated inferences. In particular, the posterior variance should be second-order unbiased for the MSE, that is, ̂ MSE(𝜃̂iHB ) o(m 1 ). Such a dual justification is very appealing to surE V(𝜃i |𝜽) vey practitioners. Datta, Rao, and Smith (2005) showed that the matching prior for the special case of bi 1 is given by m ∑
fi (𝜎𝑣2 ) ∝ 𝜎 ( 𝑣2 + 𝜓i )2
(𝜎𝑣2 + 𝜓𝓁 ) 2 .
(10.3.10)
𝓁 1
The proof of (10.3.10) is quite technical and we refer the reader to Datta et al. (2005) for details. The matching prior (10.3.10) is a bounded function of 𝜎𝑣2 so that the posterior is proper provided m > p + 2. Also, the prior (10.3.10) depends collectively on all the sampling variances for all the areas as well as on the individual sampling variance, 𝜓i , of the ith area. Strictly speaking, a prior on the common parameter 𝜎𝑣2 should not vary with area i. Hence, we should use the area-specific matching prior (10.3.10) when area i is of primary interest. The same prior is also employed for the remaining areas, although one could argue in favor of using different priors for different areas. For the balanced case, 𝜓i 𝜓 for all i, the matching prior (10.3.10) reduces to the flat prior f (𝜎𝑣2 ) ∝1 and the resulting HB inferences have dual justification. However, the use of a flat prior when the sampling variances vary significantly across areas could lead to posterior variances not tracking the MSE. The matching prior (10.3.10) may be generalized to the case of matching a weighted average of expected posterior variances to the corresponding weighted average of MSEs in the following sense: ] [m m ∑ ∑ ̂ a𝓁 V(𝜃𝓁 |𝜽) a𝓁 MSE(𝜃̂ H ) o(m 1 ), (10.3.11) E 𝓁
𝓁 1
𝓁 1
∑ where a𝓁 are specified weights satisfying a𝓁 ≥ 0, 𝓁 1, , m, and m 1. 𝓁 1 a𝓁 For this case, Ganesh and Lahiri (2008) obtained a matching prior given by ∑m (𝜎𝑣2 + 𝜓i ) 2 2 . (10.3.12) fGL (𝜎𝑣 ) ∝ ∑m 𝓁 1 2 2 2 𝓁 1 a𝓁 𝜓𝓁 (𝜎𝑣 + 𝜓𝓁 ) ∑ 2 2 By letting a𝓁 𝜓𝓁 1 ∕ m t 1 𝜓t , the prior (10.3.12) reduces to the flat prior f (𝜎𝑣 ) ∝1. 1 Average moment-matching prior is obtained by letting a𝓁 m for 𝓁 1, , m.
350
HIERARCHICAL BAYES (HB) METHOD
The area-specific matching prior (10.3.10) is also a particular case of (10.3.12) by , m. For the special case of 𝜓𝓁 𝜓 for letting ai 1 and a𝓁 0, 𝓁 ≠ i, i 1, all 𝓁, the average moment-matching prior reduces to the flat prior f (𝜎𝑣2 ) ∝1. Example 10.3.1. U.S. Poverty Counts. In Example 6.1.2, we considered EBLUP estimation of school-age children in poverty in the United States at the county and state levels. Bell (1999) used the state model to calculate maximum-likelihood (ML), REML, and HB estimates of 𝜎𝑣2 for 5 years, 1989–1993. Both ML and REML esti2 , based on the flat prior f (𝜎 2 ) ∝1, varied mates are zero for the first 4 years while 𝜎̂ 𝑣HB 𝑣 from 1.6 to 3.4. The resulting EBLUP estimates of poverty rates attach zero weight to the direct estimates regardless of the Current Population Survey (CPS) state sample sizes ni (number of households). Also, the leading term, g1i (𝜎̂ 𝑣2 ) 𝛾̂i 𝜓i , of the MSE estimator (6.2.7) becomes zero when the estimate of 𝜎𝑣2 is zero. As a result, the contribution to (6.2.7) comes entirely from the terms g2i (𝜎̂ 𝑣2 ) and g3i (𝜎̂ 𝑣2 ), which account for the estimation of 𝜷 and 𝜎𝑣2 , respectively. Table 10.1 reports results for four states in increasing order of the sampling variance 𝜓i : California (CA), North Carolina (NC), Indiana (IN), and Mississippi (MS). This table shows the sample sizes ni , poverty rates 𝜃̂i as a percentage, sampling variances 𝜓i , MSE estimates mseML and mseRE based on ML and REML estimation of 𝜎𝑣2 ̂ Naive MSE estimates, using (6.2.7) and (6.2.3), and the posterior variances V(𝜃i |𝜽). 2 2 based on the formula g1i (𝜎̂ 𝑣 ) +g2i (𝜎̂ 𝑣 ), are also included for ML and REML esti2 2 2 mation (mseN, ML and mseN,RE ). Results for 1992 (with 𝜎̂ 𝑣ML 𝜎̂ 𝑣RE 0, 𝜎̂ 𝑣HB 2 2 1.6) are compared in Table 10.1 to those for 1993 (with 𝜎̂ 𝑣ML 0.4, 𝜎̂ 𝑣RE 1.7, 2 𝜎̂ 𝑣HB 3.4). Comparing mseN, ML to mseML and mseN,RE to mseRE given in Table 10.1, we note that the naive MSE estimates lead to significant underestimation. Here, 𝜎𝑣2 is not estimated precisely, and this fact is reflected in the contribution from 2g3i (𝜎̂ 𝑣2 ). For 1992, the leading term g1i (𝜎̂ 𝑣2 ) is zero and the contribution to MSE estimates
TABLE 10.1
MSE Estimates and Posterior Variance for Four States
State
ni
𝜃̂i
𝜓i
mseN,ML
CA NC IN MS
4,927 2,400 670 796
20.9 23.0 11.8 29.6
1.9 5.5 9.3 12.4
1.3 0.6 0.3 2.8
CA NC IN MS
4,639 2,278 650 747
23.8 17.0 10.3 30.5
2.3 4.5 8.5 13.6
1.5 1.0 0.8 3.2
mseML
mseN,RE
mseRE
̂ V(𝜃i |𝜽)
3.6 2.0 1.4 3.8
1.3 0.6 0.3 2.8
2.8 1.2 0.6 3.0
1.4 2.0 1.7 4.0
3.2 2.4 1.9 4.3
1.6 1.7 1.8 4.2
2.2 2.2 2.2 4.5
1.7 2.0 3.0 5.1
1992
1993
Source: Adapted from Tables 2 and 3 in Bell (1999).
351
BASIC AREA LEVEL MODEL
comes entirely from g2i (𝜎̂ 𝑣2 ) and 2g3i (𝜎̂ 𝑣2 ). The MSE estimates for NC and IN turned out to be smaller than the corresponding MSE estimates for CA, despite the smaller sample sizes and larger sampling variances compared to CA. The reason for this occurrence becomes evident by examining g2i (𝜎̂ 𝑣2 ) and 2g3i (𝜎̂ 𝑣2 ), which reduce to ∑ ∑ T 2 1 ̂ 𝑣2 ) 4∕(𝜓i m g2i (𝜎̂ 𝑣2 ) zTi ( m i 1 zi zi ∕𝜓i ) zi and 2g3i (𝜎 i 1 𝜓i ) for ML and REML 2 2 when 𝜎̂ 𝑣 0. The term 2g3i (𝜎̂ 𝑣 ) for CA is larger because the sampling variance 𝜓CA 1.9, which appears in the denominator, is much smaller than 𝜓NC 5.5 and 𝜓IN 9.3. The naive MSE estimator is also smaller for NC and IN compared to CA, and in this case only the g2i -term contributes. It appears from the form of g2i (𝜎̂ 𝑣2 ) that the covariates zi for CA contribute to this increase. Turning to the HB values for 1992, we see that the leading term of the posterior ̂ is g1i (𝜎̂ 2 ) 𝛾̂ HB 𝜓i . Here, 𝜎̂ 2 1.6 and the leading term domivariance V(𝜃i |𝜽) 𝑣HB 𝑣HB i nates the posterior variance. As a result, the posterior variance is the smallest for CA, although the value for NC is slightly larger compared to IN, despite the larger sample size and smaller sampling variance (5.5 vs 9.3). For 1993 with nonzero ML and ̂ but mseML values REML estimates of 𝜎𝑣2 , mseRE exhibits a trend similar to V(𝜃i |𝜽), 2 2 ( 1.7) and are similar to 1992 values due to a small 𝜎̂ 𝑣ML ( 0.4) compared to 𝜎̂ 𝑣RE 2 𝜎̂ 𝑣HB ( 3.4). The occurrences of zero estimates of 𝜎𝑣2 in the frequentist approach is problematic (Hulting and Harville 1991), but it is not clear if the HB approach based on a flat prior on 𝜎𝑣2 leads to well-calibrated inferences. We have already noted that the matching prior is different from the flat prior if the 𝜓i -values vary significantly, as in the case of state CPS variance estimates with max(𝜓i )∕min(𝜓i ) as large as 20. 10.3.3
Unknown 𝝈𝒗2 : Gibbs Sampling
In this section, we apply Gibbs sampling to the basic area level model, given by (i) and (ii) of (10.3.1), assuming the prior (10.3.2) on 𝜷 and 𝜎𝑣2 with 𝜎𝑣 2 ∼ G(a, b), a > 0, b > 0, that is, a gamma distribution with shape parameter a and scale parameter b. Note that 𝜎𝑣2 is distributed as inverted gamma IG(a, b), with f (𝜎𝑣2 ) ∝ exp ( b∕𝜎𝑣2 )(1∕𝜎𝑣2 )a+1 . The positive constants a and b are set to very small values (BUGS uses a b 0.001 as the default setting). It is easy to verify that the Gibbs conditionals are then given by (m ) ⎡ ∑ (i) 𝜷|𝜽, 𝜎𝑣2 , 𝜽̂ ∼ Np ⎢𝜷 ∗ , 𝜎𝑣2 z̃ i z̃ Ti ⎢ i 1 ⎣
1⎤
⎥, ⎥ ⎦
, m, (ii) 𝜃i |𝜷, 𝜎𝑣2 , 𝜽̂ ∼ N 𝜃̂iB (𝜷, 𝜎𝑣2 ), 𝛾i 𝜓i , i 1, [ ] m ∑ 1 m 2 T 2 (𝜃̃ z̃ i 𝜷) + b , + a, (iii) 𝜎𝑣 |𝜷, 𝜽, 𝜽̂ ∼ G 2 2 i 1 i
(10.3.13) (10.3.14) (10.3.15)
∑ ∑ ̃ i z̃ Ti ) 1 ( m ̃ i 𝜃̃i ), Np (⋅) denotes a p-variate where 𝜃̃i 𝜃i ∕bi , z̃ i zi ∕bi , 𝜷 ∗ ( m i 1z i 1z B T 2 ̂ ̂ normal, and 𝜃i (𝜷, 𝜎𝑣 ) 𝛾i 𝜃i + (1 𝛾i )zi 𝜷 is the Bayes estimator of 𝜃i . A proof of
352
HIERARCHICAL BAYES (HB) METHOD
(10.3.13)–(10.3.15) is sketched in Section 10.18.3. Note that all the Gibbs conditionals (i)–(iii) have closed forms, and hence the MCMC samples can be generated directly from them. , d + D . Using Denote the MCMC samples as (𝜷 (k) , 𝜽(k) , 𝜎𝑣2(k) ), k d + 1, (10.3.14), we can obtain Rao–Blackwell estimators of the posterior mean and the posterior variance of 𝜃i as d+D
1 ∑ ̂ B (k) 2(k) 𝜃 (𝜷 , 𝜎𝑣 ) ∶ 𝜃̂iB (⋅, ⋅) D k d+1 i
𝜃̂iHB
(10.3.16)
and d+D
1 ∑ g (𝜎 2(k) ) D k d+1 1i 𝑣
̂ ̂ i |𝜽) V(𝜃
1k
D
∑ [ 𝜃̂ B (𝜷 (k) , 𝜎𝑣2(k) )
d+D
1
+
i
]2 𝜃̂iB (⋅, ⋅) .
(10.3.17)
d+1
More efficient estimators may be obtained by exploiting the closed-form results of Section 10.3.1 for known 𝜎𝑣2 . We have d+D
1 ∑ ̃ H 2(k) 𝜃 (𝜎 ) ∶ 𝜃̃iH (⋅) D k d+1 i 𝑣
𝜃̂iHB
(10.3.18)
and ̂ ̂ i |𝜽) V(𝜃
d+D [ ] 1 ∑ g1i (𝜎𝑣2(k) ) + g2i (𝜎𝑣2(k) ) D k d+1
D
∑ [ 𝜃̃ H (𝜎𝑣2(k) )
]2 𝜃̃iH (⋅) .
d+D
1
+
1k
i
(10.3.19)
d+1
Based on the Rao–Blackwell estimator 𝜃̂iHB , an estimator of the total Yi g 1 (𝜃i ) ̂ Howis given by g 1 (𝜃̂iHB ), but it is not equal to the desired posterior mean E(Yi |𝜽). (k) (k) 1 g (𝜃i ); k d + 1, , d + D can be ever, the marginal MCMC samples Yi used directly to estimate the posterior mean of Yi as d+D
Ŷ iHB
1 ∑ (k) Y D k d+1 i
∶ Yi(⋅) .
(10.3.20)
Similarly, the posterior variance of Yi is estimated as ̂ ̂ i |𝜽) V(Y
d+D
∑
1 D
1k
d+1
(Yi(k)
Yi(⋅) )2 .
(10.3.21)
353
BASIC AREA LEVEL MODEL
If L independent runs are generated, instead of a single long run, then the posterior mean is estimated as Ŷ iHB
L 2d 1 ∑ ∑ (𝓁k) Y Ld 𝓁 1 k d+1 i
L
1 ∑ (𝓁⋅) Y L𝓁 1 i
∶ Yi(⋅⋅) ,
(10.3.22)
where Yi(𝓁k) is the kth retained value in the 𝓁th run of length 2d with the first d burn-in iterations deleted. The posterior variance is estimated from (10.2.10): ̂ ̂ i |𝜽) V(Y
𝓁 1
∑L
(𝓁⋅) Yi(⋅⋅) )2 ∕(L 𝓁 1 (Yi (𝓁k) Yi(𝓁⋅) )2 ∕ L(d 1) k d+1 (Yi
where Bi d ∑L ∑2d
1
d d
1 Wi + Bi , d
(10.3.23)
1) is the between-run variance and Wi is the within-run variance.
Example 10.3.2. Canadian Unemployment Rates. Example 4.4.4 mentioned the use of time series and cross-sectional models to estimate monthly unemployment rates for m 62 Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs) (small areas) in Canada, using time series and cross-sectional data. We study this application in Example 10.9.1. Here we consider only the cross-sectional data for the “current” month June 1999 to illustrate the Gibbs sampling HB method for the basic area level model. Let 𝜃̂i be the Labor Force Survey (LFS) unemployment rate for the ith area. To obtain smoothed estimates 𝜓i of the sampling variances of 𝜃̂i , You, Rao, and Gambino (2003) first computed the average CV (over time), CVi , and then took 𝜓i (𝜃̂i CVi )2 . Employment Insurance (EI) beneficiary rates were used as auxiliary variables, zi , in , m, where m 62. the linking model 𝜃i 𝛽0 + 𝛽1 zi + 𝑣i , i 1, Gibbs sampling was implemented using L 10 parallel runs, each of length 2d 2000. The first d 1000 “burn-in” iterations of each run were deleted. The convergence of the Gibbs sampler was monitored using the method of Gelman and Rubin (1992); see Section 10.2.5. The Gibbs sampler converged very well in terms of the estimated potential scale reduction factor R̂ given by (10.2.11). To check the overall fit of the model, the posterior predictive p-value, pp, , 10, was approximated from each run of the MCMC output 𝜼(𝓁k) , 𝓁 1, ̂ 𝜼) using the formula (10.2.19) with the measure of discrepancy given by T(𝜽, ∑62 ̂ 𝜃i )2 ∕𝜓i . The average of the L 10 p-values, p̂p 0.59, indicates a i 1 ( 𝜃i good fit of the model to the current cross-sectional data. Note that the hypothetical replication 𝜃̂i(𝓁k) , used in (10.2.19), is generated from N(𝜃i(𝓁k) , 𝜓i ) for each 𝜃i(𝓁k) . Rao–Blackwell estimators (10.3.16) and (10.3.17) were used to calculate the estî for each ̂ i |𝜽), mated posterior mean, 𝜃̂iHB , and the estimated posterior variance, V(𝜃 area i. Figure 10.1 displays the coefficient of variations (CVs) of the direct estimates ̂ 1∕2 ∕𝜃̂ HB . It is clear ̂ i |𝜽) 𝜃̂i and the HB estimates 𝜃̂iHB ; the CV of 𝜃̂iHB is taken as V(𝜃 i from Figure 10.1 that the HB estimates lead to significant reduction in CV, especially for the areas with smaller population sizes.
354
0.8 0.2
0.4
0.6
Direct estimate HB estimate
0.0
Coefficient of variation
1.0
HIERARCHICAL BAYES (HB) METHOD
0
10
20
30
40
50
60
CMA/CAs by population size
Figure 10.1 Coefficient of Variation (CV) of Direct and HB Estimates. Source: Adapted from Figure 3 in You, Rao, and Gambino (2003).
10.3.4
*Unknown Sampling Variances 𝝍i
Suppose that simple random sampling within areas is assumed and that the available ∑n , m , where s2i (ni 1) 1 j i 1 (yij yi )2 . The data are D (𝜃̂i yi , s2i ); i 1, iid
HB model is specified as (i) yi |𝜃i , 𝜎i2 ∼ N(𝜃i , 𝜓i n2
i
1
; and (iii)
ind 𝜃i |𝜷, 𝜎𝑣2 ∼ N(zTi 𝜷, 𝜎𝑣2 ),
𝜎i2 ∕ni ); (ii) (ni
1)s2i ∕𝜎i2 |𝜎i2 ∼
where 𝜎i2 is an unknown parameter. Priors for iid
the model parameters are specified as f (𝜷) ∝ 1, 𝜎i 2 ∼ G(a1 , b1 ), and 𝜎𝑣 2 ∼ G(a0 , b0 ) with at > 0 and bt > 0, t 0, 1. Let 𝜽 (𝜃1 , , 𝜃m )T , 𝜽̂ (𝜃̂1 , , 𝜃̂m )T , and 𝝈 2 (𝜎12 , , 𝜎m2 )T . Under the specified HB model (denoted Model 1), the Gibbs conditionals are given by
𝜷|𝜽, 𝜎𝑣2 , 𝝈 2 , D
) (m ⎡ ∑ T ∼ Np ⎢ zz ⎢ i 1 i i ⎣
(m ∑
1 m
∑ zi 𝜃̂i , 𝜎𝑣2 i 1
) zi zTi
i 1
, m, 𝜃i |𝜷, 𝜎𝑣2 , 𝝈 2 , D ∼ N 𝜃̃iB (𝜷, 𝜎𝑣2 ), 𝛾i 𝜓i , i 1, { n 1[ 𝜎i 2 |𝜷, 𝜎𝑣2 , D, 𝜽̂ ∼ G a1 + i , b1 + n (y 𝜃i )2 + (ni 2 2 i i
1⎤
⎥, ⎥ ⎦
(10.3.24) (10.3.25)
1)s2i
]} , i
1,
, m,
(10.3.26) [ 𝜎𝑣 2 |𝜷, 𝝈 2 , 𝜽, D ∼ G a0 +
m
m 1∑ (𝜃 , b0 + 2 2 i 1 i
] zTi 𝜷)2 ,
Proof of (10.3.24)–(10.3.27) follows along the lines of Section 10.18.3.
(10.3.27)
355
BASIC AREA LEVEL MODEL
It follows from (10.3.24)–(10.3.27) that all the Gibbs conditionals have closed forms, and hence the MCMC samples can be generated directly from the conditionals. The posterior means and variances of 𝜃i are obtained from the MCMC samples, as in Section 10.3.3. You and Chapman (2006) obtained the Gibbs conditionals assuming , m. They also derived the Gibbs conditionals under an that 𝜓i 1 ∼ G(ai , bi ), i 1, alternative HB model (denoted Model 2) that assumes 𝜓i ni 1 s2i . They compared the posterior variances of the 𝜃i ’s under the two different HB models, using a data set with small area sample sizes ni (ranging from 3 to 5) and another data set with larger ni (≥ 95). In the case of small ni , the posterior variance of 𝜃i under Model 1 is significantly larger than the corresponding value under Model 2 because Model 1 accounts for the uncertainty associated with 𝝈 2 unlike Model 2. On the other hand, for the second data set, the two posterior variances are practically the same because of much larger area sample sizes ni . Model 2 should not be used in practice if ni is small. You and Chapman (2006) also studied the sensitivity of posterior inferences to the choice of at and bt in the priors for 𝜎𝑣2 and 𝜎i2 , by varying their values from 0.0001 to 0.1. Their results indicate that the HB estimates of the area means are not sensitive to the choice of proper priors specified by small values of at and bt . Arora and Lahiri (1997) proposed an HB method of incorporating the design iid weights, 𝑤ij , into the model. Assuming the unit level model yij |𝜃i , 𝜎i2 ∼ N(𝜃i , 𝜎i2 ) ind
and no sample selection bias, it holds that yiw |𝜃i , 𝜎i2 ∼ N(𝜃i , 𝜓i 𝛿iw 𝜎i2 ), where ∑n ∑ni ∑ni 𝛿iw 𝑤̃ with 𝑤̃ ij 𝑤ij ∕ j i 1 𝑤ij and yiw 𝑤̃ y , as in Section 7.6.2. j 1 ij j 1 ij ij ind
The HB model is then specified as (i) yiw |𝜃i , 𝜎i2 ∼ N(𝜃i , 𝛿iw 𝜎i2 ), (ii) (ni ind
iid
1)s2i ∕
𝜎i2 |𝜎i2 ∼ n2 1 , (iii) 𝜃i |𝜷, 𝜎𝑣2 ∼ N(zTi 𝜷, 𝜎𝑣2 ), and (iv) 𝜎i 2 ∼ G(a, b), a > 0, b > 0. i Priors for the model parameters are specified as 𝜷 ∼ Up , 𝜎𝑣 2 ∼ U + , a ∼ U + , and b ∼ U + , where U + denotes a uniform distribution over a subset of ℝ+ with large but finite length and Up denotes a uniform distribution over a p-dimensional rectangle whose sides are of large but finite length. M–H within Gibbs algorithm is used to generate MCMC samples from the joint posterior distribution. We refer the reader to Arora and Lahiri (1997) for technical details. ∑n Note that the data required to ,m . implement the proposed method are (yiw , s2i , j i 1 𝑤̃ 2ij ); i 1, Note that You and Chapman (2006) treat 𝜎i2 as a model parameter, considering the prior 𝜎i 2 ∼ G(ai , bi ) with fixed parameters ai and bi , unlike Arora and Lahiri’s assumption of random 𝜎i2 satisfying (iv) above, with random parameters a and b. Assumption of random 𝜎i2 following (iv) above allows borrowing strength across both means 𝜃i and sampling variances 𝜎i2 . 10.3.5
*Spatial Model
We now study an extension of the model introduced in Section 10.3.4 by allowing , 𝑣m )T . In particular, the spaspatial correlations among the area effects v (𝑣1 , ind ind tial model is specified as (i) yi |𝜃i , 𝜎i2 ∼ N(𝜃i , 𝜓i 𝜎i2 ∕ni ); (ii) (ni 1)s2i ∕𝜎i2 |𝜎i2 ∼ , 𝜃m )T |𝜷, 𝜎𝑣2 ∼ Nm (Z𝜷, 𝜎𝑣2 D 1 ), where Z (z1 , , zm )T n2 1 ; and (iii) 𝜽 (𝜃1 , i and D 𝝀R + (1 𝝀)I. Here, R is a neighborhood matrix with elements rii number
356
HIERARCHICAL BAYES (HB) METHOD
of neighbors of area i and ri𝓁 1 if area 𝓁 is a neighbor of area i and 0 otherwise, i ≠ 𝓁, and 𝝀 is a spatial autocorrelation parameter, 0 ≤ 𝝀 ≤ 1. Priors for the iid , 𝜎m2 ) are specified as (iv) f (𝜷) ∝ 1, 𝜎i 2 ∼ G(a, b), model parameters (𝜷, 𝝀, 𝜎𝑣2 , 𝜎12 , i 1, , m, 𝜎𝑣 2 ∼ G(a0 , b0 ) and 𝝀 ∼ U(0, 1). Model component (iii) corresponds to a conditional autoregressive (CAR) model on v (𝑣1 , , 𝑣m )T . Under the specified HB model (i)–(iv), all Gibbs conditionals excepting that of 𝝀|𝜽, 𝜷, 𝜎𝑣2 , 𝝈 2 have closed-form expressions. Therefore, we can employ M–H within Gibbs algorithm to generate MCMC from the joint posterior distribution of , 𝜎m2 ). Appendix A.4 of You and Zhou (2011) gives the full Gibbs (𝜽, 𝜷, 𝝀, 𝜎𝑣2 , 𝜎12 , conditionals assuming that 𝜓i 1 ∼ G(ai , bi ). You and Zhou (2011) applied their HB spatial model to estimate asthma rates, 𝜃i , for the m 20 health regions in British Columbia, Canada. For this purpose, they used data from the Canadian Community Health Survey (CCHS) and six area level auxiliary variables. Since the CCHS used a complex design, assumption (ii) of ind the spatial model is replaced by (di 𝜓̂ i ∕𝜓i )|𝜓i ∼ d2 and the prior on 𝜓i is taken as i 𝜓i 1 ∼ G(a, b), where 𝜓̂ i is a direct estimator of the sampling variance of 𝜃̂i and di is the degrees of freedom associated with 𝜓̂ i . You and Zhou (2011) took di ni 1, which ignores the design effect associated with 𝜓̂ i . Results based on the CCHS data indicated that the spatial model leads to significant reduction in CV relative to the model without spatial correlation when the number of neighbors to an area increases; for example, a 21% reduction in CV is achieved when the number of neighboring areas is seven. 10.4 *UNMATCHED SAMPLING AND LINKING AREA LEVEL MODELS As noted in Section 4.2, the assumption of zero design-bias, that is, Ep (ei |𝜃i ) 0, in the sampling model 𝜃̂i 𝜃i + ei may not be valid if the sample size ni in the ith area is small and 𝜃i g(Yi ) is a nonlinear function of the total Yi . A more realistic sampling model is given by Ŷ i
Yi + e∗i ,
i
1,
, m,
(10.4.1)
with Ep (e∗i |Yi ) 0, where Ŷ i is a design-unbiased (p-unbiased) estimator of Yi . A generalized regression (GREG) estimator, Ŷ iGR , of the form (2.4.7) may also be used as Ŷ i , since it is approximately p-unbiased if the overall sample size is large. However, the sampling model (10.4.1) is not matched with the linking model iid 𝜃i zTi 𝜷 + bi 𝑣i , where 𝑣i ∼ N(0, 𝜎𝑣2 ). As a result, the sampling and linking models cannot be combined to produce a basic area level model. The HB approach readily extends to unmatched sampling and linking models (You and Rao 2002b). We write the sampling model (10.4.1) under normality as ind Ŷ i ∼ N(Yi , 𝜙i ),
(10.4.2)
357
*UNMATCHED SAMPLING AND LINKING AREA LEVEL MODELS
where the sampling variance 𝜙i is known or a known function of Yi . We first consider the case of known 𝜙i . We combine (10.4.2) with the linking model (ii) of (10.3.1) and ̂ the prior (10.3.2) with 𝜎𝑣 2 ∼ G(a, b). The resulting Gibbs conditionals 𝜷|Y, 𝜎𝑣2 , Y ̂ are identical to (10.3.13) and (10.3.15), respectively, where Y and 𝜎𝑣 2 |𝜷, Y, Y ̂ (Ŷ 1 , ̂ (Y1 , , Ym )T and Y , Ŷ m )T . However, the Gibbs conditional Yi |𝜷, 𝜎𝑣2 , Y does not admit a closed form, unlike (10.3.14). It is easy to verify that ̂ ∝ h(Yi |𝜷, 𝜎𝑣2 )d(Yi ), f (Yi |𝜷, 𝜎𝑣2 , Y)
(10.4.3)
where d(Yi ) is the normal density N(Ŷ i , 𝜙i ), and { h(Yi |𝜷, 𝜎𝑣2 )
′
∝ g (Yi ) exp
(𝜃̃i
z̃ Ti 𝜷)2
} (10.4.4)
2𝜎𝑣2
is the density of Yi given 𝜷 and 𝜎𝑣2 and g′ (Yi ) 𝜕g(Yi )∕𝜕Yi , where 𝜃̃i 𝜃̂i ∕bi and z̃ i zi ∕bi . Using d(Yi ) as the “candidate” density to implement M–H within Gibbs, the acceptance probability (10.2.5) reduces to { a(𝜷
(k)
, 𝜎𝑣2(k) , Yi(k) , Yi∗ )
min
}
h(Yi∗ |𝜷 (k) , 𝜎𝑣2(k) ) h(Yi(k) |𝜷 (k) , 𝜎𝑣2(k) )
,1
,
(10.4.5)
where the candidate Yi∗ is drawn from N(Ŷ i , 𝜙i ), and Yi(k) , 𝜷 (k) , 𝜎𝑣2(k) are the values of Yi , 𝜷, and 𝜎𝑣2 after the kth cycle. The candidate Yi∗ is accepted with probability a(𝜷 (k) , 𝜎𝑣2(k) , Yi(k) , Yi∗ ); that is, Yi(k+1) Yi∗ if Yi∗ is accepted, and , m, Yi(k+1) Yi(k) otherwise. Repeating this updating procedure for i 1, (k+1) (k+1) T (k+1) (Y1 , , Ym ) , noting that the Gibbs conditional of Yi we get Y does not depend on Y𝓁 , 𝓁 ≠ i. Given Y(k+1) , we then generate 𝜷 (k+1) from ̂ and 𝜎𝑣2(k+1) from 𝜎𝑣2 |Y(k+1) , 𝜷 (k+1) , Y ̂ to complete the (k + 1)th 𝜷|Y(k+1) , 𝜎𝑣2(k) , Y (k+1) 2(k+1) (k+1) cycle Y ,𝜷 , 𝜎𝑣 . If 𝜙i is a known function of Yi , we use h(Yi |𝜷 (k) , 𝜎𝑣2(k) ) to draw the candidate Yi∗ , noting that Yi g 1 (𝜃i ) and 𝜃i |𝜷, 𝜎𝑣2 ∼ N(zTi 𝜷, b2i 𝜎𝑣2 ). In this case, the acceptance probability is given by { a(Yi(k) , Yi∗ )
min
d(Yi∗ ) d(Yi(k) )
} ,1
.
(10.4.6)
358
HIERARCHICAL BAYES (HB) METHOD
Example 10.4.1. Canadian Census Undercoverage. In Example 6.1.3, we provided a brief account of an application of the basic area level model to estimate domain undercount in the 1991 Canadian census, using the EBLUP method. You and Rao (2002b) applied the unmatched sampling and linking models to 1991 Canadian census data, using the HB method, and estimated the number of missing persons Mi ( Yi ) and the undercoverage rate Ui Mi ∕(Mi + Ci ) for each province i (i 1, , 10), where Ci is the census count. The sampling variances 𝜙i were estimated through a generalized variance function model of the form V(Ŷ i ) ∝ Ci𝛾 and then treated as known in the sampling model (10.4.2). The linking model is given by (ii) of (10.3.1) with 𝜃i log Mi ∕(Mi + Ci ) , zTi 𝜷 𝛽0 + 𝛽1 log(Ci ), and bi 1. The prior on the model parameters (𝛽0 , 𝛽1 , 𝜎𝑣2 ) is given by (10.3.2) with 𝜎𝑣 2 ∼ G(0.001, 0.001) to reflect lack of prior information. You and Rao (2002b) simulated L 8 parallel M–H within Gibbs runs independently, each of length 2d 10, 000. The first d 5, 000 “burn-in” iterations of each run were deleted. Furthermore, to reduce the autocorrelation within runs, they selected every 10th iteration of the remaining 5, 000 iterations of each run, leading to 500 iterations for each run. Thus, the total number of MCMC runs retained is Ld 4, 000. The convergence of the M–H within Gibbs sampler was monitored using the method of Gelman and Rubin (1992) described in Section 10.2.4. The MCMC sampler converged very well in terms of the potential scale reduction factor R̂ given by (10.2.11); values of R̂ for the 10 provinces were close to 1. We denote the marginal MCMC , 2d, 𝓁 1, ,L . sample as Mi(𝓁k) ; k d + 1, To check the overall fit of the model, the posterior predictive p-value was estimated , 8 using the formula (10.2.19) from each run of∑ the MCMC output 𝜼(𝓁k) , 𝓁 1, m ̂ ̂ i Yi )2 ∕𝜙i . The average of the L 8 p-values, p̂p 0.38, ( Y with T(Y, 𝜼) i 1 indicates a good fit of the model. Note that the hypothetical replications Ŷ i(𝓁k) , used in (10.2.18), are generated from N(Yi(𝓁k) , 𝜙i ), for each Yi(𝓁k) . To assess model fit at the individual province level, You and Rao (2002b) computed two statistics proposed by Daniels and Gatsonis (1999). The first statistic, given by pi
̂ obs ), P(Ŷ i < Ŷ i,obs |Y
(10.4.7)
provides information on the degree of consistent overestimation or underestimation of Ŷ i,obs . This statistic is similar to the cross-validation statistic (10.2.29) but uses the full ̂ obs ). As a result, it is computationally simpler than (10.2.29). predictive density f (Ŷ i |Y The pi -values, computed from the hypothetical replications Ŷ i(𝓁k) , ranged from 0.28 to 0.87 with a mean of 0.51 and a median of 0.49, indicating no consistent overestimation or underestimation. The second statistic is similar to the cross-validation standardized residual (10.2.28) but uses the full predictive density and it is given by √ ̂ ̂ obs ), ̂ ̂ E(Yi |Yobs ) Yi,obs ∕ V(Ŷ i |Y (10.4.8) di where the expectation and variance are with respect to the full predictive density. The estimated standardized residuals, di , ranged from 1.13 (New Brunswick) to 0.57 (Prince Edward Island), indicating adequate model fit for individual provinces.
359
*UNMATCHED SAMPLING AND LINKING AREA LEVEL MODELS
TABLE 10.2
1991 Canadian Census Undercount Estimates and Associated CVs
Prov
̂i M
̂ i) CV(M
̂ HB M i
̂ HB ) CV(M i
Û i (%)
CV(Û i )
Û iHB (%)
CV(Û iHB )
Nfld PEI NS NB Que Ont Man Sask Alta BC
11,566 1,220 17,329 24,280 184,473 381,104 20,691 18,106 51,825 92,236
0.16 0.30 0.20 0.14 0.08 0.08 0.21 0.19 0.15 0.10
10,782 1,486 17,412 18,948 189,599 368,424 21,504 18,822 55,392 89,929
0.14 0.19 0.14 0.17 0.08 0.08 0.14 0.14 0.12 0.09
1.99 0.93 1.89 3.25 2.58 3.64 1.86 1.80 2.01 2.73
0.16 0.30 0.20 0.13 0.08 0.08 0.20 0.18 0.14 0.10
1.86 1.13 1.90 2.55 2.65 3.52 1.93 1.87 2.13 2.67
0.13 0.19 0.14 0.17 0.08 0.08 0.14 0.13 0.12 0.09
Source: Adapted from Table 2 in You and Rao (2002b).
The posterior mean and the posterior variance of Mi ( Yi ) were estimated using (10.3.20) and (10.3.21) with L 8 and d 500. Table 10.2 reports the direct estî HB , and the associated CVs. For the direct estimate, the ̂ i , the HB estimates, M mates, M i ̂ HB is based on CV is based on the design-based variance estimate, while the CV for M i the estimated posterior variance. Table 10.2 also reports the estimates of undercoverage rates Ui , denoted Û i and Û iHB , and the associated CVs, where Û i is the direct estimate of Ui . The HB estimate Û iHB and the associated posterior variance are obtained from (10.3.22) and (10.3.23) by changing Yi(𝓁k) to Ui(𝓁k) Mi(𝓁k) ∕(Mi(𝓁k) + Ci ). ̂ HB , Û HB ) perform better than It is clear from Table 10.2 that the HB estimates (M i i ̂ i , Û i ) in terms of CV, except the estimate for New Brunswick. the direct estimates (M For Ontario, Quebec, and British Columbia, the CVs of HB and direct estimates are nearly equal due to larger sample sizes in these provinces. Note that the sampling model (10.4.1) assumes that the direct estimate Ŷ i is design-unbiased. In the context of census undercoverage estimation, this assumption may be restrictive because the estimate may be subject to nonsampling bias (Zaslavsky 1993). This potential bias was ignored due to lack of a reliable estimate of bias. Example 10.4.2. Adult Literacy Proportions in the United States. In Section 8.5 we studied a twofold subarea level linking model matching to the sampling model for the subareas, and developed EBLUP estimators of subarea and area means simultaneously. We now provide an application of the HB method to estimate the state and county proportions of adults in the United States at the lowest level of literacy, using unmatched sampling model and twofold subarea level linking model for the county proportions. Mohadjer et al. (2012) used data from the National Assessment of Adult Literacy (NAAL) to produce state and county HB estimates and associated credible intervals. In this application, subarea refers to county and area to state. The sampling model is given by 𝜃̂ij 𝜃ij + eij , where 𝜃ij is the population proportion of adults at the lowest level of literacy in county j belonging to state i, and
360
HIERARCHICAL BAYES (HB) METHOD
𝜃̂ij is the direct estimator of 𝜃ij . The sampling errors eij are assumed to be N(0, 𝜓ij ) 1∕2 with known coefficient of variation 𝜙ij 𝜓ij ∕𝜃ij . In the case of NAAL, “smoothed” estimates 𝜙̃ ij of 𝜙ij were obtained from the direct estimates 𝜙̂ ij and then used as proxies for the true 𝜙ij . Smoothed estimates 𝜙̃ ij are obtained in two steps. In step 1, predicted values 𝜃̃ij of 𝜃ij were obtained by fitting the model logit(𝜃̂ij ) 𝛾0 + 𝛾1 zij1 + · · · + 𝛾4 zij4 + 𝜀ij , where 𝜀ij ∼ N(0, 𝜎𝜀2 ) and zij1 , , zij4 are county-level covariates (see Mohadjer et al. 2012 for details). In the second step, predicted (smoothed) values 𝜙̃ ij of 𝜙ij are obtained by fitting the model log(𝜙̂ 2ij ) 𝜂0 + 𝜂1 log(𝜃̃ij ) + 𝜂2 log(1 𝜃̃ij ) + 𝜂3 log(nij ) + 𝜏ij , where 𝜏ij ∼ N(0, 𝜎𝜏2 ) and nij is the sample sizes in jth county in state i. The smoothed values 𝜙̃ ij are then treated as the true values 𝜙ij , leading to 𝜓ij 𝜙̃ 2ij 𝜃ij2 . The unmatched linking model is given by logit(𝜃ij ) zTij 𝜷 + 𝑣i + uij , where iid
iid
𝑣i ∼ N(0, 𝜎𝑣2 ) is the ith state random effect, uij ∼ N(0, 𝜎u2 ) is the random effect associated with the jth county in the ith state, and zij is a set of predictor variables that includes zij1 , , zij4 used in the model for smoothing the direct estimates 𝜃̂ij . County-level data from the 2000 Census of Population was the primary source for selecting the predictor variables. The census data contains variables related to adult literacy skills, such as country of birth, education, age, and disabilities. Importance of selecting a proper linking model is magnified for NAAL because the NAAL sample contains only about 10% of the U.S. counties. HB estimation of the county proportions, 𝜃ij , was implemented by assuming a flat prior for 𝜷 and gamma priors G(a, b) for 𝜎𝑣 2 and 𝜎u 2 with small values a 0.001 and b 0.001. Alternative noninformative priors were also examined and the resulting HB estimators were highly correlated with those obtained from the above choice, suggesting that the final estimates are not sensitive to the choice of priors. Let v and u denote the vectors of county and state effects 𝑣i and uij , respectively. Using L 3 independent runs, B 27, 000 MCMC samples (𝜷 (b) , v(b) , u(b) , 𝜎𝑣2(b) , 𝜎e2(b) ), b 1, , B, were generated from the joint posterior distribution of 𝜷, v and u, 𝜎𝑣2 , and 𝜎e2 . WinBUGS software (Lunn et al. 2000), based on M–H within Gibbs, was employed to generate the MCMC samples. Model-fitting methods, outlined in Section 10.2.6, were also implemented to assess the goodness of fit of the final model. The HB estimates (posterior means) for sampled counties are calculated as B
𝜃̂ijHB where 𝜃ij(b) is obtained from log(𝜃ij(b) )
1 ∑ (b) 𝜃 B b 1 ij zTij 𝜷 (b) + 𝑣(b) + u(b) . However, for nonsampled i ij
counties within a sampled state, the u(b) -values are not available. For a nonsample ij state, neither 𝑣(b) nor u(b) are available. In the former case, 𝜃ij(b) is calculated from i ij log(𝜃ij(b) )
zTij 𝜷 (b) + 𝑣(b) + u(b) , where u(b) is a random draw from N(0, 𝜎u2(b) ). i ij(RD) ij(RD)
Similarly, in the case of nonsampled state, 𝜃ij(b) is calculated from log(𝜃ij(b) )
zTij 𝜷 (b)
361
*UNMATCHED SAMPLING AND LINKING AREA LEVEL MODELS
+ 𝑣(b) + u(b) , where 𝑣(b) is a random draw from N(0, 𝜎𝑣2(b) ). A 95% posterior i(RD) ij(RD) i(RD) (credible) interval on 𝜃ij is calculated by determining the lower 2.5% and the upper 2.5% quantiles of the B values 𝜃ij(b) . ∑Ni W 𝜃 , where Ni is the number of counState proportions are given by 𝜃i⋅ j 1 ij ij ties in state i and Wij is the proportion of state i adult population in county j. The ∑Ni W 𝜃̂ HB . Credible intervals for 𝜃i corresponding HB estimate is given by 𝜃̂i⋅HB j 1 ij ij ∑ Ni are obtained from the B posterior values 𝜃i⋅(b) W 𝜃 (b) , b 1, , B. j 1 ij ij Example 10.4.3. Beta Sampling Model. Liu, Lahiri, and Kalton (2014) proposed a beta sampling model to estimate small area proportions, Pi , using design-weighted estimators, piw , based on stratified simple random sampling (SRS) within each area i. ∑n ∑Hi In particular, piw W p , where pih nih1 k ih1 yihk is the estimator of the h 1 ih ih proportion Pih associated with a binary variable yihk , Wih Nih ∕Ni is the stratum ∑H weight, h i 1 Nih Ni , and nih is the sample size in stratum h 1, , Hi within area i. The sampling model is given by ind
piw |Pi ∼ Beta(ai , bi ),
i
1,
(10.4.9)
, m,
where ( Pi
ai
ni deffiw
) 1 ,
( bi
(1
Pi )
ni deffiw
) 1
(10.4.10)
and deffiw is an approximation to the design effect associated with piw . In particular, assuming negligible sampling fractions fih nih ∕Nih and Pih (1 Pih ) ≈ Pi (1 Pi ), we have Hi ∑ Wih2 ∕nih , (10.4.11) deffiw ni h 1
where ni
∑Hi
n . h 1 ih
The resulting smoothed sampling variance 𝜓i is given by 𝜓i
Pi (1
Pi )∕ni deffiw ,
(10.4.12)
which is a function of the unknown Pi only because deffiw is known. The linking model is taken as ind
logit(Pi )|𝜷, 𝜎𝑣2 ∼ N(zTi 𝜷, 𝜎𝑣2 ).
(10.4.13)
HB inference on the proportions Pi is implemented by assuming a flat prior on 𝜷 and gamma prior on 𝜎𝑣 2 . MCMC samples from the joint posterior of (P1 , , Pm , 𝜷, 𝜎𝑣2 ) are generated using the M–H within Gibbs algorithm; Gibbs conditional of Pi does not have a closed-form expression. Liu et al. (2014) provide a WinBUGS code to implement HB inference.
362
HIERARCHICAL BAYES (HB) METHOD
Liu et al. (2014) also report design-based simulation results on the performance of the beta-logistic (BL) model relative to a normal-logistic (NL) model that assumes ind piw |Pi ∼ N(Pi , 𝜓i ), both using the sampling variances 𝜓i given by (10.4.12). The finite population studied consisted of records of live births weights (y) in 2002 in the 50 states of the United States and the District of Columbia (DC). The parameters of interest, Pi , are the state-level low birth weight rates, and the values of Pi ranged from 5% to 11% across the states. Simulation results indicated that the equal tail-area credible interval associated with the NL model leads to undercoverage, especially for small ni (< 30). The BL-based interval performed better in terms of coverage. , the NL model leads to larger bias but In terms of bias of the HB estimator, P̂ HB i smaller MSE. 10.5
BASIC UNIT LEVEL MODEL
In this section, we apply the HB approach to the basic unit level model 7.1.1 with equal error variances (that is, kij 1 for all i and j), assuming a prior distribution on the model parameters (𝜷, 𝜎𝑣2 , 𝜎e2 ). We first consider the case of known 𝜎𝑣2 and 𝜎e2 , and assume a “flat” prior on 𝜷: f (𝜷) ∝ 1. We rewrite 7.1.1 as an HB model: ind
(i) yij |𝜷, 𝑣i , 𝜎e2 ∼ N(xTij 𝜷 + 𝑣i , 𝜎e2 ), iid
(ii) 𝑣i |𝜎𝑣2 ∼ N(0, 𝜎𝑣2 ),
i
1,
j
1,
, ni ,
i
1,
, m,
, m,
(iii) f (𝜷) ∝ 1.
(10.5.1)
We then extend the results to the case of unknown 𝜎𝑣2 and 𝜎e2 by replacing (iii) in (10.5.1) by (iii)′ f (𝜷, 𝜎𝑣2 , 𝜎e2 )
f (𝜷)f (𝜎𝑣2 )f (𝜎e2 ) ∝ f (𝜎𝑣2 )f (𝜎e2 ),
(10.5.2)
where f (𝜎𝑣2 ) and f (𝜎e2 ) are the priors on 𝜎𝑣2 and 𝜎e2 . For simplicity, we take 𝜇i XTi 𝜷 + 𝑣i as the ith small area mean, assuming that the area population size, Ni , is large. 10.5.1
Known 𝝈𝒗2 and 𝝈e2
When 𝜎𝑣2 and 𝜎e2 are assumed to be known, the HB and BLUP approaches under normality lead to identical point estimates and measures of variability, assuming a flat prior on 𝜷. This result, in fact, is valid for general linear mixed models with known variance parameters. The HB estimator of 𝜇i is given by 𝜇̃ iHB (𝜎𝑣2 , 𝜎e2 )
E(𝜇i |y, 𝜎𝑣2 , 𝜎e2 )
𝜇̃ iH ,
(10.5.3)
where y is the vector of sample observations and 𝜇̃ iH is the BLUP estimator given by (7.1.6). Similarly, the posterior variance of 𝜇i is
363
BASIC UNIT LEVEL MODEL
V(𝜇i |𝜎𝑣2 , 𝜎e2 , y)
M1i (𝜎𝑣2 , 𝜎e2 )
MSE(𝜇̃ iH ),
(10.5.4)
where M1i (𝜎𝑣2 , 𝜎e2 ) is given by (7.1.12). 10.5.2
Unknown 𝝈𝒗2 and 𝝈e2 : Numerical Integration
In practice, 𝜎𝑣2 and 𝜎e2 are unknown and it is necessary to take account of the uncertainty about 𝜎𝑣2 and 𝜎e2 by assuming a prior on 𝜎𝑣2 and 𝜎e2 . The HB model is given by (i) and (ii) of (10.5.1) and (iii)′ given by (10.5.2). We obtain the HB estimator of 𝜇i and the posterior variance of 𝜇i as 𝜇̂ iHB and
V(𝜇i |y)
E(𝜇i |y)
E𝜎 2 ,𝜎 2 𝜇̃ iHB (𝜎𝑣2 , 𝜎e2 ) 𝑣
e
E𝜎 2 ,𝜎 2 M1i (𝜎𝑣2 , 𝜎e2 ) + V𝜎 2 ,𝜎 2 𝜇̃ iHB (𝜎𝑣2 , 𝜎e2 ) , 𝑣
e
𝑣
e
(10.5.5)
(10.5.6)
where E𝜎 2 ,𝜎 2 and V𝜎 2 ,𝜎 2 , respectively, denote the expectation and variance with 𝑣 e 𝑣 e respect to the posterior distribution f (𝜎𝑣2 , 𝜎e2 |y). As in Section 10.3, the posterior f (𝜎𝑣2 , 𝜎e2 |y) may be obtained from the restricted likelihood function LR (𝜎𝑣2 , 𝜎e2 ) as f (𝜎𝑣2 , 𝜎e2 |y) ∝ LR (𝜎𝑣2 , 𝜎e2 )f (𝜎𝑣2 )f (𝜎e2 ).
(10.5.7)
Under flat priors f (𝜎𝑣2 ) ∝ 1 and f (𝜎e2 ) ∝ 1, the posterior f (𝜎𝑣2 , 𝜎e2 |y) is proper (subject to a mild sample size restriction) and proportional to LR (𝜎𝑣2 , 𝜎e2 ). Evaluation of the posterior mean (10.5.5) and the posterior variance (10.5.6), using f (𝜎𝑣2 , 𝜎e2 |y) ∝ LR (𝜎𝑣2 , 𝜎e2 ), involves two-dimensional integrations. If we assume a diffuse gamma prior on 𝜎e 2 , that is, 𝜎e 2 ∼ G(ae , be ) with ae ≥ 0 and be > 0, then it is possible to integrate out 𝜎e2 with respect to f (𝜎e2 |𝜏𝑣 , y), where 𝜏𝑣 𝜎𝑣2 ∕𝜎e2 . The evaluation of (10.5.5) and (10.5.6) is now reduced to single-dimensional integration with respect to the posterior of 𝜏𝑣 , that is, f (𝜏𝑣 |y). Datta and Ghosh (1991) expressed f (𝜏𝑣 |y) as f (𝜏𝑣 |y) ∝ h(𝜏𝑣 ) and obtained an explicit expression for h(𝜏𝑣 ), assuming a gamma prior on 𝜏𝑣 1 , that is, 𝜏𝑣 1 ∼ G(a𝑣 , b𝑣 ) with a𝑣 ≥ 0 and b𝑣 > 0; note that a𝑣 is the shape parameter and b𝑣 is the scale parameter. Datta and Ghosh (1991) applied the numerical integration method to the data on county crop areas (Example 7.3.1) studied by Battese, Harter, and Fuller (1988). They calculated the HB estimate of mean hectares of soybeans and associated standard error (square root of posterior variance) for each of the m 12 counties in north-central Iowa, using flat priors on 𝜷 and gamma priors on 𝜎e 2 and 𝜏𝑣 1 with ae a𝑣 0 and be b𝑣 0.005 to reflect lack of prior information. Datta and Ghosh (1991) actually studied the more complex case of finite population means Y i instead of the means 𝜇i , but the sampling fractions ni ∕Ni are negligible in Example 7.3.1, so that Y i ≈ 𝜇i .
364
10.5.3
HIERARCHICAL BAYES (HB) METHOD
Unknown 𝝈𝒗2 and 𝝈e2 : Gibbs Sampling
In this section, we apply Gibbs sampling to the basic unit level model given by (i) and (ii) of (10.5.1), assuming the prior (10.5.2) on 𝜷, 𝜎𝑣2 , and 𝜎e2 with 𝜎𝑣 2 ∼ G(a𝑣 , b𝑣 ), a𝑣 ≥ 0, b𝑣 > 0 and 𝜎e 2 ∼ G(ae , be ), ae ≥ 0, be > 0. It is easy to verify that the Gibbs conditionals are given by )1 m n (m n ⎡ ∑ i i ∑ ∑∑ (i) 𝛽|v, 𝜎𝑣2 , 𝜎e2 , y ∼ Np⎢ xij xTij xij (yij ⎢ i 1j 1 i 1j 1 ⎣ (ii) 𝑣i |𝛽, 𝜎𝑣2 , 𝜎e2 , y ∼ N 𝛾i (yi
xTi 𝛽), g1i (𝜎𝑣2 , 𝜎e2 )
(m n i ∑∑ 𝑣i ), 𝜎e2 i 1 j
𝛾i 𝜎e2 ∕ni , i
, m, (10.5.9)
]
[
m ni 1 ∑∑ n 2 2 (y xTij 𝛽 + ae , (iii) 𝜎e |𝛽, v, 𝜎𝑣 , y ∼ G 2 2 i 1 j 1 ij ) ( m 1∑ 2 m 2 2 (iv) 𝜎𝑣 |𝛽, v, 𝜎e , y ∼ G 𝑣 + b𝑣 , + a𝑣 , 2 2 i 1 i
1,
)1 ⎤ xij xTij ⎥ , ⎥ 1 ⎦ (10.5.8)
𝑣i )2 + be ,
(10.5.10)
(10.5.11)
∑m ∑ni ∑ni where n (𝑣1 , , 𝑣m )T , yi y ∕n , xi x ∕n , and 𝛾i i 1 ni , v j 1 ij i j 1 ij i 𝜎𝑣2 ∕(𝜎𝑣2 + 𝜎e2 ∕ni ). The proof of (10.5.8)–(10.5.11) follows along the lines of the proof of (10.3.13)–(10.3.15) given in Section 10.18.3. Note that all the Gibbs conditionals have closed forms and hence the MCMC samples can be generated directly from (i)–(iv). WinBUGS can be used to generate samples from the above conditionals using either default inverted gamma priors on 𝜎𝑣2 and 𝜎e2 (with ae be 0.001 and a𝑣 b𝑣 0.001) or with specified values of ae , be , a𝑣 , and b𝑣 . CODA can be used to perform convergence diagnostics. Denote the MCMC samples from a single large run by 𝜷 (k) , v(k) , 𝜎𝑣2(k) , 𝜎e2(k) , k d + 1, , d + D . The marginal MCMC samples 𝜷 (k) , v(k) can be used directly to estimate the posterior mean of 𝜇i as d+D
𝜇̂ iHB
where 𝜇i(k)
1 ∑ (k) 𝜇 D k d+1 i
∶ 𝜇i(⋅) ,
(10.5.12)
XTi 𝜷 (k) + 𝑣(k) . Similarly, the posterior variance of 𝜇i is estimated as i
V(𝜇i |y)
d+D
∑
1 D
1k
d+1
(𝜇i(k)
𝜇i(⋅) )2 .
(10.5.13)
365
BASIC UNIT LEVEL MODEL
Alternatively, Rao–Blackwell estimators of the posterior mean and the posterior variance of 𝜇i may be used along the lines of (10.3.16) and (10.3.17): d+D ( ) 1 ∑ HB 2(k) 2(k) 𝜎𝑣 , 𝜎e ∶ 𝜇̃ iHB (⋅, ⋅) 𝜇̃ i D k d+1
𝜇̂ iHB
(10.5.14)
and V(𝜇i |y)
d+D [ ( ) ( )] 1 ∑ g1i 𝜎𝑣2(k) , 𝜎e2(k) + g2i 𝜎𝑣2(k) , 𝜎e2(k) D k d+1
+ D
( ) ∑ [ 𝜇̃ iHB 𝜎𝑣2(k) , 𝜎e2(k)
d+D
1 1k
]2 𝜇̃ iHB (⋅, ⋅) .
(10.5.15)
d+1
Example 10.5.1. County Crop Areas. The Gibbs sampling method was applied to the data on area under corn for each of m 12 counties in north-central Iowa, excluding the second sample segment in Hardin county (see Example 7.3.1). We fitted the nested error linear regression model, yij 𝛽0 + 𝛽1 x1ij + 𝛽2 x2ij + 𝑣i + eij , for corn (yij ) using the LANDSAT satellite readings of corn (x1ij ) and soybeans (x2ij ); see Example 7.3.1 for details. We used the BUGS program with default flat prior on 𝜷 and gamma priors on 𝜎𝑣 2 and 𝜎e 2 to generate samples from the joint posterior distribution of (𝜷, v, 𝜎𝑣2 , 𝜎e2 ). We generated a single long run of length D 5, 000 after discarding the first d 5, 000 “burn-in” iterations. We used CODA to implement convergence diagnostics and statistical and output analyses of the simulated samples. To check the overall fit of the model, the posterior predictive value, pp, was estimated,∑using the formula (10.2.19) with measure of discrepancy given by m ∑ni (y 𝛽0 𝛽1 x1ij 𝛽2 x2ij )2 ∕(𝜎𝑣2 + 𝜎e2 ). The estimated p-value, T(y, 𝜼) i 1 j 1 ij p 0.51, indicates a good fit of the model. We also fitted the simpler model yij 𝛽0 + 𝛽1 x1ij + 𝑣i + eij , using only the corn (x1ij ) satellite reading as an auxiliary variable. The resulting posterior predictive p-value, p̂p 0.497, indicates that the simpler model also fits the data well. We also compared the CPO plots for the two models using the formula (10.2.26). The CPO plots indicate that the full model with x1 and x2 is slightly better than the simpler model with x1 only. Table 10.3 gives the EBLUP estimates and associated standard errors (taken from Battese, Harter, and Fuller 1988), and the HB estimates and associated standard errors. It is clear from Table 10.3 that the EBLUP and HB estimates are similar. Standard errors are also similar except for Kossuth county (with ni 5) where the HB standard error (square root of posterior variance) is about 20% larger than the corresponding EBLUP standard error; in fact, the HB standard error is slightly larger than the corresponding sample regression (SR) standard error reported in Table 10.3. Reasons for this exception are not clear. 10.5.4
Pseudo-HB Estimation
H , of the small area In Section 7.6.2, we obtained a pseudo-EBLUP estimator, 𝜇̂ iw mean 𝜇i that makes use of survey weights 𝑤ij , unlike the EBLUP estimator 𝜇̂ iH .
366
HIERARCHICAL BAYES (HB) METHOD
TABLE 10.3 EBLUP and HB Estimates and Associated Standard Errors: County Corn Areas Estimate County
ni
EBLUP
Cerro Gordo Hamilton Worth Humboldt Franklin Pocahontas Winnebago Wright Webster Hancock Kossuth Hardin
1 1 1 2 3 3 3 3 4 5 5 5
122.2 126.3 106.2 108.0 145.0 112.6 112.4 122.1 115.8 124.3 106.3 143.6
Standard Error HB 122.2 126.1 108.1 109.5 142.8 111.2 113.8 122.0 114.5 124.8 108.4 142.2
EBLUP 9.6 9.5 9.3 8.1 6.5 6.6 6.6 6.7 5.8 5.3 5.2 5.7
HB 8.9 8.7 9.8 8.1 7.3 6.5 6.5 6.2 6.1 5.2 6.3 6.0
Source: Adapted from You and Rao (2002a) and You and Rao (2003).
H satisfies the benchmarking property without any adjustment, unlike 𝜇̂ H . Also, 𝜇̂ iw i HB , that is analogous to the In this section, we obtain a pseudo-HB estimator, 𝜇̂ iw H . The estimator 𝜇̂ HB also makes use of the survey pseudo-EBLUP estimator 𝜇̂ iw iw weights and satisfies the benchmarking property without any adjustment. We make use of the unit level MCMC samples 𝜎𝑣2(k) , 𝜎e2(k) generated from (i) and (ii) of (10.5.1) and the prior (10.5.2) with 𝜎𝑣 2 ∼ G(a𝑣 , b𝑣 ) and 𝜎e 2 ∼ G(ae , be ), but replace 𝜷 (k) by 𝜷 (k) 𝑤 that makes use of the design-weighted estimator 𝜷̃ 𝑤 (𝜎𝑣2 , 𝜎e2 ) 𝜷̃ 𝑤 given by (7.6.7). Now, assuming a flat prior f (𝜷) ∝ 1 and noting that 𝜷̃ 𝑤 |𝜷, 𝜎𝑣2 , 𝜎e2 ∼ Np (𝜷, Φ𝑤 ), we get the posterior distribution 𝜷|𝜷̃ 𝑤 , 𝜎𝑣2 , 𝜎e2 ∼ N(𝜷̃ 𝑤 , Φ𝑤 ), where Φ𝑤 Φ𝑤 (𝜎𝑣2 , 𝜎e2 ) is given by (7.6.14). We (k) 𝜷̃ 𝑤 (𝜎𝑣2(k) , 𝜎e2(k) ) and Φ𝑤 (𝜎𝑣2(k) , 𝜎e2(k) ) using 𝜎𝑣2(k) and 𝜎e2(k) and then calculate 𝜷̃ 𝑤 (k) (k) generate 𝜷 𝑤 from Np (𝜷̃ 𝑤 , Φ(k) 𝑤 ). Under the survey-weighted model 7.6.3, the conditional posterior mean H H (𝜷, 𝜎 2 , 𝜎 2 ) given by 𝜇̃ iw E(𝜇i |yiw , 𝜷, 𝜎𝑣2 , 𝜎e2 ) is identical to the BLUP estimator 𝜇̃ iw 𝑣 e ∑ni ∑ni ∑ni ̃ (7.6.2), where yiw 𝑤 y ∕ 𝑤 y . Similarly, the conditional 𝑤 ij ij ij ij ij j 1 j 1 j 1 posterior variance V(𝜇i |yiw , 𝜷, 𝜎𝑣2 , 𝜎e2 ) is equal to g1iw (𝜎𝑣2 , 𝜎e2 ) given by (7.6.11). Now , d + D , we using the generated MCMC samples 𝜷 (k) , 𝜎𝑣2(k) , 𝜎e2(k) ; k d + 1, get a pseudo-HB estimator of 𝜇i as
HB 𝜇̂ iw
d+D ( ) 1 ∑ 𝜇̃ iw 𝜷 (k) , 𝜎𝑣2(k) , 𝜎e2(k) D k d+1 (⋅) XTi 𝜷 (⋅) 𝑤 + 𝑣̃iw ,
(10.5.16)
367
BASIC UNIT LEVEL MODEL
∑d+D (k) (⋅) (k) 2(k) 2(k) where 𝜷 (⋅) 𝑤 k d+1 𝜷 𝑤 ∕D, 𝑣̃ iw is the average of 𝑣̃ iw (𝜷 , 𝜎𝑣 , 𝜎e ) over k, 2 2 and 𝑣̃iw (𝜷, 𝜎𝑣 , 𝜎e ) is given by (7.6.5). The pseudo-HB estimator (10.5.16) is design-consistent as ni increases. ∑m HB benchmarks to the direct survey It is easy to verify that i 1 Ni 𝜇̂ iw (⋅) ̂ 𝑤 )T 𝜷 𝑤 , where Ŷ 𝑤 ∑m ∑ni 𝑤̃ ij yij and regression estimator Ŷ 𝑤 + (X X i 1 j 1 ∑m ∑Ni ∑m ∑ni ̂ ̃ x are the direct estimators of the overall totals Y y 𝑤 X𝑤 ij ij i 1 i 1 j 1 j 1 ij ∑m ∑Ni and X+ x , respectively. Note that the direct survey regression estimai 1 j 1 ij tor here differs from the estimator in Section 7.6.2. The latter uses 𝜷̂ 𝑤 𝜷̃ 𝑤 (𝜎̂ 𝑣2 , 𝜎̂ e2 ), but the difference between 𝜷̂ 𝑤 and 𝜷 (⋅) 𝑤 should be very small. A pseudo-posterior variance of 𝜇i is obtained as V̂ PHB (𝜇i )
d+D ( ) 1 ∑ g1iw 𝜎𝑣2(k) , 𝜎e2(k) D k d+1
+ D
) ∑ [ ( H 𝜇̃ iw 𝜷 (k) , 𝜎𝑣2(k) , 𝜎e2(k)
d+D
1 1k
]2 HB 𝜇̂ iw .
(10.5.17)
d+1
The last term in (10.5.17) accounts for the uncertainty associated with 𝜷, 𝜎𝑣2 , and 𝜎e2 . We refer the reader to You and Rao (2003) for details of the pseudo-HB method. Example 10.5.2. County Corn Areas. In Example 7.3.1, we applied the pseudo-EBLUP method to county corn area data from Battese et al. (1988), assuming simple random sampling within areas, that is, 𝑤ij 𝑤i Ni ∕ni . We generated 2(k) 2(k) MCMC samples 𝜷 (k) from this data set, using diffuse priors on 𝜷, 𝑤 , 𝜎𝑣 , 𝜎e 2 2 𝜎𝑣 , and 𝜎e . Table 10.4 compares the results from the pseudo-HB method to those TABLE 10.4 Pseudo-HB and Pseudo-EBLUP Estimates and Associated Standard Errors: County Corn Areas
County
ni
Cerro Gordo Hamilton Worth Humboldt Franklin Pocahontas Winnebago Wright Webster Hancock Kossuth Hardin
1 1 1 2 3 3 3 3 4 5 5 5
Estimate PseudoPseudoHB EBLUP 120.6 125.2 107.5 108.4 142.5 110.6 113.2 121.1 114.2 124.8 108.0 142.3
120.5 125.2 106.4 107.4 143.7 111.5 112.1 121.3 115.0 124.5 106.6 143.5
Source: Adapted from Table 1 in You and Rao (2003).
Standard Error PseudoPseudoHB EBLUP 9.3 9.4 10.2 8.2 7.4 7.0 7.0 6.4 6.4 5.4 6.6 6.1
9.9 9.7 9.6 8.3 6.6 6.6 6.6 6.8 5.8 5.4 5.3 5.8
368
HIERARCHICAL BAYES (HB) METHOD
from the pseudo-EBLUP method (Table 10.3). It is clear from Table 10.4 that the pseudo-HB and the pseudo-EBLUP estimates are very similar. Standard errors are also similar except for Kossuth county (with ni 5), where the pseudo-HB standard error is significantly larger than the corresponding pseudo-EBLUP standard error, similar to the EBLUP and HB standard errors reported in Table 10.3. We assumed that the basic unit level model also holds for the sample , m in developing the pseudo-HB estimator (10.5.16) and (yij , xij ); j ∈ si , i 1, the pseudo-posterior variance (10.5.17). If the model holds for the sample, then the HB estimator (without weights) is optimal in the sense of providing the smallest posterior variance, but it is not design-consistent. As noted in Section 7.6.2, survey practitioners prefer design-consistent estimators as a form of insurance, and the sample size, ni , could be moderately large for some of the areas under consideration, in which case design-consistency becomes relevant.
10.6
GENERAL ANOVA MODEL
In this section, we apply the HB approach to the general ANOVA model 5.2.15, assuming a prior distribution on the model parameters (𝜷, 𝜹), where , 𝜎r2 )T is the vector of variance components. In particular, we assume 𝜹 (𝜎02 , ∏ that f (𝜷, 𝜹) f (𝜷) ri 0 f (𝜎i2 ) with f (𝜷) ∝ 1, f (𝜎i2 ) ∝ (𝜎i2 ) (ai +1) , i 1, , r and f (𝜎02 ) ∝ (𝜎02 ) (b+1) for specified values ai and b. Letting 𝜎02 𝜎e2 , the HB model may be written as ( ) r ∑ 2 2 Zi vi , 𝜎e I , (i) y|v, 𝜎e , 𝜷, ∼ N X𝜷 + i 1
(ii) vi |𝜎12 , (iii) f (𝜷) ∝ 1,
ind , 𝜎r2 ∼ Nhi (𝟎, 𝜎i2 Ihi ),
f (𝜎i2 ) ∝ (𝜎i2 )
(ai +1)
i ,
1,
, r,
f (𝜎e2 ) ∝ (𝜎e2 )
(b+1)
.
(10.6.1)
Under model (10.6.1), Hobert and Casella (1996) derived Gibbs conditionals and showed that the Gibbs conditionals are all proper if 2ai > hi for all i and 2b > n. These conditions are satisfied for diffuse priors with small values of ai and b. However, propriety of the conditionals does not imply propriety of the joint posterior , ar , b) simultaneously yield f (𝜷, 𝜹, v|y). In fact, many values of the vector (a1 , proper conditionals and an improper joint posterior. It is therefore important to verify that the chosen improper joint prior yields a proper joint posterior before proceeding with Gibbs sampling. Hobert and Casella (1996) derived conditions on the constants , ar , b) that ensure propriety of the joint posterior. Theorem 10.6.1 gives these (a1 , conditions. ∑m Theorem 10.6.1. Let t rank(PX Z) rank(ZT PX Z) ≤ h, where h i 1 hi , Z , Zr ) and PX In X(XT X) 1 XT . Consider the following two cases: (1) t (Z1 , h or r 1; (2) t < h and r > 1. For case 1, the following conditions are necessary
*HB ESTIMATION OF GENERAL FINITE POPULATION PARAMETERS
369
and sufficient for the propriety of the joint posterior: (a) ai < 0, (b) hi > h t 2ai , ∑ (c) n + 2 m p > 0. For case 2, conditions (a)–(c) are sufficient for the i 1 ai + 2b property of the joint posterior while necessary conditions result when (b) is replaced by (b′ ) hi > 2ai . The proof of Theorem 10.6.1 is very technical and we refer the reader to Hobert and Casella (1996) for details of the proof. If we choose ai 1 for all i and b 1, we get uniform (flat) priors on the variance components, that is, f (𝜎i2 ) ∝ 1 and f (𝜎e2 ) ∝ 1. This choice typically yields a proper joint posterior in practical applications, but not always. For example, for the balanced onefold random effects model, yij 𝜇 + 𝑣i + eij , j 1, , n, i 1, , m, a flat prior on 𝜎𝑣2 ( 𝜎12 ) violates condition (b) if m 3, noting that r 1, hi h 3, t rank(ZT PX Z) rank(I3 𝟏3 𝟏T3 ∕3) 2, and ai 1. As a result, the joint posterior is improper. PROC MIXED in SAS, Version 8.0, implements the HB method for the general ANOVA model. It uses flat priors on the variance components 𝜎e2 , 𝜎12 , , 𝜎r2 , and the regression parameters 𝜷. PRIOR option in SAS generates MCMC samples from the joint posterior of variance components and regression parameters, while RANDOM option generates samples from the marginal posterior of random effects v1 , , vr . The ANOVA model (10.6.1) covers the onefold and twofold nested error regression models as special cases, but not the two-level model (8.8.1), which is a special case of the general linear mixed model with a block-diagonal covariance structure (see 5.3.1). In Section 10.8, we apply the HB method to the two-level model (8.8.1). 10.7 *HB ESTIMATION OF GENERAL FINITE POPULATION PARAMETERS The HB approach is a good alternative to the EB method described in Section 9.4 for the estimation of general finite population quantities because it avoids the use of the bootstrap for MSE estimation. Note that the EB estimator of Section 9.4 is approximated empirically by Monte Carlo simulation. This is done by generating many nonsample vectors, attaching to each of them the sample data to get the full census vectors, and then calculating the simulated census parameter in each Monte Carlo replicate. Many censuses have to be generated in that way to get an accurate Monte Carlo approximation. Moreover, in the parametric bootstrap described in Section 9.4.4, each bootstrap replicate consists of generating a census from the fitted model. Then, for each bootstrap replicate, the sample part of the census is extracted and, from that sample, the Monte Carlo approximation of the EB estimator is obtained, which implies generating again many Monte Carlo censuses from each bootstrap sample as described earlier. This approach might be computationally unfeasible for very large populations or very complex parameters such as some poverty indicators that require sorting all census values. On the other hand, HB approaches provide approximations to the whole posterior distribution of the parameter of interest, and any summary of the posterior distribution can be directly obtained. In particular, the posterior variance
370
HIERARCHICAL BAYES (HB) METHOD
is used as measure of uncertainty of the HB estimator, and credible intervals can also be obtained without practically any additional effort. 10.7.1
HB Estimator under a Finite Population
Consider the general finite population parameter 𝜏 h(yP ), where yP (yTs , yTr )T is the (random) vector with the values of the study variable in the sampled and nonsampled units of the population, and h(⋅) is a given measurable function. HB models typically assume a density f (yP |𝝀) for the population vector yP given the vector of parameters 𝝀, and then a prior distribution f (𝝀) for 𝝀. The posterior distribution is then given by f (ys |𝝀)f (𝝀) , (10.7.1) f (𝝀|ys ) ∫ f (ys |𝝀)f (𝝀)d𝝀 The HB estimator of 𝜏
h(ys , yr ), under squared loss, is the posterior mean given by
𝜏̂ HB
E(𝜏|ys )
∫
(10.7.2)
h(ys , yr )f (yr |ys )dyr ,
where the predictive density is given by f (yr |ys )
∫
(10.7.3)
f (yr |𝝀)f (𝝀|ys )d𝝀.
Monte Carlo methods based on the generation of random values of 𝝀 from the posterior distribution (10.7.1), and of random values of yr from the predictive distribution (10.7.3), can be used to approximate the HB estimator (10.7.2). 10.7.2
Reparameterized Basic Unit Level Model
To avoid the use of MCMC methods, Molina, Nandram, and Rao (2014) considered a reparameterization of the basic unit level model (4.3.1), by using the intraclass correlation coefficient 𝜌 𝜎𝑣2 ∕(𝜎𝑣2 + 𝜎e2 ). They considered the following HB population model: ind
yij |𝑣i , 𝜷, 𝜎e2 ∼ N(xTij 𝜷, 𝜎e2 kij2 ), ( ) 𝜌 2 ind 2 𝑣i |𝜌, 𝜎e ∼ N 0, 𝜎 , 1 𝜌 e f (𝜷, 𝜌, 𝜎e2 ) ∝ 𝜎e 2 ,
𝜖≤𝜌≤1
j
1,
, Ni ,
i
1,
, M,
𝜖,
i
1,
, M,
(10.7.4)
where 𝜖 > 0 is a small number. , m, where A sample si of size ni < Ni is drawn from each area i, for i 1, , yTms )T be the last M m areas in the population are not sampled. Let ys (yT1s , the vector containing the sample data from the m sampled areas. Then, assuming that the population model (10.7.4) holds also for the sample, the posterior distribution of
371
*HB ESTIMATION OF GENERAL FINITE POPULATION PARAMETERS
(vT , 𝜷 T , 𝜎e2 , 𝜌)T , where v
the vector of parameters 𝝀 of area effects, is given by f (𝝀|ys )
, 𝑣m )T is the vector
(𝑣1 ,
( ) ( ) ( ) ( ) f1 v|𝜷, 𝜎e2 , 𝜌, ys f2 𝜷|𝜎e2 , 𝜌, ys f3 𝜎e2 |𝜌, ys f4 𝜌|ys
(10.7.5)
The conditional densities f1 , f2 , and f3 appearing in (10.7.5) all have simple forms, given by { ind 𝑣i |𝜷, 𝜎e2 , 𝜌, ys ∼
∼N
} ′
xi 𝜷), 1
𝝀i (𝜌)(yi
𝝀i (𝜌)
𝜌 1
𝜎e2
𝜌
(10.7.6)
,
̂ 𝜎e2 Q 1 (𝜌) , 𝜷|𝜎e2 , 𝜌, ys ∼ N 𝜷(𝜌), [ ] n p 𝛾(𝜌) 𝜎e 2 |𝜌, ys ∼ Gamma , , 2 2 where 𝝀i (𝜌)
ai⋅ ai⋅ + (1 m ∑ ∑
Q(𝜌)
1,
𝜌)∕𝜌
kij
1∕2
xi )(xij
(xij
(10.7.8) ̂ , m, and 𝜷(𝜌)
1,
i
(10.7.7)
x i )′ +
m
∑∑ p(𝜌)
kij
1∕2
xi )(yij
(xij
yi ) +
m
𝜌∑
1 𝜌
i 1 j∈si
′
𝝀i (𝜌) xi xi ,
i 1 m
𝜌∑
1 𝜌
i 1 j∈si
Q 1 (𝜌)p(𝜌), with
𝝀i (𝜌)xi yi
i 1
and m ∑ ∑
kij
𝛾(𝜌)
1∕2 [
yij
yi
(xij
̂ xi )′ 𝜷(𝜌)
]2
i 1 j∈si
+
m
𝜌∑
1 𝜌
[ 𝝀i (𝜌) yi
]2 ′̂ xi 𝜷(𝜌) .
i 1
Thus, random values of 𝑣i , 𝜷, and 𝜎e2 can be easily drawn from (10.7.6), (10.7.7), and (10.7.8). Unfortunately, f4 does not have a simple form: ( f4 (𝜌|ys ) ∝
1
𝜌 𝜌
)m∕2
m ∏
|Q(𝜌)|
1∕2
𝛾(𝜌)
(n p)∕2
1∕2
𝝀i (𝜌), i 1
𝜖≤𝜌≤1
𝜖.
(10.7.9) However, since 𝜌 is in the bounded interval 𝜖, 1 𝜖 , random values can be generated from (10.7.9) using a grid method. Thus, the considered reparameterization of the basic unit level model in terms of 𝜌, together with the considered priors, allows us to draw random values of 𝝀 from the posterior (10.7.5) avoiding the use of MCMC methods in this particular model.
372
HIERARCHICAL BAYES (HB) METHOD
10.7.3
HB Estimator of a General Area Parameter
In this section, we confine to the estimation of a general area parameter 𝜏i h(yPi ), where yPi (yTis , yTir )T is the vector with the values of the response variable for the sampled and nonsampled units from area i. By the HB model (10.7.4), given the vector of parameters 𝝀 that includes area effects, responses yij , i ∈ ri for nonsampled units are independent of sample responses ys , with ind
yij |𝝀 ∼ ∼ N(xTij 𝜷 + 𝑣i , 𝜎e2 kij2 ),
j ∈ ri ,
i
1,
(10.7.10)
, M.
Then, the posterior predictive density of yir is given by ∏ f (yir |ys )
∫
f (yij |𝝀)f (𝝀|ys )d𝝀. i∈ri
The HB estimator of the target parameter 𝜏i h(yPi ) is then given by the posterior mean 𝜏̂iHB E(𝜏i |ys ) h(yis , yir )f (yir |ys )dyir . (10.7.11) ∫ A Monte Carlo approximation of (10.7.11) is obtained by first generating samples from the posterior f (𝝀|ys ). For this, we first draw 𝜌 from f4 (𝜌|ys ), then 𝜎e2 from f3 (𝜎e2 |𝜌, ys ), then 𝜷 from f2 (𝜷|𝜎e2 , 𝜌, ys ), and finally v from f1 (v|𝜷, 𝜎e2 , 𝜌, ys ). We can repeat this procedure a large number, A, of times to get a random sample 𝝀(a) , a , A, from f (𝝀|ys ), we 1, , A, from f (𝝀|ys ). Then, for each generated 𝝀(a) , a 1, , j ∈ r , i 1, , m, from (10.7.10). Thus, for each samdraw nonsample values y(a) i ij pled area i 1, , m, we have generated a nonsample vector y(a) y(a) , i ∈ ri ir ij and we have also the sample data yis available. Attaching both vectors, we con(yTis , (y(a) )T )T . For areas i with zero sample size, struct the full census vector yP(a) i ir i m + 1, , M, the whole vector yP(a) y(a) is generated from (10.7.10), since i ir , we compute the area parameter 𝜏i(a) h(yP(a) ), i in that case ri Pi . Using yP(a) i i (a) , A, from the poste1, , M. In this way, we obtain a random sample 𝜏i , a 1, rior density of the target parameter 𝜏i . Finally, the HB estimator 𝜏̂iHB and its posterior variance are approximated as A
𝜏̂iHB
E(𝜏i |ys ) ≈
1 ∑ (a) 𝜏 , Aa 1 i
V(𝜏i |ys ) ≈
A ( 1 ∑ (a) 𝜏 Aa 1 i
𝜏̂iHB
)2 .
(10.7.12)
Other useful posterior summaries, such as credible intervals, can be computed in a straightforward manner. Similarly, as described at the end of Section 9.4.2 for the EB method, an HB approach can be implemented analogously to the fast EB approach introduced in Ferretti and Molina (2012). For this, from each Monte Carlo population vector yP(a) i we draw a sample s(a) and, with this sample, we obtain a design-based estimator i
373
*HB ESTIMATION OF GENERAL FINITE POPULATION PARAMETERS
∑ DB(a) 𝜏̂iDB(a) of 𝜏i(a) . Then, the fast HB estimator is given by 𝜏̂iFHB H 1 H , and h 1 𝜏̂i ∑ H DB(a) FHB 1 𝜏̂i )2 . its posterior variance can be approximated similarly by H h 1 (𝜏̂i In the particular case of estimating the FGT poverty measure 𝜏i F𝛼i , from the (yTis , (y(a) )T )T , we calculate census vector yP(a) i ir 𝛼
(a) F𝛼i
(a) )𝛼 ⎡ ( ⎤ ∑ ⎛ z Eij ⎞ 1 ⎢∑ z Eij ⎜ ⎟ I(E(a) < z)⎥ , I(Eij < z) + ij ⎜ z ⎟ ⎥ Ni ⎢ j∈s z i∈ri ⎝ ⎣ i ⎦ ⎠
(10.7.13)
), i ∈ ri , i 1, , M, for the selected where Eij T 1 (yij ), j ∈ si and Eij(a) T 1 (y(a) ij transformation T(⋅) of the welfare variables Eij . Then we average for a 1, , A and calculate the posterior variance as in (10.7.12). Molina, Nandram, and Rao (2014) conducted a simulation study to compare EB and HB estimates of poverty incidence and poverty gap under the frequentist setup. The setup of the simulation is exactly the same as in Molina and Rao (2010), which is also described in Section 9.4.6. Mean values of EB and HB estimates across Monte Carlo simulations turned out to be practically equal. Approximately, the same point estimates were also obtained in an application with data from the Spanish Survey on Income and Living Conditions from year 2006. Results in this application also show that posterior variances of HB estimators are of comparable magnitude to MSEs of frequentist EB estimators. Thus, HB estimates obtained from model (10.7.4) appear to have also good frequentist properties. Example 10.7.1. Poverty Mapping in Spain. Molina and Rao (2010) and Molina, Nandram, and Rao (2014) applied EB and HB methods to unit level data from the 2006 Spanish Survey on Income and Living Conditions to estimate poverty incidences and gaps for provinces by gender. They considered the basic unit level model for the log(income+constant), where the constant was selected to achieve symmetry of the distribution of model residuals. As explanatory variables in the model, they considered the indicators of five age groups, of having Spanish nationality, of three education levels and of the labor force status (unemployed, employed, or inactive). Table 10.5 reports the CVs of EBLUP estimators based on the FH model and
TABLE 10.5 Estimated % CVs of the Direct, EB, and HB Estimators of Poverty Incidence for Selected Provinces by Gender Province
Gender
ni
CV(dir.)
CV(EBLUP)
CV(EB)
CV(HB)
Soria Tarragona Córdoba Badajoz Barcelona
F M F M F
17 129 230 472 1,483
51.9 24.4 13.0 8.4 9.4
25.4 20.1 10.5 7.6 7.8
16.6 14.9 6.2 3.5 6.5
19.8 12.3 6.9 4.2 4.5
Source: Adapted from Table 17.1 in Rao and Molina (2015).
374
HIERARCHICAL BAYES (HB) METHOD
of the EB and HB estimators of poverty incidence for a selected set of domains (province×gender); for the HB estimator, the CV is computed from the posterior variance. Table 10.5 shows that both EB and HB estimators have much smaller CV than the direct estimators and the EBLUP estimators based on the FH model, especially for provinces with small sample sizes; for example, Soria with a sample size of 17 females.
10.8
TWO-LEVEL MODELS
In Section 8.8, we studied EBLUP estimation for the two-level model (8.8.1). In this section, we study three different HB models, including the HB version of (8.8.1), assuming priors on the model parameters. Model 1. The HB version of (8.8.1) with kij
1 may be written as
ind
(i) yij |𝛽i , 𝜎e2 ∼ N(xTij 𝛽i , 𝜎e2 ), ind
(ii) 𝛽i |𝛼, Σ𝑣 ∼ Np (Zi 𝛼, Σ𝑣 ), (iii) f (𝛼, 𝜎e2 , Σ𝑣 )
f (𝛼)f (𝜎e2 )f (Σ𝑣 ), 2
where 1
(10.8.1)
𝛼 ∼ Nq ( , D), 𝜎e ∼ G(a, b), Σ𝑣 ∼ Wp (d, 𝚫), and Wp (d, 𝚫) denotes a Wishart distribution with df d and scale matrix 𝚫: { f (𝚺𝑣 1 ) ∝ |𝚺𝑣 1 |(d
p 1)∕2
exp
} 1 tr(𝚫𝚺𝑣 1 ) , 2
d ≥ p.
(10.8.2)
The constants a ≥ 0, b > 0, d, and the elements of D and 𝚫 are chosen to reflect lack of prior information on the model parameters. In particular, using a diagonal matrix D with very large diagonal elements, say 104 , is roughly equivalent to a flat prior f (𝜶) ∝ 1. Similarly, d p, a 0.001, b 0.001, and a scale matrix 𝚫 with diagonal elements equal to 1 and off diagonals equal to 0.001 may be chosen. Daniels and Kass (1999) studied alternative priors for 𝚺𝑣 1 and discuss the limitations of Wishart prior when the number of small areas, m, is not large. Model 2. By relaxing the assumption of constant error variance, 𝜎e2 , we obtain a HB two-level model with unequal error variances: ind
(i) yij |𝛽i , 𝜎ei2 ∼ N(xTij 𝛽i , 𝜎ei2 ), (ii) Same as in (ii) of Model 1., (iii) Marginal priors on 𝜶 and 𝚺𝑣 1 same as in (iii) of Model 1 ind
and 𝜎ei2 ∼ G(ai , bi ),
i
1,
, m.
(10.8.3)
375
TWO-LEVEL MODELS
The constants ai and bi are chosen as ai 0.001, bi 0.001 to reflect lack of prior information. Model 3. In Section 7.6.1, we studied a simple random effects model given by yij 𝛽 + 𝑣i + eij with random error variance 𝜎ei2 . We also noted that the HB approach may be used to handle extensions to nested error regression models with random error variances. Here, we consider an HB version of a two-level model with random error variances: (i) Same as in (i) of Model 2, (ii) Same as in (ii) of Model 2, ind
(iii) 𝜎ei2 ∼ G(𝜂, 𝝀), (iv) Marginal priors on 𝜶 and 𝚺𝑣 1 same as in (iii) of Model 2, and 𝜂 and 𝝀 uniform over a large interval (0, 104 .
(10.8.4)
Note that (iii) of (10.8.4) is a component of the two-level model, and priors on model parameters are introduced only in (iv) of (10.8.4). The priors on 𝜂 and 𝝀 reflect vague prior knowledge on the model parameters 𝜂 > 0, 𝝀 > 0. You and Rao (2000) showed that all the Gibbs conditionals for Models 1–3 have closed forms. They also obtained Rao–Blackwell HB estimators of small area means 𝜇i XTi 𝜷 i and posterior variance V(𝜇i |y) XTi V(𝜷 i |y)Xi , where Xi is the vector of population x-means for the ith area. Example 10.8.1. Household Income. In Example 8.8.1, a two-level model with equal error variances 𝜎e2 was applied to data from 38,704 households in m 140 enumeration districts (small areas) in one county in Brazil. The two-level model is given by (8.8.2) and (8.8.3), where yij is the jth household’s income in the ith small area, (x1ij , x2ij ) are two unit level covariates: number of rooms in the (i, j)th household and educational attainment of head of the (i, j)th household centered around the means (x1i , x2i ). The area level covariate zi , the number of cars per household in the ith area, is related to the random slopes 𝜷 i , using (8.8.3.) You and Rao (2000) used a subset of m 10 small areas with ni 28 households each, obtained by simple random sampling in each area, to illustrate model selection and HB estimation. The Gibbs sampler for the three models was implemented using the BUGS program aided by CODA for convergence diagnostics. Using priors as specified above, the Gibbs sampler for each model was first run for a “burn-in” of d 2, 000 iterations. Then, D 5, 000 more iterations were run and kept for model selection and HB estimation. For model selection, You and Rao (2000) calculated CPO values for the three models, using (10.2.26). In particular, for Model 1, [ ̂ij CPO
d+D
1 1 ∑ (k) D k d+1 f (y |𝜷 , 𝜎e2(k) ) ij i
]
1
,
(10.8.5)
376
0.0
0.02
0.04
CPO
0.06
0.08
0.10
HIERARCHICAL BAYES (HB) METHOD
0
50
100
150
200
250
Figure 10.2 CPO Comparison Plot for Models 1–3. Source: Adapted from Figure 1 in You and Rao (2000).
where 𝜷 (k) , 𝜎e2(k) , 𝜶 (k) , 𝚺(k) d + 1, , d + D denote the MCMC samples. For 𝑣 ;k i 2(k) Models 2, and 3, 𝜎e in (10.8.5) is replaced by 𝜎ei2(k) . A CPO plot for the three models is given in Figure 10.2. This plot shows that a majority of CPO-values for Model 2 were significantly larger than those for Models 1 and 3, thus indicating Model 2 as the best fitting model among the three models. You and Rao (2000) calculated Rao–Blackwell HB estimates of small area means 𝜇i 𝛽0 + 𝛽1 X 1i + 𝛽2 X 2i under Model 2 and demonstrated the benefits of Rao–Blackwellization in terms of simulation standard errors. Example 10.8.2. Korean Unemployment. Cheng, Lee, and Kim (2001) applied Models 1 and 2 to adjust direct unemployment estimates, yij , associated with the jth month survey data in the ith small area (here j refers to May, July, and December 2000). Auxiliary variables xij were obtained from the Economically Active Population Survey (EAPS), census, and administrative records. HB analysis for these data was conducted using WinBUGS program (Lunn et al. 2000). ̂ij plot for the two models showed that CPO ̂ values for Model 2 are signifiA CPO cantly larger in every small area than those for Model 1. Chung et al. (2003) also calculated standardized residuals, dij , similar to (10.4.8), and two other measures of fit, ∑m ∑ni namely negative cross-validatory loglikelihood, log f̂ (yij,obs |y(ij),obs ) , i 1 ∑d+D j 1 ∑m ∑ni 1 (k) and posterior mean deviance, 2 i 1 j 1 D k d+1 log f (yij,obs |𝜼 ) , where f̂ (yij,obs |y(ij),obs ) is obtained from (10.2.26). These measures are computable directly in WinBUGS. Model 2 yielded a negative cross-validatory loglikelihood of 121.5
TIME SERIES AND CROSS-SECTIONAL MODELS
377
(compared to 188.7 for Model 1) and a posterior mean deviance of 243.0 (compared to 377.3 for Model 1), thus supporting Model 2 relative to Model 1. Using Model 2, Chung et al. (2003) calculated the posterior means and variances of the small area means 𝜇ij xTij 𝜷 i .
10.9
TIME SERIES AND CROSS-SECTIONAL MODELS
In Section 8.3.1, we studied EBLUP estimation from time series and cross-sectional data using the sampling model (4.4.6) and the linking model (4.4.7): (i) 𝜃̂it 𝜃it + eit ; (ii) 𝜃it zTit 𝜷 + 𝑣i + uit , where uit follows either an AR(1) model uit 𝜌ui,t 1 + 𝜖it , |𝜌| < 1, or a random walk model uit ui,t 1 + 𝜖it . In this section, we give a brief account of HB estimation under this model, assuming a prior distribution on the model parameters. The HB version of the model with random walk effects uit may be expressed as ind (i) 𝜽̂ i |𝜽i ∼ NT (𝜽i , Ψi ),
matrix of 𝜽̂ i
(𝜃̂i1 ,
whereΨi is the known sampling covariance , 𝜃̂iT )T ,
ind
(ii) 𝜃it |𝜷, uit , 𝜎𝑣2 ∼ N(zTit 𝜷 + uit , 𝜎𝑣2 ), ind
(iii) uit |ui,t 1 , 𝜎𝜖2 ∼ N(ui,t 1 , 𝜎𝜖2 ), (iv) f (𝜷, 𝜎𝑣2 , 𝜎 2 )
f (𝜷)f (𝜎𝑣2 )f (𝜎 2 )
f (𝜷) ∝ 1, 𝜎𝑣 2 ∼ G(a1 , b1 ), 𝜎
with 2
∼ G(a2 , b2 ).
(10.9.1)
For the AR(1) case with known 𝜌, replace ui,t 1 in (iii) of (10.9.1) by 𝜌 ui,t 1 . Datta et al. (1999) and You, Rao, and Gambino (2003) obtained Gibbs conditionals under the HB model (10.9.1) and showed that all the conditionals have closed forms. You et al. (2003) also obtained Rao–Blackwell estimators of the posterior ̂ and the posterior variance V(𝜃iT |𝜽) ̂ for the current time T. We refer mean E(𝜃iT |𝜽) the reader to the above papers for technical details. You (2008) extended the matched random walk model (10.9.1) to unmatched models by replacing the linking model (ii) by ind
log(𝜃it )|𝜷, uit , 𝜎𝑣2 ∼ N(zTit 𝜷 + uit , 𝜎𝑣2 )
(10.9.2)
and retaining (i), (iii), and (iv). In this case, the Gibbs conditional for 𝜽i does not have a closed form and the M–H within Gibbs algorithm is used to generate MCMC samples from the joint posterior distribution. Example 10.9.1. Canadian Unemployment Rates. You et al. (2003) used the HB model (10.9.1) to estimate unemployment rates, 𝜃iT , for m 62 Census Agglomerations (CAs) in Canada, using Canadian LFS direct estimates 𝜃̂it and auxiliary data zit .
378
HIERARCHICAL BAYES (HB) METHOD
Here, CA’s are treated as small areas. They used data 𝜃̂it , zit for T 6 consecutive months, from January 1999 to June 1999, and the parameters of interest are the true unemployment rates, 𝜃iT , in June 1999 for each of the small areas. The choice T 6 was motivated by the fact that the correlation between the estimates 𝜃̂it and 𝜃̂is (s ≠ t) is weak after a lag of 6 months because of the LFS sample rotation based on a 6-month cycle; each month, one-sixth of the sample is replaced. To obtain a smoothed estimate of the sampling covariance matrix 𝚿i used in the model (10.9.1), You et al. (2003) first computed the average CV (CVi ) for each CA i over time and the average lag correlations, ra , over time and over all CA’s. A smoothed estimate of 𝚿i was then obtained using those smoothed CVs and lag correlations: the tth diagonal element, 𝜓itt , of 𝚿i (i.e., the sampling variance of 𝜃̂it ) equals 𝜃̂it (CVi ) 2 and the (t, s)th element, 𝜓its , of 𝚿i (i.e., sampling covariance of 2 𝜃̂it and 𝜃̂is ) equals ra (𝜓itt 𝜓iss )1∕2 with a |t s|. The smoothed estimate of 𝚿i was treated as the true 𝚿i . You et al. (2003) used a divergence measure proposed by Laud and Ibrahim (1995) to compare the relative fits of the random walk model (10.9.1) and the corresponding AR(1) model with 𝜌 0.75 and 𝜌 0.50. This measure is given by d(𝜽̂ ∗ , 𝜽̂ obs ) E (mT) 1 ‖𝜽̂ ∗ 𝜽̂ obs ||2 |𝜽̂ obs , where 𝜽̂ obs is the (mt)-vector of direct estimates 𝜃̂it and the expectation is with respect to the posterior predictive distribution f (𝜽̂ ∗ |𝜽̂ obs ) of a new observation 𝜽̂ ∗ . Models yielding smaller values of this measure are preferred. The Gibbs output with L 10 parallel runs was used to generate samples 𝜽(𝓁k) ; 𝓁 1, , 10 from the posterior distribution, f (𝜽|𝜽̂ obs ), where 𝜽 is the (𝓁k) (mT)-vector of small area parameters 𝜃it . For each 𝜽(𝓁k) , a new observation 𝜽̂ ∗ was (𝓁k) ̂ (𝓁k) ). The new observations 𝜽̂ ∗ then generated from f (𝜽|𝜽 represent simulated ̂ ̂ samples from f (𝜽∗ |𝜽obs ). The measure d(𝜽̂ ∗ , 𝜽̂ obs ) was approximated by using these observations from the posterior predictive distribution. The following values of d(𝜽̂ ∗ , 𝜽̂ obs ) were obtained: (i) 13.36 for the random walk model; (ii) 14.62 for the AR(1) model with 𝜌 0.75; and (iii) 14.52 for the AR(1) model with 𝜌 0.5. Based on these values, the random walk model was selected. To check the overall fit of the random walk model, the simulated values (𝓁k) (𝜽(𝓁k) , 𝜽̂ ∗ ); 𝓁 1, , 10 were employed to approximate the posterior predictive p value from each run, 𝓁, using the formula (10.2.19) with measure of ̂ 𝜼) ∑62 (𝜽̂ i 𝜽i )T 𝚿 1 (𝜽̂ i 𝜽i ). The average of the discrepancy given by T(𝜽, i 1 i L 10 posterior predictive p-values, p̂p 0.615, indicated a good fit of the random walk model to the time series and cross-sectional data. HB , and the Rao–Blackwell method was used to calculate the posterior mean, 𝜃̂i6 ̂ posterior variance, V(𝜃i6 |𝜽), for each area i. Figure 10.3 displays the HB estimates under model (10.9.1), denoted HB1, the HB estimates under the Fay–Herriot model using only the current cross-sectional data, denoted HB2, and the direct LFS estimates, denoted DIRECT. It is clear from Figure 10.3 that the HB2 estimates tend to be smoother than HB1, whereas HB1 leads to moderate smoothing of the direct estimates. For the CAs with larger population sizes and therefore larger sample sizes, DIRECT and HB1 are very close to each other, whereas DIRECT differs substantially from HB1 for some smaller CAs.
379
TIME SERIES AND CROSS-SECTIONAL MODELS
10 5 0
Unemployment rate
15
Direct HB1 HB2
0
10
20
30
40
50
60
CMA/CAs by population size
0.8 0.2
0.4
0.6
HB1 HB2
0.0
Coefficient of variation
1.0
Figure 10.3 Direct, Cross-sectional HB (HB2) and Cross-Sectional and Time Series HB (HB1) Estimates. Source: Adapted from Figure 2 in You, Rao, and Gambino (2003).
0
10
20
30
40
50
60
CMA/CAs by population size
Figure 10.4 Coefficient of Variation of Cross-sectional HB (HB2) and Cross-Sectional and Time Series HB (HB1) Estimates. Source: Adapted from Figure 3 in You, Rao, and Gambino (2003).
Figure 10.4 displays the CVs of HB1 and HB2; note that we have already compared the CVs of HB2 and DIRECT in Figure 10.1. It is clear from Figure 10.4 that HB1 leads to significant reduction in CV relative to HB2, especially for the smaller CAs.
380
HIERARCHICAL BAYES (HB) METHOD
Example 10.9.2. U.S. Unemployment Rates. Datta et al. (1999) studied an extension of the random walk model (10.9.1) and applied their model to estimate monthly unemployment rate for 49 U.S. states and the District of Columbia (m 50), using CPS estimates as 𝜃̂it and Unemployment Insurance (UI) claims rate as zit ; New York state was excluded from the study because of very unreliable UI data. They considered the data for the period January 1985–December 1988 (T 48) to calculate HB HB , for the current time point T. The HB version of the Datta et al. (1999) estimates, 𝜃̂iT model may be written as ind (i) 𝜽̂ i |𝜽i ∼ NT (𝜽i , Ψi ), 12 ∑
𝛽0i + 𝛽1i zit + uit +
(ii) 𝜃it
4 ∑
aitu 𝛾iu + u 1
11 ∑
where 𝛾i,12
3 ∑
𝛾iu and 𝝀i4 u 1
iid
bitk 𝝀ik , k 1
𝝀ik , k 1
iid
(iii) 𝛽0i ∼ N(𝛽0 , 𝜎02 ), 𝛽1i ∼ N(𝛽1 , 𝜎12 ), 𝛾i ∼ N11 (𝛾, 𝚺𝛾 ), 𝝀i ∼ N3 (𝝀, Σ𝝀 ), uit |ui,t 1 , 𝜎 2 ∼ N(ui,t 1 , 𝜎 2 ), (iv) Independent gamma priors on 𝜎0 2 , 𝜎1 2 and 𝜎 2 , independent Wishart priors on 𝚺𝛾 1 and 𝚺𝝀 1 , and flat priors on 𝛽0 and 𝛽1 .
(10.9.3)
The last two terms in (ii) of model (10.9.3) account for seasonal variation in monthly unemployment rates, where aitu 1 if t u, aitu 0 otherwise; ait,12 1 if , 𝛾i,11 )T represents random t 12, 24, 36, 48, ait,12 0 otherwise, and 𝜸 i (𝛾i1 , month effects. Similarly, bitk 1 if 12(k 1) < t ≤ 12k, bitk 0 otherwise, and 𝝀i (𝝀i1 , 𝝀i2 , 𝝀i3 )T represent random year effects. The random effects 𝛽0i , 𝛽1i , 𝜸 i , 𝝀i , and uit are assumed to be mutually independent. Datta et al. (1999) used L 10 parallel runs to draw MCMC samples 𝜼(k) from the joint posterior distribution of the random effects 𝛽0i , 𝛽1i , uit , 𝜸 i , 𝝀i , and the model parameters. To check the overall fit of the model (10.9.3), they estimated ̂ 𝜼) of Example the posterior predictive p-value using the discrepancy measure T(𝜽, 10.3.2. The estimated posterior predictive p-value, p̂p 0.614, indicates a good fit of the model to the CPS time series and cross-sectional data. If the seasonal effects are deleted from the model (10.9.3), then the posterior predictive p-value becomes p̂p 0.758. This increase suggests that the inclusion of seasonal effects is important, probably because of the long time series used in the study (T 48). Note that in Example 10.3.2, the time series is short (T 6). Datta et al. (1999) also calculated the divergence measure proposed by Laud and Ibrahim (1995) and given by d(𝜽̂ ∗ , 𝜽̂ obs )
D0 1 ∑ ̂ (k) ‖𝜽 nD0 k 1 ∗
𝜽̂ obs ||2 ,
(10.9.4)
381
MULTIVARIATE MODELS
where n mT, D0 is the total number of MCMC samples, 𝜽̂ obs is the vector of (k) observed direct estimates with blocks 𝜽̂ i,obs , 𝜽̂ ∗ is the vector of hypothetical (k) (k) replications with blocks 𝜽̂ i∗ , with 𝜽̂ i∗ drawn from NT (𝜽(k) , 𝚿i ). For the CPS data, i ̂ ̂ d(𝜽∗ , 𝜽obs ) 0.0007 using model (10.9.3), while it increased to dFH (𝜽̂ ∗ , 𝜽̂ obs ) 1.63 using separate Fay–Herriot models. If the seasonal effects 𝛾iu and 𝝀ik are deleted from the model (10.9.3), then d(𝜽̂ ∗ , 𝜽̂ obs ) 0.0008, which is close to the value 0.0007 under the full model. HB , and the posterior variances, Datta et al. (1999) calculated the HB estimates, 𝜃̂iT ̂ V(𝜃iT |𝜽), for the current time point T, under model (10.9.3), and then compared the values to the CPS values, 𝜃̂iT and 𝜓iT , and the HB estimates for the Fay–Herriot model fitted only to the current month cross-sectional data. The HB standard errors, √ ̂ under the model (10.9.3) were significantly smaller than the correspondV(𝜃iT |𝜽), √ ing CPS standard errors, 𝜓iT , uniformly over time and across all states. However, HB was more effective for states with fewer sample households (“indirect-use” states). The reduction in standard error under the Fay–Herriot model for the current period relative to CPS standard error was sizeable for indirect-use states, but it is less than 10% (and as small as 0.1% in some months) for “direct-use” states with larger sample sizes.
10.10 10.10.1
MULTIVARIATE MODELS Area Level Model
In Section 8.1, we considered EBLUP estimation under the multivariate Fay–Herriot model (4.4.3). An HB version of (4.4.3) is given by ind (i) 𝜽̂ i |𝜽i ∼ Nr (𝜃i , Ψi ), where
𝜽̂ i
(𝜃̂i1 ,
, 𝜃̂ir )T is the vector of direct estimators andΨi is known,
(ii) 𝜽i |𝜷, Σ𝑣 ∼ Nr (Zi 𝜷, Σ𝑣 ), (iii) f (𝜷, Σ𝑣 )
f (𝜷)f (Σ𝑣 ) with f (𝜷) ∝ 1, f (Σ𝑣 ) ∝ 1.
(10.10.1)
Datta et al. (1996) derived the Gibbs conditionals, 𝜷|𝜽, 𝚺𝑣 , 𝜽̂ , 𝜽i |𝜷, 𝚺𝑣 , 𝜽̂ , and 𝚺𝑣 |𝜷, 𝜽, 𝜽̂ , under (10.10.1) and showed that all the conditionals have closed forms. ̂ and the They also obtained Rao–Blackwell estimators of the posterior means E(𝜽i |𝜽) ̂ posterior variances V(𝜽i |𝜽). Example 10.10.1. Median Income. Datta et al. (1996) used the multivariate Fay–Herriot model (10.10.1) to estimate the median incomes of four-person families in the U.S. states (small areas). Here 𝜽i (𝜃i1 , 𝜃i2 , 𝜃i3 )T with 𝜃i1 , 𝜃i2 , and 𝜃i3 denoting the true median income of four, three, and five-person families in state i, and 𝜃i1 ’s are the parameters of interest. The adjusted census median income, zij1 , and
382
HIERARCHICAL BAYES (HB) METHOD
base-year census median income, zij2 , for the three groups j 1, 2, 3 were used as explanatory variables. The matrix Zi consists of three rows (zTi1 , T3 , T3 ), ( T3 , zTi2 , T3 ), and ( T3 , T3 , zTi3 ), where zTij (1, zij1 , zij2 ) and T3 (0, 0, 0). Direct survey estimates 𝜽̂ i of 𝜽i and associated sampling covariance matrices 𝚿i were obtained from the 1979 Current Population Survey (CPS). HB estimates of 𝜃i1 were obtained using univariate, bivariate, and trivariate Fay–Herriot models, denoted as HB1 , HB2 , and HB3 , respectively. Two cases for the bivariate model were studied: 𝜽i (𝜃i1 , 𝜃i2 )T with the corresponding Zi and 𝜽i (𝜃i1 , 𝜃i3 )T with the corresponding Zi . Denote the HB estimates for the two cases as HB2a and HB2b , respectively. In terms of posterior variances, HB2b obtained from the bivariate model that uses only the median incomes of four- and five-person families performed better than the other estimates. HB estimates based on the multivariate models performed better than HB1 based on the univariate model. Datta et al. (1996) also conducted an external evaluation by treating the census estimates for 1979, available from the 1980 census data, as true values. Table 10.6 reports the absolute relative error averaged over the states (ARE); the direct estimates 𝜃̂i1 ; and the HB estimates HB3 , HB2a , HB2b , and HB1 . It is clear from Table 10.6 that all HB estimates performed similarly in terms of ARE and outperformed the direct estimates. 10.10.2
Unit Level Model
In Section 8.6, we studied EBLUP estimation under the multivariate nested error regression model (4.5.1). Datta, Day, and Maiti (1998) studied HB estimation for this model. An HB version of the model may be expressed as ind
(i) yij |B, 𝚺e ∼ Nr (Bxij + vi , 𝚺e ),
j
1,
, ni ,
i
1,
, m,
iid
(ii) vi |𝚺𝑣 ∼ Nr ( , 𝚺𝑣 ), (iii) Flat prior on B and independent Wishart priors on 𝚺𝑣 1 and 𝚺e 1 . (10.10.2) Datta et al. (1998) derived Gibbs conditionals under (10.10.2) and showed that all the conditionals have closed forms. They also considered improper priors of the form f (B, 𝚺𝑣 , 𝚺e ) ∝ |𝚺𝑣 | a𝑣 ∕2 |𝚺e | ae ∕2 under which Gibbs conditionals are proper but the
TABLE 10.6 Average Absolute Relative Error (ARE%): Median Income of Four-Person Families Direct
HB1
HB2a
HB2b
HB3
4.98
2.07
2.04
2.06
2.02
Source: Adapted from Table 11.2 in Datta et al. (1996).
383
DISEASE MAPPING MODELS
joint posterior may be improper. They obtained necessary and sufficient conditions on a𝑣 and ae that ensure property of the joint posterior. Datta et al. (1998) also obtained Rao–Blackwell estimators of the posterior means E(Yi |y) and the posterior variance matrix V(Yi |y), where Yi is the vector of finite population means for the ith area. The multivariate model was applied to the county corn and soybeans data, yij1 , yij2 of Battese et al. (1988). In terms of posterior variances, the HB estimates under the multivariate model performed better than the HB estimates under the univariate model for corn as well as soybeans. We refer the reader to Datta et al. (1998) for details of MCMC implementation and model fit.
10.11
DISEASE MAPPING MODELS
In Section 9.6, we studied EBLUP estimation for three models useful in disease mapping applications: Poisson-gamma, log-normal, and CAR-normal models. In this section, we study HB estimation for these models, assuming priors on the model parameters. We also consider extensions to two-level models.
10.11.1
Poisson-Gamma Model
Using the notation in Section 9.6, let 𝜃i , yi and ei denote, respectively, the relative risk (RR), observed and expected number of cases (deaths) over a given period in the ith area (i 1, , m). An HB version of the Poisson-gamma model, given in Section 9.6.1, may be written as ind
(i) yi |𝜃i ∼ Poisson(ei 𝜃i ), iid
(ii) 𝜃i |𝛼, 𝜈 ∼ G(𝜈, 𝛼), (iii) f (𝛼, 𝜈) ∝ f (𝛼)f (𝜈), with f (𝛼) ∝ 1∕𝛼;
𝜈 ∼ G(a
1∕2, b),
b > 0;
(10.11.1)
see Datta, Ghosh, and Waller (2000). The joint posterior f (𝜽, 𝛼, 𝜈|y) is proper if at least one yi is greater than zero. It is easy to verify that the Gibbs conditionals are given by (i) (ii)
ind
𝜃i |𝛼, 𝜈, y ∼ G(yi + 𝜈, ei + 𝛼), ) ( m ∑ 𝜃i , 𝛼|𝜽, 𝜈, y ∼ G m𝜈, i 1
( m ∏
) 𝜃i𝜈 1
(iii) f (𝜈|𝜽, 𝛼, y) ∝ i 1
exp ( b𝜈)𝛼 𝜈m ∕Γm (𝜈).
(10.11.2)
384
HIERARCHICAL BAYES (HB) METHOD
MCMC samples can be generated directly from (i) and (ii) of (10.11.2), but we need to use the M–H algorithm to generate samples from (iii) of (10.11.2). Using the MCMC samples (𝜃1(k) , , 𝜃m(k) , 𝜈 (k) , 𝛼 (k) ); k d + 1, , d + D , posterior quantities of interest may be computed; in particular, the posterior mean E(𝜃i |y) and posterior variance V(𝜃i |y) for each area i 1, , m.
10.11.2
Log-Normal Model
An HB version of the basic log-normal model, given in Section 9.6.2, may be written as ind
(i)
yi |𝜃i ∼ Poisson(ei 𝜃i ),
(ii)
𝜉i
(iii)
f (𝜇, 𝜎 2 ) ∝ f (𝜇)f (𝜎 2 ) with
iid
log(𝜃i )|𝜇, 𝜎 2 ∼ N(𝜇, 𝜎 2 ),
f (𝜇) ∝ 1;
𝜎
2
a ≥ 0,
∼ G(a, b),
b > 0.
(10.11.3)
The joint posterior f (𝜽, 𝜇, 𝜎 2 |y) is proper if at least one yi is greater than zero. It is easy to verify that the Gibbs conditionals are given by [ y 1
(i) f (𝜃i |𝜇, 𝜎 2 , y ∝ 𝜃i i
( (ii)
𝜇|𝜽, 𝜎 , y ∼ N (
(iii)
𝜎 |𝜽, 𝜇, y ∼ G
1 (𝜉 2𝜎 2 i
ei 𝜃i
] 𝜇)2 ,
)
m
1 ∑ 𝜎2 𝜉, m i 1 i m
2
2
exp
, )
m
1∑ m (𝜉 + a, 2 2 i 1 i
2
𝜇) + b ;
(10.11.4)
see Maiti (1998). MCMC samples can be generated directly from (ii) and (iii) of (10.11.4), but we need to use the M–H algorithm to generate samples from (i) of (10.11.4). We can express (i) as f (𝜃i |𝜇, 𝜎 2 , y) ∝ k(𝜃i )h(𝜃i |𝜇, 𝜎 2 ), y where k(𝜃i ) exp( ei 𝜃i )𝜃i i and h(𝜃i |𝜇, 𝜎 2 ) ∝ g′ (𝜃i ) exp (𝜉i 𝜇)2 ∕2𝜎 2 with g′ (𝜃i ) 𝜕g(𝜃i )∕𝜕𝜃i and g(𝜃i ) log(𝜃i ). We can use h(𝜃i |𝜇, 𝜎 2 ) to draw the candidate, 𝜃i∗ , noting that 𝜃i g 1 (𝜉i ) and 𝜉i |𝜇, 𝜎 2 ∼ N(𝜇, 𝜎 2 ). The acceptance probability used in the M–H algorithm is then given by a(𝜃 (k) , 𝜃i∗ ) min k(𝜃i∗ )∕k(𝜃i(k) ), 1 . As noted in Section 9.6.2, the basic log-normal model with Poisson counts, yi , readily extends to the case of covariates zi by changing (ii) of (10.11.3) to 𝜉i |𝜷, 𝜎 2 ∼ N(zTi 𝜷, 𝜎 2 ). Furthermore, we change (iii) of (10.11.3) to f (𝜷, 𝜎 2 ) ∝ f (𝜷)f (𝜎 2 ) with
385
DISEASE MAPPING MODELS
f (𝜷) ∝ 1 and 𝜎 2 ∼ G(a, b). Also, the basic model can be extended to allow spatial covariates. An HB version of the spatial CAR-normal model is given by (i) yi |ei ∼ Poisson(ei 𝜃i ), [
]
m ∑
2
(ii) 𝜉i |𝜉j(j≠i) , 𝜌, 𝜎 ∼ N 𝜇 + 𝜌
qi𝓁 (𝜉𝓁
𝜇), 𝜎
2
,
𝓁 1
(iii) f (𝜇, 𝜎 2 , 𝜌) ∝ f (𝜇)f (𝜎 2 )f (𝜌) with f (𝜇) ∝ 1;
𝜎
2
∼ G(a, b), a ≥ 0, b > 0;
𝜌 ∼ U(0, 𝜌0 ),
(10.11.5)
where 𝜌0 denotes the maximum value of 𝜌 in the CAR model, and Q (qi𝓁 ) is the “adjacency” matrix of the map with qi𝓁 q𝓁i , qi𝓁 1 if i and 𝓁 are adjacent areas and qi𝓁 0 otherwise. Maiti (1998) obtained the Gibbs conditionals. In particular, 𝜇|𝜽, 𝜎 2 , 𝜌, y is distributed as normal, 𝜎 2 |𝜽, 𝜇, 𝜌, y gamma, 𝜌|𝜽, 𝜇, 𝜎 2 , y truncated normal, and 𝜃i |𝜃j(j≠i) , 𝜇, 𝜎 2 , 𝜌, y does not admit a closed form in the sense that the conditional is known only up to a multiplicative constant. MCMC samples can be generated directly from the first three conditionals, but we need to use the M–H algorithm to generate samples from the conditionals 𝜃i |𝜃j(j≠i) , 𝜇, 𝜎 2 , 𝜌, y , i 1, , m. Example 10.11.1. Lip Cancer. In Example 9.6.1, EB estimation was applied to lip cancer counts, yi , in each of 56 counties of Scotland. Maiti (1998) applied HB estimator to the same data using the log-normal and the CAR-normal models. The HB estimates E(𝜃i |y)√of lip cancer incidence are very similar for the two models, but the standard errors, V(𝜃i |y), are smaller for the CAR-normal model as it exploits the spatial structure of the data. Ghosh et al. (1999) proposed a spatial log-normal model that allows covariates zi , given by ind
(i) yi |ei ∼ Poisson(ei 𝜃i ) zTi 𝜷 + ui + 𝑣i where zTi 𝜷 does not include an intercept term,
(ii) 𝜉i iid
𝑣i ∼ N(0, 𝜎𝑣2 ) and the ui ′ shave joint density [ m ] ∑∑ 2 m∕2 2 2 f (u) ∝ (𝜎u ) exp (ui u𝓁 ) qi𝓁 ∕(2𝜎 ) , where qi𝓁 ≥ 0 i 1 𝓁≠i
for all 1 ≤ i ≠ 𝓁 ≤ m, (iii) 𝜷, 𝜎u2 and 𝜎𝑣2 are mutually independent with f (𝜷) ∝ 1, 𝜎u 2 ∼ G(au , bu ) and 𝜎𝑣 2 ∼ G(a𝑣 , b𝑣 ).
(10.11.6)
386
HIERARCHICAL BAYES (HB) METHOD
Ghosh et al. (1999) showed that all the Gibbs conditionals admit closed forms except for 𝜃i | 𝜃𝓁(𝓁≠i) , 𝜷, 𝝁, 𝜎u2 , 𝜎𝑣2 , y . They also established conditions for the propriety of the joint posterior; in particular, we need bu > 0, b𝑣 > 0. Example 10.11.2. Leukemia Incidence. Ghosh et al. (1999) applied the HB method, based on the model (10.11.6), to leukemia incidence estimation for m 281 census tracts (small areas) in an eight-county region of upstate New York. Here qi𝓁 1 if i and 𝓁 are neighbors and qi𝓁 0 otherwise, and zi is a scalar (p 1) variable zi denoting the inverse distance of the centroid of the ith census tract from the nearest hazardous waste site containing trichloroethylene (TCE), a common contaminant of ground water. We refer the reader to Ghosh et al. (1999) for further details. 10.11.3
Two-Level Models
Let yij and nij denote, respectively, the number of cases (deaths) and the population at risk in the jth age class in the ith area (j 1, , J, i 1, , m). Using the data yij , nij , it is of interest to estimate the age-specific mortality rates 𝜏ij and ∑n the age-adjusted rates j i 1 aj 𝜏ij , where the aj ’s are specified constants. The basic assumption is ind yij |𝜏ij ∼ Poisson(nij 𝜏ij ). (10.11.7) Nandram, Sedransk, and Pickle (1999) studied HB estimation under different linking models: log(𝜏ij )
zTj 𝜷 + 𝑣i ,
log(𝜏ij )
zTj 𝜷 i ,
log(𝜏ij )
zTj 𝜷 i + 𝛿j ,
iid
𝑣i |𝜎𝑣2 ∼ N(0, 𝜎𝑣2 ),
(10.11.8)
iid
(10.11.9)
𝜷 i |𝜷, 𝚫 ∼ Np (𝜷, 𝚫), iid
iid
𝜷 i |𝜷, 𝚫 ∼ Np (𝜷, 𝚫), 𝛿j ∼ N(0, 𝜎 2 ),
(10.11.10)
where zj is a p × 1 vector of covariates and 𝛿j is an “offset” corresponding to age class j. Nandram et al. (1999) assumed the flat prior f (𝜷) ∝ 1 and proper diffuse (i.e., proper with very large variance) priors for 𝜎𝑣2 , 𝚫, and 𝜎 2 . For model selection, they used the posterior EPD, the posterior predictive value, and measures based on the cross-validation predictive densities (see Section 10). Example 10.11.3. Mortality Rates. Nandram et al. (1999) applied the HB method to estimate age-specific and age-adjusted mortality rates for U.S. Health Service Areas (HSAs). They studied one of the disease categories, all cancer for white males, presented in the 1996 Atlas of United States Mortality. The number of HSAs (small areas), m, is 798 and the number of age categories, J, is 10: 0 4, 5 14, , 75 84, 85 and higher, coded as 0.25, 1, , 9. The vector of auxiliary 1, j 1, (j 1)2 , (j 1)3 , max 0, ((d 1) knot)3 T for variables is given by zj j ≥ 2 and z1 1, 0.25, (0.25)2 , (0.25)3 , max 0, (0.25 knot)3 T , where the value
387
DISEASE MAPPING MODELS
of the “knot” was obtained by maximizing the likelihood based on marginal deaths, ∑m ∑m ind yj i 1 yij , and population at risk, nj i 1 nij , where yj |nj , 𝜏j ∼ Poisson(nj 𝜏j ) with log(𝜏j ) zTj 𝜷. The auxiliary vector zj was used in the Atlas model based on a normal approximation to log(rij ) with mean log(𝜏ij ) and matching linking model given by (10.11.9), where rij yij ∕nij is the crude rate. Nandram et al. (1999) used unmatched sampling and linking models based on the Poisson sampling model (10.11.7) and the linking models (10.11.8)–(10.11.10). We denote these models as Models 1, 2, and 3, respectively. Nandram et al. (1999) used the MCMC samples generated from the three models to calculate the values of the posterior expected predictive deviance E Δ(y; yobs )|yobs ∑m ∑ni (y yij,obs )2 ∕(yij + using the chi-square measure Δ(y, yobs ) i 1 j 1 ij 0.5). They also calculated the posterior predictive p-values, using T(y, 𝝉) ∑m ∑ni (y nij 𝜏ij )2 ∕(nij 𝜏ij ), the standardized cross-validation residuals i 1 j 1 ij ∗ d2,ij
rij,obs E(rij |y(ij),obs ) , √ V(rij |y(ij),obs )
(10.11.11)
where y(ij),obs denotes all elements of yobs except for yij,obs ; see (10.2.28). The resid∗ were summarized by counting (a) the number of elements (i, j) such that uals d2,ij ∗ ∗ | ≥ 3 for |d2,ij | ≥ 3, called “outliers”, and (b) the number of HSAs, i, such that |d2,ij at least one j, called “# of HSAs”. Table 10.7 reports the posterior EPD, the posterior predictive p-value, the number of outliers according to the above definition, and the number of HSAs for Models 1–3. It is clear from Table 10.7 that Model 1 performs poorly with respect to all the four measures. Overall, Model 3 with the random age coefficient, 𝛿j , provides the best fit to the data, although Models 2 and 3 are similar with respect to EPD. Based on Model 3, Nandram et al. (1999) produced cancer maps of HB estimates of age-specific mortality rates for each age group j. Note that the HB estimate of the mortality rate 𝜏ij is given by the posterior mean E(𝜏ij |y). The maps revealed that the mortality rates, for all age classes, are highest among the Appalachian mountain range (Mississippi to West Virginia) and the Ohio River Valley (Illinois to Ohio). Also, the highest rates formed more concentrated clusters in the middle and older age groups (e.g., 45–54), whereas the youngest and oldest age groups exhibited more scattered patterns.
TABLE 10.7
Comparison of Models 1–3: Mortality Rates
Model
EPD
1 2 3
22,307 16,920 16,270
p-value
Outliers
# of HSAs
0.00 0.00 0.32
284 136 59
190 93 54
Source: Adapted from Table 1 in Nandram et al. (1999).
388
HIERARCHICAL BAYES (HB) METHOD
Nandram, Sedransk, and Pickle (2000) used models and methods similar to those in Nandram et al. (1999) to estimate age-specific and age-adjusted mortality rates for chronic obstructive pulmonary disease for white males in HSAs.
10.12
*TWO-PART NESTED ERROR MODEL
Pfeffermann, Terryn, and Moura (2008) studied the case of a response variable, y, taking either the value 0 or a value drawn from a continuous distribution. For example, in the assessment of adult literacy, y is either zero indicating illiteracy or a positive score measuring the literacy level. Pfeffermann et al. (2008) used a twofold, two-part model to estimate the mean of the literacy scores and the proportion of positive scores in each village (subarea) belonging to a district (area) in Cambodia. For simplicity, we consider only a onefold, two-part model here. Let yij denote the response (e.g., literacy score) associated with the jth unit in ith , Ni , area, zij a vector of covariates for that unit, and pij Pr(yij > 0|zij , ui ), (j 1, i 1, , m). The probabilities pij are then modeled as logit(pij )
iid
ui ∼ N(0, 𝜎u2 ).
zTij 𝜸 + ui ,
(10.12.1)
Furthermore, consider another vector of covariates xij , possibly different from zij , such that (10.12.2) yij |xTij , 𝑣i , yij > 0 ∼ N(xTij 𝜷 + 𝑣i , 𝜎e2 ) iid
with 𝑣i ∼ N(0, 𝜎𝑣2 ). Note that (10.12.2) implies E(yij |xij , zij , 𝑣i , ui )
(xTij 𝜷 + 𝑣i )pij .
(10.12.3)
The conditional likelihood for the two-part model (10.12.1) and (10.12.2) may be written as m ∏ ∏
a
pijij f (yij |xij , 𝑣i , yij > 0)
L
aij
(1
pij )1
aij
,
(10.12.4)
i 1 j∈si
where aij 1 if yij > 0, aij 0 otherwise, and pij E(aij ). Pfeffermann et al. (2008) introduced a correlation between ui and 𝑣i by assuming that 2 ), ui |𝑣i ∼ N(K𝑣 𝑣i , 𝜎u|𝑣
i
1,
, m.
(10.12.5)
Note that the unknown model parameters are the regression coefficients 𝜷, 𝜸, and K𝑣 2 . and the variances 𝜎u2 , 𝜎e2 , and 𝜎u|𝑣 By specifying diffuse priors on all model parameters, MCMC samples 2(b) ); b 1, ,B are generated from (v(b) , u(b) , 𝜷 (b) , 𝜸 (b) , K𝑣(b) , 𝜎𝑣2(b) , 𝜎e2(b) , 𝜎u|𝑣 the joint posterior distribution using WinBUGS to implement M–H within Gibbs
389
BINARY DATA
algorithm, where u (u1 , , um )T and v (𝑣1 , , 𝑣m )T . It follows from (10.12.1) and (10.12.3) that the HB estimator of the area mean Y i is given by ̂ HB Yi
{ ∑ Ni
1
[ yij +
j∈si
B
]}
1 ∑ (b) y B b 1 ij
∑ j∈ri
B
∶
1 ∑ ̂ HB Y (b), Bb 1 i
(10.12.6)
where y(b) ij
(xTij 𝜷 (b) + 𝑣(b) )p(b) and logit(p(b) ) zTij 𝜸 (b) + u(b) . Similarly, the HB estii ij ij i ∑ Ni 1 a is given by mator of the area proportion Pi Ni j 1 ij { ∑ P̂ HB i
Ni
1
[ ∑ aij +
j∈si
j∈ri
B
1 ∑ (b) p B b 1 ij
]}
B
∶
1 ∑ ̂ HB P (b). Bb 1 i
(10.12.7)
Posterior variances of Y i and Pi and credible intervals on Y i and Pi are similarly ̂ HB obtained from the generated values Y i (b) and P̂ HB (b), b 1, , B. The HB estii mators (10.12.6) and (10.12.7) are more efficient than those given in Pfeffermann et al. (2008) because they make use of (10.12.1) and (10.12.3).
10.13
BINARY DATA
In Section 9.5, we studied EB models for binary responses yij . In this section, we study HB estimation for these models, assuming priors on the model parameters. We also consider extensions to logistic linear mixed models. 10.13.1
Beta-Binomial Model
An HB version of the beta-binomial model studied in Section 9.5.1 is given by ind
(i) yi |pi ∼ binomial(ni , pi ), iid
(ii) pi |𝛼, 𝛽 ∼ beta(𝛼, 𝛽), 𝛼 > 0, 𝛽 > 0, (iii) 𝛼 and 𝛽 mutually independent of 𝛼 ∼ G(a1 , b1 ), a1 > 0, b1 > 0, 𝛽 ∼ G(a2 , b2 ), a2 > 0, b2 > 0.
(10.13.1)
The Gibbs conditional pi |p𝓁(𝓁≠i) , 𝛼, 𝛽, y is beta(yi + 𝛼, ni yi + 𝛽) under model (10.13.1), but the remaining conditionals 𝛼|p, 𝛽, y and 𝛽|p, 𝛼, y do not admit closed forms. He and Sun (1998) showed that the latter two conditionals are log-concave if a1 ≥ 1 and a2 ≥ 1. Using this result, they used adaptive rejection sampling to generate samples from these conditionals. BUGS version 0.5 handles log-concave conditionals using adaptive rejection sampling (Section 10.2.4).
390
HIERARCHICAL BAYES (HB) METHOD
Example 10.13.1. Turkey Hunting. He and Sun (1998) applied the beta-binomial model (10.13.1) to data collected from a mail survey on turkey hunting conducted by Missouri Department of Conservation. The purpose of this survey was to estimate hunter success rates, pi , and other parameters of interest for m 114 counties in Missouri. Questionnaires were mailed to a random sample of 5151 permit buyers after the 1994 spring turkey hunting season; the response rate after these mailings was 69%. A hunter was allowed at most 14 one-day trips, and each respondent reported the number of trips, whether a trip was success or not, and the county where hunted for each trip. From this data, a direct estimate of pi is computed as p̂ i yi ∕ni , where ni is the total number of trips in county i and yi is the number of successful trips. The direct estimate is not reliable for counties with small ni ; for example, ni 11 and yi 1 for county 11. The direct estimate of success rate for the state, p̂ 10.1%, is reliable because of the large overall sample size; the average of HB estimates, p̂ H , i equals 10.2%. For the counties with small ni , the HB estimates shrink toward p̂ . For example, p̂ i 9.1% and p̂ HB 10.8% for county 11. i You and Reiss (1999) extended the beta-binomial HB model (10.13.1) to a twofold beta-binomial HB model. They applied the twofold model to estimate the proportions, pij , within groups, j, in each area i. They also derived Rao–Blackwell estimators of the posterior means and posterior variances. Using data from Statistics Canada’s 1997 Homeowner Repair and Renovation Survey, they obtained HB estimates of the response rates, pij , within response homogeneity groups (RHGs), j, in each province i. The number of RHGs varied widely across provinces; for example, 153 RHGs in Ontario compared to only 8 RHGs in Prince Edward Island. Sample sizes, nij , varied from 1 to 201. As a result, direct estimates of response rates pij are unreliable for the RHGs with small sample sizes. HB standard errors were substantially smaller than the direct standard errors, especially for the RHGs with small nij . For example, the sample size is only 4 for the RHG 17 in Newfoundland, and the direct standard error is 0.22 compared to HB standard error equal to 0.03.
10.13.2
Logit-Normal Model
As noted in Section 9.5.2, the logit-normal model readily allows covariates, unlike the beta-binomial model. We first consider the case of area-specific covariates, zi , and then study the case of unit level covariates, xij . Area Level Covariates An HB version of the logit-normal model with area level covariates may be expressed as ind
(i) yi |pi ∼ binomial(ni , pi ), (ii) 𝜉i
logit(pi )
iid
zTi 𝜷 + 𝑣i , with 𝑣i ∼ N(0, 𝜎𝑣2 )
(iii) 𝜷 and 𝜎𝑣2 are mutually independent of f (𝜷) ∝ 1 and 𝜎𝑣 2 ∼ G(a, b), a ≥ 0, b > 0.
(10.13.2)
391
BINARY DATA
It is easy to verify that the Gibbs conditionals corresponding to the HB model (10.13.2) are given by
(i)
(ii)
(m ) 1 ⎡ ⎤ ∑ ∗ 2 T ⎥ ∼ Np ⎢𝜷 , 𝜎𝑣 zi zi ⎢ ⎥ i 1 ⎣ ⎦ [ ] m 1∑ m 2 T 2 (𝜉 zi 𝜷) + b . 𝜎𝑣 |𝜷, p, y ∼ G + a, 2 2 i 1 i
𝜷|p, 𝜎𝑣2 , y
(iii) f (pi |𝜷, 𝜎𝑣2 , y) ∝ h(pi |𝜷, 𝜎𝑣2 )k(pi ), where 𝜷 ∗
∑ T ( m i 1 zi zi )
1 ∑m z 𝜉 , i 1 i i
(10.13.3) y
pi i (1
k(pi )
pi )ni
{ h(pi |𝜷, 𝜎𝑣2 )
∝ g (pi ) exp
and
zTi 𝜷)2
(𝜉i
′
yi
2𝜎𝑣2
} ,
(10.13.4)
where g′ (pi ) 𝜕g(pi )∕𝜕pi with g(pi ) logit(pi ). It is clear from (10.13.3) that the conditionals (i) and (ii) admit closed forms while (iii) has form similar to (10.4.3). Therefore, we can use h(pi |𝜷, 𝜎𝑣2 ) to draw the candidate p∗i , noting that pi g 1 (𝜉i ) and 𝜉i |𝜷, 𝜎𝑣2 ∼ N(zTi 𝜷, 𝜎𝑣2 ). In this case, the acceptance probability used in the M–H , p∗i ) min k(p∗i )∕k(p(k) ), 1 . algorithm is given by a(p(k) i i The HB estimate of pi and the posterior variance of pi are obtained directly from (k) 2(k) , , p(k) d + 1, , d + D generated the MCMC samples (p(k) m , 𝜷 , 𝜎𝑣 ); k 1 from the joint posterior f (p1 , , pm , 𝜷, 𝜎𝑣2 |y), as d+D
p̂ HB ≈ i
1 ∑ (k) p D k d+1 i
and V(pi |̂p) ≈
(10.13.5)
d+D
∑
1 D
p(⋅) i
1k
(p(k) i
p(⋅) )2 . i
(10.13.6)
d+1
Unit Level Covariates In Section 9.5.2, we studied EB estimation for a logistic linear mixed model with unit level covariates xij , assuming that the model holds , m . An HB version of this model may be for the sample (yij , xij ); j ∈ si , i 1, expressed as ind
(i) yij |pij ∼ Bernoulli(pij ), (ii) 𝜉ij
logit(pij )
xTij 𝜷 + 𝑣i ,
iid
with 𝑣i ∼ N(0, 𝜎𝑣2 )
(iii) 𝜷 and 𝜎𝑣2 are mutually independent with f (𝜷) ∝ 1 and 𝜎𝑣 2 ∼ G(a, b), a ≥ 0, b > 0.
(10.13.7)
392
HIERARCHICAL BAYES (HB) METHOD
An alternative prior on 𝜷 and 𝜎𝑣2 is the flat prior f (𝜷, 𝜎𝑣2 ) ∝ 1. (k) 2(k) , , 𝑣(k) d + 1, , d + D denote the MCMC samples Let (𝑣(k) m , 𝜷 , 𝜎𝑣 ); k 1 generated from the joint posterior f (𝑣1 , , 𝑣m , 𝜷, 𝜎𝑣2 |y). Then the HB estimate of the finite population proportion Pi is obtained as [ 1 P̂ HB ≈ i Ni
∑ j∈si
] d+D 1 ∑ ∑ (k) , yij + p D k d+1 𝓁∈r i𝓁
(10.13.8)
i
where logit(p(k) ) xTij 𝜷 (k) + 𝑣(k) and ri is the set of nonsampled units in the ith area. ij i Similarly, the posterior variance V(Pi |y) is obtained as )2
(
d+D ⎡ 1 ∑ ⎢ ∑ (k) p (1 V(Pi |y) ≈ Ni D k d+1 ⎢𝓁∈r i𝓁 ⎣ i [ d+D ]2 ∑ (k) 2 1 p ; Ni D k d+1 i𝓁
p(k) ) i𝓁
2
∑ +
p(k) i𝓁
𝓁∈ri
⎤ ⎥ ⎥ ⎦ (10.13.9)
see (9.5.21). The total Yi Ni Pi is estimated by Ni P̂ HB and its posterior variance i 2 is given by Ni V(Pi |y). Note that the xi𝓁 -values for 𝓁 ∈ ri are needed to implement (10.13.8) and (10.13.9). The Gibbs conditionals corresponding to the HB model (10.13.7) are given by m ∏ ∏
(i) f (𝛽1 |𝛽2 ,
y
pij )1
pijij (1
, 𝛽p , v, y) ∝
yij
,
i 1 j∈si m ∏ ∏
(ii) f (𝛽u |𝛽1 ,
y
pijij (1
, 𝛽p , v, y) ∝
, 𝛽u 1 , 𝛽u+1 ,
pij )1
yij
,
i 1 j∈si
(iii) f (𝑣i |𝜷, 𝜎𝑣2 , y, 𝑣1 ,
, 𝑣i 1 , 𝑣i+1 ,
, 𝑣m )
m ∏ ∏
( (iv)
2
𝜎𝑣 |𝜷, v, y ∼ G
y
pijij (1
∝
pij )1
yij
exp
𝑣2i ∕(2𝜎𝑣2 ) ,
i 1 j∈si
) m 1∑ 2 m 𝑣 +b . + a, 2 2 i 1 i
(10.13.10)
It is clear from (10.13.10) that (i)–(iii) do not admit closed forms. Farrell (2000) used the griddy-Gibbs sampler (Ritter and Tanner 1992) to generate MCMC samples from (i) to (iv). We refer the reader to Farrell (2000) for details of the griddy-Gibbs sampler with regard to the HB model (10.13.7). Alternatively, M–H within Gibbs may be used to generate MCMC samples from (i) to (iv) of (10.13.10).
393
BINARY DATA
Example 10.13.2. Simulation Study. Farrell (2000) conducted a simulation study on the frequentist performance of the HB method and the approximate EB method of MacGibbon and Tomberlin (1989); see Section 9.5.2 for the approximate EB method. He treated a public use microdata sample of the 1950 United States Census as the population and calculated the true local area female participation rates, Pi , for m 20 local areas selected with probability proportional to size and without replacement from M 52 local areas in the population. Auxiliary variables, xij , related to pij , were selected by a stepwise logistic regression procedure. Both unit (or individual) level and area level covariates were selected. Unit level variables included age, marital status, and whether the individual had children or not. Area level variables included average age, proportions of individuals in various marital status categories, and proportion of individuals having children. Treating the m 20 sampled areas as strata, R 500 independent stratified random samples with ni 50 individuals in each stratum were drawn. The HB estimates (r) and the approximate EB estimates P̂ EB (r) were then calculated from each P̂ HB i i simulation run r 1, , 500. Using these estimates, the absolute differences R
ADHB i
1 ∑ ̂ HB |P (r) Rr 1 i
R
Pi |,
1 ∑ ̂ EB |P (r) Rr 1 i
ADEB i
Pi |
HB EB ∑ HB and the mean absolute differences AD m 1 m and AD i 1 ADi ∑ EB were obtained. In terms of mean absolute difference, HB perm 1 m i 1 ADi HB
EB
formed better than EB, obtaining AD 0.0031 compared to AD 0.0056. EB was smaller than AD for 16 of the 20 Moreover, the absolute difference ADHB i i local areas. However, these results should be interpreted with caution because Farrell (2000) used the gamma prior on 𝜎𝑣 2 with both a 0 and b 0, which results in an improper joint posterior. 10.13.3
Logistic Linear Mixed Models
A logistic linear mixed HB model is given by ind
(i) yij |pij ∼ Bernoulli(pij ), (ii) 𝜉ij
logit(pij )
iid
xTij 𝜷 + zTij vi with vi ∼ Nq ( , 𝚺𝑣 ), (10.13.11)
(iii) f (𝜷, 𝚺𝑣 ) ∝ f (𝚺𝑣 ). Zeger and Karim (1991) proposed the Jeffrey’s prior f (𝚺𝑣 ) ∝ |𝚺𝑣 |
(q+1)∕2
.
(10.13.12)
Unfortunately, the choice (10.13.12) leads to an improper joint posterior distribution even though all the Gibbs conditionals are proper (Natarajan and Kass 2000). A simple choice that avoids this difficulty is the flat prior f (𝚺𝑣 ) ∝ 1. An alternative prior is
394
HIERARCHICAL BAYES (HB) METHOD
obtained by assuming a Wishart distribution on 𝚺𝑣 1 with parameters reflecting lack of information on 𝚺𝑣 . Natarajan and Kass (2000) proposed different priors on 𝚺𝑣 that ensure propriety of the joint posterior. In the context of small area estimation, Malec et al. (1997) studied a two-level HB model with class-specific covariates, xj , when the population is divided into classes j. Suppose there are M areas (say, counties) in the population and yij𝓁 denotes the binary response variable associated with the 𝓁th individual in class j and area i (𝓁 1, , Nij ). The two-level population HB model of Malec et al. (1997) may be expressed as ind
(i) yij𝓁 |pij ∼ Bernoulli(pij ) (ii) 𝜉ij
logit(pij )
xTj 𝜷 i
𝜷i
Zi 𝜶 + vi ;
vi ∼ Nq ( , 𝚺𝑣 )
iid
(iii) f (𝜶, 𝚺𝑣 ) ∝ 1,
(10.13.13)
where Zi is a p × q matrix of area level covariates. We assume that the model (10.13.13) holds for the sample (yij𝓁 , xj , Zi ); 𝓁 ∈ sij , i 1, , m , where m ≤ M is the number of sampled areas and sij is the sample of nij individuals in class j and sampled area i. (k) (k) Let (𝜷 (k) , , 𝜷 (k) d + 1, , d + D denote the MCMC samples m , 𝜶 , 𝚺𝑣 ); k 1 (k) (k) , 𝜷 m , 𝜶, 𝚺𝑣 |y). For a nonsampled area generated from the joint posterior f (𝜷 1 , , we generate 𝜷 (k) from N(Zi 𝜶 (k) , 𝚺(k) i m + 1, , M, given 𝜶 (k) , 𝚺(k) 𝑣 𝑣 ), assuming i , M is observed. Using this value of 𝜷 i , we calculate p(k) that Zi for i m + 1, ij exp (x(k) 𝜷 (k) )∕ 1 + exp (x(k) 𝜷 (k) ) for all areas i 1, , M. j i j i Suppose that we are interested in a finite population proportion corresponding to a collection of areas, say I ⊂ 1, , M , and a collection of classes, say J; that is, N
/∑∑
ij ∑∑∑
P(I, J)
yij i∈I j∈J 𝓁 1
Nij , i∈I j∈J
(10.13.14)
∶ Y(I, J)∕N(I, J).
Then the HB estimate of the proportion P(I, J) is given by P̂ HB (I, J) Ŷ HB (I, J)∕N(I, J), where Ŷ HB (I, J) is obtained as ∑∑ ∑ ̂ HB
Y
(I, J)
yij𝓁 i∈I j∈J 𝓁∈sij
[ d+D 1 ∑ ∑∑ + (N D k d+1 i∈I j∈J ij
] nij )p(k) ij
.
(10.13.15)
395
BINARY DATA
Similarly, the posterior variance is V P(I, J)|y V Y(I, J)|y is obtained as
V Y(I, J)|y ∕ N(I, J) 2 , where
[ ⎧ d+D ∑∑ 1 ∑ ⎪∑∑ (k) (k) (Nij nij )pij (1 pij ) + (Nij V Y(I, J)|y ≈ ⎨ D k d+1⎪i∈I j∈J i∈I j∈J ⎩ [ d+D ]2 1 ∑ ∑∑ (k) (N nij )pij . D k d+1 i∈I j∈J ij Note that nij
]2⎫ ⎪ ⎬ ⎪ ⎭
nij )p(k) ij
(10.13.16)
0 if an area i ∈ I is not sampled.
Example 10.13.3. Visits to Physicians. Malec et al. (1997) applied the two-level model (10.13.13) to estimate health-related proportions from the National Health Interview Survey (NHIS) data for the 50 states and the District of Columbia and for specified subpopulations within these 51 areas. The NHIS sample consisted of 200 primary sampling units (psu’s) and households sampled within each selected psu. Each psu is either a county or a group of contiguous counties, and the total sample consisted of approximately 50,000 households and 120,000 individuals. Selection of Variables Individual-level auxiliary variables included demographic variables such as race, age, and sex and socioeconomic variables such as highest education level attained. County-level covariates, such as mortality rates, counts of hospitals and hospital beds, and number of physicians by field of specialization, were also available. The population was partitioned into classes defined by the cross-classification of race, sex, and age (in 5-year groups). Reliable estimates of the counts, Nij , were available for each county i and class j. The vector of covariates, xj , used in (ii) of the HB model (10.13.13), is assumed to be the same for all individuals in class j. A particular binary variable y, denoting the presence/absence of at least one doctor visit within the past year, was used to illustrate the model fitting and HB estimation of proportions P(I, J). The auxiliary variables, xj , and the area level covariates in the matrix Zi , used in the linking model (ii), were selected in two steps using PROC LOGISTIC in SAS. In the first step, the variation in the 𝜷 i ’s was ignored by setting 𝜷 i 𝜷 and the elements of x were selected from a list of candidate variables and their ind interactions, using the model logit(pj ) xTj 𝜷 together with yij𝓁 ∼ Bernoulli(pj ). The following variables and interactions were selected by the SAS procedures: xj (1, x0,j , x15,j , x25,j , x55,j , aj x15,j , aj x25,j , bj )T , where aj and bj are (0,1) variables with aj 1 if class j corresponds to males, bj 1 if class j corresponds to whites, and xt,j max(0, cj t), t 0, 15, 20, 25 with cj denoting the center point of the age group in class j; for example, age cj 42.5 if class j corresponds to age group 40, 45 , and then x15,j max(0, 42.5 15) 27.5.
396
HIERARCHICAL BAYES (HB) METHOD
The variables xj selected in the first step were then used in the combined model ind
logit(pij ) xTj Zi 𝜶 and yij𝓁 ∼ Bernoulli(pij ) to select the covariates Zi , using the SAS . For the parprocedure. Note that the combined model is obtained by setting 𝚺𝑣 ticular binary variable y, the choice Zi I captured between county variation quite well, that is, county level covariates are not needed. But for other binary variables, county-level variables may be needed. Based on the above two-step selection of variables, the final linking model is given by logit(pij )
𝛽1i + 𝛽2i x0,j + 𝛽3i x15,j + 𝛽4i x25,j + 𝛽5i x55,j (10.13.17)
+ 𝛽6i aj x15,j + 𝛽7i aj x25,j + 𝛽8i bj , where 𝜷 i
(𝛽1i ,
, 𝛽8i )T
𝜶 + vi .
Model Fit To determine the model fit, two kinds of cross-validation were conducted. In the first kind, the sample individuals were randomly divided into five groups, while, in the second kind, the sample counties were randomly divided into groups. Let sih be the set of individuals in the hth group in county i, and yih be the corresponding sample mean. Denote the expectation and variance of yih with respect to the full predictive density by E2 (yih ) and V2 (yih ) E2 E2 (yih ) yih 2 , respectively. Malec et al. (1997) E2 (yih ) yih 2 to V2 (yih ) using the following summary measure: compared D2ih [
m 5 ∑ ∑
C
D2ih
V2 (yih )∕ i 1 h 1
]1∕2
m 5 ∑ ∑
.
(10.13.18)
i 1 h 1
If the assumed model provides an adequate fit, then |C 1| should be reasonably small. The values of C were calculated for both types of cross-validation and for each of several subpopulations. These values clearly indicated adequacy of the model in the sense of C values close to 1. The measure C2 may also be regarded as the ratio of the average posterior variance to the average MSE of HB estimator, treating yih as the “true” value. Under this interpretation, the posterior variance is consistent with the MSE because the C values are close to 1. Comparison of Estimators The HB method was compared to the EB method and also to synthetic estimation. In the EB method, the posterior mean and posterior variance were obtained from (10.13.15) and (10.13.16) by setting 𝜶 and 𝚺𝑣 equal to the corresponding ML estimates and then generating the MCMC samples. Similarly, for synthetic estimation, posterior mean and associated posterior variance were obtained and then generating from (10.13.15) and (10.13.16) by setting 𝜷 i Zi 𝜶 and 𝚺𝑣 the MCMC samples. As expected, the EB estimates were close to the corresponding HB estimates, and the EB standard errors were considerably smaller than the corresponding HB standard errors. Synthetic standard errors were also much smaller than the corresponding HB standard errors, but the relative differences in the point estimates were small.
397
*MISSING BINARY DATA
Malec et al. (1997) also conducted an external evaluation in a separate study. For this purpose, they used a binary variable, health-related partial work limitation, which was included in the 1990 U.S. Census of Population and Housing Long Form. Treating the small area census proportions for this variable as “true” values, they compared the estimates corresponding to alternative methods and models to the true values. Folsom, Shah, and Vaish (1999) used a different logistic linear mixed HB model to produce HB estimates of small area prevalence rates for states and age groups for up to 20 drug use related binary outcomes, using data from pooled National Household Survey on Drug Abuse (NHSDA) surveys. Let yaij𝓁 denote the value of a binary variable, y, for the 𝓁th individual in age group a 1, , 4 belonging to the jth clusind ter of the ith state. Assume that yaij𝓁 |paij𝓁 ∼ Bernoulli(paij𝓁 ). A linking model of the logistic linear mixed model type was assumed: xTaij𝓁 𝜷 a + 𝑣ai + uaij ,
logit(paij𝓁 )
(10.13.19)
where xaij𝓁 denotes a pa × 1 vector of auxiliary variables associated with age group a and 𝜷 a is the associated regression parameters. Furthermore, the vectors vi (𝑣1i , , 𝑣4i )T and uij (u1ij , , u4ij )T are assumed to be mutually independent with vi ∼ N4 ( , 𝚺𝑣 ) and uij ∼ N4 ( , 𝚺u ). The model parameters 𝜷, 𝚺u , and 𝚺𝑣 are assumed to obey the following prior: f (𝜷, 𝚺u 1 , 𝚺𝑣 1 ) ∝ f (𝚺u 1 )f (𝚺𝑣 1 ),
(10.13.20)
where f (𝚺u 1 ) and f (𝚺𝑣 1 ) are proper Wishart densities. The population model is assumed to hold for the sample (i.e., absence of sample selection bias), but survey weights, 𝑤aij𝓁 , were introduced to obtain pseudo-HB estimates and pseudo-HB standard errors, similarly as in Section 10.5.4. We refer the reader to Folsom et al. (1999) for details on MCMC sampling, selection of covariates and validation studies.
10.14
*MISSING BINARY DATA
Nandram and Choi (2002) extended the results of Section 10.13.1 on beta-binomial model to account for missing binary values yij . Let Rij denote the response indicator associated with jth individual in ith area, where Rij 1 if the individual responds and Rij 0 otherwise. The extended model assumes that iid
yij |pi ∼ Bernoilli(pi ), Rij | pRi , yij Rij | pRi , 𝛾i , yij
iid
0 ∼ Bernoilli(pRi ), iid
1 ∼ Bernoilli(𝛾i pRi ).
(10.14.1) (10.14.2) (10.14.3)
398
HIERARCHICAL BAYES (HB) METHOD
This model allows the response indicator Rij depend on yij . When 𝛾i 1, we have ignorable nonresponse in the sense that the probability of response does not depend on yij . The parameter 𝛾i in (10.14.3) reflects uncertainty about ignorability of the response mechanism for the ith area. Nandram and Choi (2002) used a beta linking model, pi |𝛼, 𝛽 ∼ beta(𝛼, 𝛽), 𝛼 > 0, 𝛽 > 0 on pi , as in Section 10.13.1. Furthermore, pRi is assumed to obey a beta distribution, beta(𝛼R , 𝛽R ), and 𝛾i obeys a truncated G(𝜈, 𝜈) distribution in 0 < 𝛾i < ∑ni pRi1 . Suitable proper priors are specified for the model parameters. Let Ri R j 1 ij the number of respondents in area i, r be the vector of response indicators Rij , p (p1 , , pm )T , pR (pR1 , , pRm )T , 𝝅 R (𝜋R1 , , 𝜋Rm )T , where 𝜋Ri 𝛾i pRi , and ∑ni T z (z1 , , zm ) , where zi y is the unknown y-total among nonresponj Ri +1 ij dents in area i. MCMC samples are then generated from the joint posterior of the model parameters 𝜽 (𝛼, 𝛽, 𝛼R , 𝛽R , 𝜈)T and the random effects p, pR , 𝝅 R , and z. Generation of a sample is done in three steps: (i) Integrate out p, pR , and 𝝅 R from the joint posterior density f (𝜽, p, pR , 𝝅 R , z|y, r) to obtain the marginal posterior density f (𝜽, z|y, r). (ii) Generate (𝜽, z) from f (𝜽, z|y, r) using a M–H algorithm and a sample importance resampling (SIR) algorithm. (iii) Obtain (pi , pRi , 𝜋Ri ), i 1, , m, from the posterior conditional density of (pi , pRi , 𝜋Ri ) given the value of (𝜽, z) generated from step (ii) and the data (y, r). Repeat steps (i)–(iii) a large number of times L ∑ to generate p(𝓁) ; 𝓁 1, , L and take the mean L 1 L𝓁 1 p(𝓁) as the HB estimate i i of pi . The HB estimates of pRi and 𝛾i are obtained similarly. We refer the reader to Nandram and Choi (2002) for further details. Nandram and Choi (2002) applied their HB method to binary data from the U.S. National Crime Survey (NCS) consisting of m 10 domains determined by urbanization, type of place, and poverty level (areas). Domain sample sizes ni ranged from 10 to 162. Nonresponse in these domains ranged from 9.4% to 16.9%. The parameter pi here refers to the proportion of households reporting at least one crime in domain i. Nandram and Choi (2010) proposed a two-part model to estimate mean body mass index (BMI) for small domains from the U.S. National Health and Nutrition Examination Survey (NHANES III) in the presence of nonignorable nonresponse. Part 1 of the model specifies a Bernouilli(pRij ) distribution for the response indicator Rij with logit(pRij ) 𝛽0i + 𝛽1i yij and (𝛽0i , 𝛽1i )T are iid and obeying a bivariate normal distribution, where yij is the BMI for the jth individual in area i. Part 2 of the model assumes a two-level linear mixed model based on covariates related to BMI. The model is extended to include sample selection probabilities into the nonignorable nonresponse model to reflect the higher selection probabilities for black non-Hispanics and Hispanic Americans in NHANES III. HB inferences on the mean BMI for each area are implemented using sophisticated M–H algorithms to draw MCMC samples.
10.15
NATURAL EXPONENTIAL FAMILY MODELS
In Section 4.6.4 of Chapter 4, we assumed that the sample statistics yij (j 1, , ni , i 1, , m), given the 𝜃ij ’s, are independently distributed with probability density
399
CONSTRAINED HB
function belonging to the natural exponential family with canonical parameters 𝜃ij and known scale parameters 𝜙ij (> 0): [ f (yij |𝜃ij )
exp
1 𝜃 y 𝜙ij ij ij
] a(𝜃ij ) + b(yij , 𝜙ij ) ,
(10.15.1)
where a(⋅) and b(⋅) are known functions. For example, 𝜃ij logit(pij ), 𝜙ij yij ∼ binomial(nij , pij ). The linking model on the 𝜃ij ’s is given by 𝜃ij
xTij 𝜷 + 𝑣i + uij , iid
1 if
(10.15.2) iid
where 𝑣i , and uij are mutually independent of 𝑣i ∼ N(0, 𝜎𝑣2 ) and uij ∼ N(0, 𝜎u2 ), and 𝜷 is a p × 1 vector of covariates without the intercept term. For the HB version of the model, we make an additional assumption that the model parameters 𝜷, 𝜎𝑣2 , and 𝜎u2 are mutually independent of f (𝜷) ∝ 1, 𝜎𝑣 2 ∼ G(a𝑣 , b𝑣 ), and 𝜎u 2 ∼ G(au , bu ). The objective here is to make inferences on the small area parameters; in particular, evaluate the posterior quantities E(𝜃ij |y), V(𝜃ij |y) and Cov(𝜃ij , 𝜃𝓁k |y) for j ≠ k or i ≠ 𝓁. For example, 𝜃ij logit(pij ), where pij denotes the proportion associated with a binary variable in the jth age-sex group in the ith region. Ghosh et al. (1998) gave sufficient conditions for the propriety of the joint posterior f (𝜽|y). In particular, if yij is either binomial or Poisson, the conditions are: b𝑣 > 0, ∑ ∑n bu > 0, m p + au > 0, m + a𝑣 > 0 and j i 1 yij > 0 for each i. The posterior i 1 ni is not identifiable (that is, improper) if an intercept term is included in the linking model (10.15.2). The Gibbs conditionals are easy to derive. In particular, 𝜷|𝜽, v, 𝜎𝑣2 , 𝜎u2 , y is ind p-variate normal, 𝑣i |𝜽, 𝜷, 𝜎𝑣2 , 𝜎u2 , y ∼ normal, 𝜎𝑣 2 |𝜽, 𝜷, v, 𝜎u2 , y ∼ gamma, and 𝜎u 2 |𝜽, 𝜷, v, 𝜎𝑣2 , y ∼ gamma, but 𝜃ij |𝜷, v, 𝜎𝑣2 , 𝜎u2 , y does not admit a closed-form density function. However, log f (𝜃ij |𝜷, v, 𝜎𝑣2 , 𝜎u2 , y) is a concave function of 𝜃ij , and therefore one can use the adaptive rejection sampling scheme of Gilks and Wild (1992) to generate samples. Ghosh et al. (1998) generalized the model given by (10.15.1) and (10.15.2) to handle multicategory data sets. They applied this model to a data set from Canada on exposures to health hazards. Sample respondents in 15 regions of Canada were asked whether they experienced any negative impact of health hazards in the work place. Responses were classified into four categories: 1 yes, 2 no, 3 not exposed, and 4 not applicable or not stated. Here it was desired to make inferences on the category proportions within each age–sex class j in each region i.
10.16
CONSTRAINED HB
In Section 9.8.1 of Chapter 9, we obtained the constrained Bayes (CB) estimator 𝜃̂iCB , given by (9.8.6), that matches the variability of the small area parameters 𝜃i . This estimator, 𝜃̂iCB (𝝀), depends on the unknown model parameters 𝝀 of the two-stage
400
HIERARCHICAL BAYES (HB) METHOD
ind iid ̂ we model 𝜃̂i |𝜃i ∼ f (𝜃̂i |𝜃i , 𝝀1 ) and 𝜃i ∼ f (𝜃i |𝝀2 ). Replacing 𝝀 by a suitable estimator 𝝀, ECB ̂ obtained an empirical CB (ECB) estimator 𝜃i We can use a constrained HB (CHB) estimator instead of the ECB estimator. The CHB estimator, 𝜃̂iCHB , is obtained by minimizing the posterior squared ∑ ∑ error E m ti )2 |𝜽̂ subject to t⋅ 𝜃̂⋅HB and (m 1) 1 m t⋅ ) 2 i∑ 1 (𝜃i i 1 (ti m 1 2 ̂ 𝜃⋅ ) |𝜽 . Solving this problem, we obtain E (m 1) i 1 (𝜃i
ti opt
𝜃̂iCHB
̂ 𝜃̂ HB 𝜃̂⋅HB + a(𝜽)( i
where 𝜃̂iHB is the HB estimator for area i, 𝜃̂⋅HB
̂ a(𝜽)
m ∑ ⎡ V(𝜃i ⎢ i 1 ⎢1 + m ⎢ ∑ (𝜃̂iHB ⎢ ⎣ i 1
𝜃̂⋅HB ),
(10.16.1)
∑m ̂ HB i 1 𝜃i ∕m and 1∕2
̂ ⎤ 𝜃⋅ |𝜽) ⎥ ⎥ ⎥ ̂𝜃⋅HB )2 ⎥ ⎦
(10.16.2)
,
∑m where 𝜃⋅ i 1 𝜃i ∕m. The proof of (10.16.1) follows along the lines of Section 9.12.3. The estimator 𝜃̂iCHB depends on the HB estimators 𝜃̂iHB and the posterior variances ̂ which can be evaluated using MCMC methods. The estimator 𝜃̂ CHB V(𝜃i 𝜃⋅ |𝜽), i usually employs less shrinking toward the overall average compared to 𝜃̂iECB (Ghosh and Maiti 1999). An advantage of the CHB approach is that it readily provides a measure of uncer̂ associated with tainty associated with 𝜃̂iCHB . Similar to the posterior variance V(𝜃i |𝜽) ̂𝜃 HB , we use the posterior MSE, E (𝜃i 𝜃̂ CHB )2 |𝜽̂ , as the measure of uncertainty i i associated with 𝜃̂iCHB . This posterior MSE can be decomposed as E (𝜃i
𝜃̂iCHB )2 |𝜽̂
E (𝜃i
𝜃̂iHB )2 |𝜽̂ + (𝜃̂iHB
̂ + (𝜃̂ HB V(𝜃i |𝜽) i
𝜃̂iCHB )2
𝜃̂iCHB )2 .
(10.16.3)
It is clear from (10.16.3) that the posterior MSE is readily obtained from the posterior ̂ and the estimators 𝜃̂ HB and 𝜃̂ CHB . On the other hand, it appears variance V(𝜃i |𝜽) i i difficult to obtain a nearly unbiased estimator of MSE of the ECB estimator 𝜃̂iECB . The jackknife method used in Chapter 9 is not readily applicable to 𝜃̂iECB . 10.17
*APPROXIMATE HB INFERENCE AND DATA CLONING
We return to the general two-stage modeling set up of Section 10.1, namely the conditional density f (y|𝝁, 𝝀1 ) of y given 𝝁 and 𝝀1 , the conditional density f (𝝁|𝝀2 ) of 𝝁 given 𝝀2 , and the prior density f (𝝀) of 𝝀 (𝝀T1 , 𝝀T2 )T . The joint posterior density of 𝝁 and 𝝀 may be expressed as f (𝝁, 𝝀|y)
f (𝝁|𝝀, y)f (𝝀|y) f (y|𝝁, 𝝀1 )f (𝝁|𝝀2 )f (𝝀)∕f1 (y)
(10.17.1)
*APPROXIMATE HB INFERENCE AND DATA CLONING
401
where f1 (y) is the marginal density of y. The posterior density of 𝝀 is then given by f (𝝀|y)
L(𝝀; y)f (𝝀)∕f1 (y),
(10.17.2)
where L(𝝀; y) is the likelihood function of the data y, given by L(𝝀; y)
∫
f (y|𝝁, 𝝀1 )f (𝝁|𝝀2 )d𝝁.
(10.17.3)
A first-order approximation to the posterior density f (𝝀|y) is given by the Normal ̂ the density with mean equal to the ML estimate 𝝀̂ of 𝝀 and covariance matrix I(𝝀), ̂ Fisher information matrix evaluated at 𝝀. Note this approximation does not depend on the prior density of 𝝀. Lele, Nadeem, and Schmuland (2010) suggested the use of this approximation in (10.1.1) to generate MCMC samples (𝝁(r) , 𝝀(r) ); r 1, , R by employing WinBUGS and use the 𝝁(r) -values to make posterior inferences on 𝝁. In particular, credible intervals, posterior means and posterior variances for 𝝁 may be computed. Kass and Steffey (1989) in fact used this approximation to get an analytical approximation to the posterior variance of 𝜃i in the basic area level model (10.3.1), ̂ given by (9.2.26). leading to VKS (𝜃i |𝜽) For complex two-stage models, the computation of ML estimates of model parameters 𝝀 can be difficult. Lele et al. (2010) proposed a “data cloning” method that exploits MCMC generation to compute ML estimate 𝝀̂ and its asymptotic covarî routinely. This method essentially creates a large number, K, of ance matrix I 1 (𝝀) clones of the data, y, and takes the likelihood function of the cloned data y(K) (yT , , yT )T as L(𝝀; y) K . The posterior density of 𝝀 given the cloned data y(K) is equal to f (𝝀|y(K) )
L(𝝀; y) K f (𝝀)∕f1 (y(K) ),
(10.17.4)
where f1 (y(K) ) is the marginal density of y(K) . Under regularity conditions, f (𝝀|y(K) ) ̂ is approximately the normal density with mean 𝝀̂ and covariance matrix K 1 I 1 (𝝀). ̂ Hence, this distribution is nearly degenerate at 𝝀 if K is large. The mean of the posterior distribution given in (10.17.4) is taken as the ML estimate 𝝀̂ and K ̂ times the posterior covariance matrix as the asymptotic covariance matrix of 𝝀. (r) (K) We simply generate MCMC samples 𝝀 ; r 1, , R from f (𝝀|y ) and take ∑ ∑ 𝝀(⋅) R 1 Rr 1 𝝀(r) as 𝝀̂ and R 1 Rr 1 (𝝀(r) 𝝀(⋅) )(𝝀(r) 𝝀(⋅) )T as the asymptotic ̂ provided the parameter space is continuous (Lele et al. 2010). covariance matrix of 𝝀, Torabi and Shokoohi (2015) applied the above approximate HB and data cloning approach to complex two-stage models, involving semi-parametric linear, logistic and Poisson mixed models based on spline approximations to the mean function. Their simulation results indicate good performance of the method in terms of coverage probability and absolute relative bias of the MSE estimator obtained from MCMC samples.
402
10.18 10.18.1
HIERARCHICAL BAYES (HB) METHOD
PROOFS Proof of (10.2.26)
Noting that f (y)
f (y, 𝜼)∕f (𝜼|y), we express f (yr |y(r) ) as f (yr |y(r) )
f (y) f (y(r) )
1 f (y(r) , 𝜼) ∫
f (y, 𝜼)
f (𝜼|y)d𝜼
1 . 1 f (𝜼|y)d𝜼 ∫ f (yr |y(r) , 𝜼)
(10.18.1)
The denominator of (10.18.1) is the expectation of 1∕f (yr |y(r) , 𝜼) with respect to f (𝜼|y). Hence, we can estimate (10.18.1) by (10.2.26) using the MCMC output 𝜼(k) . 10.18.2
Proof of (10.2.32)
We write E a(yr )|y(r),obs as E a(yr )|y(r),obs
E1 E2 a(yr )
E1 br (𝜼) ,
where E2 is the expectation over yr given 𝜼 and E1 is the expectation over 𝜼 given y(r),obs . Note that we have assumed conditional independence of yr and y(r) given 𝜼. Therefore, E1 br (𝜼)
∫
br (𝜼)f (𝜼|y(r),obs )d𝜼
f (yr,obs |y(r),obs )
br (𝜼) f (𝜼|yobs )d𝜼, ∫ f (yr,obs |𝜼)
(10.18.2)
noting that (i) f (𝜼|y(r) ) f (y(r) |𝜼)f (𝜼)∕f (y(r) ), (ii) f (𝜼) f (y)f (𝜼|y)∕f (y|𝜼), (iii) f (y)∕f (y(r) ) f (y|y(r) ), and (iv) f (y|𝜼) f (y(r) , yr |𝜼) f (y(r) |𝜼)f (yr |𝜼). The integral in (10.18.2) is the expectation of br (𝜼)∕f (yr,obs |𝜼) with respect to f (𝜼|yobs ). Hence, (10.18.2) may be estimated by (10.2.32) using the MCMC output 𝜼(k) . 10.18.3
Proof of (10.3.13)–(10.3.15)
The Gibbs conditional 𝜷|𝜽, 𝜎𝑣2 , 𝜽̂ may be expressed as ̂ f (𝜷|𝜽, 𝜎𝑣2 , 𝜽)
̂ f (𝜷, 𝜽, 𝜎𝑣2 |𝜽) 2 ̂ ∫ f (𝜷, 𝜽, 𝜎𝑣 |𝜽)d𝜷 ̂ ∝ f (𝜷, 𝜽, 𝜎𝑣2 |𝜽)
(10.18.3)
403
PROOFS
because the denominator of (10.18.3) is constant with respect to 𝜷. Retaining terms ̂ and letting z̃ i zi ∕bi and 𝜃̃i 𝜃i ∕bi , we get involving only 𝜷 in f (𝜷, 𝜽, 𝜎𝑣2 |𝜽) log
̂ f (𝜷|𝜽, 𝜎𝑣2 , 𝜽)
1 T 𝜷 2𝜎𝑣2
const
(m ∑
m ∑
𝜃̃i z̃ Ti 𝜷
2
𝜷
i 1
⎡ 1 ⎢ (𝜷 2𝜎𝑣2 ⎢ ⎣
const
) z̃ i z̃ Ti
i 1
(m ∑ 𝜷 ∗ )T
)
1
z̃ i z̃ Ti
(𝜷
i 1
⎤ 𝜷 ∗ )⎥(10.18.4) , ⎥ ⎦
∑ ∑ ̃ i z̃ Ti ) 1 m ̃ i 𝜃̃i . It follows from (10.18.4) that 𝜷|𝜽, 𝜎𝑣2 , 𝜽̂ is where 𝜷 ∗ ( m i 1z i 1z ∑ m ∗ 2 T 1 Np 𝜷 , 𝜎𝑣 ( i 1 z̃ i z̃ i ) . Similarly, [ log
̂ f (𝜃i |𝜷, 𝜎𝑣2 , 𝜽)
𝜃i2
const
1 2
const
1 𝜃 2𝛾i 𝜓i i
𝛾i 𝜓 i
𝜃 𝜃̂ 2 i i 𝜓i
2
𝜃i zTi 𝜷
]
𝜎𝑣2 b2i
𝜃̂iB (𝜷, 𝜎𝑣2 ) 2 .
(10.18.5)
It now follows from (10.18.5) that 𝜃i |𝜷, 𝜎𝑣2 , 𝜽̂ is N 𝜃̂iB (𝜷, 𝜎𝑣2 ), 𝛾i 𝜓i . Similarly, m
̂ log f (𝜎𝑣2 |𝜷, 𝜽, 𝜽)
1 ∑ ̃ (𝜃i z̃ Ti 𝜷)2 2𝜎𝑣2 i 1 [ ] 1 b + log . 𝜎𝑣2 (𝜎𝑣2 )m∕2+a 1
const
(10.18.6)
It now follows from (10.18.6) that the Gibbs conditional 𝜎𝑣 2 |𝜷, 𝜽, 𝜽̂ is G m∕2 + ∑ ̃ z̃ T 𝜷)2 ∕2 + b . a, m i 1 (𝜃i i
REFERENCES
Ambrosio Flores, L. and Iglesias Martínez, L. (2000), Land Cover Estimation in Small Areas Using Ground Survey and Remote Sensing, Remote Sensing of Environment, 74, 240–248. Anderson, T.W. (1973), Asymptotically Efficient Estimation of Covariance Matrices with Linear Covariance Structure, Annals of Statistics, 1, 135–141. Anderson, T.W. and Hsiao, C. (1981), Formulation and Estimation of Dynamic Models Using Panel Data, Journal of Econometrics, 18, 67–82. Ansley, C.F. and Kohn, R. (1986), Prediction Mean Squared Error for State Space Models with Estimated Parameters, Biometrika, 73, 467–473. Arima, S., Datta, G.S., and Liseo, B. (2015), Accounting for Measurement Error in Covariates in SAE: An Overview, in M. Pratesi (Ed.), Analysis of Poverty Data by Small Area Methods, Hoboken, NJ: John Wiley & Sons, Inc., in print. Arora, V. and Lahiri, P. (1997), On the Superiority of the Bayes Method over the BLUP in Small Area Estimation Problems, Statistica Sinica, 7, 1053–1063. Baíllo, A. and Molina, I. (2009), Mean-Squared Errors of Small-Area Estimators Under a Unit-Level Multivariate Model, Statistics, 43, 553–569. Banerjee, M. and Frees, E.W. (1997), Influence Diagnostics for Linear Longitudinal Models, Journal of the American Statistical Association, 92, 999–1005. Banerjee, S., Carlin, B.P., and Gelfand, A.E. (2004), Hierarchical Modeling and Analysis for Spatial Data, New York: Chapman and Hall. Bankier, M. (1988), Power Allocation: Determining Sample Sizes for Sub-National Areas, American Statistician, 42, 174–177.
406
REFERENCES
Bartoloni, E. (2008), Small Area Estimation and the Labour Market in Lombardy’s Industrial Districts: a Mathematical Approach, Scienze Regionali, 7, 27–54. Bates, D., Maechler, M., Bolker, B., Walker, S., Bojesen Christensen, R.H., Singmann, H., and Dai, B. (2014), lme4: Linear mixed-effects models using Eigen and S4, R package version 1.1-7. Battese, G.E., Harter, R.M., and Fuller, W.A. (1988), An Error Component Model for Prediction of County Crop Areas Using Survey and Satellite Data, Journal of the American Statistical Association, 83, 28–36. Bayarri, M.J. and Berger, J.O. (2000), P Values for Composite Null Models, Journal of the American Statistical Association, 95, 1127–1142. Beaumont, J.-F., Haziza, D., and Ruiz-Gazen, A. (2013), A Unified Approach to Robust Estimation in Finite Population Sampling, Biometrika, 100, 555–569. Beckman, R.J., Nachtsheim, C.J., and Cook, R.D. (1987), Diagnostics for Mixed-Model Analysis of Variance, Technometrics, 1987, 413–426. Béland, Y., Bailie, L., Catlin, G., and Singh, M.P. (2000), An Improved Health Survey Program at Statistics Canada, Proceedings of the Section on Survey Research Methods, Washington, DC: American Statistical Association, pp. 671–676. Bell, W.R. (1999), Accounting for Uncertainty About Variances in Small Area Estimation, Bulletin of the International Statistical Institute. Bell, W.R. (2008), Examining Sensitivity of Small Area Inferences to Uncertainty About Sampling Error Variances, Proceedings of the Survey Research Section, American Statistical Association, pp. 327–334. Bell, W.R., Datta, G.S., and Ghosh, M. (2013), Benchmarking Small Area Estimators, Biometrika, 100, 189–202. Bell, W.R. and Huang, E.T. (2006), Using the t-distribution to Deal with Outliers in Small Area Estimation, in Proceedings of Statistics, Canada Symposium on Methodological Issues in Measuring Population Health, Statistics Canada, Ottawa, Canada. Berg, E.J. and Fuller, W.A. (2014), Small Area Estimation of Proportions with Applications to the Canadian Labour Force Survey, Journal of Survey Statistics and Methodology, 2, 227–256. Berger, J.O. and Pericchi, L.R. (2001), Objective Bayesian Methods for Model Selection: Introduction and Comparison, in P. Lahiri (ed.), Model Selection, Notes - Monograph Series, Volume 38, Beachwood, OH: Institute of Mathematical Statistics. Besag, J.E. (1974), Spatial Interaction and the Statistical Analysis of Lattice Systems (with Discussion), Journal of the Royal Statistical Society, Series B, 35, 192–236. Bilodeau, M. and Srivastava, M.S. (1988), Estimation of the MSE Matrix of the Stein Estimator, Canadian Journal of Statistics, 16, 153–159. Brackstone, G.J. (1987), Small Area Data: Policy Issues and Technical Challenges, in R. Platek, J.N.K. Rao, C.-E. Särndal, and M.P. Singh (Eds.), Small Area Statistics, New York: John Wiley Sons, Inc., pp. 3–20. Brandwein, A.C. and Strawderman, W.E. (1990), Stein-Estimation: The Spherically Symmetric Case, Statistical Science, 5, 356–369. Breckling, J. and Chambers, R. (1988), M-quantiles, Biometrika, 75, 761–771. Breidt, F.J., Claeskens, G., and Opsomer, J.D. (2005), Model-Assisted Estimation for Complex Surveys Using Penalized Splines, Biometrika, 92, 831–846.
REFERENCES
407
Breslow, N. and Clayton, D. (1993), Approximate Inference in Generalized Linear Mixed Models, Journal of the American Statistical Association, 88, 9–25. Brewer, K.R.W. (1963), Ratio Estimation and Finite Populations: Some Results Deducible from the Assumption of an Underlying Stochastic Process, Australian Journal of Statistics, 5, 5–13. Brooks, S.P. (1998), Markov Chain Monte Carlo Method and Its Applications, Statistician, 47, 69–100. Brooks, S.P., Catchpole, E.A., and Morgan, B.J. (2000), Bayesian Animal Survival Estimation, Statistical Science, 15, 357–376. Browne, W.J. and Draper, D. (2006), A Comparison of Bayesian and Likelihood-Based Methods for Fitting Multilevel Models, Bayesian Analysis, 1, 473–514. Butar, F.B. and Lahiri, P. (2003), On Measures of Uncertainty of Empirical Bayes Small-Area Estimators, Journal of Statistical Planning and Inference, 112, 63–76. Calvin, J.A. and Sedransk, J. (1991), Bayesian and Frequentist Predictive Inference for the Patterns of Care Studies, Journal of the American Statistical Association, 86, 36–48. Carlin, B.P. and Louis, T.A. (2008), Bayes and Empirical Bayes Methods for Data Analysis (3rd ed.), Boca Raton, FL: Chapman and Hall/CRC. Carroll, R.J., Ruppert, D., Stefanski, L.A., and Crainiceanu, C.M. (2006), Measurement Error in Nonlinear Models: A Modern Perspective. (2nd ed.), Boca Raton, FL: Chapman Hall/CRC. Casady, R.J. and Valliant, R. (1993), Conditional Properties of Post-stratified Estimators Under Normal Theory, Survey Methodology, 19, 183–192. Casella, G. and Berger, R.L. (1990), Statistical Inference, Belmonte, CA: Wadsworth Brooks/Cole. Casella, G. and Hwang, J.T. (1982), Limit Expressions for the Risk of James-Stein Estimators, Canadian Journal of Statistics, 10, 305–309. Casella, G., Lavine, M., and Robert, C.P. (2001), Explaining the Perfect Sampler, American Statistician, 55, 299–305. Chambers, R.L. (1986), Outlier Robust Finite Population Estimation. Journal of the American Statistical Association, 81, 1063–1069. Chambers, R.L. (2005), What If … ? Robust Prediction Intervals for Unbalanced Samples. S3RI Methodology Working Papers M05/05, Southampton Statistical Sciences Research Institute, Southampton, UK. Chambers, R., Chandra, H., Salvati, N., and Tzavidis, N. (2014), Outlier Robust Small Area Estimation, Journal of the Royal Statistical Society, Series B, 76, 47–69. Chambers, R., Chandra, H., and Tzavidis, N. (2011), On Bias-Robust Mean Squared Error Estimation for Pseudo-Linear Small Area Estimators, Survey Methodology, 37, 153–170. Chambers, R.L. and Clark, R.G. (2012), An Introduction to Model-Based Survey Sampling with Applications, Oxford: Oxford University Press. Chambers, R.L. and Feeney, G.A. (1977), Log Linear Models for Small Area Estimation, Unpublished paper, Australian Bureau of Statistics. Chambers, R. and Tzavidis, N. (2006), M-quantile Models for Small Area Estimation, Biometrika, 93, 255–268.
408
REFERENCES
Chandra, H. and Chambers, R. (2009), Multipurpose Weighting for Small Area Estimation, Journal of Official Statistics, 25, 379–395. Chandra, H., Salvati, N., and Chambers, R. (2007), Small Area Estimation for Spatially Correlated Populations - A Comparison of Direct and Indirect Model-Based Estimators, Statistics in Transition, 8, 331-350. Chatrchi, G. (2012), Robust Estimation of Variance Components in Small Area Estimation, Masters Thesis, School of Mathematics and Statistics, Carleton University, Ottawa, Canada. Chatterjee, S., Lahiri, P., and Li, H. (2008), Parametric Bootstrap Approximation to the Distribution of EBLUP and Related Prediction Intervals in Linear Mixed Models, Annals of Statistics, 36, 1221–1245. Chattopadhyay, M., Lahiri, P., Larsen, M., and Reimnitz, J. (1999), Composite Estimation of Drug Prevalences for Sub-State Areas, Survey Methodology, 25, 81–86. Chaudhuri, A. (1994), Small Domain Statistics: A Review, Statistica Neerlandica, 48, 215–236. Chaudhuri, A. (2012), Developing Small Domain Statistics: Modelling in Survey Sampling, Saarbrücken: LAP LAMBERT Academic Publishing GMbH Co. KG. Chen, M-H., Shao, Q-M., and Ibrahim, J.G. (2000), Monte Carlo Methods in Bayesian Computation, New York: Springer-Verlag. Chen, S. and Lahiri, P. (2003), A Comparison of Different MSPE Estimators of EBLUP for the Fay-Herriot Model, Proceedings of the Section on Survey Research Methods, Washington, DC: American Statistical Association, pp. 903–911. Chen, S., Lahiri, P. and Rao, J.N.K. (2008). Robust mean squared prediction error estimators of EBLUP of a small area total under the Fay-Herriot model, Unpublished manuscript. Cheng, Y.S., Lee, K-O., and Kim, B.C. (2001), Adjustment of Unemployment Estimates Based on Small Area Estimation in Korea, Technical Report, Department of Mathematics, KAIST, Taejun, Korea. Chib, S. and Greenberg, E. (1995), Understanding the Metropolis-Hastings Algorithm, American Statistician, 49, 327–335. Choudhry, G.H., Rao, J.N.K., and Hidiroglou, M.A. (2012), On Sample Allocation for Efficient Domain Estimation, Survey Methodology, 38, 23–29. Christiansen, C.L. and Morris, C.N. (1997), Hierarchical Poisson Regression Modeling, Journal of the American Statistical Association, 92, 618–632. Christiansen, C.L., Pearson, L.M., and Johnson, W. (1992), Case-Deletion Diagnostics for Mixed Models, Technometrics, 34, 38–45. Chung, Y.S., Lee, K-O., and Kim, B.C. (2003), Adjustment of Unemployment Estimates Based on Small Area Estimation in Korea, Survey Methodology, 29, 45–52. Claeskens, G. and Hart, J.D. (2009), Goodness-of-fit tests in mixed models, Test, 18, 213–239. Clayton, D. and Bernardinelli, L. (1992), Bayesian Methods for Mapping Disease Risk, in P. Elliot, J. Cuzick, D. English, and R. Stern (Eds.), Geographical and Environmental Epidemiology: Methods for Small-Area Studies, London: Oxford University Press. Clayton, D. and Kaldor, J. (1987), Empirical Bayes Estimates of Age-Standardized Relative Risks for Use in Disease Mapping, Biometrics, 43, 671–681.
REFERENCES
409
Cochran, W.G. (1977), Sampling Techniques (3rd ed.), New York: John Wiley Sons, Inc.. Cook, R.D. (1977), Detection of Influential Observations in Linear Regression, Technometrics, 19, 15–18. Cook, R.D. (1986), Assessment of Local Influence, Journal of the Royal Statistical Society, Series B, 48, 133–155. Costa, A., Satorra, A., and Ventura, E. (2004), Using Composite Estimators to Improve both Domain and Total Area Estimation, SORT, 28, 69–86. Cowles, M.K. and Carlin, B.P. (1996), Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review, Journal of the American Statistical Association, 91, 883–904. Cressie, N. (1989), Empirical Bayes Estimation of Undercount in the Decennial Census, Journal of the American Statistical Association, 84, 1033–1044. Cressie, N. (1991), Small-Area Prediction of Undercount Using the General Linear Model, Proceedings of Statistics Symposium 90: Measurement and Improvement of Data Quality, Ottawa, Canada: Statistics Canada, pp. 93–105. Cressie, N. (1992), REML Estimation in Empirical Bayes Smoothing of Census Undercount, Survey Methodology, 18, 75–94. Cressie, N. and Chan, N.H. (1989), Spatial Modelling of Regional Variables, Journal of the American Statistical Association, 84, 393–401. Daniels, M.J. and Gatsonis, C. (1999), Hierarchical Generalized Linear Models in the Analysis of Variations in Health Care Utilization, Journal of the American Statistical Association, 94, 29–42. Daniels, M.J. and Kass, R.E. (1999), Nonconjugate Bayesian Estimation of Covariance Matrices and its Use in Hierarchical Models, Journal of the American Statistical Association, 94, 1254–1263. Das, K., Jiang, J., and Rao, J.N.K. (2004), Mean Squared Error of Empirical Predictor, Annals of Statistics, 32, 818–840. Dass, S.C., Maiti, T., Ren, H., and Sinha, S. (2012), Confidence Interval Estimation of Small Area Parameters Shrinking Both Means and Variances, Survey Methodology, 38, 173–187. Datta, G.S. (2009), Model-Based Approach to Small Area Estimation, in D. Pfeffermann and C.R. Rao, (Eds.), Sample Surveys: Inference and Analysis, Handbook of Statistics, Volume 29B, Amsterdam: North-Holland, pp. 251–288. Datta, G.S., Day, B., and Basawa, I. (1999), Empirical Best Linear Unbiased and Empirical Bayes Prediction in Multivariate Small Area Estimation, Journal of Statistical Planning and Inference, 75, 169–179. Datta, G.S., Day, B., and Maiti, T. (1998), Multivariate Bayesian Small Area Estimation: An Application to Survey and Satellite Data, Sankhy¯a, Series A, 60, 1–19. Datta, G.S., Fay, R.E., and Ghosh, M. (1991), Hierarchical and Empirical Bayes Multivariate Analysis in Small Area Estimation, in Proceedings of Bureau of the Census 1991 Annual Research Conference, U.S. Bureau of the Census, Washington, DC, pp. 63–79. Datta, G.S. and Ghosh, M. (1991), Bayesian Prediction in Linear Models: Applications to Small Area Estimation, Annals of Statistics, 19, 1748–1770. Datta, G.S., Ghosh, M., Huang, E.T., Isaki, C.T., Schultz, L.K., and Tsay, J.H. (1992), Hierarchical and Empirical Bayes Methods for Adjustment of Census Undercount: The 1988 Missouri Dress Rehearsal Data, Survey Methodology, 18, 95–108.
410
REFERENCES
Datta, G.S., Ghosh, M., and Kim, Y-H. (2001), Probability Matching Priors for One-Way Unbalanced Random Effect Models, Technical Report, Department of Statistics, University of Georgia, Athens. Datta, G.S., Ghosh, M., Nangia, N., and Natarajan, K. (1996), Estimation of Median Income of Four-Person Families: A Bayesian Approach, in W.A. Berry, K.M. Chaloner, and J.K. Geweke (Eds.), Bayesian Analysis in Statistics and Econometrics, New York: John Wiley Sons, Inc., pp. 129–140. Datta, G.S., Ghosh, M., Smith, D.D., and Lahiri, P. (2002), On the Asymptotic Theory of Conditional and Unconditional Coverage Probabilities of Empirical Bayes Confidence Intervals, Scandinavian Journal of Statistics, 29, 139–152. Datta, G.S., Ghosh, M., Steorts, R., and Maples, J.J. (2011), Bayesian Benchmarking with Applications to Small Area Estimation, Test, 20, 574–588. Datta, G.S., Ghosh, M., and Waller, L.A. (2000), Hierarchical and Empirical Bayes Methods for Environmental Risk Assessment, in P.K. Sen and C.R. Rao (Eds.), Handbook of Statistics, Volume 18, Amsterdam: Elsevier Science B.V., pp. 223–245. Datta, G.S., Hall, P., and Mandal, A. (2011), Model Selection and Testing for the Presence of Small-Area Effects, and Application to Area-Level Data, Journal of the American Statistical Association, 106, 362–374. Datta, G.S., Kubokawa, T., Molina, I., and Rao, J.N.K. (2011), Estimation of Mean Squared Error of Model-Based Small Area Estimators, Test, 20, 367–388. Datta, G.S. and Lahiri, P. (1995), Robust Hierarchical Bayes Estimation of Small Area Characteristics in the Presence of Covariates, Journal of Multivariate Analysis, 54, 310–328. Datta, G.S. and Lahiri, P. (2000), A Unified Measure of Uncertainty of Estimated Best Linear Unbiased Predictors in Small Area Estimation Problems, Statistica Sinica, 10, 613–627. Datta, G.S., Lahiri, P., and Maiti, T. (2002), Empirical Bayes Estimation of Median Income of Four-Person Families by State Using Time Series and Cross-Sectional Data, Journal of Statistical Planning and Inference, 102, 83–97. Datta, G.S., Lahiri, P., Maiti, T., and Lu, K.L. (1999), Hierarchical Bayes Estimation of Unemployment Rates for the U.S. States, Journal of the American Statistical Association, 94, 1074–1082. Datta, G.S. and Mandal, A. (2014), Small Area Estimation with Uncertain Random Effects, in Paper presented at SAE 2014 Conference, Poznan, Poland, September 2014. Datta, G.S., Rao, J.N.K., and Smith, D.D. (2005), On Measuring the Variability of Small Area Estimators under a Basic Area Level Model, Biometrika, 92, 183–196. Datta, G.S., Rao, J.N.K., and Torabi, M. (2010), Pseudo-Empirical Bayes Estimation of Small Area Means Under a Nested Error Linear Regression Model with Functional Measurement Errors, Journal of Statistical Planning and Inference, 140, 2952–2962. Dawid, A.P. (1985), Calibration-Based Empirical Probability, Annals of Statistics, 13, 1251–1274. Dean, C.B. and MacNab, Y.C. (2001), Modeling of Rates Over a Hierarchical Health Administrative Structure, Canadian Journal of Statistics, 29, 405–419. Deming, W.E. and Stephan, F.F. (1940), On a Least Squares Adjustment of a Sample Frequency Table When the Expected Marginal Totals are Known, Annals of Mathematical Statistics, 11, 427–444. Dempster, A.P., Laird, N.M., and Rubin, D.B. (1977), Maximum Likelihood from Incomplete Data Via the EM Algorithm, Journal of the Royal Statistical Society, Series B, 39, 1–38.
REFERENCES
411
Dempster, A.P. and Ryan, L.M. (1985), Weighted Normal Plots, Journal of the American Statistical Association, 80, 845–850. DeSouza, C.M. (1992), An Appropriate Bivariate Bayesian Method for Analysing Small Frequencies, Biometrics, 48, 1113–1130. Deville, J.C. and Särndal, C.-E. (1992), Calibration Estimation in Survey Sampling, Journal of the American Statistical Association, 87, 376–382. Diallo, M. (2014), Small Area Estimation: Skew-Normal Distributions and Time Series, Unpublished Ph.D. Dissertation, Carleton University, Ottawa, Canada. Diallo, M. and Rao, J.N.K. (2014), Small Area Estimation of Complex Parameters Under Unit-level Models with Skew-Normal Errors, Proceedings of the Survey Research Section, Washington, DC: American Statistical Association. Diao, L., Smith, D.D., Datta, G.S., Maiti, T., and Opsomer, J.D. (2014), Accurate Confidence Interval Estimation of Small Area Parameters Under the Fay-Herriot Model. Scandinavian Journal of Statistics, 41, 497–515. Dick, P. (1995), Modelling Net Undercoverage in the 1991 Canadian Census, Survey Methodology, 21, 45–54. Draper, N.R. and Smith, H. (1981), Applied Regression Analysis (2nd ed.), New York: John Wiley Sons, Inc.. Drew, D., Singh, M.P., and Choudhry, G.H. (1982), Evaluation of Small Area Estimation Techniques for the Canadian Labour Force Survey, Survey Methodology, 8, 17–47. Efron, B. (1975), Biased Versus Unbiased Estimation, Advances in Mathematics, 16, 259–277. Efron, B. and Morris, C. (1972a), Limiting the Risk of Bayes and Empirical Bayes Estimators, Part II: The Empirical Bayes Case, Journal of the American Statistical Association, 67, 130–139. Efron, B. and Morris, C. (1972b), Empirical Bayes on Vector Observations: An Extension of Stein’s Method, Biometrika, 59, 335–347. Efron, B. and Morris, C. (1973), Stein’s Estimation Rule and its Competitors - An Empirical Bayes Approach, Journal of the American Statistical Association, 68, 117–130. Efron, B. and Morris, C. (1975), Data Analysis Using Stein’s Estimate and Its Generalizations, Journal of the American Statistical Association, 70, 311–319. Elbers, C., Lanjouw, J.O., and Lanjouw, P. (2003), Micro-Level Estimation of Poverty and Inequality. Econometrica, 71, 355–364. Erciulescu, A.L. and Fuller, W.A. (2014), Parametric Bootstrap Procedures for Small Area Prediction Variance, Proceedings of the Survey Research Methods Section, Washington, DC: American Statistical Association. Ericksen, E.P. and Kadane, J.B. (1985), Estimating the Population in Census Year: 1980 and Beyond (with discussion), Journal of the American Statistical Association, 80, 98–131. Ericksen, E.P., Kadane, J.B., and Tukey, J.W. (1989), Adjusting the 1981 Census of Population and Housing, Journal of the American Statistical Association, 84, 927–944. Esteban, M.D., Morales, D., Pérez, A., and Santamaría, L. (2012), Small Area Estimation of Poverty Proportions Under Area-Level Time Models, Computational Statistics and Data Analysis, 56, 2840–2855. Estevao, V., Hidiroglou, M.A., and Särndal, C.-E. (1995), Methodological Principles for a Generalized Estimation Systems at Statistics Canada, Journal of Official Statistics, 11, 181–204.
412
REFERENCES
Estevao, V., Hidiroglou, M.A., and You Y. (2014), Area Level Model, Unit Level, and Hierarchical Bayes Methodology Specifications, Statistical Research and Innovation Division, Internal Statistics Canada Document, 240 pages. Fabrizi, E., Salvati, N., and Pratesi, M. (2012), Constrained Small Area Estimators Based on M-quantile Methods, Journal of Official Statistics, 28, 89–106. Falorsi, P.D., Falorsi, S., and Russo, A. (1994), Empirical Comparison of Small Area Estimation Methods for the Italian Labour Force Survey, Survey Methodology, 20, 171–176. Falorsi, P.D. and Righi, P. (2008), A Balanced Sampling Approach for Multi-Way Stratification Designs for Small Area Estimation, Survey Methodology, 34, 223–234. Farrell, P.J. (2000), Bayesian Inference for Small Area Proportions, Sankhy¯a, Series B, 62, 402–416. Farrell, P.J., MacGibbon, B., and Tomberlin, T.J. (1997a), Empirical Bayes Estimators of Small Area Proportions in Multistage Designs, Statistica Sinica, 7, 1065–1083. Farrell, P.J., MacGibbon, B., and Tomberlin, T.J. (1997b), Bootstrap Adjustments for Empirical Bayes Interval Estimates of Small Area Proportions, Canadian Journal of Statistics, 25, 75–89. Farrell, P.J., MacGibbon, B., and Tomberlin, T.J. (1997c), Empirical Bayes Small Area Estimation Using Logistic Regression Models and Summary Statistics, Journal of Business & Economic Statistics, 15, 101–108. Fay, R.E. (1987), Application of Multivariate Regression to Small Domain Estimation, in R. Platek, J.N.K. Rao, C.-E. Särndal, and M.P. Singh (Eds.), Small Area Statistics, New York: John Wiley Sons, Inc., pp. 91–102. Fay, R.E. (1992), Inferences for Small Domain Estimates from the 1990 Post Enumeration Survey, Unpublished Manuscript, U.S. Bureau of the Census. Fay, R.E. and Herriot, R.A. (1979), Estimation of Income from Small Places: An Application of James–Stein Procedures to Census Data, Journal of the American Statistical Association, 74, 269–277. Fellner, W.H. (1986), Robust Estimation of Variance Components, Technometrics, 28, 51–60. Ferretti, C. and Molina, I. (2012), Fast EB Method for Estimating Complex Poverty Indicators in Large Populations. Journal of the Indian Society of Agricultural Statistics, 66, 105–120. Folsom, R., Shah, B.V., and Vaish, A. (1999), Substance Abuse in States: A Methodological Report on Model Based Estimates from the 1994–1996 National Household Surveys on Drug Abuse, Proceedings of the Section on Survey Research Methods, Washington, DC: American Statistical Association, pp. 371–375. Foster, J., Greer, J., and Thorbecke, E. (1984), A Class of Decomposable Poverty Measures. Econometrica, 52, 761–766. Freedman, D.A. and Navidi, W.C. (1986), Regression Methods for Adjusting the 1980 Census (with discussion), Statistical Science, 18, 75–94. Fuller, W.A. (1975), Regression Analysis for Sample Surveys, Sankhy¯a, Series C, 37, 117–132. Fuller, W.A. (1987), Measurement Error Models, New York: John Wiley Sons, Inc.. Fuller, W.A. (1989), Prediction of True Values for the Measurement Error Model, in Conference on Statistical Analysis of Measurement Error Models and Applications, Humboldt State University. Fuller, W.A. (1999), Environmental Surveys Over Time, Journal of Agricultural, Biological and Environmental Statistics, 4, 331–345.
REFERENCES
413
Fuller, W.A. (2009), Sampling Statistics, New York: John Wiley Sons, Inc.. Fuller, W.A. and Battese, G.E. (1973), Transformations for Estimation of Linear Models with Nested-Error Structure, Journal of the American Statistical Association, 68, 626–632. Fuller, W.A. and Goyeneche, J.J. (1998), Estimation of the State Variance Component, Unpublished manuscript. Fuller, W.A. and Harter, R.M. (1987), The Multivariate Components of Variance Model for Small Area Estimation, in R. Platek, J.N.K. Rao, C.-E. Särndal, and M.P. Singh (Eds.), Small Area Statistics, New York: John Wiley Sons, Inc., pp. 103–123. Ganesh, N. and Lahiri, P. (2008), A New Class of Average Moment Matching Priors, Biometrika, 95, 514–520. Gelfand, A.E. (1996), Model Determination Using Sample-Based Methods, in W.R. Gilks, S. Richardson, and D.J. Spiegelhalter (Eds.), Markov Chain Monte Carlo in Practice, London: Chapman and Hall, pp. 145–161. Gelfand, A.E. and Smith, A.F.M. (1990), Sample-Based Approaches to Calculating Marginal Densities, Journal of the American Statistical Association, 85, 972–985. Gelfand, A.E. and Smith, A.F.M. (1991), Gibbs Sampling for Marginal Posterior Expectations, Communications in Statistics - Theory and Methods, 20, 1747–1766. Gelman, A. (2006), Prior Distributions for Variance Parameters in Hierarchical Models, Bayesian Analysis, 1, 515–533. Gelman, A. and Meng, S-L. (1996), Model Checking and Model Improvement, in W.R. Gilks, S. Richardson, and D.J. Spiegelhalter (Eds.), Markov Chain Monte Carlo in Practice, London: Chapman and Hall, pp. 189–201. Gelman, A. and Rubin, D.B. (1992), Inference from Iterative Simulation Using Multiple Sequences, Statistical Science, 7, 457–472. Ghangurde, P.D. and Singh, M.P. (1977), Synthetic Estimators in Periodic Household Surveys, Survey Methodology, 3, 152–181 Ghosh, M. (1992a), Hierarchical and Empirical Bayes Multivariate Estimation, in M. Ghosh and P.K. Pathak (Eds.), Current Issues in Statistical Inference: Essays in Honor of D. Basu, IMS Lecture Notes - Monograph Series, Volume 17, Beachwood, OH: Institute of Mathematical Statistics. Ghosh, M. (1992b), Constrained Bayes Estimation with Applications, Journal of the American Statistical Association, 87, 533–540. Ghosh, M. and Auer, R. (1983), Simultaneous Estimation of Parameters, Annals of the Institute of Statistical Mathematics, Part A, 35, 379–387. Ghosh, M. and Lahiri, P. (1987), Robust Empirical Bayes Estimation of Means from Stratified Samples, Journal of the American Statistical Association, 82, 1153–1162. Ghosh, M. and Lahiri, P. (1998), Bayes and Empirical Bayes Analysis in Multistage Sampling, in S.S. Gupta and J.O. Berger (Eds.), Statistical Decision Theory and Related Topics IV, Volume 1, New York: Springer-Verlag, pp. 195–212. Ghosh, M. and Maiti, T. (1999), Adjusted Bayes Estimators with Applications to Small Area Estimation, Sankhy¯a, Series B, 61, 71–90. Ghosh, M. and Maiti, T. (2004), Small Area Estimation Based on Natural Exponential Family Quadratic Variance Function Models and Survey Weights, Biometrika, 91, 95–112. Ghosh, M., Maiti, T., and Roy, A. (2008), Influence Functions and Robust Bayes and Empirical Bayes Small Area Estimation, Biometrika, 95, 573–585.
414
REFERENCES
Ghosh, M., Nangia, N., and Kim, D. (1996), Estimation of Median Income of Four-Person Families: A Bayesian Time Series Approach, Journal of the American Statistical Association, 91, 1423–1431. Ghosh, M., Natarajan, K., Stroud, T.W.F., and Carlin, B.P. (1998), Generalized Linear Models for Small Area Estimation, Journal of American Statistical Association, 93, 273–282. Ghosh, M., Natarajan, K., Waller, L.A., and Kim, D. (1999), Hierarchical Bayes GLMs for the Analysis of Spatial Data: An Application to Disease Mapping, Journal of Statistical Planing and Inference, 75, 305–318. Ghosh, M. and Rao, J.N.K. (1994), Small Area Estimation: an Appraisal (with Discussion), Statistical Science, 9, 55–93. Ghosh, M. and Sinha, K. (2007), Empirical Bayes Estimation in Finite Population Sampling Under Functional Measurement Error Models, Scandinavian Journal of Statistics, 33, 591–608. Ghosh, M., Sinha, K., and Kim, D. (2006), Empirical and Hierarchical Bayesian Estimation in Finite Population Sampling Under Structural Measurement Error Models, Journal of Statistical Planning and Inference, 137, 2759–2773. Ghosh, M. and Steorts, R. (2013), Two-Stage Bayesian Benchmarking as Applied to Small Area Estimation, Test, 22, 670–687. Gilks, W.R., Richardson, S., and Spiegelhalter, D.J. (Eds.) (1996), Markov Chain Monte Carlo in Practice, London: Chapman and Hall. Gilks, W.R. and Wild, P. (1992), Adaptive Rejection Sampling for Gibbs Sampling, Applied Statistics, 41, 337–348. Godambe, V.P. and Thompson, M.E. (1989), An Extension of Quasi-likelihood Estimation (with Discussion), Journal of Statistical Planning and Inference, 22, 137–152. Goldstein, H. (1975), A Note on Some Bayesian Nonparametric Estimates, Annals of Statistics, 3, 736–740. Goldstein, H. (1989), Restricted Unbiased Iterative Generalized Least Squares Estimation, Biometrika, 76, 622–623. Gonzalez, M.E. (1973), Use and Evaluation of Synthetic Estimates, Proceedings of the Social Statistics Section, American Statistical Association, pp. 33–36. Gonzalez, J.F., Placek, P.J., and Scott, C. (1996), Synthetic Estimation of Followback Surveys at the National Center for Health Statistics, in W.L. Schaible (ed.), Indirect Estimators in U.S. Federal Programs, New York: Springer–Verlag, pp. 16–27. Gonzalez, M.E. and Waksberg, J. (1973), Estimation of the Error of Synthetic Estimates, in Paper presented at the First Meeting of the International Association of Survey Statisticians, Vienna, Austria. González-Manteiga, W., Lombardía, M.J., Molina, I., Morales, D., and Santamaría, L. (2008a), Bootstrap Mean Squared Error of a Small-Area EBLUP, Journal of Statistical Computation and Simulation, 78, 443–462. González-Manteiga, W., Lombardía, M.J., Molina, I., Morales, D., and Santamaría, L. (2008b), Analytic and Bootstrap Approximations of Prediction Errors Under a Multivariate FayHerriot Model, Computational Statistics and Data Analysis, 52, 5242–5252. González-Manteiga, W., Lombardía, M.J., Molina, I., Morales, D., and Santamaría, L. (2010), Small Area Estimation Under Fay-Herriot Models with Nonparametric Estimation of Heteroscedasticity, Statistical Modelling, 10, 215–239.
REFERENCES
415
Griffin, B. and Krutchkoff, R. (1971), Optimal Linear Estimation: An Empirical Bayes Version with Application to the Binomial Distribution, Biometrika, 58, 195–203. Griffiths, R. (1996), Current Population Survey Small Area Estimations for Congressional Districts, Proceedings of the Section on Survey Research Methods, Washington, DC: American Statistical Association, pp. 314–319. Groves, R.M. (1989), Survey Errors and Survey Costs, New York: John Wiley Sons, Inc.. Groves, R.M. (2011), Three Eras of Survey Research, Public Opinion Quarterly, 75, 861–871. Hall, P. and Maiti, T. (2006a), Nonparametric Estimation of Mean-Squared Prediction Error in Nested-Error Regression Models, Annals of Statistics, 34, 1733–1750. Hall, P. and Maiti, T. (2006b), On Parametric Bootstrap Methods for Small Area Prediction, Journal of the Royal Statistical Society, Series B, 68, 221–238. Han, B. (2013), Conditional Akaike Information Criterion in the Fay-Herriot Model, Survey Methodology, 11, 53–67. Han, C.-P. and Bancroft, T.A. (1968), On Pooling Means When Variance is Unknown, Journal of the American Statistical Association, 63, 1333–1342. Hansen, M.H., Hurwitz, W.N., and Madow, W.G. (1953), Sample Survey Methods and Theory I, New York: John Wiley Sons, Inc.. Hansen, M.H., Madow, W.G., and Tepping, B.J. (1983), An Evaluation of Model-Dependent and Probability Sampling Inferences in Sample Surveys, Journal of the American Statistical Association, 78, 776–793. Hartigan, J. (1969), Linear Bayes Methods, Journal of the Royal Statistical Society, Series B, 31, 446–454. Hartless, G., Booth, J.G., and Littell, R.C. (2000), Local Influence Diagnostics for Prediction of Linear Mixed Models, Unpublished manuscript. Hartley, H.O. (1959), Analytic Studies of Survey Data, in a Volume in Honor of Corrado Gini, Rome, Italy: Instituto di Statistica. Hartley, H.O. (1974), Multiple Frame Methodology and Selected Applications, Sankhy¯a, Series C, 36, 99–118. Hartley, H.O. and Rao, J.N.K. (1967), Maximum Likelihood Estimation for the Mixed Analysis of Variance Model, Biometrika, 54, 93–108. Harvey, A.C. (1990), Forecasting, Structural Time Series Models and the Kalman Filter, Cambridge: Cambridge University Press. Harville, D.A. (1977), Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems, Journal of the American Statistical Association, 72, 322–340. Harville, D.A. (1991), Comment, Statistical Science, 6, 35–39. Harville, D.A. and Jeske, D.R. (1992), Mean Squared Error of Estimation or Prediction Under General Linear Model, Journal of the American Statistical Association, 87, 724–731. Hastings, W.K. (1970), Monte Carlo Sampling Methods Using Markov Chains and Their Applications, Biometrika, 57, 97–109. He, X. (1997), Quantile Curves Without Crossing, American Statistician, 51, 186–192. He, Z. and Sun, D. (1998), Hierarchical Bayes Estimation of Hunting Success Rates, Environmental and Ecological Statistics, 5, 223–236. Hedayat, A.S. and Sinha, B.K. (1991), Design and Inference in Finite Population Sampling, New York: John Wiley Sons, Inc..
416
REFERENCES
Henderson, C.R. (1950), Estimation of Genetic Parameters (Abstract), Annals of Mathematical Statistics, 21, 309–310. Henderson, C.R. (1953), Estimation of Variance and Covariance Components, Biometrics, 9, 226–252. Henderson, C.R. (1963), Selection Index and Expected Genetic Advance, in Statistical Genetics and Plant Breeding, Publication 982, Washington, DC: National Academy of Science, National Research Council, pp. 141–163. Henderson, C.R. (1973), Maximum Likelihood Estimation of Variance Components, Unpublished manuscript. Henderson, C.R. (1975), Best Linear Unbiased Estimation and Prediction Under a Selection Model, Biometrics, 31, 423–447. Henderson, C.R., Kempthorne, O., Searle, S.R., and von Krosigk, C.N. (1959), Estimation of Environmental and Genetic Trends from Records Subject to Culling, Biometrics, 13, 192–218. Hidiroglou, M.A. and Patak, Z. (2009), An Application of Small Area Estimation Techniques to the Canadian Labour Force Survey, Proceedings of the Survey Research Methods Section, Statistical Society of Canada. Hill, B.M. (1965), Inference About Variance Components in the One-Way Model, Journal of the American Statistical Association, 60, 806–825. Hobert, J.P. and Casella, G. (1996), The Effect of Improper Priors on Gibbs Sampling in Hierarchical Linear Mixed Models, Journal of the American Statistical Association, 9l, 1461–1473. Hocking, R.R. and Kutner, M.H. (1975), Some Analytical and Numerical Comparisons of Estimators for the Mixed A.O.V. Model, Biometrics, 31, 19–28. Holt, D. and Smith, T.M.F. (1979), Post-Stratification, Journal of the Royal Statistical Society, Series A, 142, 33–46. Holt, D., Smith, T.M.F., and Tomberlin, T.J. (1979), A Model-based Approach to Estimation for Small Subgroups of a Population, Journal of the American Statistical Association, 74, 405–410. Huang, E.T. and Fuller, W.A. (1978), Non-negative Regression Estimation for Sample Survey Data, Proceedings of the Social Statistics Section, Washington, DC: American Statistical Association, pp. 300–3005. Huber, P.J. (1972), The 1972 Wald Lecture Robust Statistics: A Review. Annals of Mathematical Statistics, 43, 1041–821. Huggins, R.M. (1993), A Robust Approach to the Analysis of Repeated Measures. Biometrics, 49, 715–720. Hulting, F.L. and Harville, D.A. (1991), Some Bayesian and Non-Bayesian Procedures for the Analysis of Comparative Experiments and for Small Area Estimation: Computational Aspects, Frequentist Properties and Relationships, Journal of the American Statistical Association, 86, 557–568. IASS Satellite Conference (1999), Small Area Estimation, Latvia: Central Statistical Office. Ireland, C.T. and Kullback, S. (1968), Contingency Tables with Given Marginals, Biometrika, 55, 179–188. Isaki, C.T., Huang, E.T., and Tsay, J.H. (1991), Smoothing Adjustment Factors from the 1990 Post Enumeration Survey, Proceedings of the Social Statistics Section, Washington, DC: American Statistical Association, pp. 338–343.
REFERENCES
417
Isaki, C.T., Tsay, J.H., and Fuller, W.A. (2000), Estimation of Census Adjustment Factors, Survey Methodology, 26, 31–42. Jacqmin-Gadda, H., Sibillot, S., Proust, C., Molina, J.-M., and Thiébaut, R. (2007), Robustness of the Linear Mixed Model to Misspecified Error Distribution, Computational Statistics and Data Analysis, 51, 5142–5154. James, W. and Stein, C. (1961), Estimation with Quadratic Loss, Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, pp. 361–379. Jiang, J. (1996), REML Estimation: Asymptotic Behavior and Related Topics, Annals of Statistics, 24, 255–286. Jiang, J. (1997), A Derivation of BLUP – Best Linear Unbiased Predictor, Statistics & Probability Letters, 32, 321–324. Jiang, J. (1998), Consistent Estimators in Generalized Linear Mixed Models, Journal of the American Statistical Association, 93, 720–729. Jiang, J. (2001), Private Communication to J.N.K. Rao. Jiang, J. and Lahiri, P. (2001), Empirical Best Prediction for Small Area Inference with Binary data, Annals of the Institute of Statistical Mathematics, 53, 217–243. Jiang, J. and Lahiri, P. (2006), Mixed Model Prediction and Small Area Estimation, Test, 15, 1–96. Jiang, J., Lahiri, P., and Wan, S-M. (2002), A Unified Jackknife Theory, Annals of Statistics, 30, 1782–1810. Jiang, J., Lahiri, P., Wan, S-M., and Wu, C-H. (2001), Jackknifing the Fay-Herriot Model with an Example, Unpublished manuscript. Jiang, J., Lahiri, P., and Wu, C-H. (2001), A Generalization of Pearson’s Chi-Square Goodness-of-Fit Test with Estimated Cell Frequencies, Sankhy¯a, Series A, 63, 260–276. Jiang, J. and Nguyen, T. (2012), Small Area Estimation via Heteroscedastic Nested-Error Regression, Canadian Journal of Statistics, 40, 588–603. Jiang, J., Nguyen, T., and Rao, J.S. (2009), A Simplified Adaptive Fence Procedure, Statistics and Probability Letters, 79, 625–629. Jiang, J., Nguyen, T., and Rao, J.S. (2011), Best Predictive Small Area Estimation, Journal of the American Statistical Association, 106, 732–745. Jiang, J., Nguyen, T., and Rao, J.S. (2014), Observed Best Prediction Via Nested-error Regression with Potentially Misspecified Mean and Variance, Survey Methodology, in print. Jiang, J. and Rao, J.S. (2003), Consistent Procedures for Mixed Linear Model Selection, Sankhy¯a, Series A., 65, 23–42. Jiang, J., Rao, J.S., Gu, I., and Nguyen, T. (2008), Fence Methods for Mixed Model Selection, Annals of Statistics, 36, 1669–1692. Jiang, J. and Zhang, W. (2001), Robust Estimation in Generalized Linear Mixed Models, Biometrika, 88, 753–765. Jiongo, V.D., Haziza, D., and Duchesne, P. (2013), Controlling the Bias of Robust Small-area Estimation, Biometrika, 100, 843–858. Kackar, R.N. and Harville, D.A. (1981), Unbiasedness of Two-stage Estimation and Prediction Procedures for Mixed Linear Models, Communications in Statistics, Series A, 10, 1249–1261.
418
REFERENCES
Kackar, R.N. and Harville, D.A. (1984), Approximations for Standard Errors of Estimators of Fixed and Random Effects in Mixed Linear Models, Journal of the American Statistical Association, 79, 853–862. Kalton, G., Kordos, J., and Platek, R. (1993), Small Area Statistics and Survey Designs Volume I: Invited Papers: Volume II: Contributed Papers and Panel Discussion, Warsaw, Poland: Central Statistical Office. Karunamuni, R.J. and Zhang, S. (2003), Optimal Linear Bayes and Empirical Bayes Estimation and Prediction of the Finite Population Mean, Journal of Statistical Planning and Inference, 113, 505–525. Kass, R.E. and Raftery, A. (1995), Bayes Factors, Journal of the American Statistical Association, 90, 773–795. Kass, R.E. and Steffey, D. (1989), Approximate Bayesian Inference in Conditionally Independent Hierarchical Models (Parametric Empirical Bayes Models), Journal of the American Statistical Association, 84, 717–726. Kim, H., Sun, D., and Tsutakawa, R.K. (2001), A Bivariate Bayes Method for Improving the Estimates of Mortality Rates with a Twofold Conditional Autoregressive Model, Journal of the American Statistical Association, 96, 1506–1521. Kish, L. (1999), Cumulating/Combining Population Surveys, Survey Methodology, 25, 129–138. Kleffe, J. and Rao, J.N.K. (1992), Estimation of Mean Square Error of Empirical Best Linear Unbiased Predictors Under a Random Error Variance Linear Model, Journal of Multivariate Analysis, 43, 1–15. Kleinman, J.C. (1973), Proportions with Extraneous Variance: Single and Independent Samples, Journal of the American Statistical Association, 68, 46–54. Koenker, R. (1984), A Note on L-Estimators for Linear Models, Statistics and Probability Letters, 2, 323–325. Koenker, R. and Bassett, G. (1978), Regression Quantiles, Econometrica, 46, 33–50. Kokic, P., Chambers, R., Breckling, J., and Beare, S. (1997), A Measure of Production Performance, Journal of Business and Economic Statistics, 15, 445–451. Kott, P.S. (1990), Robust Small Domain Estimation Using Random Effects Modelling, Survey Methodology, 15, 3–12. Lahiri, P. (1990), “Adjusted” Bayes and Empirical Bayes Estimation in Finite Population Sampling, Sankhy¯a, Series B, 52, 50–66. Lahiri, P. and Maiti, T. (2002), Empirical Bayes Estimation of Relative Risks in Disease Mapping. Calcutta Statistical Association Bulletin, 1, 53–211. Lahiri, P. and Pramanik, S. (2013), Estimation of Average Design-based Mean Squared Error of Synthetic Small Area Estimators, Unpublished paper. Lahiri, P. and Rao, J.N.K. (1995), Robust Estimation of Mean Squared Error of Small Area Estimators, Journal of the American Statistical Association, 82, 758–766. Lahiri, S.N., Maiti, T., Katzoff, M., and Parson, V. (2007), Resampling-Based Empirical Prediction: An Application to Small Area Estimation. Biometrika, 94, 469–485. Laird, N.M. (1978), Nonparametric Maximum Likelihood Estimation of a Mixing Distribution, Journal of the American Statistical Association, 73, 805–811. Laird, N.M. and Louis, T.A. (1987), Empirical Bayes Confidence Intervals Based on Bootstrap Samples, Journal of the American Statistical Association, 82, 739–750.
REFERENCES
419
Laird, N.M. and Ware, J.H. (1982), Random-Effects Models for Longitudinal Data, Biometrics, 38, 963–974. Lange, N. and Ryan, L. (1989), Assessing Normality in Random Effects Models, Annals of Statistics, 17, 624–642. Langford, I.H., Leyland, A.H., Rasbash, J., and Goldstein, H. (1999), Multilevel Modelling of the Geographical Distribution of Diseases, Applied Statistics, 48, 253–268. Laud, P. and Ibrahim, J.G. (1995), Predictive Model Selection, Journal of the Royal Statistical Society, Series B, 57, 247–262. Lehtonen, R., Särndal, C.-E., and Veijanen, A. (2003), The Effect of Model Choice in Estimation for Domains Including Small Domains, Survey Methodology, 29, 33–44. Lehtonen, R. and Veijanen, A. (1999), Domain Estimation with Logistic Generalized Regression and Related Estimators, Proceedings of the IASS Satellite Conference on Small Area Estimation, Riga: Latvian Council of Science, pp. 121–128. Lehtonen, R. and Veijanen, A. (2009), Design-Based Methods of Estimation for Domains and Small Areas, in D. Pfeffermann and C.R. Rao, (Eds.), Sample Surveys: Inference and Analysis, Handbook of Statistics, Volume 29B, Amsterdam: North-Holland, pp. 219–249. Lele, S.R., Nadeem, K., and Schmuland, B. (2010), Estimability and Likelihood Inference for Generalized Linear Mixed Models Using Data Cloning, Journal of the American Statistical Association, 105, 1617–1625. Levy, P.S. (1971), The Use of Mortality Data in Evaluating Synthetic Estimates, Proceedings of the Social Statistics Section, American Statistical Association, pp. 328–331. Li, H. and Lahiri, P. (2010), An Adjusted Maximum Likelihood Method for Solving Small Area Estimation Problems, Journal of Multivariate Analysis, 101, 882–892. Liu, B., Lahiri, P., and Kalton, G. (2014), Hierarchical Bayes Modeling of Survey-Weighted Small Area Proportions, Survey Methodology, 40, 1–13. Lohr, S.L. (2010), Sampling: Design and Analysis, Pacific Grove, CA: Duxbury Press. Lohr, S.L. and Prasad, N.G.N. (2003), Small Area Estimation with Auxiliary Survey Data, Canadian Journal of Statistics, 31, 383–396. Lohr, S.L. and Rao, J.N.K. (2000), Inference for Dual Frame Surveys, Journal of the American Statistical Association, 95, 271–280. Lohr, S.L. and Rao, J.N.K.. (2009), Jackknife Estimation of Mean Squared Error of Small Area Predictors in Non-Linear Mixed Models, Biometrika, 96, 457–468. Longford, N.T. (2005), Missing Data and Small-Area Estimation, New York: Springer-Verlag. Longford, N.T. (2006), Sample Size Calculation for Small-area Estimation, Survey Methodology, 32, 87–96. López-Vizcaíno, E., Lombardía, M.J., and Morales, D. (2013), Multinomial-based Small Area Estimation of Labour Force Indicators, Statistical Modelling, 13, 153–178. López-Vizcaíno, E., Lombardía, M.J., and Morales, D. (2014), Small Area Estimation of Labour Force Indicators Under a Multinomial Model with Correlated Time and Area Effects, Journal of the Royal Statistical Society, Series A, doi: 10.1111/rssa.12085. Louis, T.A. (1984), Estimating a Population of Parameter Values Using Bayes and Empirical Bayes Methods, Journal of the American Statistical Association, 79, 393–398. Luery, D.M. (2011), Small Area Income and Poverty Estimates Program, in Proceedings of 27th SCORUS Conference, Jurmala, Latvia, pp. 93–107.
420
REFERENCES
Lunn, D.J., Thomas, A., Best, N., and Spiegelhalter, D. (2000), WinBUGS - A Bayesian Modeling Framework: Concepts, Structures and Extensibility, Statistics and Computing, 10, 325–337. MacGibbon, B. and Tomberlin, T.J. (1989), Small Area Estimation of Proportions Via Empirical Bayes Techniques, Survey Methodology, 15, 237–252. Maiti, T. (1998), Hierarchical Bayes Estimation of Mortality Rates for Disease Mapping, Journal of Statistical Planning and Inference, 69, 339–348. Malec, D., Davis, W.W., and Cao, X. (1999), Model-Based Small Area Estimates of Overweight Prevalence Using Sample Selection Adjustment, Statistics in Medicine, 18, 3189-3200. Malec, D., Sedransk, J., Moriarity, C.L., and LeClere, F.B. (1997), Small Area Inference for Binary Variables in National Health Interview Survey, Journal of the American Statistical Association, 92, 815–826. Mantel, H.J., Singh, A.C., and Bureau, M. (1993), Benchmarking of Small Area Estimators, in Proceedings of International Conference on Establishment Surveys, Washington, DC: American Statistical Association, pp. 920–925. Marchetti, S., Giusti, C., Pratesi, M., Salvati, N., Giannotti, F., Pedreschi, D., Rinzivillo, S., Pappalardo, L., and Gabrielli, L. (2015), Small Area Model-based Estimators Using Big Data Sources, Journal of Official Statistics, in print. Marhuenda, Y., Molina, I., and Morales, D. (2013), Small Area Estimation with Spatio-Temporal Fay-Herriot Models, Computational Statistics and Data Analysis, 58, 308–325. Maritz, J.S. and Lwin, T. (1989), Empirical Bayes Methods (2nd ed.), London: Chapman and Hall. Marker, D.A. (1995), Small Area Estimation: A Bayesian Perspective, Unpublished Ph.D. Dissertation, University of Michigan, Ann Arbor, MI. Marker, D.A. (1999), Organization of Small Area Estimators Using a Generalized Linear Regression Framework, Journal of Official Statistics, 15, 1–24. Marker, D.A. (2001), Producing Small Area Estimates From National Surveys: Methods for Minimizing Use of Indirect Estimators, Survey Methodology, 27, 183–188. Marshall, R.J. (1991), Mapping Disease and Mortality Rates Using Empirical Bayes Estimators, Applied Statistics, 40, 283–294. McCulloch, C.E. and Searle, S.R. (2001), Generalized, Linear, and Mixed Models, New York: John Wiley Sons, Inc.. Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., and Teller, E. (1953), Equations of State Calculations by Fast Computing Machine, Journal of Chemical Physics, 27, 1087–1097. Meza, J.L. and Lahiri, P. (2005), A Note on the Cp Statistic Under the Nested Error Regression Model, Survey Methodology, 31, 105–109. Militino, A.F., Ugarte, M.D., and Goicoa, T. (2007), A BLUP Synthetic Versus and EBLUP Estimator: an Empirical Study of a Small Area Estimation Problem, Journal of Applied Statistics, 34, 153–165. Militino, A.F., Ugarte, M.D., Goicoa, T., and González-Audícana, M. (2006), Using Small Area Models to Estimate the Total Area Occupied by Olive Trees, Journal of Agricultural, Biological, and Environmental Statistics, 11, 450–461.
REFERENCES
421
Mohadjer, L., Rao, J.N.K., Lin, B., Krenzke, T., and Van de Kerckove, W. (2012), Hierarchical Bayes Small Area Estimates of Adult Literacy Using Unmatched Sampling and Linking Models, Journal of the Indian Agricultural Statistics, 66, 55–63. Molina, I. and Marhuenda, Y. (2013), sae: Small Area Estimation, R package version 1.0-2. Molina, I. and Marhuenda, Y. (2015), sae: An R Package for Small Area Estimation, R Journal, in print. Molina, I. and Morales, D. (2009), Small Area Estimation of Poverty Indicators, Boletín de Estadística e Investigación Operativa, 25, 218–225. Molina, I., Nandram, B., and Rao, J.N.K. (2014), Small Area Estimation of General Parameters with Application to Poverty Indicators: A Hierarchical Bayes Approach. Annals of Applied Statistics, 8, 852–885. Molina, I. and Rao, J.N.K. (2010), Small Area Estimation of Poverty Indicators. Canadian Journal of Statistics, 38, 369–385. Molina, I., Rao, J.N.K., and Datta, G.S. (2015), Small Area Estimation Under a Fay-Herriot Model with Preliminary Testing for the Presence of Random Effects, Survey Methodology, in print. Molina, I., Rao, J.N.K., and Hidiroglou, M.A. (2008), SPREE Techniques for Small Area Estimation of Cross-classifications, Unpublished manuscript. Molina, I., Saei, A., and Lombardía, M.J. (2007), Small Area Estimates of Labour Force Participation Under a Multinomial Logit Mixed Model, Journal of the Royal Statistical Society, Series A, 170, 975–1000. Molina, I., Salvati, N., and Pratesi, M. (2008), Bootstrap for Estimating the MSE of the Spatial EBLUP, Computational Statistics, 24, 441–458. Morris, C.N. (1983a), Parametric Empirical Bayes Intervals, in G.E.P. Box, T. Leonard, and C.F.J. Wu, (Eds.), Scientific Inference, Data Analysis, and Robustness, New York: Academic Press, pp. 25–50. Morris, C.N. (1983b), Parametric Empirical Bayes Inference: Theory and Applications, Journal of the American Statistical Association, 78, 47–54. Morris, C.N. (1983c), Natural Exponential Families with Quadratic Variance Function: Statistical Theory, Annals of Mathematical Statistics, 11, 515–529. Morris, C. and Tang, R. (2011), Estimating Random Effects via Adjustment for Density Maximization, Statistical Science, 26, 271–287. Moura, F.A.S. and Holt, D. (1999), Small Area Estimation Using Multilevel Models, Survey Methodology, 25, 73–80. Mukhopadhyay, P. (1998), Small Area Estimation in Survey Sampling, New Delhi: Narosa Publishing House. Mukhopadhyay, P. and McDowell, A. (2011), Small Area Estimation for Survey Data Analysis Using SAS Software, Paper 336-2011, Cary, NC: SAS Institute. Müller, S., Scealy, J.L., and Welsh, A.H. (2013), Model Selection in Linear Mixed Models, Statistical Science, 28, 135–167. Nandram, B. and Choi, J.W. (2002), Hierarchical Bayes Nonresponse Models for Binary Data from Small Areas with Uncertainty About Ignorability, Journal of the American Statistical Association, 97, 381–388. Nandram, B. and Choi, J.W. (2010), A Bayesian Analysis of Body Mass Index Data from Small Domains Under Nonignorable Nonresponse and Selection, Journal of the American Statistical Association, 105, 120–135.
422
REFERENCES
Nandram, B., Sedransk, J., and Pickle, L. (1999), Bayesian Analysis of Mortality Rates for U.S. Health Service Areas, Sankhy¯a, Series B, 61, 145–165. Nandram, B., Sedransk, J., and Pickle, L. (2000), Bayesian Analysis and Mapping of Mortality Rates for Chronic Obstructive Pulmonary Disease, Journal of the American Statistical Association, 95, 1110–1118. Natarajan, R. and Kass, R.E. (2000), Reference Bayesian Methods for Generalized Linear Mixed Models, Journal of the American Statistical Association, 95, 227–237. Natarajan, R. and McCulloch, C.E. (1995), A Note on the Existence of the Posterior Distribution for a Class of Mixed Models for Binomial Responses, Biometrika, 82, 639–643. National Center for Health Statistics (1968), Synthetic State Estimates of disability, P.H.S. Publications 1759, Washington, DC: U.S. Government Printing Office. National Institute on Drug Abuse (1979), Synthetic Estimates for Small Areas, Research Monograph 24, Washington, DC: U.S. Government Printing Office. National Research Council (2000), Small-Area Estimates of School-Age Children in Poverty: Evaluation of Current Methodology, in C.F. Citro and G. Kalton (Eds.), Committee on National Statistics, Washington, DC: National Academy Press. Opsomer, J.D., Claeskens, G., Ranalli, M.G., Kauermann, G., and Breidt, F.J. (2008), Nonparametric Small Area Estimation Using Penalized Spline Regression, Journal of the Royal Statistical Society, Series B, 70, 265–286. Patterson, H.D. and Thompson, R. (1971), Recovery of Inter-Block Information When Block Sizes are Unequal, Biometrika, 58, 545–554. Peixoto, J.L. and Harville, D.A. (1986), Comparisons of Alternative Predictors Under the Balanced One-Way Random Model, Journal of the American Statistical Association, 81, 431–436. Petrucci, A. and Salvati, N. (2006), Small Area Estimation for Spatial Correlation in Watershed Erosion Assessment. Journal of Agricultural, Biological and Environmental Statistics, 11, 169–182. Pfeffermann, D. (2002), Small Area Estimation – New Developments and Directions, International Statistical Review, 70, 125–143. Pfeffermann, D. (2013), New Important Developments in Small Area Estimation, Statistical Science, 28, 40–68. Pfeffermann, D. and Barnard, C. (1991), Some New Estimators for Small Area Means with Applications to the Assessment of Farmland Values, Journal of Business and Economic Statistics, 9, 73–84. Pfeffermann, D. and Burck, L. (1990), Robust Small Area Estimation Combining Time Series and Cross-Sectional Data, Survey Methodology, 16, 217–237. Pfeffermann, D. and Correa, S. (2012), Empirical Bootstrap Bias Correction and Estimation of Prediction Mean Square Error in Small Area Estimation, Biometrika, 99, 457–472. Pfeffermann, D., Feder, M., and Signorelli, D. (1998), Estimation of Autocorrelations of Survey Errors with Application to Trend Estimation in Small Areas, Journal of Business and Economic Statistics, 16, 339–348. Pfeffermann, D., Sikov, A. and Tiller, R. (2014), Single- and Two-Stage Cross-Sectional and Time Series Benchmarking, Test, 23, 631–666. Pfeffermann, D. and Sverchkov, M. (2007), Small-Area Estimation Under Informative Probability Sampling of Areas and Within Selected Areas, Journal of the American Statistical Association, 102, 1427–1439.
REFERENCES
423
Pfeffermann, D., Terryn, B., and Moura, F.A.S. (2008), Small Area Estimation Under a Two-Part Random Effects Model with Application to Estimation of Literacy in Developing Countries, Survey Methodology, 34, 235–249. Pfeffermann, D. and Tiller, R.B. (2005), Bootstrap Approximation to Prediction MSE for State-Space Models with Estimated Parameters, Journal of Time Series Analysis, 26, 893–916. Pfeffermann, D. and Tiller, R.B. (2006), Small Area Estimation with State-Space Models Subject to Benchmark Constraints, Journal of the American Statistical Association, 101, 1387–1397. Pinheiro, J., Bates, D., DebRoy, S., Sarkar, D., and R Core Team (2014), nlme: Linear and Nonlinear Mixed Effects Models, R package version 3.1-118. Platek, R., Rao, J.N.K., Särndal, C.-E., and Singh, M.P. (Eds.) (1987), Small Area Statistics, New York: John Wiley Sons, Inc.. Platek, R. and Singh, M.P. (Eds.) (1986), Small Area Statistics: Contributed Papers, Ottawa, Canada: Laboratory for Research in Statistics and Probability, Carleton University. Porter, A.T., Holan, S.H., Wikle, C.K., and Cressie, N. (2014), Spatial Fay-Herriot Model for Small Area Estimation with Functional Covariates, Spatial Statistics, 10, 27–42. Portnoy, S.L. (1982), Maximizing the Probability of Correctly Ordering Random Variables Using Linear Predictors, Journal of Multivariate Analysis, 12, 256–269. Prasad, N.G.N. and Rao, J.N.K. (1990), The Estimation of the Mean Squared Error of Small-Area Estimators, Journal of the American Statistical Association, 85, 163–171. Prasad, N.G.N. and Rao, J.N.K. (1999), On Robust Small Area Estimation Using a Simple Random Effects Model, Survey Methodology, 25, 67–72. Pratesi, M. and Salvati, N. (2008), Small Area Estimation: The EBLUP Estimator Based on Spatially Correlated Random Area Effects, Statistical Methods and Applications, 17, 113–141. Purcell, N.J. and Kish, L. (1979), Estimates for Small Domain, Biometrics, 35, 365–384. Purcell, N.J. and Kish, L. (1980), Postcensal Estimates for Local Areas (or Domains), International Statistical Review, 48, 3–18. Purcell, N.J. and Linacre, S. (1976), Techniques for the Estimation of Small Area Characteristics, Unpublished Paper, Canberra: Australian Bureau of Statistics. Raghunathan, T.E. (1993), A Quasi-Empirical Bayes Method for Small Area Estimation, Journal of the American Statistical Association, 88, 1444–1448. Randrianasolo, T. and Tillé, Y. (2013), Small Area Estimation by Splitting the Sampling Weights, Electronic Journal of Statistics, 7, 1835-–1855. Rao, C.R. (1971), Estimation of Variance and Covariance Components – MINQUE Theory, Journal of Multivariate Analysis, 1, 257–275. Rao, C.R. (1976), Simultaneous Estimation of Parameters – A Compound Decision Problem, in S.S. Gupta and D.S. Moore (Eds.), Statistical Decision Theory and Related Topics II, New York: Academic Press, pp. 327–350. Rao, J.N.K. (1965), On Two Simple Schemes of Unequal Probability Sampling Without Replacement, Journal of the Indian Statistical Association, 3, 173–180. Rao, J.N.K. (1974), Private Communication to T.W. Anderson.
424
REFERENCES
Rao, J.N.K. (1979), On Deriving Mean Square Errors and Their Non-Negative Unbiased Estimators in Finite Population Sampling, Journal of the Indian Statistical Association, 17, 125–136. Rao, J.N.K. (1985), Conditional Inference in Survey Sampling, Survey Methodology, 11, 15–31. Rao, J.N.K. (1986), Synthetic Estimators, SPREE and Best Model Based Predictors, in Proceedings of the Conference on Survey Research Methods in Agriculture, U.S. Department of Agriculture, Washington, DC, pp. 1–16. Rao, J.N.K. (1992), Estimating Totals and Distribution Functions Using Auxiliary Information at the Estimation Stage, in Proceedings of the Workshop on Uses of Auxiliary Information in Surveys, Statistics Sweden. Rao, J.N.K. (1994), Estimating Totals and Distribution Functions Using Auxiliary Information at the Estimation Stage, Journal of Official Statistics, 10, 153–165. Rao, J.N.K. (1999), Some Recent Advances in Model-based Small Area Estimation, Survey Methodology, 25, 175–186. Rao, J.N.K. (2001a), EB and EBLUP in Small Area Estimation, in S.E. Ahmed and N. Reid (Eds.), Empirical Bayes and Likelihood Inference, Lecture Notes in Statistics, Volume 148, New York: Springer-Verlag, pp. 33–43. Rao, J.N.K. (2001b), Small Area Estimation with Applications to Agriculture, in Proceedings of the Second Conference on Agricultural and Environmental Statistical Applications, Rome, Italy: ISTAT. Rao, J.N.K. (2003a), Small Area Estimation, Hoboken, NJ: John Wiley Sons, Inc.. Rao, J.N.K. (2003b), Some New Developments in Small Area Estimation, Journal of the Iranian Statistical Society, 2, 145–169. Rao, J.N.K. (2005), Inferential Issues in Small Area Estimation: Some New Developments, Statistics in Transition, 7, 523–526. Rao, J.N.K. (2008), Some Methods for Small Area Estimation, Rivista Internazionale di Scienze Sociali, 4, 387–406. Rao, J.N.K. and Choudhry, G.H. (1995), Small Area Estimation: Overview and Empirical Study, in B.G. Cox, D.A. Binder, B.N. Chinnappa, A. Christianson, M.J. Colledge, and P.S. Kott (Eds.), Business Survey Methods, New York: John Wiley Sons, Inc., pp. 527–542. Rao, J.N.K. and Molina, I. (2015), Empirical Bayes and Hierarchical Bayes Estimation of Poverty Measures for Small Areas, in M. Pratesi (Ed.) Analysis of Poverty Data by Small Area Methods, Hoboken, NJ: John Wiley Sons, Inc., in print. Rao, J.N.K. and Scott, A.J. (1981), The Analysis of Categorical Data from Complex Sample Surveys: Chi-squared Tests for Goodness of Fit and Independence in Two-Way Tables, Journal of the American Statistical Association, 76, 221–230. Rao, J.N.K. and Singh, A.C. (2009), Range Restricted Weight Calibration for Survey Data Using Ridge Regression, Pakistani Journal of Statistics, 25, 371–384. Rao, J.N.K., Sinha, S.K., and Dumitrescu, L. (2014), Robust Small Area Estimation Under Semi-Parametric Mixed Models, Canadian Journal of Statistics, 42, 126–141. Rao, J.N.K., Wu, C.F.J., and Yue, K. (1992), Some Recent Work on Resampling Methods for Complex Surveys, Survey Methodology, 18, 209–217. Rao, J.N.K. and Yu, M. (1992), Small Area Estimation by Combining Time Series and Cross-Sectional Data, in Proceedings of the Section on Survey Research Method, American Statistical Association, pp. 1–9.
REFERENCES
425
Rao, J.N.K. and Yu, M. (1994), Small Area Estimation by Combining Time Series and Cross-Sectional Data, Canadian Journal of Statistics, 22, 511–528. Rao, P.S.R.S. (1997), Variance Component Estimation, London: Chapman and Hall. R Core Team (2013), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing, http://www.R-project.org/. Richardson, A.M. and Welsh, A.H. (1995), Robust Estimation in the Mixed Linear Model. Biometrics, 51, 1429–1439. Ritter, C. and Tanner, T.A. (1992), The Gibbs Stopper and the Griddy-Gibbs Sampler, Journal of the American Statistical Association, 87, 861–868. Rivest, L.-P. (1995), A Composite Estimator for Provincial Undercoverage in the Canadian Census, in Proceedings of the Survey Methods Section, Statistical Society of Canada, pp. 33–38. Rivest, L.-P. and Belmonte, E. (2000), A Conditional Mean Squared Error of Small Area Estimators, Survey Methodology, 26, 67–78. Rivest, L.-P. and Vandal, N. (2003), Mean Squared Error Estimation for Small Areas when the Small Area Variances are Estimated, in Proceedings of the International Conference on Recent Advances in Survey Sampling, Technical Report No. 386, Ottawa, Canada: Laboratory for Research in Statistics and Probability, Carleton University. Roberts, G.O. and Rosenthal, J.S. (1998), Markov-Chain Monte Carlo: Some Practical Implications of Theoretical Results, Canadian Journal of Statistics, 26, 5–31. Robinson, G.K. (1991), That BLUP Is a Good Thing: The Estimation of Random Effects, Statistical Science, 6, 15–31. Robinson, J. (1987), Conditioning Ratio Estimates Under Simple Random Sampling, Journal of the American Statistical Association, 82, 826–831. Royall, R.M. (1970), On Finite Population Sampling Theory Under Certain Linear Regression, Biometrika, 57, 377–387. Royall, R.M. (1976), The Linear Least-Squares Prediction Approach to Two-stage Sampling, Journal of the American Statistical Association, 71, 657–664. Rubin-Bleuer, S. and You, Y. (2013), A Positive Variance Estimator for the Fay-Herriot Small Area Model, SRID2-12-OOIE, Statistics Canada. Rueda, C., Menéndez, J.A., and Gómez, F. (2010), Small Area Estimators Based on Restricted Mixed Models, Test, 19, 558–579. Ruppert, D., Wand, M.P., and Carroll, R.J. (2003), Semiparametric Regression, New York: Cambridge University Press. Rust, K.F. and Rao, J.N.K. (1996), Variance Estimation for Complex Surveys Using Replication Techniques, Statistical Methods in Medical Research, 5, 283–310. Saei, A. and Chambers, R. (2003), Small area estimation under linear and generalized linear mixed models with time and area effects, Working Paper M03/15. Southampton: Southampton Statistical Sciences Research Institute, University of Southampton. Ehsanes Saleh, A.K.Md.E. (2006), Theory of Preliminary Test and Stein-Type Estimation with Applications, New York: John Wiley Sons, Inc.. Salibian-Barrera, M. and Van Aelst, S. (2008), Robust Model Selection Using Fast and Robust Bootstrap, Computational Statistics and Data Analysis, 52, 5121–5135. Sampford, M.R. (1967), On Sampling Without Replacement with Unequal Probabilities of Selection, Biometrika, 54, 499–513.
426
REFERENCES
Särndal, C.E. and Hidiroglou, M.A. (1989), Small Domain Estimation: A Conditional Analysis, Journal of the American Statistical Association, 84, 266–275. Särndal, C.E., Swensson, B., and Wretman, J.H. (1989), The Weighted Regression Technique for Estimating the Variance of Generalized Regression Estimator, Biometrika, 76, 527–537. Särndal, C.E., Swensson, B., and Wretman, J.H. (1992), Model Assisted Survey Sampling, New York: Springer-Verlag. SAS Institute Inc. (2013), SAS/STAT 13.1 User’s Guide, Cary, NC: SAS Institute. Schaible, W.L. (1978), Choosing Weights for Composite Estimators for Small Area Statistics, Proceedings of the Section on Survey Research Methods, Washington, DC: American Statistical Association, pp. 741–746. Schaible, W.L. (Ed.) (1996), Indirect Estimation in U.S. Federal Programs, New York: Springer-Verlag. Schall, R. (1991), Estimation in Generalized Linear Models with Random Effects, Biometrika, 78, 719–-727. Schirm, A.L. and Zaslavsky, A.M. (1997), Reweighting Households to Develop Microsimulation Estimates for States, Proceedings of the 1997 Section on Survey Research Methods, American Statistical Association, pp. 306–311. Schmid, T. and Münnich, R.T. (2014), Spatial Robust Small Area Estimation, Statistical Papers, 55, 653–670. Schoch, T. (2012), Robust Unit Level Small Area Estimation: A Fast Algorithm for Large Data Sets, Austrian Journal of Statistics, 41, 243–265. Searle, S.R., Casella, G., and McCulloch, C.E. (1992), Variance Components, New York: John Wiley Sons. Inc.. Shao, J. and Tu, D. (1995), The Jackknife and Bootstrap, New York: Springer-Verlag. Shapiro, S.S. and Wilk, M.B. (1965), An Analysis of Variance Test for Normality (Complete Samples), Biometrika, 52, 591–611. Shen, W. and Louis, T.A. (1998), Triple-Goal Estimates in Two-Stage Hierarchical Models, Journal of the Royal Statistical Society, Series B, 60, 455–471. Singh, A.C. and Folsom, R.E. (2001), Benchmarking of Small Area Estimators in a Bayesian Framework, Unpublished manuscript. Singh, A.C., Mantel, H.J., and Thomas, B.W. (1994), Time Series EBLUPs for Small Areas Using Survey Data, Survey Methodology, 20, 33–43. Singh, A.C. and Mian, I.U.H. (1995), Generalized Sample Size Dependent Estimators for Small Areas, Proceedings of the 1995 Annual Research Conference, U.S. Bureau of the Census, Washington, DC, pp. 687–701. Singh, A.C., Stukel, D.M., and Pfeffermann, D. (1998), Bayesian Versus Frequentist Measures of Error in Small Area Estimation, Journal of the Royal Statistical Society, Series B, 60, 377–396. Singh, B., Shukla, G., and Kundu, D. (2005), Spatio-Temporal Models in Small Area Estimation, Survey Methodology, 31, 183–195. Singh, M.P., Gambino, J., and Mantel, H.J. (1994), Issues and Strategies for Small Area Data, Survey Methodology, 20, 3–22. Singh, M.P. and Tessier, R. (1976), Some Estimators for Domain Totals, Journal of the American Statistical Association, 71, 322–325.
REFERENCES
427
Singh, R. and Goel, R.C. (2000), Use of Remote Sensing Satellite Data in Crop Surveys, Technical Report, Indian Agricultural Statistics Research Institute, New Delhi, India. Singh, R., Semwal, D.P., Rai, A., and Chhikara, R.S. (2002), Small Area Estimation of Crop Yield Using Remote Sensing Satellite Data, International Journal of Remote Sensing, 23, 49–56. Singh, T., Wang, S., and Carroll, R.J. (2015a), Efficient Small Area Estimation When Covariates are Measured with Error Using Simulation Extrapolation, Unpublished Technical Report. Singh, T., Wang, S., and Carroll, R.J. (2015b), Efficient Corrected Score Estimators for the Fay-Herriot Model with Measurement Error in Covariates, Unpublished Technical Report. Sinha, S.K. and Rao, J.N.K. (2009), Robust Small Area Estimation, Canadian Journal of Statistics, 37, 381–399. Sinharay, S. and Stern, H. (2002), On the Sensitivity of Bayes Factors to the Prior Distributions, American Statistician, 56, 196–201. Skinner, C.J. (1994), Sampling Models and Weights, Proceedings of the Section on Survey Research Methods, Washington, DC: American Statistical Association, pp. 133–142. Skinner, C.J. and Rao, J.N.K. (1996), Estimation in Dual Frame Surveys with Complex Designs, Journal of the American Statistical Association, 91, 349–356. Slud, E.V. and Maiti, T. (2006), Mean-squared Error Estimation in Transformed Fay-Herriot Models, Journal of the Royal Statistical Society, Series B, 68, 239–257. Smith, T.M.F. (1983), On the Validity of Inferences from Non-Random Samples, Journal of the Royal Statistical Society, Series A, 146, 394–403. Spiegelhalter, D.J., Best, N.G., Gilks, W.R., and Inskip, H (1996), Hepatitis B: A Case Study in MCMC Methods, in W.R. Gilks, S. Richardson, and D.J. Spiegelhalter (Eds.), Markov Chain Monte Carlo in Practice, London: Chapman and Hall, pp. 21–43. Spiegelhalter, D.J., Thomas, A., Best, N.G., and Gilks, W.R. (1997), BUGS: Bayesian Inference Using Gibbs Sampling, Version 6.0, Cambridge: Medical Research Council Biostatistics Unit. Spjøtvoll, E. and Thomsen, I. (1987), Application of Some Empirical Bayes Methods to Small Area Statistics, Bulletin of the International Statistical Institute, 2, 435–449. Srivastava, M.S. and Bilodeau, M. (1989), Stein Estimation Under Elliptical Distributions, Journal of Multivariate Analysis, 28, 247–259. Stasny, E., Goel, P.K., and Rumsey, D.J. (1991), County Estimates of Wheat Production, Survey Methodology, 17, 211–225. Stefan, M. (2005), Contributions à l’Estimation pour Petits Domains, Ph.D. Thesis, Université Libre de Bruxelles. Stein, C. (1981), Estimation of the Mean of a Multivariate Normal Distribution, Annals of Statistics, 9, 1135–1151. Steorts, R. and Ghosh, M. (2013), On Estimation of Mean Squared Error of Benchmarked Empirical Bayes Estimators, Statistica Sinica, 23, 749–767. Stukel, D.M. (1991), Small Area Estimation Under One and Two-fold Nested Error Regression Models, Unpublished Ph.D. Thesis, Carleton University, Ottawa, Canada. Stukel, D.M. and Rao, J.N.K. (1997), Estimation of Regression Models with Nested Error Structure and Unequal Error Variances Under Two and Three Stage Cluster Sampling, Statistics & Probability Letters, 35, 401–407.
428
REFERENCES
Stukel, D.M. and Rao, J.N.K. (1999), Small-Area Estimation Under Two-Fold Nested Errors Regression Models, Journal of Statistical Planning and Inference, 78, 131–147. Sutradhar, B.C. (2004), On Exact Quasi-Likelihood Inference in Generalized Linear Mixed Models, Shankya, 66, 263–291. Swamy, P.A.V.B. (1971), Statistical Inference in Random Coefficient Regression Models, Berlin: Springer-Verlag. Thompson, M.E. (1997), Theory of Sample Surveys, London: Chapman and Hall. Tillé, Y. (2006), Sampling Algorithms, Springer Series in Statistics, New York: Springer-Verlag. Tiller, R.B. (1982), Time Series Modeling of Sample Survey Data from the U.S. Current Population Survey, Journal of Official Statistics, 8, 149–166. Torabi, M., Datta, G.S., and Rao, J.N.K. (2009), Empirical Bayes Estimation of Small Area Means Under a Nested Error Linear Regression Model with Measurement Errors in the Covariates, Scandinavian Journal of Statistics, 36, 355–368. Torabi, M. and Rao, J.N.K. (2008), Small Area Estimation under a Two-Level Model, Survey Methodology, 34, 11–17. Torabi, M. and Rao, J.N.K. (2010), Mean Squared Error Estimators of Small Area Means Using Survey Weights, Canadian Journal of Statistics, 38, 598–608. Torabi, M. and Rao, J.N.K. (2014), On Small Area Estimation under a Sub-Area Model, Journal of Multivariate Analysis, 127, 36–55. Torabi, M. and Shokoohi, F. (2015), Non-Parametric Generalized Linear Mixed Models in Small Area Estimation, Canadian Journal of Statistics, 43, 82–96. Tsutakawa, R.K., Shoop, G.L., and Marienfield, C.J. (1985), Empirical Bayes Estimation of Cancer Mortality Rates, Statistics in Medicine, 4, 201–212. Tzavidis, N., Marchetti, S., and Chambers, R. (2010), Robust Estimation of Small-Area Means and Quantiles, Australian and New Zealand Journal of Statistics, 52, 167–186. Ugarte, M.D., Goicoa, T., Militino, A.F., and Durban, M. (2009), Spline Smoothing in Small Area Trend Estimation and Forecasting, Computational Statistics and Data Analysis, 53, 3616–3629. Vaida, F. and Blanchard, S. (2005), Conditional Akaike Information for Mixed-Effect Models, Biometrika, 92, 351–370. Valliant, R., Dorfman, A.H., and Royall, R.M. (2001), Finite Population Sampling and Inference: A Prediction Approach, New York: John Wiley Sons, Inc.. Verret, F, Rao, J.N.K., and Hidiroglou, M.A. (2014), Model-Based Small Area Estimation Under Informative Sampling, Survey Methodology, in print. Wang, J. (2000), Topics in Small Area Estimation with Applications to the National Resources Inventory, Unpublished Ph.D. Thesis, Iowa State University, Ames. Wang, J. and Fuller, W.A. (2003), The Mean Squared Error of Small Area Predictors Constructed with Estimated Area Variances, Journal of the American Statistical Association, 98, 718–723. Wang, J., Fuller, W.A., and Qu, Y. (2008), Small Area Estimation Under Restriction, Survey Methodology, 34, 29–36. Wolter, K.M. (2007), Introduction to Variance Estimation (2nd ed.), New York: Springer-Verlag.
REFERENCES
429
Woodruff, R.S. (1966), Use of a Regression Technique to Produce Area Breakdowns of the Monthly National Estimates of Retail Trade, Journal of the American Statistical Association, 61, 496–504. Wu, C.F.J. (1986), Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis (with discussion), Annals of Statistics, 14, 1261–1350. Ybarra, L.M.R. and Lohr, S.L. (2008), Small Area Estimation When Auxiliary Information is Measures with Error, Biometrika, 95, 919–931. Yoshimori, M. and Lahiri, P. (2014a), A New Adjusted Maximum Likelihood Method for the Fay-Herriot Small Area Model, Journal of Multivariate Analysis, 124, 281–294. Yoshimori, M. and Lahiri, P. (2014b), Supplementary material to Yoshimori, M. and Lahiri, P. (2014a). Unpublished note. Yoshimori, M. and Lahiri, P. (2014c), A Second-Order efficient Empirical Bayes Confidence Interval, Annals of Statistics, 42, 1233–1261. You, Y. (1999), Hierarchical Bayes and Related Methods for Model-Based Small Area Estimation, Unpublished Ph.D. Thesis, Carleton University, Ottawa, Canada. You, Y. (2008), An Integrated Modeling Approach to Unemployment Rate Estimation for Sub-Provincial Areas of Canada, Survey Methodology, 34, 19–27. You, Y. and Chapman, B. (2006), Small Area Estimation Using Area Level Models and Estimated Sampling Variances, Survey Methodology, 32, 97–103. You, Y. and Rao, J.N.K. (2000), Hierarchical Bayes Estimation of Small Area Means Using Multi-Level Models, Survey Methodology, 26, 173–181. You, Y. and Rao, J.N.K. (2002a), A Pseudo-Empirical Best Linear Unbiased Prediction Approach to Small Area Estimation Using Survey Weights, Canadian Journal of Statistics, 30, 431–439. You, Y. and Rao, J.N.K. (2002b), Small Area Estimation Using Unmatched Sampling and Linking Models, Canadian Journal of Statistics, 30, 3–15. You, Y. and Rao, J.N.K. (2003), Pseudo Hierarchical Bayes Small Area Estimation Combining Unit Level Models and Survey Weights, Journal of Statistical Planning and Inference, 111, 197–208. You, Y., Rao, J.N.K., and Hidiroglou, M.A. (2013), On the Performance of Self-Benchmarked Small Area Estimators Under the Fay-Herriot Area Level Model, Survey Methodology, 39, 217–229. You, Y., Rao, J.N.K., and Gambino, J. (2003), Model-Based Unemployment Rate Estimation for the Canadian Labour Force Survey: A Hierarchical Bayes Approach, Survey Methodology, 29, 25–32. You, Y. and Reiss, P. (1999), Hierarchical Bayes Estimation of Response Rates for an Expenditure Survey, Proceedings of the Survey Methods Section, Statistical Society of Canada, pp. 123–128. You, Y. and Zhou, Q.M. (2011), Hierarchical Bayes Small Area Estimation Under a Spatial Model with Application to Health Survey Data, Survey Methodology, 37, 25–37. Zaslavsky, A.M. (1993), Combining Census, Dual-System, and Evaluation Study Data to Estimate Population Shares, Journal of the American Statistical Association, 88, 1092–1105. Zeger, S.L. and Karim, M.R. (1991), Generalized Linear Models with Random Effects: A Gibbs Sampling Approach, Journal of the American Statistical Association, 86, 79–86. Zewotir, T. and Galpin, J.S. (2007), A Unified Approach on Residuals, Leverages and Outliers in the Linear Mixed Model, Test, 16, 58–75.
430
REFERENCES
Zhang, D. and Davidian, M. (2001), Linear Mixed Models with Flexible Distributions of Random Effects for Longitudinal Data, Biometrics, 57, 795–802. Zhang, L.-C. (2007), Finite Population Small Area Estimation, Journal of Official Statistics, 23, 223–237. Zhang, L.-C. and Chambers, R.L. (2004), Small Area Estimates for Cross-Classifications, Journal of the Royal Statistical Society, Series B, 66, 479–496.
INDEX
Adaptive rejection sampling, 336, 339, 389, 399 Adult literacy proportions, 359 Aggregate level model, see area level model Akaike information criterion (AIC) conditional, 112–13, 192 marginal, 111, 113, 118–19, 170, 265, 328 American Community Survey (ACS), 8, 26, 132–3, 159 American Community Survey, 8, 26, 132 Area level model, 4, 8, 304, 316 basic, 76–8, 112, 118–19, 123–71, 270–88, 299, 301, 336, 341, 347–55, 359, 373, 374, 401 multivariate, 81–2, 89, 235–37, 262–264 spatial, 86–88, 248–51, 264–8, 355–6 time series and cross-sectional, 83–86, 240–3, 377–81 with correlated sampling errors, 82–83, 237–40 Area occupied by olive trees, 192 BLUP, see best linear unbiased prediction estimator BUGS, 334, 339–40, 351, 360–1, 364–5, 375–6, 388–9, 401 Balanced repeated replication, 40 Balanced sampling, 31 Batting averages, 63, 66–8
Bayes factor, 342, 343, 345 Bayes theorem, 330, 333–4 Bayesian information criterion (BIC), 111–13, 118–19, 170, 265, 328 Benchmarking, 159–65, 204, 207, 208, 210, 366 difference, 160–1 “optimal”, 161–3 ratio, 42, 160 self- 164–5 two-stage, 130, 163 Best estimator, see best prediction estimator Best linear unbiased estimator, 98, 124 Best linear unbiased prediction estimator, 98, 124 derivation of, 119–20 empirical, see empirical BLUP for area level model, 124–6 for unit level model, 174–7 survey weighted, 143 Best prediction (BP) estimator, 5, 99, 144, 212, 271 Best predictive estimator (BPE), 166, 220 Beta-binomial model, 303, 309, 389, 390, 397 Big data covariates, 159 Block diagonal covariance structure, block-diagonal covariance, 108–11, 114–18, 287–8
438 Bootstrap, parametric bootstrap, 114, 119, 141–2, 148, 168, 183–6, 198–9, 204, 210, 213, 215, 220, 224, 226–30, 237, 246–7, 250, 262, 264–7, 276, 281–5, 289, 294–8, 304, 306–7, 310, 325–6, 329, 369 Burn-in period, 341 CODA, 334, 339, 364–5, 375 Calibration estimator, see also generalized regression estimator, 13–4, 22, 61 Census undercoverage, Canada, 78, 134–5, 151, 358–9 Census undercoverage, U.S., 78, 83, 239–40 Clustering, 24 Coefficient of variation, 11 Community Health Survey, Canada, 25, 356 Composite estimator, 57–64 Conditional MSE, 136, 144–5, 149, 167, 179–81, 199, 273, 288, 302, 319, 321 Conditional autoregression spatial (CAR) model, 86–7, 94, 248–9 Conditional predictive ordinate, 346 Constrained HB estimation, 399–400 Constrained empirical Bayes, 316–18 Constrained linear Bayes, 324–5 Consumer expenditure, 250 Cook’s distance, 116–17 Correlated sampling errors, see area level models County crop, 7 areas, 80, 89, 117, 186–9, 209–10, 228–31, 363, 365 production, 41 Covariates subject to sampling error, 156–8 Credible interval, 6, 339, 359, 361–2, 370, 372, 389, 401 Cross-validation predictive density, 345 Current Population Survey (CPS), U.S., 8, 10, 26, 59, 78, 82–6, 88, 132, 134, 143, 237, 242–3, 247, 315, 350–1, 380–2 Danish Health and Morbidity Survey, 25 Data cloning, 400–1 Demographic methods, 3, 45 Department of Education, U.S., 8, 78, 131 Department of Health and Human Services, U.S., 82 Design issues, 10, 23 Design-based approach, 10, 11 Design-consistent estimator, 11 Design-unbiased estimator, 11 Diagnostics case-deletion, 117–18 influence, 116–17 Diagnostics, 114–17, 176, 186–7
SUBJECT INDEX Direct estimators, 1 Disease mapping, 7, 308-13, 383–8 empirical Bayes, 308-13 hierarchical Bayes, 383–8 Dual-frame survey, 25 EB, see empirical Bayes estimation EBLUP, see empirical best linear unbiased prediction estimator ELL estimation, 295–6 EM algorithm, 103, 118, 305, 307, 311, 323 Effective sample size, 24, 25, 26, 113 Empirical Bayes estimation, 5–8, 269–331 area level models, 270–87 binary data, 298–308 confidence intervals, 281–7 design-weighted, 313–15 disease mapping, 308–13 general finite population parameters, 289–98 linear mixed models, 287–9 Empirical best estimation, see empirical Bayes estimation Empirical best linear unbiased prediction estimator, 5, 8, 97–121, 223 for area level model, 123–6 for block diagonal covariance structure, 108–11 for unit level model, 173–7 for vectors of area proportions, 262–4 MSE estimation, 136–43, 181–6 MSE of, 136, 179–80 multivariate, 110–11 with survey weights, 206–10 Empirical constrained linear Bayes estimator, 324–5 Empirical distribution function, estimation of, 318 Empirical linear Bayes, 322–3 European Community Household Panel Survey, 25 Expansion estimator, 11 Exponential family models, 94, 313–5, 398–9 Extreme value, see outlier FGT poverty measures, 293 Fay-Herriot moment estimator, 126–9 Fence method, 113–14 Fitting-of-constants method, 177-78 Frame, 25 General finite population parameters, 289–98, 325–30, 369–74 General linear mixed model, 91, 98–111 with block diagonal covariance structure, 108–11 Generalized information criterion (GIC), 112–13 Generalized linear mixed model, 92–95, 261–2, 304–5
SUBJECT INDEX exponential family, 94, 313–15, 398–9 for mortality and disease rates, 93–4 logistic regression, 92–3, 261–3, 304–8, 393–7 semi-parametric, 95 Generalized linear structural model (GLSM), 49–54 Generalized regression estimator, 13–16 Gibbs sampling, 305, 334–47, 351–6, 364–5 see also Monte Carlo Markov chain methods HB, see hierarchical Bayes estimation Henderson’s estimators of variance components, 177 Heteroscedastic error variances, 205 Hierarchical Bayes estimation, 5–8, 333–403 binary data, 389–98 disease mapping models, 383–7 exponential family models, 398 general ANOVA model, 368–9 missing binary data, 397 model determination, 342 multivariate models, 381–2 pseudo-HB estimator, 366–8 time series and cross-sectional models, 377–80 two-level models, 386–8 unmatched sampling and linking models, 356–62 Highest posterior density interval, 339 Histogram estimation, 318 Horvitz-Thompson estimator, 12 Hospital admissions, 95 Housing Demand Survey, Netherlands, 25 Income for small places, 8, 77, 129–31 Indirect estimator, 2–5, 10, 35–74 Informative sampling, see sample selection bias Integration of surveys, 25 Jackknife, 158, 206, 218, 273–5, 281, 288, 301–4, 323 James-Stein estimator, 63–74 multivariate, 71 MSE estimator, 68–70 total MSE, 63–8 Kalman filter, state space models, 244–6 Labour Force Survey (Canada), 60, 83 Likelihood displacement, 117 Linear Bayes, 319–23 see also empirical linear Bayes Linear mixed model, 77, 91–2, 98–107 block diagonal covariance structure, 108–11 Linking model, 3–4, 6, 8, 77
439 unmatched with sampling model, 77, 118, 356–62 Lip cancer, counties in Scotland, 94, 308, 312, 385 List frame, 24–25, 248 Local influence, 186 Log-normal model, 310–12, 384–6 Logistic regression, 92–4, 304–8 Logit-normal models, 302–5 M-quantile estimation, 200–5 ML, see maximum likelihood estimation Markov Chain Monte Carlo methods, 305, 334–47, 351–6, 364–5 Gibbs sampler, 336 M-H within Gibbs, 336–7 burn-in length, 341–2 choice of prior, 339–41 convergence diagnostics, 341–2 model determination, 342–7 posterior quantities, 337–9 single run versus multiple runs, 341 Markov chain, 335–6, 338, 341–2 Matching prior, 340, 349–51 Maximum likelihood estimation, 97, 102–4, 109–10, 118, 127–9, 136–43, 169, 178, 180–1, 227, 242–3, 245–6, 249, 262, 264, 300–1, 305, 309–11, 323, 401 Mean cross product error, 144 Mean squared error, 43–5, 58, 68–70, 98, 100–1, 105–6, 121, 125, 136–46, 169 area specific estimator of, 44–5, 110, 136–8, 142, 167, 181, 220, 301–2, 310 estimator of, 43–5, 68–70, 106–7, 109–10, 108–9, 139–2, 169, 273–4, 288–9, 301–2, 310, 325 naive estimator of, 118, 178, 258, 351 conditional, 136, 144–6, 167, 199, 273, 288, 302 for non sampled areas, 139–40 for small area means, 140–1 of weighted estimator, 143 Measurement error in covariates, 156–8, 216–8 Median income of four-person households, 8, 82, 85, 237, 242, 381–2 Metropolis-Hastings algorithm, 334, 336–7, 339 Milk expenditure, 170 Minimum norm quadratic unbiased estimation (MINQUE), 104–5 Misspecified linking model, 165–9, 218–20 Model checking, see model diagnostics Model diagnostics, 114–18, 186–7 Model for multinomial counts, 93–4, 261–4 Model identification, see model diagnostics Model-based estimator, 4–5, 9, 61, 69
440
SUBJECT INDEX
Modified GREG estimator, 21 Monthly Retail Trade Survey (MRTS), Canada, 28, 30
Pseudo-EBLUP estimation, 118, 206–10, 214–16, 313–5, 365–8 p-unbiased, see design-unbiased estimator, 10
National Agricultural Statistical Service, U.S., 7 National Center for Health Statistics, U.S., 6, 36, 38, 40 National Center for Social and Economic Modeling (NATSEM), 55 National Health Interview Survey, U.S., 7, 24, 26, 36, 93, 156, 274, 395 National Health and Nutrition Examination Survey, U.S., 156, 398 National Household Survey on Drug Abuse, U.S., 25, 397 National Natality Survey, U.S., 40 Nested error regression model basic, 78–81, 113, 118–9, 173–233, 290–8, 325–30, 362–368, 370–4 multivariate, 88–9, 253–4, 382–3 semi-parametric, 95 two-fold, 89–90, 254–8 Nonnegligible sampling fraction, 182, 184 Normal curvature, 117
Quantile-quantile plot, 134–5, 187
Observed best predictor (OBP), 165–9, 218–20 Operator notation, 12–13 Optimal design, 23 Optimal sample allocation, 26–32 Outlier robust EBLUP, 146–8, 193–200, 224–7, 250 Outlier, 114–15, 134–5, 146–8, 187, 193–9, 224–7, 229–230, 387 p-consistent, see design-consistent estimator Penalized likelihood, 99 Penalized quasi-likelihood, 262, 305 Perfect sampling, 342 Poisson-gamma model, 312–13, 383 Posterior density, 269–72, 311, 334, 400–1 propriety of, 339–340 Posterior linearity, 322–4 Posterior predictive density, 343–4, 372 Posterior predictive p-value, 344–5, 353, 358, 365, 378, 380, 387 Poststratification, 11, 39 Poverty counts, 8, 133, 315, 350 Poverty mapping, 8, 270, 326, 373 Prasad-Rao moment estimator, 177 Prediction mean squared error, 98 Preliminary test (PT), 154–6 Prior distribution choice of, 339–41 diffuse improper, 245 informative, 245
R package sae, 119, 169, 227, 264, 325 REBLUP, see outlier robust EBLUP REML, see restricted maximum likelihood estimation Radio listening, U.S., 36–7 Raking, two-step, 130, 133 Random error variances linear model EBLUP estimation, 205 HB estimation, 375 Random walk model, 84–6, 240–5, 377–8, 380 Ranks, Bayes estimator of, 318–19 Rao-Blackwell estimator, 338, 352–3, 365, 375–8, 381, 383, 390 Ratio adjustment, 61, 70, 163, 188–9 Ratio estimation, 14, 17–9, 21, 23, 29, 38–9, 60 Regression estimation, see also generalized regression estimator 10, 13–22, 31–3, 37–8, 60, 260–1 Regression-adjusted synthetic estimator, 42–3 Regression-synthetic estimator, 41–2, 126, 129, 133, 139, 142, 154–6 Relative root mean squared error, defined, 40 Repeated surveys, 26 Resampling methods, 43, 61 Residual analysis, 114, 134, 346 Residuals conditional, 115 EBLUP, 116, 119, 187, 192, 229 transformed, 187, 192 Studentized, 115 Restricted iterative generalized least squares, 259 Restricted maximum likelihood estimation, 100, 102–5, 107, 109–10, 118, 127–9, 136, 139, 145–6, 164–5, 169, 178, 180–2, 227, 238, 264, 277, 280 Robustness of MSE estimator, 138 Rolling samples, 26 SAS, 28, 118, 178, 369, 395 SPREE, see Structure preserving estimation Sample size dependent estimator, 59–63 Sampling model, 77, 81, 88, 165–8, 263, 356–9, 361, 377, 387 Satellite data, 7, 80, 186, 228 Selection bias, sample, 79, 211–16 Semi-parametric model, 95 Shrinkage estimator, 64 see James-Stein estimator
441
SUBJECT INDEX Simulation extrapolation (SIMEX), 158 Simultaneously autoregression spatial (SAR) model, 87, 248–50 Small area, defined, 2 Software, 118–19, 169–71, 227–31, 264–8, 339, 340, 360 Spatial models, 86–8, 94, 248–51, 264–8, 313, 355–6 Spline model, see semi-parametric unit level model Standard error, 11 State space models, 84–5, 243–8, 250, 335 Stein’s lemma, 71–2 Stratified sampling, 24 Strictly positive estimators, 151–4 Structure preserving estimation, 45–54, 59 composite, 51–4 generalized, 49 Survey on Income and Living Conditions (SILC), 135, 243, 373 Survey regression estimator, 22, 35, 207–8, 257, 367 Synthetic estimation, 2, 6–7, 36–57 estimation of MSE, 43–45 ratio-synthetic estimator, 38 regression-adjusted synthetic estimator, 42–3 regression-synthetic estimator, 36, 41–2 with no auxiliary information, 36
Time series and cross-sectional models, 83–6, 240–3, 353, 377–81 Triple goal estimation, 6, 315–16 Turkey hunting, U.S., 390 Two-fold subarea level model, 88, 251–3 Two-level model, 90–1, 259–61, 369, 374–7 Two-part nested error model, 388–9 Undercount, see census undercoverage Unit level model basic, 78–81, 113, 118–9, 173–233, 290–8, 325–30, 362–368, 370–4 multivariate, 88–9, 253–4, 382–3 semi-parametric, 95, 220–7 two-fold, 89–90, 254–8 Unknown sampling error variances, 148–51 Variable selection, 111–14 Variance components, 97 estimation of, 102–5, 177–8, 207 Variance estimation, 12, 15, 61 Variance, design based, 4, 10 Visits to physicians, 93, 395 Wages in Nova Scotia, 20–1, 81, 189 Weight, sampling, 1, 11 Poisson model for, 55 Weight-sharing, 38, 53 Weighted normal probability plots, 116