Asymptotics in Statistics and Probability: Papers in Honor of George Gregory Roussas [Reprint 2018 ed.] 9783110942002, 9783110354744


147 68 36MB

English Pages 453 [456] Year 2000

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Contents
Preface
Contributors
George Gregory Roussas: Biographical Sketch
REACT Trend Estimation In Correlated Noise
Higher Order Analysis at Lebesgue Points
Regression Analysis for Multivariate Failure Time Observations
Local Estimation of a Biometric Function with Covariate Effects
The Estimation of Conditional Densities
Functional Limit Theorems for Induced Order Statistics of a Sample from a Domain of Attraction of α-Stable Law, α ∈(0,2)
Limit Laws for Kernel Density Estimators for Kernels with Unbounded Supports
Inequalities for a New Data-Based Method for Selecting Nonparametric Density Estimates
B-Fuzzy Stochastics
Detecting Jumps in Nonparametric Regression
Some Recent Results on Inference Based on Spacings
Extending Correlation and Regression from Multivariate to Functional Data
Estimation of Conditional Distribution Function and its Quantiles Involving Measurement Errors
On the Estimation of a Distribution Function from Noisy Observations in Time Series
Nonparametric Estimation of Conditional Survival Under Destructive Testing
Local Polynomial Fitting of Continuous-Time Processes From Discrete-Time Observations: Strong Consistency and Rates
A Central Limit Theorem for an Array of Strong Mixing Random Fields
The Continuous-Path Block-Bootstrap
Nonparametric Estimation of Partial Derivatives of a Multivariate Probability Density by the Method of Wavelets
Limit Theorems for Fuzzy Random Variables
Statistical Analysis of Doubly Censored Data Under Support Constraints
Limit Theorems for Exchangeable Random Elements and Random Sets
The Least Squares Method in Heteroscedastic Censored Regression Models
L1 Estimates for an Additive Regression Type Function Under Dependence
Estimating L1 Error of Kernel Estimator: Convergence Monitoring of Markov Samplers
APPENDIX: The Publications of George Gregory Roussas
Recommend Papers

Asymptotics in Statistics and Probability: Papers in Honor of George Gregory Roussas [Reprint 2018 ed.]
 9783110942002, 9783110354744

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Asymptotics in Statistics and Probability

George Roussas being installed as Rector (Chancellor) of the University of Patras, Greece, by the departing Rector, Professor George Maniatis

ASYMPTOTICS IN STATISTICS AND PROBABILITY PAPERS IN HONOR OF GEORGE GREGORY ROUSSAS

EDITED BY MADAN L. PURI

M Y S P W Utrecht • Boston • Köln • Tokyo 2000

VSP BV P.O. Box 346 3700 AH Zeist The Netherlands

Tel: +31 30 692 5790 Fax: +31 30 693 2081 [email protected] www.vsppub.com

© VSP BV 2000 First published in 2000 ISBN 90-6764-333-5

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner.

Printed in The Netherlands by Ridderprint

bv,

Ridderkerk.

CONTENTS Preface Contributors George Gregory Roussas: Biographical Sketch

xi xiii xvii

REACT Trend Estimation in Correlated Noise Rudolf Beran 1. Introduction 2. The Lumber-Thickness Data 3. Asymptotically Minimax Estimators 4. Proofs References

1 4 9 12 15

Higher Order Analysis at Lebesgue Points A. Berlinet and S. Levallois 1. Introduction 2. Definitions and Main Results 3. Two Examples with Infinite Derivatives 4. Example with Discontinuity of Second Kind 5. Conclusion References

17 18 23 30 31 32

Regression Analysis for Multivariate Failure Time Observations T. Cai and L. J. Wei 1. Introduction 2. Estimation of Regression Parameters 3. Simultaneous Predictions of Survival Probabilities and Related Quantities . 4. Example 5. Remarks 6. Appendix: Asymptotic Joint Distribution of {/}£} with Discrete Covariates . References

33 34 38 40 43 44 45

Local Estimation of a Biometric Function with Covariate Effects Zongwu Cai and Lianfen Qian 1. Introduction 2. Local Likelihood Method 3. An Exponential Regression Approach 4. Simulation Study

47 50 57 59

vi

Contents

Appendix: Proofs References

62 69

The Estimation of Conditional Densities X. Chen, O. Linton and P. M. Robinson 1. Introduction 71 2. Kernel Conditional Density Estimates 73 3. Asymptotic Theory of Conditional Density Estimates and Bandwidth Choice 76 References 83

Functional Limit Theorems for Induced Order Statistics of a Sample from a Domain of Attraction of a-Stable Law, a e (0,2) Yu. Davydov and V. Egorov 1. Introduction 2. Notation 3. Auxiliary Facts 4. Main Results 5. Lemmas 6. Proofs 7. Concluding Remarks References

85 87 88 100 103 106 113 115

Limit Laws for Kernel Density Estimators for Kernels with Unbounded Supports Paul Deheuvels 1. Introduction and Results 2. Proofs References

117 127 131

Inequalities for a New Data-Based Method for Selecting Nonparametric Density Estimates Luc Devroye, Gabor Lugosi and Frederic Udina 1. Introduction 2. The Basic Estimate 3. Standard Kernel Estimate: Riemann Kernels 4. Standard Kernel Estimates: General Kernels 5. Multiparameter Kernel Estimates - Product Kernels 6. Multiparameter Kernel Estimates - Ellipsoidal Kernels 7. The Transformed Kernel Estimate 8. Monte Carlo Simulations A. Proof of Theorem 1 References

133 134 136 138 141 143 143 147 150 153

Contents

vii

B-Fuzzy Stochastics C. A. Drossos, G. Markakis and P. L. 1. Introduction 2. Preliminaries 3. Basic Results 4. B-Fuzzy Structures 5. B-Fuzzy Statistics References

Theodoropoulos 155 156 160 161 165 169

Detecting Jumps in Nonparametric Regression Ch. Dubowik and U. Stadtmiiller 1. Introduction 2. The Proposed Model 3. New Results 4. Simulations 5. Appendix References

171 172 174 177 177 183

Some Recent Results on Inference Based on Spacings Kaushik Ghosh and S. Rao 1. Introduction 2. Parameter Estimation 3. Goodness-of-fit Tests 4. Conclusions References

Jammalamadaka 185 186 190 195 196

Extending Correlation and Regression from Multivariate to Functional Data G. He, H. G. Miiller and J. L. Wang 1. Introduction 2. Preliminaries 3. Properties of Functional Canonical Correlation 4. The Functional Linear Regression Model References

197 198 202 206 210

Estimation of Conditional Distribution Function and its Quantiles Involving Measurement Errors D. A. Ioannides and E.

Matzner-L0ber

1. Introduction 2. Notations and Main Results 3. Numerical Example 4. Proofs References

211 214 216 218 221

viii

Contents

On the Estimation of a Distribution Function from Noisy Observations in Time Series D. A. loannides, D. P. Papanastassiou and S. B. Fotopoulos 1. Introduction 2. Assumptions and Main Results 3. Some Auxiliary Results 4. Proofs References

223 226 227 235 240

Nonparametric Estimation of Conditional Survival Under Destructive Testing Richard A. Johnson and K. T. Wu 1. Introduction

243

2. Two Estimators for Fy\x>l(-)

244

3. Some Technical Results 4. Proof of Theorem 2.1 References

246 252 257

Local Polynomial Fitting of Continuous-Time Processes From Discrete-Time Observations: Strong Consistency and Rates Elias Masry 1. Introduction 2. Preliminaries 3. Decomposition of the Estimation Error 4. Main Results 5. Derivations 6. Appendix References

259 261 264 268 276 285 287

A Central Limit Theorem for an Array of Strong Mixing Random Fields Tucker McElroy and Dimitris N. Politis 1. Introduction and Background 2. Central Limit Theorem 3. Proof of the Theorem References

289 290 292 302

The Continuous-Path Block-Bootstrap Efstathios Paparoditis and Dimitris N. Politis 1. Introduction 2. The Continuous-Path Block-Bootstrap (CBB) 3. Estimation of the Unit Root Distribution 4. Proofs References

305 307 310 311 319

Contents

ix

Nonparametric Estimation of Partial Derivatives of a Multivariate Probability Density by the Method of Wavelets B. L. S. Prakasa Rao 1. Introduction 2. Preliminaries 3. Main Result References

321 322 323 330

Limit Theorems for Fuzzy Random Variables F. N. Proske and M. L. Puri 1. Introduction 2. Preliminaries 3. Strong Law of Large Numbers for Fuzzy Random Variables 4. Central Limit Theorem for Fuzzy Random Variables 5. Conclusion References

331 333 335 339 345 345

Statistical Analysis of Doubly Censored Data Under Support Constraints W. Stute and J. Reitze 1. Introduction and Main Results 2. Confidence Bands 3. Simulation Results 4. Proofs 5. Appendix: Some Selected Boundary Crossing Probabilities References

347 355 357 361 363 365

Limit Theorems for Exchangeable Random Elements and Random Sets Robert L. Taylor, A. N. Vidyashankar and Yinpu Chen 1. Introduction and Summary 2. Background and Preliminaries 3. Strong Laws of Large Numbers for Exchangeable Random Elements 4. Strong Laws of Large Numbers for Random Sets References

367 368 371 376 377

The Least Squares Method in Heteroscedastic Censored Regression Models Ingrid Van Keilegom and Michael G. Akritas 1. Introduction 2. Definitions and Assumptions 3. Main Results 4. Proofs References

379 382 387 389 390

X

Contents

Lj Estimates for an Additive Regression Type Function Under Dependence Yannis G. Yatracos 1. Introduction 2. Definitions, the Assumptions, the Tools 3. Estimates and Rates of Convergence References

393 394 396 399

Estimating L 1 Error of Kernel Estimator: Convergence Monitoring of Markov Samplers Bin Yu 1. Introduction 2. Estimating the Ll Error of a Kernel Estimator 3. Using Estimated L1 Error to Monitor the Convergence of Markov Samplers 4. Examples: Bimodal Target Densities 5. Concluding Remarks 6. Acknowledgements 7. Appendix: Proofs References

401 403 408 410 415 416 416 420

APPENDIX: The Publications of George Gregory Roussas

423

PREFACE This volume is presented as a tribute to George Gregory Roussas who has served with great distinction and dedication the statistical community for over thirty-five years through his research, teaching, administrative skills and many other activities. These are documented elsewhere in this volume. His colleagues and friends welcome this opportunity to honor him with this collection of papers dedicated to him. The contributors were selected because of their association with George Roussas or because of their activity in areas of research where George has made contributions. The papers included here represent George's broad range of research interests. A large number of individuals have helped make this volume possible. I am grateful to the authors for their contributions. I am also indebted to the referees for their kind assistance. My special thanks to Yannis Yatracos for his unfailing assistance in compiling George Roussas' biographical sketch. My deepest gratitude goes to Mrs. Virginia K. Jones who went over each manuscript with a fine tooth comb, and corresponded with the authors regarding technical details. Her enthusiastic assistance is deeply appreciated. All of us who had a part in the preparation of this volume have done so to express our admiration and affection to George Gregory Roussas. MADAN L. PURI Indiana University Bloomington, Indiana U.S.A.

CONTRIBUTORS

AKRITAS, MICHAEL G., Department of Statistics, Penn State University, 326 Thomas Building, University Park, PA 16802, U.S.A. BERAN, RUDOLF, Department of Statistics, University of California, Berkeley, CA 94720-3860, U.S.A. BERLINET, ALAIN. Department of Mathematics, University of Montpellier II, place Eugène Bataillon, 34 095 Montpellier Cedex 5, France. CAI, T., Department of Biostatistics, Harvard University, 655 Huntington Avenue, Boston, MA 02115, U.S.A. CAI, ZONGWU, Department of Mathematics, University of North Carolina at Charlotte, Charlotte, NC 28223, U.S.A. CHEN, X., Department of Economics, London School of Economics, Houghton Street, London WC2A 2AE, United Kingdom. CHEN, YINPU, SCIREX Corporation, Bloomingdale, IL 60108, U.S.A. DAVYDOV, YU., Laboratoire de Statistique et Probabilités, Université des Sciences et Technologies de Lille, 59655 Villeneuve d'Ascq, France. DEHEUVELS, PAUL, L.S.T.A., Université Paris VI, Paris, France. DEVROYE, LUC, School of Computer Science, McGill University, Montreal, Canada H3A 2A7. DROSSOS, C. A., Department of Mathematics, University of Patras, GR 26500, Patras, Greece. DUBOWIK, CH., Universität Ulm, Abt. Math. III, 89069 Ulm, Germany. EGOROV, V., University of Electrical Engineering, 197376 St. Petersbourg, Russia. FOTOPOULOS, S. B., Department of Management and Decision Sciences and Program in Statistics, Washington State University, Pullman, WA 991644736, U.S.A.

xiv

Contributors

GHOSH, KAUSHIK, Department of Statistics, George Washington University, Washington, DC 20052. U.S.A. HE, G., Biometrics Unit, California Department of Fish and Game, 1807 13th Street, Suite 201, Sacramento, CA 95814, U.S.A. IOANNIDES, D. A., Department of Economics, University of Macedonia, 54006 Thessaloniki, Greece. JAMMALAMADAKA, S. RAO, Department of Statistics and Applied Probability, University of California, Santa Barbara, CA 93106, U.S.A. JOHNSON, RICHARD A., Department of Statistics, University of Wisconsin-Madison, 1210 W. Dayton Street, Madison, WI 53706, U.S.A. LEVALLOIS, S., Department of Mathematics, University of Montpellier II, place Eugène Bataillon, 34 095 Montpellier Cedex 5, France. LINTON, O., Department of Economics, London School of Economics, Houghton Street, London WC2A 2AE, United Kingdom. LUGOSI, GÂBOR, Department of Economics and Business, Universität Pompeu Fabra, Ramon Trias Fargas, 25-27, 08005 Barcelona, Spain. MARKAKIS, G., T.E.I, of Heraklion, Stavromenos, Heraklion, GR 71500, Greece. MASRY, ELIAS, Department of Electrical and Computer Engineering, University of California at San Diego, La Jolla, CA 92093-0407, U.S.A. MATZNER-L0BER, E., Department of Statistics, University of Rennes II, 35043 Rennes Cedex, France. McELROY, TUCKER, Department of Mathematics, University of California at San Diego, La Jolla, CA 92093-0112, U.S.A. MÜLLER, H. G., Division of Statistics, University of California, Davis, CA 95616-8705, U.S.A. PAPANASTASSIOU, D. P., Department of Applied Informatics, University of Macedonia, 54006 Thessaloniki, Greece. PAPARODITIS, EFSTATHIOS, Department of Mathematics and Statistics, University of Cyprus, CY-1678 Nicosia, Cyprus. POLITIS, DIMITRIS N„ Department of Mathematics, University of California at San Diego, La Jolla, CA 92093-0112, U.S.A.

Contributors

xv

PRAKASA RAO, B. L. S., Indian Statistical Institute, 7, SJS Sansanwal Marg, New Delhi 110 016, India and Australian National University, Canberra, ACT 0200, Australia. PROSKE, F. N., Universität Ulm, Abt. Math. III, 89069 Ulm, Germany. PURI, MADAN L., Department of Mathematics, Indiana University, Bloomington, IN 47405, U.S.A. QIAN, LLANFEN, Department of Mathematics, Florida Atlantic University, Boca Raton, FL 33431, U.S.A. REITZE, J., Mathematical Institute, University of Giessen, Arndtstr. 2, D-35392 Giessen, Germany. ROBINSON, P. M., Department of Economics, London School of Economics, Houghton Street, London WC2A 2AE, United Kingdom. STADTMÜLLER, U., Universität Ulm, Abt. Math. III, 89069 Ulm, Germany. STUTE, W., Mathematical Institute, University of Giessen, Arndtstr. 2, D-35392 Giessen, Germany. TAYLOR, ROBERT L„ Department of Statistics, University of Georgia, Athens, GA 30602-1952, U.S.A. THEODOROPOULOS, P. L., 8 Theoklitou st., GR 22100, Tripolis, Greece. UDINA, FREDERIC, Department of Economics and Business, Universität Pompeu Fabra, Ramon Trias Fargas, 25-27, 08005 Barcelona, Spain. VAN KEILEGOM, INGRID, Department of Mathematics, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands. VIDYASHANKAR, A. N., Department of Statistics, University of Georgia, Athens, GA 30602-1952, U.S.A. WANG, J. L., Division of Statistics, University of California, Davis, CA 95616-8705, U.S.A. WEI, L. J., Department of Biostatistics, Harvard University, Boston, MA 02115, U.S.A. WU, K. T., Department of Statistics, Ming Chuan University, Guei-San, TauYuen County, Taiwan.

xvi

Contributors

YATRACOS, YANNIS G., Department of Statistics and Applied Probability, The National University of Singapore, Faculty of Science, 3 Science Drive 2, Singapore 117543. YU, Bin, Bell Laboratories, Lucent Technologies, Murray Hill, NJ 07974 and University of California, Berkeley, CA 94720, U.S.A.

GEORGE GREGORY ROUSSAS: BIOGRAPHICAL SKETCH*

George Gregory Roussas was born on June 29, 1933 to a prominent family in the township of Marmara, in the heart of continental Greece. His parents were Maria and Gregory, and he is the youngest and only male child among four siblings. George's extended family consisted of educators and businessmen. His father established a prosperous business in the city of Thessaloniki, the second largest city in Greece (after Athens), located in Northern Greece, Macedonia. In the early 1920's, and throughout his life, George's father was both a millionaire and a pauper, subject to tough and unforgiving rules of life! He was also a vocal freethinker, the personification of a genuine liberal in its classical European sense. George's mother, Maria, was a strongwilled and beautiful woman who was instinctively endowed with the virtues of exhibiting deep, yet restrained, love combined with a fair dose of healthy disciplinary principles. George's elementary education was completed in Marmara and Thessaloniki, and his secondary education (Gymnasium) in Thessaloniki and Athens. He entered the University of Athens in the Fall of 1953, and graduated with a degree in Mathematics and high honors in the Fall of 1956. The dream of his youth was to become an Air Force Officer for which he needed both of his parents' consent. That consent was not to be given. Instead, he decided to compete for a position in the Engineering School of the Air Force Officers, where the rate of success at that time was about 2.7%. After a tenmonth period of intensive study, George earned one of eight positions among the 300 candidates. Rumor had it that he was also the leader of the class. Unfortunately (or, perhaps, fortunately), due to political considerations, he was not admitted. The early 1950's was the period of recovery after a bitter and devastating four-year war between the government forces and the communist insurrection. As is usually the case in situations like this, there was an unabridged divide between "them" and "us" with no gray area allowed to "This biographical sketch has been compiled after the undersigned had a conversation with George. The publications of George Roussas are given in the Appendix.

xviii

G. G. Roussas: biographical

sketch

exist. So, George's father's political ideology was entirely unsuitable for the case at hand. Thus, George became a mathematician by default. George served his required military service as a private in the Greek Army for two years, improved his knowledge of the English language, and successfully passed an examination for a fellowship from the State Fellowships Foundation for postgraduate study abroad. At that time, a degree in Mathematics meant a one-way street to high school teaching, which George was not looking forward to. Fulfillment of the military obligation was a precondition for claiming a State Fellowship. After basic training, George was placed in the Army School of Telecommunications for specialization. Instead of specializing, he was put, in effect, in charge of the administration of about 30 classes of draftees, noncommissioned officers and officers, under the supervision of two career officers. It was through this experience that George discovered that he was endowed with considerable administrative abilities. Due to financial considerations, the Fellows who successfully passed the English language examination were automatically sent to the United Kingdom. It was at this point that Professor D. A. Kappos of the University of Athens effectively intervened with the leadership of the State Fellowships Foundation and persuaded them to allow George to go to the United States instead. Professor Kappos was a student of the great Caratheodory in Germany and held a faculty position there before he was elected in the University of Athens. In the core of mathematical subjects, he was the lone modern mathematician at the University of Athens. George served as his teaching assistant when he was a junior and senior and also for about ten months between the time of his discharge from the military and his departure to the United States. At this point, honorable mention must be made of Dr. Gerasimos Legatos who was the Senior Lecturer in the Chair of Mathematical Analysis occupied by Professor Kappos, and who was instrumental in identifying the gifted and promising among the mathematics students. In the Spring Semester 1960, George joined the Department of Statistics at the University of California, Berkeley. He was immediately fascinated, as anyone would, by the physical beauty of the UC-Berkeley campus, and the immensely high academic standards of the department. George had no prior exposure to Statistics whatsoever and, as it would be expected, things were tough going at times. He received a solid education there, as well as encouragement and support by individual faculty members and the department. To each and every faculty member of the department at the time, George feels a sense of gratitude. Furthermore, as is often the case "some are more equal than others" in any given circumstance. Accordingly, special appreciation is given posthumously to Professor Edward W. Barankin, a

G. G. Roussas: biographical

sketch

xix

philosopher, classical scholar and exceptional individual, in addition to being a statistician of UC-Berkeley caliber. His untimely departure is regrettable. Professor Lucien Le Cam always had enough time and plenty of ideas to share with anyone. He was, and surely still is, the person who would answer just about all questions asked by any student. His presence is an invaluable asset to the department. Finally, Professor David Blackwell has been the rarest of blends, a great mathematical scientist and an absolutely superb human being. It is not an overstatement to say that he has been the single most remarkable individual that George had the privilege to meet in his entire long professional career. George graduated from UC-Berkeley with a Ph.D. in Statistics in 1964, and although he had contractual obligations to return to Greece, he was not anxious to do so immediately. He carelessly did not pursue an invitation by Professor George E. P. Box of the University of Wisconsin, Madison, to submit an application there. Instead, he readily accepted a position as an Assistant Professor of Mathematics at California State University, San Jose. A year later, George responded to a renewed invitation by Professor Irwin Guttman, Acting Chair of the Department of Statistics at the University of Wisconsin, Madison. Following an interview, George was offered an appointment as an Assistant Professor of Statistics, which he accepted and in the Fall 1966 commenced. During the same year, there were five assistant professors who joined the department, of which only one is still on the faculty and that is Richard A. Johnson. In 1968, he was promoted to Associate Professor with tenure, and in 1972 to Full Professor. George's stay in Madison remains the happiest part of his professional career. He was impressed by the then superb academic climate at the university, and the natural beauty of the campus and the city of Madison. And best of all, that is where he met his wife of twenty-nine years, Mary Louise Stewart. Mary was a graduate student in Food Administration, and a student in one of George's classes! She earned a Master's degree and was well on her way toward completing the requirements towards a Ph.D. degree. They are the proud parents of three beautiful sons, Gregory (born in Madison), John (born in Bloomington, Indiana) and George-Alexander (born in Patras, Greece). In 1970, while George was in UW-Madison, he received a summons from the Greek government to return to Greece to fulfil his contractual obligations. However, the problem was that Greece was under military rule at that time, and, naturally enough, George was not anxious to return there. Above all, he was concerned for the safety of his family and himself, as he was actively involved in anti-junta organizations in the United States and Canada. A combination of unusual circumstances forced him to submit candidacy for a professorship at the then new University of Patras, which specialized in science and technology. He was elected twice as Full Professor between

XX

G. G. Roussas: biographical

sketch

1970 and 1971 and the election was finally approved by a newly appointed Minister of Education. George retained appointments in the UW-Madison and the University of Patras from 1972 through September 1976 when he resigned from his Madison position against the advice of his colleagues there. In the meantime, the military regime collapsed in July 1974 and the situation in Greece looked very promising under the leadership of Konstantinos Karamanlis. George, in his professorial capacity, used his knowledge and expertise and devoted himself toward improving the University of Patras across all scientific subject matters. This effort was facilitated by his election, first as the Dean of the School of Physical and Mathematical Sciences, and later as the Chancellor of the University. Actually, he was the last chancellor elected by his colleagues (Full Professors), according to the continental European academic system. Since 1982, with the formation of the socialist government under A. Papandreou, things changed radically. The election of all university authorities had been given, essentially, to the political parties through their proxies in the vastly enlarged faculty, students and staff. The same applied for all university matters, academic and nonacademic. Soon, it became painfully evident that the opportunity to have the Greek University system reorganized on a solid basis was lost, at least for the foreseeable future. The choice for George was either to stay in Greece and mark time or to expatriate. For the former, he was too young and still creative, so he chose the latter. Before changing the subject, George would wish to acknowledge the fact that, at the University of Patras, he came to be associated with a substantial number of outstanding scientists and to establish friendship with some truly good human beings, including some of his immediate collaborators, such as Professor C. Drossos, Dr. D. A. Ioannides and Dr. G. A. Stamatelos. Also, he developed extreme confidence and great lasting respect for his top administrative aid, Mrs. Irene Triki-Petta. In closing this section, it would be an omission not to mention that in 1982, George accepted an appointment as Vice President of the Governing Committee of the University of Crete, which at the time existed only on paper. His main role was to chair all electoral bodies who would select the first faculty members in the mathematics and science departments. That goal was achieved with great success, but the details of which, would take volumes to describe. Perhaps, George will choose to do so after his retirement! It would definitely be worth the time and effort, as it would fully reveal the embodiment of political intrigue, duplicity, opportunism, expediency and self-righteousness of the new political class. One shudders at the thought that the reformers supposedly came to power with a claim to correct the existing shortcomings of political life. They only succeeded in copying everything contemptible in the book and then some!

G. G. Roussas: biographical

sketch

xxi

These were the circumstances, which led to the decision to leave Greece and under which George found himself in the stimulating and hospitable environment of the Division of Statistics at the University of California, Davis. While there in a visiting capacity, George was advised to submit his candidacy for a vacant senior faculty position. The occupant of this position was slated to also be the Associate Dean of the Division. George took the advice, was selected through the prescribed procedure and, as of July 1, 1985, accepted the position of Professor and Associate Dean. In the latter capacity, he served for fourteen years until June 30, 1999. The Division of Statistics at UC-Davis was established in 1979, and the first leader, Julius Blum, passed away untimely in April 1982. Between that time and July 1, 1985, the Division was led, in an acting capacity, by Professor P. K. Bhattacharya and then Professor R. H. Shumway. Neither had an interest in the permanent position of Associate Dean. In 1985, the Division was in good standing academically, and consisted of a fine group of faculty members. At that time, the faculty deliberated on new policies and procedures to help accelerate the further development of the Division and the rise of its profile at the national and international level. The ensuing concerted efforts were by and large highly successful. This was attested to by a thorough study carried out by the National Sciences and Engineering Council of Canada for its internal purposes over a five-year period, 1987— 1992. In this study, 300 institutions from around the world, producing statistical research, were ranked. The basis of ranking was the number of suitably normed research papers published in 15 select statistical journals. This ranking classified Statistics at UC-Davis in 14th place worldwide and in 11th place among domestic institutions. Thus, Statistics at UC Davis was in the top 4.7% of 300 institutions across the world. This was subsequently reaffirmed by a follow-up study by Christian Genest and published in the Canadian Journal of Statistics, Vol. 25, pages 427-443, 1997. In this study, institutions were ranked on the basis of suitably normed publications over the ten-year period 1985-95, published in 16 most frequently cited journals. Among the top 25 most productive institutions in statistical research (ranked on the condition that there were at least 9 contributors in the 16 journals under consideration over the study period), Statistics at UC-Davis was 10th worldwide, and 5 th from among the United States institutions. Furthermore, Statistics at UC-Davis was recognized as one of four centers of excellence in statistics in California. In concluding this section, George would like to express his appreciation to all members of the Division of Statistics. In particular, to those senior faculty who in 1984, prompted him to apply for the vacant position. Throughout the fourteen years of his administrative duties, George had the support and

xxii

G. G. Roussas: biographical

sketch

fruitful cooperation of the Management Services Officer, Ms. Aelise Houx. Her contribution in administering the Division was immeasurable. Regarding research contributions, George made significant contributions in the areas of parametric and nonparametric inference in stochastic processes. His early work at UC-Berkeley and later at UW-Madison was primarily in parametric inference in Markovian processes. The concept and tools used were those of contiguity developed by Le Cam in his 1960 paper "Locally asymptotically normal families of distributions," University of California Publications in Statistics, Vol. 3, pages 37-98. An elaboration on contiguity was presented by George in the monograph, "Contiguity of Probability Measures: Some Applications in Statistics," Cambridge University Press, Cambridge, 1972. This book and its translation into Russian in 1975 made contiguity accessible to a wider audience, and helped disseminate these valuable tools. In the late 1960's, George turned his interest toward the nonparametric approach to Markov processes. His papers "Nonparametric estimation in Markov processes," published in the Annals of Statistical Mathematics, Vol. 21, pages 73-87, 1969 and "Nonparametric estimation of the transition distribution function of a Markov process," published in the Annals of Mathematical Statistics, Vol. 40, pages 1386-1400, 1969 were probably the first papers in this area. Subsequently, there was an avalanche of papers on the same and similar themes. Upon George's arrival at UC-Davis, he expanded his interests to the class of mixing processes. The probabilistic aspect of such processes was thoroughly investigated but there was precious little available in terms of statistical inference. George pioneered this area of statistical research, and contributed significantly to the literature. Subsequently, statistical inference under mixing became a fairly popular research area worldwide. A celebration of nonparametric statistical inference took place in a two-week long Advanced Study Institute organized by Professor Luc Devroye, Professor Peter Robinson and George on the Greek Island of Spetses. By all accounts, the Institute was highly successful and resulted in a 708 page volume ("Nonparametric Functional Estimation and Related Topics'' Kluwer Academic Publishers, Dordrecht, 1990) edited by George. The Institute was sponsored and financed by NATO. At this time, NATO was still a defense organization of free peoples against totalitarianism. This institute was followed up by an equally successful Symposium on Nonparametric Functional Estimation organized by Professors Luc Devroye, Yannis Yatracos and George in the Centre de Recherches Mathématiques, Université de Montréal, in October 13-24, 1997. In the early 1990's, George expanded his interest into a new class of stochastic processes known as associated processes. Such processes were introduced into the statistical literature by Esary et al. in the paper "Association of random variables with applications," Annals of Mathematical Statistics,

G. G. Roussas: biographical

sketch

xxiii

Vol. 38, pages 1466-1474, 1967, in the context of systems reliability. The same processes under a different name were already used in statistical mechanics, as described, for example in the papers by Fortuin et al., "Correlation inequalities on some partially ordered sets," Communications in Mathematical Physics, Vol. 22, pages 89-103, 1971, and the paper by C. M. Newman "Normal fluctuations and the FKG inequalities," Communications in Mathematical Physics, Vol. 74, pages 119-128, 1980. Once again, whereas the probabilistic aspect of association was well developed, there was little available in the literature in terms of statistical inference. George helped bridge this gap with a substantial number of publications. Presently, there are at least another two schools working in the general area of association located in China and France. Apart from his research efforts, George contributed to the statistical community by way of books. In addition to the Cambridge monograph (and its translation into Russian) and the edited volume already cited, George has been the author of one textbook in English. This book was published in first edition by Addison-Wesley in 1973 under the title "A First Course in Mathematical Statistics," and in a second revised edition by Academic Press in 1997 under the title "A Course in Mathematical Statistics." He also has been the author of four books on probability and statistical inference written in Greek to help Greek speaking audiences. Still on another front, George has been an associate editor and a member of the editorial board of four statistical journals since their inception. They are: Statistics and Probability Letters, the Journal of Nonparametrie Statistics, Stochastic Modeling and Applications, and Statistical Inference for Stochastic Processes. George's multifaceted contributions to higher education, in general, and in the statistical community in particular, have been recognized and appropriately rewarded by his colleagues. He is an elected Fellow of the American Statistical Association and of the Institute of Mathematical Statistics. Also, he is an elected member of the International Statistical Institute, and a Fellow of the Royal Statistical Society. In conclusion and on a personal note, George wishes to express his deep gratitude to his wife and sisters for their continuous and undivided support throughout the years. Mary, having no Greek roots of her own, willfully, devotedly and cheerfully joined her husband to his expedition in Greece in an effort to help reform the ailing education in that country. Unfortunately, his parents have long since departed and cannot receive their share of gratitude. MADAN L. PURI YANNIS YATRACOS

Asymptotics in Statistics and Probability, pp. 1-16 M.L. Puri (Ed.) 2000 VSP

REACT TREND ESTIMATION IN CORRELATED NOISE RUDOLF BERAN * Department of Statistics, University of California, Berkeley, CA 94720-3860 E-mail: [email protected]

Berkeley,

ABSTRACT Suppose that the data is modeled as replicated realizations of a p-dimensional random vector whose mean fi is a trend of interest and whose covariance matrix £ is unknown, positive definite. REACT estimators for the trend involve transformation of the data to a new basis, estimating the risks of a class of candidate linear shrinkage estimators, and selecting the candidate estimator with smallest estimated risk. For Gaussian samples and quadratic loss, the maximum risks of REACT estimators proposed in this paper undercut that of the classically efficient sample mean vector. The superefficiency of the proposed estimators relative to the sample mean is most pronounced when the new basis provides an economical description of the vector dimension p is not small, and sample size is much larger than p. A case study illustrates how vague prior knowledge may guide choice of a basis that reduces risk substantially.

1. INTRODUCTION The average of a sample of random vectors drawn from a Np(p., E) normal distribution is inadmissible, under suitable quadratic loss, as an estimator of the mean vector ¡1 whenever the dimension p of the distribution exceeds two (see Stein [8]). The insistence of the sample mean on unbiasedness *This research was supported at Universität Heidelberg by the Alexander von Humboldt Foundation and at Berkeley by National Science Foundation Grant DMS 99-70266. Dean Huber of the U.S. Forest Service in San Francisco provided the lumber-thickness data, both numbers and context.

2

R. Beran

can result in over-fitting of ¡i when p is not small. Recent work on modelselection, shrinkage, and thresholding estimators when E — a 2 1 p has shown, in that case, that even uncertain prior knowledge about the nature of /x can be translated into major reductions in estimation risk (cf. Donoho and Johnstone [3], Efromovich [4], and Beran [1]). This paper develops REACT shrinkage estimators of /x and their risk properties for situations where the covariance matrix E is unknown, though possibly restricted as in spatial or time-series analysis. The superior performance of the proposed estimators is illustrated on a set of multivariate lumber-thickness measurements collected in a study of saw-mill operations. As data model, suppose that (jti, x 2 , . . . , xn) are independent random column vectors, each of which has a Np(/x, E) distribution. The components of /i constitute a trend that is observed in correlated noise. The word trend indicates that component order matters. Both /x and the covariance matrix E are unknown, though the latter is assumed positive definite and may sometimes have further structure. It is tacitly assumed that observation dimension p is not small and that sample size n is much larger than p, in ways that will be made precise. Let /x denote any estimator of /x. The quality of ft is assessed through the quadratic loss !„.„(£,

n, E ) = (n/p)(ix

-

n)'i:-l

0.70

Pair

1.00

x

3 4 Pair

1.00'

0.85-

1 2 3

12

Pair

2 3 4 Pair

3 4

12

Pair

12

3 4 Par

3 4

12

Pair

1

2 3 Pair

3 4

12

12

3 4 Pair

3 4 Pair

Pair

12

3 4 Pair

Figure 1. Thickness measurements on a sample of 25 boards. The symbols o and x denote opposed upper and lower edge measurements at the four pairs of sites on each board.

REACT in correlated noise

REACT vs Sample Mean

7

Canonical Mean Vector

0.8&

0.84-

0.8?

Pair

Roughness of U

Best Monotone f o

1.0

Î £ co

o

o

0.5-

o

0.0 2

4

6

8

2

4

o

6

8

Column Number

Component

Normalized Residuals

All Residuals Q-Q

2

4

6

Measurement Site

- 3 - 2 - 1 0

1

o

2

N(0,1) Quantiles

Figure 2. Cell (1,1) displays the REACT estimate pim (with interpolated lines) and the sample mean vector (points coded as in Figure 1). The other cells report diagnostic plots discussed in Section 2.

8

R. Beran

±~]/2fi, we reorder the columns {u;,} of W from least to most rough. Such a reordered basis should be economical if the components of transformed mean thickness vary slowly as we move to adjacent measurement sites. The function 4

8

4

- x,_!) 2 +

Rough(x) = 1=2

- -*,-f) 2 + ;= 6

- x,f

(12)

1=1

is taken to measure the roughness of any vector x e Rs. Reordering the columns of W according to their Rough values generates the orthonormal basis matrix U = (wu

u>3, w5,

w2, w4,

u>7, w6, Wg).

(13)

Cell (2,1) in Figure 2 displays the Rough values for successive columns of U. The corresponding values of the canonical mean vector z, defined in (5), are plotted in cell (1,2). The small magnitudes of the higher order components of z suggest that the basis U is, in fact, economical in representing the mean vector ¡i. Computing i&m- This is straightforward from (8) and the preceding definitions once we have found the empirically best monotone shrinkage vector fM, which minimizes p { f ) over / e T M - Let g = 1 — 1 /z2. Then

p ( f ) = a v e [ ( / - g)2z2]

+ ave(|2).

(14)

Let 7{ = {h e Rp\h\ hi ^ ... ^ hp}. An argument in Beran and Diimbgen [2] deduces from (14) that fin = f+

with

/ = argmin ave[(/z — g) 2 z 2 ]. h€H

(15)

The positive-part step arises in (15) because g lies in [—oo, l] p rather than in [0, 1 ] p . The pool-adjacent-violators algorithm, treated by Robertson, Wright and Dykstra [7], provides an effective technique for computing / and hence fM-

Cell (2,2) of Figure 2 displays the components of fM for the lumber thickness case study. The first three components are very close to 1, the fourth is .89, the fifth is .20, and the last three components are zero. The estimated risk of /l M is P(Jm) — -24, sharply lower than the risk or estimated risk of the sample mean x, which is 1. Cell (1,1) in Figure 2 plots the components of ¡JL (with linear interpolation between adjacent sites along each edge) and the corresponding components of x. The plot of fiM suggests that mean thickness decreases as we move down the length of a board; that upper edge means are consistently smaller M

REACT in correlated

noise

9

than corresponding lower edge means; and that the difference in crossboard mean thickness grows only slowly down the length of the board. The impression left by the plot of x is more confused and does not bring out the last feature. In this particular case study, jxM smooths x through shrinkage and choice of the basis U, even though the primary goal is to reduce risk. As an incidental but useful consequence, jxM is more intelligible than x. Cell (3,1) of Figure 2 displays, component by component, the normalized 1/2 residual vectors (jc,- — ¡XM), where 1 < / ^ 25. The Q-Q plot in cell (3,2) compares all 200 residuals against the standard normal distribution. There is no evidence of serious departures from marginal normality of the lumber thickness measurements, from the postulated covariance structure (9), and from the fitted mean vector jxM.

3. ASYMPTOTICALLY MINIMAX ESTIMATORS This section begins with asymptotic minimax bounds for estimation of fx over certain subsets of the parameter space. Subsection 3.1 gives an oracle estimator that achieves these bounds. The oracle estimator is usually not realizable because its definition requires knowledge of V and of E. However, the form of the oracle estimator motivates, in Subsection 3.2, the definition of the fully adaptive estimator fxM and provides a path to establishing asymptotic minimaxity of the latter. The choice of the orthogonal basis U is discussed theoretically after Theorems 1 and 4 and is carried out in Section 2 for the lumber-thickness data. 3.1. Minimax Oracle Estimation We begin by reparametrizing the estimation problem in the oracle world where E and ¿u/E are known. Let z = nl/2U"£~1/2x

$ = Ez = nl/2U"£-1/2fM.

(16)

Any estimator jx of ¡x induces the estimator f = « 1 / 2 £/'E 1/2¡1 of The mapping between /x and £ is one-to-one as is the mapping between jx and Risks are placed into correspondence through the loss identity L„iP(Ji,n,

E) = /? _1 |f - £ | 2 .

(17)

In the oracle world, the problem of estimating /x under loss (1) is equivalent to estimating £ under quadratic loss (17). To formulate the notion of basis economy, consider for every ¿» e [0, 1] and every r > 0 the ball B(r, b) = {£: ave(£2) < r and ft = 0 for i > bp}.

(18)

10

R. Beran

Let ut denote the /-th column of U. In the original parametrization, B(r, b) corresponds to the ellipsoid D(r, b) = {n: (n/p)p,"E~l fj, < r and M;E"1/2/X = 0 for i > bp).

(19)

If \i lies in D(r, b), then lies in the subspace spanned by the first [bp\ columns of U. Regression coefficients with respect to these orthonormal vectors provide a description of £~ 1/2 /z which is highly compressed when b is small. We then say that the basis is economical for estimating ¡JL. Though overly idealized, this definition of economy leads to explicit results that link the economy of the basis with the superefficiency of fi M . Consider candidate estimators for £ of the form f ( / ) = f z , where / € J~M- These correspond to the candidate estimators £ ( / , E) = S I / 2 i / d i a g ( / ) i / ' S _ 1 / 2 j c = n _ 1 / 2 E 1 / 2 i / d i a g ( / ) z

(20)

for (i. Because of (17), the risk of /t(/, £ ) is /?„,,(£(/, E),M,i:) = p ( / , £ 2 ) ,

(21)

the function p being defined in (2). Let fM = argminf € -p M p{f,^2). The oracle estimator is £), the candidate estimator that minimizes risk. The restriction to candidate estimators indexed by / e TM makes possible successful adaptation (see remarks preceding Theorem 2) as well as fine performance when the basis U is economical (see remarks following Theorems 1 and 4). THEOREM 1.

lim

For every r > 0 and, b e [0, 1], sup neD(r,b)

Rn,p(fr(fM,

£),

£ ) = rb/(r + b).

(22)

The asymptotic minimax risk over all estimators of ¡1 is lim inf

sup Rn,p(fi-,

A ¡ieD(r,b)

£) = rb/(r + b).

(23)

The asymptotic minimax bound in (23) is thus achieved by the oracle estimator. For fixed b, the asymptotic maximum risk of /X(/M, £ ) increases monotonically in r but never exceeds b. In sharp contrast, the risk of x is always 1 whatever the value of /z. The first message of Theorem 1 is that we can only gain, when p is not small, by using the oracle estimator in place of the sample mean Jc. The second message is that the reduction in maximum risk achieved by the oracle estimator can be remarkable if b is close to zero. This occurs when the basis U used to define the oracle estimator is highly economical. We note that the minimax asymptotics are uniform over subsets of ¡1 and thus are considerably more trustworthy than risk limits computed pointwise in [i.

REACT in correlated noise

11

3.2. Successful Adaptation The oracle estimator depends on § 2 and E, both of which are typically unknown. To devise a realizable estimator that does not depend on unknown parameters, we proceed as follows. Let E be a consistent estimator of E. Then z = nl/2U'£~l/2x 1/2

(24)

1/2

plausibly estimates z = « t/'E~ .i. Consider the realizable candidate estimators /!(/, E) where / ranges over T M . In view of (21), the function p ( f ) defined in (6) estimates the risk of these candidate estimators. This risk estimator is suggested by the Mallows [5] CL criterion or the Stein [9] unbiased risk estimator, with plug-in estimation of the unknown covariance matrix. By analogy with the construction of the oracle estimator, we minimize estimated risk over the candidate estimators to obtain the estimator ¡xm defined in (8). We will show in Theorem 4 that \±m shares the asymptotic minimaxity of the oracle estimator. Let | • | denote the Euclidean matrix norm, which is defined by \A\2 = VL[AA'\. If A\ and A2 are both p x p matrices, then the Cauchy-Schwarz inequality for this norm asserts that \A\A2\ < |Ai||A 2 |. The following consistency condition will be imposed upon the estimator E. Condition C. The estimators E and x are independent. for every r > 0,

s -i/2£i/2_

lim

sup E | y - / p | 2 = 0

n.p-*°%eD(r,l)

lim

Let V

=

sup E|V" 1 - I p \ 2 = 0. (25)

neD(r,D

In this statement, the relative rates at which n and p tend to infinity will depend on the covariance estimator S . For instance, if E is the sample covariance matrix based on the observed (xi, x2,..., x„), then Condition C holds provided p and n tend to infinity in such a way that p2/n tends to zero. In the lumber data example or in time-series contexts, restrictions may be imposed on the form of E. Condition C may then hold for suitably constructed E under less severe limitations on the rate at which p increases with n. The next two theorems, proved in Section 4, show that the estimated risk function p ( f ) and the adaptive estimator \xM serve asymptotically as valid surrogates for p ( f , § 2 ) and the oracle estimator /¿(/m> E). It is important to note that similar results do not hold if the class of monotone shrinkage vectors Tu> defined before display (3), is replaced by a much larger class of shrinkage vectors such as the global class T G = [0, l ] ^ Adaptation over TQ produces an inadmissible estimator of p, as shown in [2],

12

R. Be ran THEOREM 2. Suppose

that E satisfies Condition C. For every r > 0 and

every positive definite E, lim

sup E sup |L„, p (A(/, E), /i, E) - p ( f , £ 2 )| = 0 )ieD(r,l) f£FM

(26)

and sup E sup | p ( f ) ~ P(f, ? 2 )l = 0.

lim

"'^"(iEDlr.l)

Because Tm(%2) =

(27)

§ 2 ), a consequence of Theorem 2 is

THEOREM 3. Suppose that E satisfies Condition C. For every r > 0 and every positive definite E, lim sup E | 7 - r M ( § 2 ) | = 0, '/'->-00/ieO(r1l)

(28)

n

where T can be any one of Ln p{(xf4, fi, E), Ln p(ii(fm, E), (i, E), or P(/m)-

Theorem 3 implies the risk convergence (4) and THEOREM 4. Suppose that E satisfies Condition C. For every r > 0, every b € [0, 1], and every positive definite E, lim

sup \Rn,p(ilM,n,

E) - * „ . , ( £ ( / « , £),/*, E)| = 0

(29)

and lim

sup R„tP{[iM,p,T,)=rb/{r

+ b).

(30)

By comparing (30) with (23), we see that the adaptive estimator jl M is asymptotically minimax over D(r, b) and has small maximum risk when b is small, in which event the basis U represents E - 1 / 2 /a economically. Moreover, (29) shows that the risk of fi M mimics that of the oracle estimator iu.(/m, E), uniformly over ellipsoids in the parameter space that correspond to bounds on the signal-to-noise ratio. Theorem 4 thus establishes the success of the adaptation strategy over shrinkage vectors / e TM that is expressed in the definition of ¡JLm. 4. PROOFS

Pinsker's paper [6] yields two minimax theorems for the estimation of £ from z in the oracle world. Let £ — [a e Rp: a, e [1, oo], 1 si i < p). For every a 6 £, define the ellipsoid E(r, a) = {£ e Rp: ave(a£2) sC r).

(31)

REACT in correlated noise

13

When £ e E(r, a) and a, = oo, it is to be understood that a~ x = 0. Let Hi = [( 0 and every a e £. The first theorem that can be specialized from Pinsker's reasoning identifies the linear estimator that is minimax among all linear estimators of £ and finds the minimax risk for this class. THEOREM 5. For every a e £ and every r > 0, inf

sup Elfz-$l2

= vp(r,a)

=

sup E | g 0 z - £ | 2 -

(34)

The second theorem gives conditions under which the minimax linear estimator goz is asymptotically minimax among all estimators of THEOREM 6. For every a e £ and every r > 0 such that l i m ^ o o pvp(r, a) = oo, lim [inf sup E | | - £1 2 /v p (r, a)] = 1. I £eE(/\a)

(35)

//'lim infp^oo vp(r, a) > 0, then also lim [inf sup E|f - £| 2 - vp(r, a)] = 0. p^OO |

(36)

Because go depends on r and a, the asymptotic minimaxity of goz is assured only over the one ellipsoid E(r, a). The following construction yields an oracle estimator that is asymptotically minimax over a class of such ellipsoids. Let £ 0 C £ and T be such that go(r, a) e T for every a e £o and every r > 0. To enable successful adaptation, we will require that the shrinkage class T be not too large. This requirement limits the choice of «SoLet / = argmin^g^ p ( f , £ 2 ). Because both / and go lie in JF, it follows that sup E| f z I eE(r,a)

^

sup E|g 0 z - £| 2 = vP(r, a)

(37)

feE(r,a)

for every a e £o and every r > 0. This implies the asymptotic minimaxity of f z over the class of ellipsoids E(r, a) that is generated as a ranges over £Q and r ranges over the positive reals. Proof of Theorem 1. In the transformed problem, candidate estimator f i ( f , E) = f z . The ball B{r,b) defined in (18) is the specialization of E(r, a) when a, = 1 for 1 ^ i < bp and = oo otherwise. In this case, (32)

14

R. Beran

and (33) imply that l i m ^ ^ vp(r, a) go,i — [1 — 5 —1/2 ]+ for 1 < i ^ go £ J~M- The asymptotic minimax while (22) follows from (37) with T

= rb/(r+b) and that g0 has coefficients bp and = 0 otherwise. Consequently, bound (23) is the specialization of (36) — TM-

Proof of Theorem 2. If X and Y are non-negative random variables, then E\X2 - Y21 < E|X - Y|2 + 2E1/2Y2 • El/2\X

- Y\2.

(38)

We first prove (27). The definitions (16) and (24) of z and z entail that = nmU'{V~x

z-z

- /P)E"1/2x.

(39)

From this, Condition C, and the Cauchy-Schwarz inequality for the matrix norm, E|z - zl 2 < p[ 1 + ave(£ 2 )]E| V~l - Ip\2.

(40)

Let p ( f ) = a v e [ / 2 + (1 - f)2(z2

- 1)].

(41)

It follows from the definition (6) of p ( f ) , (38), (40) and Condition C that sup E sup \ p ( f ) - p ( / ) | 2 = 0. 1) /e[0,l]/>

lim

(42)

On the other hand, Lemmas 6.3 (first part) and 6.4 in [2] imply lim

sup E sup | p ( / ) - p ( / , £ 2 ) ) | 2 = 0.

(43)

In (43), the distribution of the difference does not depend on n\ and it is not possible to replace / e TM with / e [0, \} p for reasons discussed in [2], Limit (27) is immediate from (42) and (43). Next, observe that E ) = p~x\VU diag(/)2 -

L „ , „ ( £ ( / , t),n, and that | t / d i a g ( / ) z and Condition C follows lim

sup E

oo^€B(r>1)

sup

|2

(44)

| 2 = | f z - £| 2 . From these facts plus (38), (40) | L „ , P ( £ ( / , E), fi, E ) - p~x\fz-%\2\

= 0.

(45)

/€[0,1]P

On the other hand, Lemmas 6.3 (second part) and 6.4 in [2] entail lim

-1 2 2 sup 1) E f sup e f | p | / z — £| — p ( / , § )| = 0. M

Limit (26) is the consequence of (45) and (46).

(46)

REACT in correlated noise Proof

of Theorem

3.

15

L i m i t ( 2 7 ) i m p l i e s that

lim

sup

E|p(/

lim

sup

E|p(/

M

)-p(/

M

,£2)| = 0

(47)

,£2)| = 0 .

(48)

and

In view o f (3), r M ( £ 2 )

=

M

)-p(/

p(/M,

M

Consequently, limit ( 2 8 ) holds for

R = P ( / M ) a n d , in a d d i t i o n , lim

sup

E|p(/m,£2)-Ta,(£2)| =

0.

(49)

n,p—>oo O n t h e o t h e r h a n d , l i m i t ( 2 6 ) i m p l i e s that lim

sup

E\LniP(fiM,

E ) — p(/a/, £2)| =

Combining

this result with ( 4 9 ) yields

Because E

=

T =

E

LniP((i(fM,

Proof

(28) for T

=

Ln

0. (JxM,

p

(50) ¡x,

£).

s a t i s f i e s C o n d i t i o n C , it is a l s o t r u e t h a t ( 2 8 ) h o l d s f o r E),/x, E).

of Theorem

4.

IRn,p(flM,
00 Me0(r,t)

Rn,p{ixM,n, E ) -

sup / ? „ i P ( / i ( / M , E ) , ¿i, E ) | = 0. neD(r,b)

(52)

This together with ( 2 2 ) implies (30).

REFERENCES 1. Beran, R. (2000). REACT scatterplot smoothers: superefficiency through basis economy. J. Amer. Statist. Soc. 95, 155-171. 2. Beran, R. and Diimbgen, L. (1998). Modulation of estimators and confidence sets. Ann. Statist. 26, 1826-1856. 3. Donoho, D. L. and Johnstone, I. M. (1995). Adapting to unknown smoothness via wavelet shrinkage. J. Amer. Statist. Assoc. 90, 1200-1224. 4. Efromovich, S. (1999). Quasi-linear wavelet estimation. J. Amer. Statist. Soc. 94, 189— 204. 5. Mallows, C. L. (1973). Some comments on Cp. Technometrics 15, 6 6 1 - 6 7 6 . 6. Pinsker, M. S. (1980). Optimal filtration of square-integrable signals in Gaussian noise. Problems Inform. Transmission 16, 120-133.

16

R. Beran

7. Robertson, T„ Wright, F. T. and Dykstra, R. L. (1988). Order Restricted Statistical Inference. Wiley, New York. 8. Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In: Proc. Third Berkeley Symp. Math. Statist. Prob., pp. 197-206, J. Neyman (Ed.), Univ. Calif. Press, Berkeley. 9. Stein, C. (1981). Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9, 1135-1151.

Asymptotics in Statistics and Probability, pp. 17-32 M.L. Puri (Ed.) 2000 VSP

HIGHER ORDER ANALYSIS AT LEBESGUE POINTS A. BERLINET and S. LEVALLOIS Department of Mathematics, University of Montpellier 11, place Eugène Bataillon, 34 095 Montpellier cedex 5, France

ABSTRACT A number of convergence results in density estimation can be stated with no restriction on the function to be estimated. In contrast to these universal properties, asymptotic normality of estimates often requires hypotheses on derivatives together with additional conditions on the smoothing parameter. We take here the example of nearest neighbor estimates and consider points where standard assumptions fail to hold. In spite of bad local behavior of the density (it is not continuous or has infinite derivative), asymptotic normality of the estimates can still occur which is desirable when confidence bands are required. We show that the conditions on the derivative of the density can be removed. On the contrary the asymptotic distribution of the estimate may depart from the standard normal if the sufficient additional condition on the smoothing parameter is not satisfied. The examples also show that the apparent local behavior of the density can be misleading when analyzing locally the associated measure.

1. INTRODUCTION The subject of this paper is related to the general problem of derivation of measures (see for instance Rudin (1987) or Dudley (1989)). It finds its motivation in the study of estimation problems when standard regularity conditions are not satisfied. We consider here applications to the nearest neighbor density estimate. A classical result by Loftsgaarden and Quesenberry (1965) states the pointwise convergence in probability of this estimate at points where the density is positive and continuous, but their proof can easily

18

A. Berlinet and S.

Levallois

be extended to any Lebesgue point (see Bosq and Lecoutre, 1987). In 1977, Moore and Yackel proved the asymptotic normality under additional conditions on the smoothing parameter and on the derivatives of the density. It seems that no author reconsidered the question since that time. We address here the problem of asymptotic normality of the nearest neighbor density estimate in cases where the density has bad local behavior. This behavior can be misleading. What is important is the local behavior of the associated measure, more precisely the rate at which the local value of the density is approximated by ratios of ball measures. Three selected examples illustrate the theorems given in Section 2. In the first two the density has infinite derivative at the point of interest. In the third one the density has a discontinuity of second kind so one has first to determine towards which value the estimate may converge. The practical lessons on the choice of the smoothing parameter to be drawn from the theory and examples are summarized in the last section.

2. DEFINITIONS AND MAIN RESULTS In many contexts we are interested in the evaluation of the measure of "small sets". More precisely, a measure /z on a semi-metric space (£, d) being given we would like to know how the measure fi(B(x,

8))

behaves in comparison with some quantity 0 .If ,1 + 1/2« lim ^ = 0 «->oo n

then, as tends to infinity, rrfnix)

- f(x)

A

fix)

tends in distribution to Af(0, 1). Proof. Writing n (B(x, 8))

=f

f ( t ) dm

=

JB(x,S)

f

if it) - fix))

dm

+ 2 8f(x)

JB(X,S)

and the Lipschitz condition at x, \fix)-fiy)\^Cx\x-y\a one gets niBix,8)) MB(x,8))

-fix) ^ T l f 1

O

JB(x,S)

Cx \t-x\a

This implies p-regularity with Pi8) = Cx8a.

dm^Cx8a.

Lebesgue points

23

We have rr

oo n cannot be removed. In the next section we give the example of a density satisfying a Lipschitz condition of order 1 / 2 for which the additional sufficient condition on (k n ) is also necessary.

3. TWO EXAMPLES WITH INFINITE DERIVATIVES 3.1. Lipschitz

case

Let / i be the probability density function on R defined by /lW = 1- —

-0.5

Figure 1.

-0.25

+ 71*11 [-0.5,0.5] (*)•

0.25

0.5

24

A. Berlinet and S. Levallois

It is plotted in Figure 1. The associated distribution function is given by Fx{x) =

0 \ + (1 - ^)x 1

+ fxVW

if x 0.5.

On the interval (—0.5, 0.5) the density f\ is lower bounded by (1 — \/2/3) and it is differentiable except at the point 0. For 0 < x < y < 0.5 we have Mx) - My) = (x - y):

1

2Vf

where £ e (x, y) and for y > 0 we have /i(0) - f d y ) = - y f y . So, f \ satisfies a Lipschitz condition everywhere on (—0.5, 0.5). The order is 1/2 at 0 and 1 elsewhere. This implies p-regularity at any point x with p(S) = CxS 1/2

= 0, 0 < A < 1 and 0 < t < -t

a

-1


n

we have convergence in probability of /„(*) to lim

tx{B{x,8))

«->0+ X(B(x,

¿))

whenever this limit exists. So the points where the estimation is carried out should be Lebesgue points of the possible underlying unknown probability measures. For measures having smooth densities a value of kn near «Jn gives good results. This practical remark was already put forward by Loftsgaarden and Quesenberry in 1965. Moreover this choice meets the additional condition uV2

I m n—i > 00 n

=

for v

n

f 77~\ ix)

0

32

A. Berlinet and S. Levallois

to be asymptotically distributed as A/"(0, 1). Now, to preserve this asymptotic distribution property when the smoothness cannot be assumed around the point of interest, the additional condition on the sequence (kn) has to be strengthened. When a Lipschitz property of order a can be assumed for the function / at the point x, Corollary 2.1 states that a sufficient condition on (kn) is . 1 + 1/2«

lim ^

n->-oo

n

= 0.

This condition can be also necessary (Theorem 3.1). When it is not reasonable to assume a Lipschitz property, the asymptotic normality result can be shown to hold by Theorem 2.2. Then, one has to prove that, for any e > 0,

P(AM^W) >

0, as n

oo.

This can be done by using a deviation inequality (as in Subsection 3.2) or by evaluating p(Rn(x)) as a function of ii(Bn(x)), the distribution of which is known (as in Section 4). Acknowledgement

We thank the referee for his careful reading of the manuscript. His comments lead to a substantial improvement in the presentation of the paper.

REFERENCES Bosq, D. and Lecoutre, J. P. (1987). Théorie de l'Estimation Fonctionnelle. Economica, Paris. Devroye, L., Györfi, L. and Lugosi, G. (1998). A Probabilistic Theory of Pattern Recognition. Springer-Verlag, New York. Dudley, R. M. (1989). Real Analysis and Probability. Chapman and Hall, New York. Fix, .E. and Hodges, J. L. Jr. (1951). Discriminatory analysis, nonparametric discrimination: consistency properties. Report n. 4, USAF School of Aviation Medicine, Randolph Field, Texas. Lasota, A. and Mackey, M. C. (1994). Chaos, Fractals and Noise. Springer-Verlag, New York. Loftsgaarden, D. O. and Quesenberry, C. P. (1965). A nonparametric estimate of a multivariate density function. Annals of Mathematical Statistics 36, 1049-1051. Moore, D. S. and Yackel, J. W. (1977). Large sample properties of nearest neighbour density function estimates. In: Statistical Decision Theory and Related Topics II, S. S. Gupta and D. S. Moore (Eds), Academic Press, New York. Rudin, W. (1987). Real and Complex Analysis. McGraw-Hill, New York. Shorack, G. R. and Wellner, J. A. (1986). Empirical Processes with Applications in Statistics. Wiley, New York.

Asymptotics in Statistics and Probability, pp. 33-45 M.L. Puri (Ed.) 2000 VSP

REGRESSION ANALYSIS FOR MULTIVARIATE FAILURE TIME OBSERVATIONS T. CAI and L. J. WEI Department of Biostatistics, Harvard University, 655 Huntington Ave., Boston, MA, 02115

ABSTRACT In a long term observational or experimental survival study, each subject may have several distinct failure time observations. A commonly used regression method for analyzing multiple event times is based on a multivariate Cox model (Wei, Lin and Weissfeld, 1989; Cai and Prentice, 1995). Recently, Chen and Wei (1997) generalized this multivariate method by modeling each marginal failure time with a rich class of semi-parametric transformation models. However, when the support of the censoring variable is shorter than that of the failure time, their inference procedures may not be valid. To deal with this commonly encountered situation, in this article, we take a completely different approach to estimate the regression coefficients for the models studied by Chen and Wei (1997). The new proposals can handle the case with time dependent covariates. Furthermore, simultaneous prediction procedures for survival probabilities and their related quantities for patients with a specific set of covariates are also proposed. The new methods are illustrated with data from a recent pediatric AIDS study that evaluated several antiretroviral treatments.

1. INTRODUCTION The regression analysis for censored event time data with a single event for each subject has been studied extensively (Fleming & Harrington, 1991; Andersen et al., 1995). Oftentimes, for long term observational and experimental studies, the response variables may involve two or more distinct failure times from each individual subject. For example, in a recent pediatric AIDS clinical trial, ACTG 377, sponsored by the AIDS Clinical Trials Group to evaluate four different antiretroviral drug combinations, 181 patients were randomized to four arms, 41 to d4T/Nevirapine/Ritonavir, 52 to

34

T. Cai and L. J. Wei

d4T/3TC/Nelfinavir, 44 to d4T/Nevirapine/Nelfinavir, and 44 to d4T/3TC/ Nevirapine/Nelfinavir. Each patient was scheduled to be treated 96 weeks after randomization and the quantitative plasma RNA value was taken periodically at predetermined time points during the study. The RNA value is the number of HIV-1 RNA copies per milliliter of plasma and was obtained using Roche assay. One of the major goals of the trial is to use repeated RNA measures to evaluate the efficacy of these drug combinations. Due to the limits of quantification of the assay, however, if the observed RNA value is below 400 copies/ml, it is highly unreliable and conventionally is treated as a leftcensored observation at 400. Here, the RNA value is the event time which is possibly censored. The outcome for each subject is a vector composed of repeated RNA measurements. Several statistical methods have been proposed to analyze such multivariate "failure" time data. For example, Wei, Lin & Weissfeld (1989) used the Cox proportional hazards model to analyze the event time taking a marginal approach that does not impose a specific correlation structure on the distinct failure times. Cai & Prentice (1995) tried to improve the efficiency of the procedure by Wei et al. (1989) using a dependent working model for the Cox score function. The Cox model, however, may not fit the data well. For univariate failure time data, a class of semi-parametric transformation models, which includes Cox's model as a special case, has been studied, for example, by Dabrowska & Doksum (1988a, 1988b), Cheng et al. (1995, 1997), and Scharfstein et al. (1998). Recently, Chen & Wei (1997) used the same idea as Wei et al. (1989), but modeled each failure time with the above rich class of survival models in order to handle multivariate survival time observations. However, the support of the censoring variable is often shorter than that of the survival time, in which case their procedure is not valid (Fine et al. 1998). Moreover Chen & Wei (1997) cannot handle time-dependent covariates. In this article, we use the class of the models studied by Chen & Wei (1997) with time-dependent covariates, but estimate the regression parameters of the transformation models using a completely different approach. Furthermore, procedures are proposed for predicting survival probabilities of future patients with a specific set of covariates. The new proposals are illustrated with the RNA data from the aforementioned AIDS trial. 2. ESTIMATION OF REGRESSION PARAMETERS For the k'h type of failure, k — 1,..., K, let Tki be the failure time of the ith patient, i = 1 b u t 7*,- may not be observed completely. Instead, one observes a bivariate vector (X[,,, Aj,,), where X^ = min(7^, , C*,-), Cki is the censoring time, and A*,- = 1 if X^ = 7^, , and 0, otherwise.

Multivariate Survival

Analysis

35

At time t, let Zkl(t) be a p x 1 vector of external time-dependent and bounded covariates for the i'h patient with respect to the kth failure, and let Z*k — {Zik(t), t ^ 0}. We assume that the censoring variable does not depend on the covariates. This assumption can be relaxed for the case with discrete covariates. Furthermore, assume that 7} = (T\,,..., TKi)' and C, = (C\i,..., CKi)' are independent vectors, and (7,, C,, Z*u,..., Z*Ki), i — 1 , . . . , n, are independent and identically distributed. Let Sfc(-|Zj*(-) be the survival function of Tki given Zki. The Cox model can be written as = exp[— e x p { h k ( t ) + PkZki(t)}], where hk(t) is a completely unspecified strictly increasing function and j3k is a p x 1 vector of unknown regression coefficients. A natural generalization of the Cox model is Sk(t\Z*ki)

= gk{hk(t)

+ P'kzki(t)},

(2.1)

where gk(-) is a known decreasing function. If we let gk(x) be (1 + ex)~l, (2.1) is the proportional odds model. It is important to note that we only model the event time for each type of failure marginally, but do not impose any parametric correlation structure among Tki, k — 1 , . . . , K. Now, let Git(-) be the survival function of Ck\ and G be the corresponding Kaplan-Meier estimate of Gk. First, let us consider estimation of hk(-) for ^ t ) is any given Under Model (2.1), conditioning on {Zki}, gk(hok(t) + PokZki(t)}Gk(t), where h0k(-) and p0k are the true values of hk(•) and p k , respectively. This motivates the following estimating equation for hok(t) n

£ [ / ( * « > t ) ~ gk{hk(t)

+ p'kZki(t)}Gk(0]

= 0 , Tka SC t < rkb,

(2.2)

¡=1 where /(•) is the indicator function and xka and rkb are prespecified constants such that both pr(X^i < xka) and priX^i > zkb) are positive. Let the resulting estimator for hok(t) be denoted by h k ( f , ftk). Next, by mimicking the least squares estimation procedure for uncensored data, we consider a class of estimating equations for /fob " J 2

fTkb /

Zki(t)[I(Xki

gk{hk(f,

pk)

+ ¡3'kZki(t)}Gk(t)]dvk(t)

= 0,

(2.3) where vk is a known increasing, but possibly data-dependent weight function such that vk(t) converges to a deterministic function vk(t) uniformly in t e [tka, *kb\- Let the root to (2.3) be denoted by fik. Also, let hk(t) = hk{t\ fik). It follows from Appendix 1 of Cai et al. (1999) that if the covariate vector Zki(t) in a proper interval of t is not degenerated, then these two estimators are unique for large n and are consistent.

36

T. Cai and L. J. Wei

Although hk(-) is an infinite-dimensional vector, it is easy to see that hk(t) is constant between distinct jump time points of the counting process constructed from {Xki} and the process (Z^J. Numerically, ft and hk(-) can be obtained easily through the standard Newton-Raphson algorithm (Cai et al., 1999). From Appendix 2 and 3 of Cai et al. (1999), one can show that the marginal distributions of ft and hk( •) can be approximated by a normal with mean Pok and by the distribution of a mean 0 Gaussian process, respectively. To make simultaneous and global inferences about Pok, k = 1 , . . . , K, one needs the joint distribution of ft. It follows from Appendix 1 of Cai et al. (1999) that n

"

1 / 2

(ft -

u

Pok) ~ n-'^A-'OV)

M k ) ,

i=1

where Uki {fi) fTkb / IZkiit) J Tka

Zk(t;

-

P)}[l(Xki

> 0 -

gk{h0k(t)

fZkb +

Zk(t\

{bk(t)

J Tka

P), bk(t),

-

zk(t;

and

ak{t)

Zk(f\

P)

C / Jo

p)ak(t)}Gk(t)

E"=i

8k{hk(f,

=

±EUZki(t)gk{hk(t)

and

P'kZki(t)} Mki(t)

=

nk(t)

I(Xki




u)dAk(t)},

At(-)

cumulative hazard function for the Ck\, and 1 Ak(P)

=

fTkb

"

{ Z

— J 2 n

+ N o w let P =

(P[,...,

*'(i) ~

W®2Sk{hk(f,

i=i

P'kZki(t)}Gk(t)dvk(t). P'KY,

PO =

(FI'QV

...,

'Â^'(ft) diag(A ~\P))

=

V

n0

P'0KY

and

... ...

0

VA

~K\pK),

+

dg(x)/dx,

P)

is t h e

Multivariate Survival Analysis

37

Then / UuiPoi)

ni/2(P

- Po) « n- 1 / 2 diag(A-' ( f t ) ) £

\

|

\UKi(poK))J Since Ajt(-) converges to a deterministic function (k = 1,..., K), it follows i=1

from the Central Limit Theorem that the limiting distribution of nl/2((P) — (Po)) is asymptotically normal with mean 0. The covariance matrix can be consistently estimated by nT, where

1

f'(6u\

r = -diagCA-'C/?)) E

l'=1

"

1

M

W'u • • • U'kJ

\UKiJ

diag(A-'(^)).

J

Here Uki is obtained by replacing all the theoretical quantities in £4, (Am) with their empirical counterparts. This results in

{Zki(t) - Zk(fJk)}l(Xki > t) - gk{hk(t) + P'kZki(t)}Gk(t)

Uti=

Jna

+ (Mi) - Zk(t; pk)ak(t)}Gk(t)I(Xki < t) x

iI n ^ r(X r )- £¿-^nnliXkj) k

ki

>

IJVJ " * ( 0 •

A (t) M (t)

Note that in is replaced by its one-sample Nelson's estimator. k ki With this large sample approximation, various kinds of inferences about the regression parameters can be made. For example, suppose that we are interested in rjok, a specific component of the covariate vector k 1,..., Let the estimator of the covariance matrix of the corresponding estimator { i j \ , . . . , r) K )' for {t] 0k } be denoted by r , p which can be obtained from T directly. Then, a (1 — a ) simultaneous confidence interval for (»7oi. • • •. »70*)' is {»7i ± • • •> VK ± CP*:}, where £ satisfies

fio , k =

K.

(

prl max V

k

Im -

Yk

m\

< f^ = 1 - a ,

and yk is the estimated standard error for rjk. To test the joint null hypothesis {rjok = 0, k = 1 , . . . , A"}, one may use the quadratic form fjK)r-l(fji,..., which is chi-square distributed with degrees of freedom K under the null hypothesis. Furthermore, suppose that jj 01 = . . . = r)0K = t]o, a common parameter, then one can construct an optimal linear combination rj of r)k s to estimate t]o, where tj = E f = i dkfjk with Ef=i dk = \,d = (du . . . , dK)' = (e'r-le)~lr-le and e = ( 1 , . . . , 1)'. An estimate for standard error of rj is d'Vnd. It is important to note that if there

Q = (fj i,...,

fjn)',

38

T. Cai and L. J. Wei

is no single link function g in (2.1) which works for all K types of failures, it may be difficult to interpret the combined estimate fj. On the other hand, in the next section, we show how to make simultaneous and global predictions of, for example, survival probabilities, for patients with a specific set of covariates even when the link function varies from failure to failure. The above procedures are valid when the censoring vector is independent of the covariate vector Z. This assumption can be relaxed when the covariate vector Z has a finite number of possible values. One possible modification of the estimating equations (2.2) and (2.3) is n

j y n X k i > t ) ~ gk[hk(t)

/=i

+ P'kZki(t)}Gk(t\Zu)]

= o,

(2.4)

Ta ^ t ^ Tb, J2 / Zki(t)[I(Xki l — l J'Cka

> t ) - gk{hk(t;

fa)

+

p'kZki(t))Gk(t\Zki)]dvk(t)

= o,

(2.5)

where Gk(t\z) is the Kaplan-Meier estimator for the survival function of the censoring variable Ckl based on the pairs {(Xu, A k i)} whose Zki = z, I = 1 , . . . , n. In the Appendix, we show that the distribution of the roots {$1, k = 1 , . . . , K) to the above simultaneous equations is approximately normal with mean {f$ok, k = 1 , . . . , K} and covariance matrix T*. Inference procedures similar to Q and fj for {f}ok, k = 1 , . . . , A'} based on the above large sample properties of k = 1 , . . . , K) can be obtained accordingly.

3. SIMULTANEOUS PREDICTIONS OF SURVIVAL PROBABILITIES AND RELATED QUANTITIES In this section, we show how to construct simultaneous confidence intervals for survival probabilities and their related quantities for future patients with a specific set of covariate zo over K types of failures. Let Sk(t\zo) = gk(hk(t) + fi'kzo), a consistent estimator for Sk(t]zo). It follows from Cai etal. (1999) that the process Wt(i|z 0 ) = n^2{g^(Sk(t\zo)) - g? (Sk(t\z0))) is asymptotically equivalent to Wk(t\zo) — n~lf2 Wki(t\zo), where W,u(f |z 0 ) =

i{xkl

_

g k [ h o k ( t ) +

Gk{t)

ck(t) +

ak(t) Jo

+ (zo - zk(t;

Xk(s)

J

m ' ^ W o k W k i i f a ) ,

^QkZkiit)]

39

Multivariate Survival Analysis

and ck(t) kk(ß).

and Ak(ß)

are the limits of ck{t)

L e t t = {tu ...,

= \

Then W(t|z 0 ) = ( W i ( f i l z o ) , . . . , WK(tK

tK)'.

is asymptotically equivalent to W(t|z 0 )

=

(W|Oi|zo), • • • >

which converges to a mean zero Gaussian process, as n let {Lt, i =

and

gk{hk(t)+ß'kZkl(t))

|z 0 ))'

WkUk\zq))'•, oo.

Now,

1 , . . . , n} be a random sample from the standard normal dis-

tribution which is independent of the data. n-y2Y,U(Wu(h\z0),

Furthermore, let W(t|zo)

W K i U M Y L i , where Wki(t\z0)

=

is obtained by

replacing all the theoretical quantities in W, (t\zo) by their empirical counterparts. Here Wki(t\zo) 1 £k(t)

=

\l{X L

k i

^t)

Gk(t)

x [ Xk(Xki)

I

J_

j ^ n r f i X ß )

fik{Xki)

^

+

njt k ysi j k

{z0-Zk(tJk)}'A^(pk)Uki.

Using similar arguments in Cai et al. (1999), w e can show that the multitime process W(t|zo) has the same limiting distribution as that of WXt|zo) conditioning on the data. Various inference about {Sjt(i|zo), k =

1 , . . . , AT) can be made based on

the above approximation. For example, to construct simultaneous confidence intervals for {Sk(to\zo),

k =

1,...,

compute J realizations of

K]

for a given time point to, w e first

of VV(tfj) based on J random samples of

{ L ^ , where to = (f 0 , • • •, io)'- Let ¿ ( t o ) = j

^0)^0)

^ e estimate

of the covariance matrix of W(to|zo). Define a critical value £(zo) such that ( |jfr*(/b|zo)| A w prr max < f£( z( zo0)) ) = 1 - a , \ k ak )

(3.1)

where aI is the k'h diagonal element of £ ( t o ) . The probability and f (zo) in (3.1) can be approximated using the J realizations {u>(j)}. Then a (1 — a) confidence band for { ^ ( t o l z o ) } is { g t ( g t _ 1 ( 5 t ( i o k o ) ) ± S(z0)n-l/2ak)}. Now, suppose that 5i(i 0 |zo) = • • • = SK(t0\zo)

(3.2)

= 5(io|zo), one can construct

an optimal linear combination J2k=\ dk(zo)Sk(to\zo)

to estimate the common

probability S(i 0 |zo), where d(zo)

= ( d i ( z o ) , • • •, dK{zo))'

=

(g't(to)-Ie)-1i:(to)-,e

,

40

T. Cai and L. J. Wei

and /¿i(S,(foko))

•••

0

\

E(to)=

E(to) V

o

•••

gK(SK(to\zo))/

/ 8i(Si(to\zo))

•••

V

•••

0

0

\

gK(SK(to\zo))/

An estimated standard error of S(io|zo) is d'(zo)£(.to)d(zo). One can also make inferences about certain percentiles of the failure time. Let mnk be the 100t]% percentile of the k'h type failure time for patients with covariate zq, k = 1 , . . . , A'. A consistent estimator m^ for mt]k is the root to the equation S t (m |zo) = Using the same techniques by Keaney & Wei (1994), one can show that the distribution of m i; = (mn\,..., « , / ( ) ' can be approximated by a normal with mean m , = ( m , i , . . . , m n K y• However, its covariance matrix is difficult to estimate well directly. Instead, we use the following resampling method to estimate this 1 matrix. Note that the distribution of n~x>2W(mri\zo) = (g^ (Si (m^ |z 0 )) — gf'O?).. • •, gKl(SK(mnK\zn)) can be approximated by that of a ll2 normal random vector L — n~ W{mn\zo), where m,, is the observed m , . Let be the observed Sk(t\za) and let m* = (m*,,.. .m*KY be the solution to the equation / ^'(^(mHzo))-^1^)

\ \=L.

Vg^GSjcim/dzo))

~

(3.3)

Si'kW)'

Then, it follows from Parzen el al. (1994) that the distribution of (m, — m , ) can be approximated by that of (m* — m,,). The latter distribution can be approximated easily by generating L and solving m* through (3.3) repeatedly. The limiting covariance matrix of (m, — m,,) can be estimated by the sample covariance matrix obtained from a large number of realizations of m*. Confidence intervals and bands such as (3.2) for m,, and an optimal linear combination of {mnk} can be obtained accordingly. 4. EXAMPLE We illustrate the new proposals with the AIDS data described in the Introduction. Following the convention we use logRNA, the base 10 logarithm of RNA, in our analysis. For this AIDS study, RNA values for each patient were obtained at Week 4, 8, 12 and 24. Some observations, however, were

Multivariate

Survival

41

Analysis

Table 1. Estimated treatment differences between A and B (estimated standard error) using WLW and new methods WLW Weak

New Method Proportional hazards model

Proportional odds model

4 8 12 24

0.13(0.29) 0.49(0.37) 0.85(0.35) 0.89(0.37)

0.37(0.35) 0.60(0.42) 1.00(0.43) 1.17(0.45)

0.48(0.44) 0.71(0.51) 1.18(0.51) 1.37(0.53)

Combined over time

0.40(0.25)

0.66(0.31)

0.84(0.38)

missing mainly due to administrative censoring. To simplify our illustration, we only consider two groups of patients, which were treated either by d4T/Nevirapine/Ritonavir (Arm A) or d4T/3TC/Nevirapine/Nelfinavir (Arm B) in the analysis. Since we deal with left-censored observations, we let the "event time" T — - l o g R N A and C = - l o g 4 0 0 . Note that modeling the survival function of T is equivalent to modeling the distribution function of logRNA. To have a rich, but manageable family of models (2.1) for our analysis, we consider the following subset of models whose link function g is indexed by A, where gx(s) = (1 + Ae s )~ 1/X , if A > 0 and exp(— exp(.v)) if A = 0 (Dabrowska & Doksum, 1988a). This type of transformation is closely related to the standard Box-Cox transformation for the linear regression model. Note that if A = 0, this corresponds to the proportional hazards model; if A = 1, this gives us the proportional odds model. Now, we use a simple additive model with link function gxk and a timeindependent covariate vector with two components. The first component is 1, if the patient is in Group A, 0, otherwise. The second component is the baseline log RNA value. For this study, we are mainly interested in the treatment difference between Group A and B. Let t]k be the treatment difference at the fcth time point. In Table 1, we present estimates of rj 0 k ,k — 1 , . . . , 4 using (2.2) and (2.3) with A = 0,1. Here, xkb = - l o g 4 0 0 and Tka was chosen so that there are 5% of {X*,} below this value to avoid numerical instability, and vk(t) be the counting process which jumps at [X^}, k — 1 , . . . , 4. For comparison, we provide the WLW estimates based on the commonly used procedure by Wei, Lin & Weissfeld (1989) for the Cox model. The optimal linear estimates rj for the common rjo are also reported. Globally Treatment B is significantly better than A. It is important to note that the estimates for A = 0 based on our method are markedly different from those based on WLW. This suggests that the Cox model may not fit the data well marginally at the four time points.

42

T. Cai and L. J. Wei

4

8

12

24 Week

(a) Treatment A

4

8

12

24 Week

(b) Treatment B Figure 1. Simultaneous and pointwise 0.95 confidence intervals of pr(log RNA < 4) for patients treated by A or B with baseline RNA = 24,000 (— estimates; pointwise; • • • simultaneous).

To choose an appropriate link function gxk at the £th time point, one may select kk which minimizes a quantity that measures the discrepancy between the observed and fitted values. For example, a reasonable choice of such measure is

Qkih) = J2f i

k

Jtka

"V( Xki

SklM)

+

P'kzki(t)}6k(m2dh(t)

with vk{t) being the counting process of {X^,}. For the RNA data, we find that the optimal A. is 0, 1,6 and 7 for Week 4, 8, 12 and 24, respectively. With link functions varying over time, it is difficult to interpret a common treatment difference rj0. On the other hand, even under the present situation, one can make simultaneous prediction of certain quantities related to the

Multivariate Survival

Analysis

43

Table 2. Estimated upper quartile of log RNA (estimated standard errors) for patients treated by A or B with baseline log RNA = 5 Week

Treatment A

Treatment B

4 8 12 24

3.58(0.39) 4.03(0.42) 4.71(0.30) 4.27(0.19)

3.39(0.23) 3.12(0.48) 3.25(0.50) 3.53(0.41)

Combined over time

4.31(0.19)

3.40(0.21)

survival functions over these four time points. For example, clinically, an RNA value exceeding 10,000 indicates that the patient is not doing well. To this end, let t0 = — log 10 10,000 = —4. Figure 1(a) gives 0.95 simultaneous and pointwise confidence intervals based on (3.2) for S*(4) which is the probability of log RNA < 4, k = 1, 2, 3, 4, for patients treated with A and baseline RNA = 24,000, which is the median of baseline RNA values for 181 patients in the study. Figure 1(b) gives corresponding intervals for patients treated by B. The optimal linear combinations of 5^(4), k = 1, 2, 3 , 4 are 0.86 and 0.93 with estimated standard errors of 0.039 and 0.034 for Group A and B, respectively. Now, suppose that we are interested in the upper quartiles of the RNA values over the four time points. In Table 2, we present estimates of this percentile for patients treated by A or B with baseline log RNA = 5. For those patients who were not doing well at the time of randomization, the upper quartiles of the RNA values for patients in Group B tend to be significantly lower than those in Group A during 24 weeks follow-up.

5. REMARKS It is important to note that our methods are not valid in cases where "failures" are terminal events, i.e., events that prevent future failures from occurring. Such a situation may occur if a patient has unacceptably high HIV RNA levels early in the study and must be prematurely removed from the study. Recently, a simple version of the competing risks problem was revisited by Fine & Gray (1999) and Fine (1999). Using an inadequate model has detrimental effects on the estimation of parameters. For a general link function g for model (2.1), however, it may be difficult to interpret the physical meaning of the regression parameter. On the other hand, one can always provide useful information on the prediction of survival probabilities and their related quantities for future patients under

44

T. Cai and L. J. Wei

general models. Although we present an ad hoc numerical method for model selection, more research is needed on model checking and selection even for univariate survival analysis.

6. APPENDIX: ASYMPTOTIC JOINT DISTRIBUTION OF DISCRETE COVARIATES

} WITH

Let h*k(t; f i ) be the root to the equation (2.4) and h*k(t) = h*k(f, f5*k). It follows from similar arguments from Appendix 2 of Cai et al. (1999) that n x / 1 (fil — pok) is asymptotically equivalent to _ 1

(A»)}

Ï > £ . ( Pot),

i=i

where fTkb

U*ki(/3) =

/ {Zki(t) Jrka

- zî(t;

-gk{h0k(t)

+

¡3)}[l(Xki

> t)

p'Zki(t)}Gk(t\Zki)]dvk(t)

fCkb +

/ J na

[b*k(t\Zki)

-z*k(t; z*k(t\P),

b*k(t\z),

Zk(t\

p)a*k{t\Zki)}Gk{t\Zki)

and

ak(t\z)

p)

=

E " = i SkiKt*'

P) +

Sk{h*k(t;

X

=

nY,UZki(t)I{Zki

TiJlUHZki

=

dvk(t),

P'Zki{t)}Gk{t\Zki)Zki{t)

p) +

=

z)gk{h*k(t)

^ n*(s\Zkl)

are the limits of

nk(t\z)

E?=i

f Jo

p'Zki(t)}Gk(t\Zki)

z)gk{h*k(t)

+ Zki(t)'p*k}

+

a n d n*(t\z)

Zki(t)'p*k},

a*k(t\z)

=

I (Xki

= >

respectively; Mki(t\z) = I(Xki sC t, Ak[ = 0) - / 0 ' I(Xkl ^ u)dAkz(u), Akz(-) is the common cumulative hazard function of Q , whose covariate vector is z, and t, Zki

=

z),

1 K

(

P

)

=



£

n

+

fTkb

" i=

/

i

{

Z

* '

(

i

)

-

P )

J

v t) -

{b*k(t\Zki)

- Z*k(fJ*k)a*k(t\Zki)}Gk(t\Zki)I(Xki 1 -

gk{h*k{t)

^

1 - Aft/

y r

Akj

Kixki)

¿—I n-fr.nit*(X ) kj

I(Xki

t) ^ Xkj)

\dvk(t).

REFERENCES Andersen, P. K., Borgan, 0., Gill, R. D. and Keiding, N. (1995). Statistical Models Based on Counting Processes. Springer-Verlag. Cai, J. and Prentice, R. L. (1995). Estimating equations for hazard ratio parameters based on correlated failure time data. Biometrika 82, 151-164. Cai, T., Wei, L. J. and Wilcox, M. (1999). Semi-parametric regression analysis for clustered failure time data. Technical report, Dept. of Biostatistics, Harvard University. Chen, L. and Wei, L. J. (1997). Analysis of multivariate survival times with non-proportional hazards models. In: Proceedings of the First Seattle Symposium in Biostatistics: Survival Analysis, p. 23-36, Lin, D. Y. and Fleming, T. R. (Eds). Cheng, S. C., Wei, L. J. and Ying, Z. (1995). Analysis of transformation models with censored data. Biometrika 82, 835-45. Cheng, S. C., Wei, L. J. and Ying, Z. (1997). Prediction of survival probabilities with semiparametric transformation models. J. Am. Statist. Assoc. 92, 227-35. Dabrowska, D. and Doksum, K. (1988a). Estimation and testing in a two-sample generalized odds-rate model. J. Am. Statist. Assoc. 83, 744-9. Dabrowska, D. and Doksum, K. (1988b). Partial likelihood in transformation models with censored data. Scand. J. Statist. 15, 1-23. Fine, J. (1999). Analyzing competing risks data with transformation models. J. R. Statist. Soc. B, to appear. Fine, J. and Gray, R. (1999). A proportional hazards model for the subdistribution of a competing risk. J. Am. Statist. Assoc. 94, 496-509. Fine, J. P., Ying, Z. and Wei, L. J. (1998). On the linear transformation model for censored data. Biometrika 85, 980-6. Fleming, T. R. and Harrington, D. P. (1991). Counting Processes and Survival Analysis. Wiley. Keaney, K. M. and Wei, L. J. (1994). Interim analyses based on median survival times. Biometrika 81, 279-286. Parzen, M. I., Wei, L. J. and Ying, Z. (1994). A resampling method based on pivotal estimating functions. Biometrika 81, 341-350. Scharfstein, D. O., Tsiatis, A. A. and Gilbert, P. (1998). Semi-parametric efficient estimation in the generalized odds-rate class of regression models for right-censored time to event data. Lifetime Data Analysis 4, 355-91. Wei, L. J., Lin, D. Y. and Weissfeld, L. (1989). Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. J. Am. Statist. Assoc. 84, 1065-1073.

Asymptotics in Statistics and Probability, pp. 47-70 M.L. Puri (Ed.) 2000 VSP

LOCAL ESTIMATION OF A BIOMETRIC FUNCTION WITH COVARIATE EFFECTS ZONGWU C A P and LIANFEN QIAN Department of Mathematics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA, E-mail: [email protected] Department of Mathematics, Florida Atlantic University, Boca Raton, FL 33431, USA, E-mail: [email protected]

ABSTRACT In the analysis of life tables one biometric function of interest is the life expectancy at age t, e(t) = E(T — 11 T > t). In this paper e(t) is extended to a regression model - proportional mean residual life regression model with both censoring and explanatory variables. The model usually assumes that the covariate has a log-linear effect on the mean residual life. We consider the proportional mean residual life regression model with a nonparametric risk effect and we discuss estimation of the risk function and its derivatives for two types of baseline mean residual life: parametrized and non-parametrized. In parametric baseline mean residual life case, inference is based on a local likelihood function, while in a nonparametric baseline mean residual life case, we propose a simple approach based on the exponential regression idea to find estimator. This simple method makes the implementation easier. Finally, the consistency and asymptotic normality of the resulting estimators are established. A simulation study is presented to illustrate the estimation procedures.

1. INTRODUCTION Let T be a non-negative random variable with a mean and a density function s(-). Let S(t) = 1 — F(t) be the survival function. Define e(t)

= E ( T - t \ T

>t)

= S~l(t)

S(u)du.

*Cai's work was supported, in part, by the National Science Foundation grant DMS 0072400 and funds provided by the University of North Carolina at Charlotte.

48

Z. Cai and L. Qian

For life tables, e(t) is called the life expectancy at age t, or more generally a biometric function (Chiang, 1960). It is also called the mean residual life (MRL) function in the reliability literature. In biometry e(t) is defined via the force of mortality (hazard function) X(t) = s ( t ) / S ( t ) , i

OO

e(t)

/

fV

e x p j - j ^ k{t + y)dy

dv.

Like X(-) and i(f), e(t) also determines S(t), 5(0 =

— e x p i e{t) (

i

Jo

e~\u)du

For the detailed interpretation of the MRL function e(t), we refer to the papers by Dasu (1991) and Maguluri and Zhang (1994). There is a vast literature on the nonparametric estimation of e(t) for both uncensored and censored data, see Yang (1978), Csorgo, Csorgo and Horvath (1984), Lee, Park and Sohn (1993), and Li (1997), to name just a few. In demographic and reliability studies, the MRL function may be more important than the hazard function, since the former deals with the entire residual life distribution, whereas the latter relates only the risk of immediate failure. It is obvious that the MRL is an important function in the actuarial work relating to life insurance. During the past two decades, the biometric function e(t) has been extended into several different directions. For example, Hall and Wellner (1984) introduced a class of survival distributions with linear MRL function, e(t) = At + B (A > — 1, B > 0), which covers a Pareto (A > 0), an exponential (A = 0) and a rescaled Beta (—1 < A < 0) distribution. Oakes and Dasu (1990) proposed a family of semiparametric proportional mean residual life (PMRL) models. Two survivor functions S(-) and SoO are said to have PMRL if e(t)

= de0(t)

f o r all

t ^ 0,

0 > 0.

(1.1)

Recently, Maguluri and Zhang (1994) extended the model (1.1) to a more general framework with covariate Z e(t

\z)

=exp(i/rz)e0(t).

(1.2)

Here, e0 (0 serves as the MRL corresponding to a baseline survivor function So(t). They proposed two methods to estimate parameter xj/, of which is based on the maximum likelihood equation of the exponential regression model, and the other is based on the underlying proportional hazards structure of the model and Cox's estimating equation.

Estimate

of Biometrie

Function

49

In this article, we extend the model (1.2) to a more popular and general nonparametric regression model with covariate effect Z e(t\z)

= V(z)e0(t),

(1.3)

which is called the proportional mean residual life regression model. Clearly, e(t | z) is the conditional mean residual life function of T — t given T > t and Z = z. When *!>(()) = 1, the function e0(t) is the conditional MRL function of T — t given T > t and Z — 0, and is called the baseline mean residual function. The model (1.3) looks similar to the proportional hazards model (Cox's model). Following the lead of the Cox's model, we consider the following PMRL model by taking its reparametrization form *(z) = exp(^(z)). Here, ifr(-) is called the risk function, which is common in the proportional hazards rate analysis literature; see Fan, Gijbels, and King (1997). Thus, the major interest for the PMRL model is to estimate the risk function ir(x). We consider two estimation approaches: Local likelihood approach and local likelihood for exponential regression fitting approach. The former discussed in Section 2 is mainly inspired by Fan, Gijbels, and King (1997) for the Cox's model and the latter described in Section 3 comes from Prentice (1973) for the exponential regression fitting. In many applications, the survival times of studied subjects are not always fully observed, but subject to right-censoring, due to the termination of the study or early withdrawal from the study. Consider the bivariate data {(7}, Z,); i — 1, ..., n} which form an i.i.d. sample from the population (T, Z). Under the independent censoring model in which we have the i.i.d. censoring times C\, ..., Cn that are independent of the survival times given the covariates, one only observes the censored data F, = min(7}, C, ) and Si = I(Ti < C;) as well as the associated covariate Z,. For notational simplicity, it is assumed throughout this paper that the random variables T and C are positive and continuous and the covariate Z remains constant over time. In this paper, we consider the problem of estimation of risk function in the model (1.3). Section 2 deals with the situation where the baseline mean residual life function is parametrized and discusses inference based on the local likelihood. Also, the consistency and asymptotic normality of the resulting estimator are investigated. For a nonparametric baseline mean residual life function, in Section 3, a simple method based on the exponential regression approach is proposed to estimate risk function and its large sample properties are also derived. In Section 4, we investigate the finite sample behavior of the local likelihood method based on the exponential regression procedure.

50

Z. Cai and L. Qian

2. LOCAL LIKELIHOOD METHOD 2.1.

Local

likelihood

estimation

Let s(t | z) denote the conditional density function of T given Z = z, and let S(t | z) = P{T > 11 Z = z) be its conditional survivor function. Under the independent censoring scheme and the usual assumption about uninformative censoring, the conditional likelihood function is

u

c

ar|

where ]"[« d ]~[c denote respectively the product over uncensored and censored individuals. This kind of likelihood can be seen often in survival analysis literature (Cox and Oakes, 1984, p. 81). We use the following notation. Denote by E0(t) = /0' e^1 (u)du, e.\(t | z) = £ e ( t \ z ) , and f 0 ( 0 = log {e0(r)/i»0(0)}. Assume temporarily that the baseline mean residual function e 0 (0 has been parametrized as e0(t) — eo(t; 9) and that \jj{z) has been parametrized as i j f ( z ) — is(z; fi). Therefore, under model (1.3), we have n

l o g ( L ) = ¿2

[5, $(Yr,

0 | Zi) - e x p

/?)} ¿ o W ; 9) -

ft(Xf,

i=i

0)],

(2.1)

where g(r; e \ z ) = \ o g

e i i t ;

+

* 9)

eo(i;

'(Z)

and

e { ( f , 9) =

at

e0(f,

9).

Maximization of (2.1) leads to the maximum likelihood estimate (MLE) of 0 and Suppose now that \}r(z) is in nonparametric nature, and assume that the pth order derivative of i f r ( Z ) at z exists for p ^ 1. Then by the Taylor's expansion, locally around z, ir(Z) can be approximated by tfr(Z)«

ZTP,

(2.2)

where fi = (/}„, . . . , pp)T and Z = (1, Z - z, . . . , (Z z)p)T• Note that fi depends on z. Using the local model (2.2), for given data {(Si, Yi, Zi)}, we obtain the local log-likelihood n i n i f i , 9) = J 2 l A

9

1

z

') -

e x

P ( - z f P)

E

°W;

9) -

$0(Yr,

0)]

1=1

xKh(Zi-z)/n,

(2.3)

Estimate of Biometrie Function

where Z, = (1, Z,- - z, . . . , (Z, -

51

z)p)T,

%i(t\ fi, 0\Z) = l o g { e i a ; 0) + exp (—Z r /J)} — log eo(t; 0), h = hn is the bandwidth parameter, K(-) is a kernel function and Kh() = h~lK(-/h). The local MLEs ft and 0 are the roots of the local likelihood equations « T Kh(Z, r ^

r -

5

z)

EoiYt;

1

9)

e x p ( - Z f 0 ) Z, = 0,

L

m(P,

(2.4)

9)J

and n

£ ^ ( Z , - - z)[ 0. (vii) n h

00.

CONDITION B .

(i) There exists an 77 > 0 such that E

{ £ 0 ( y ; 0o) 2 + "|Z},

E{\W{Y-0O\Z)\\2+^\Z),

E{\\E'0(Y-,

E

0o)\\2+»\Zj,

1

2+n

Estimate of Biometric Function

*

Z)

and

*

(

55

e | ( y

^ > ,

( z )

are continuous at the point Z = z. (ii) n h2p+3 is bounded. We now state the asymptotic consistency in Theorem 3 and the asymptotic normality in Theorem 4, but their proofs are relegated to the Appendix. T H E O R E M 3. Under Condition A, there exists a solution fi and 0 to the local likelihood equations (2.4) and (2.5) such that

H(/} — /?0)

0 and

?-0o-^>O.

Under Conditions A and B, the solution given in Theorem 3 is asymptotically normal T H E O R E M 4.

K i Z )

)

N (0, o\z)),

(2.10)

b„( z ) = — — — h p + l v ~ \ K ) f up+luK(u)du,

(2.11)

where Q>+1)!

a2(z) = f~l(z)So(z-,

J

0o)~%(z;

60 )S0 (z; e0 )~\

and v(Kj) = j u u r K'(u)du,

j = 1, 2.

Remark 1. Note that the bias term b„(z) of fi given in (2.11) has the same expression as that of the least-squares nonparametric regression (Fan and Gijbels, 1996, p. 62). It is not surprising to see that the bias term is independent of the model because the bias term comes from the approximation error. As an application of Theorem 4, the asymptotic normality of the local polynomial estimation of i/r(lJ)(-)(v = 0, . . . , p) can be obtained easily from (2.10), which is stated as follows. C O R O L L A R Y 5.

Under the conditions of Theorem 4, and if K(-) is sym-

metric, then J n h { H (j5 - P0) - bn(z)}

N jo,

V

-\

K

) v(K2) v~l(K) j ,

Z. Cai and L. Qian

56

where x p (-Zf/J) i=i = 0.

Kh(Zi

¿=1

z,

Kh{Zi

-

z)

i=i (3.1)

It is easy to see that {¿¡(/J) is non-positively definite and (3.1) has a unique solution, denoted by fis. If it is known that S0(t) = exp(—t//i0), then the equation (3.1) is the local MLE of this exponential regression model and /Ss is asymptotically efficient (Prentice, 1973). For the general PMRL regression model (1.3), the estimating equation (3.1) is still asymptotically consistent by the fact thaitUn((i) -> 0 in probability. Furthermore, we have asymptotic normality for fis (see Theorem 6). Note that the function \jr{z) is not directly estimatable since (3.1) does not involve the intercept /3q = i/(z) due to the cancellation. It is not surprising since from the PMRL model (1.3), ir(z) is only identifiable to within a constant factor. The identifibility of i j f ( z ) is ensured by imposing the condition i/r(0) = 0. Then the function i j / ( z ) = JQ ir'(t)dt can be estimated by f ( z ) =

[

Jo

tAi

(t)dt.

For practical implementation, Tibshirani and Hastie (1987) suggested approximating the integration by the trapezoidal rule. Since the equation (3.1) does not involve fio = is(z), we need to re-write (3.1). To this end, let C = (0

Ip),

H* = C H C r ,

ji* = H* Cj8,

and Z* = H t _ 1 C Z . Correspondingly, let P*0 = H*CP0,

and

u* = ( « , . . . ,

up)T

.

58

Z. Cai and L. Qian

Then, (3.1) becomes ain 1 [/„(/**) = ain + — - Texp

( - Z * f r ) Yt Z* Kh(Z, - z) = 0,

(3.2)

where «in =

1

" y > V ATA(Zj - z), n ' i=i

1 " a 2n = ~ y 2 s i K h (Zi - z), n ' i=i

and 1 " A3» = " E e X P (~Z*JP*) ;=1

Y

>

Let 0* be the solution of (3.2). Then, ftl){z) the I-th element of /J*. Set, for 7 = 1, 2,

is 11 p*tl/hl, I ^ 1, where ps l is

v*(KJ) = j u*u*TKj(u)du, v„(K) = v*(K) - (^j

~ Z)'

u*

and

K(u)d u^j

.

To establish the asymptotic properties of /?*, the following conditions are needed. CONDITION C .

(i) Functions E{Y\Z)anAE{8\Z) are continuous at Z = z(ii) The kernel function AT(-) is bounded density with a compact support. (iii) The function \j/ (•) has a continuous (p + l)th derivative around the point z. (iv) The density /(•) of Z is continuous at the point z and / ( z ) > 0. (v) n h —> oo. CONDITION D .

(i) There exists an rj > 0 such that E(\Y\2+rt\Z) Z = z. (ii) n h2p+i is bounded. THEOREM 6.

is finite and continuous at

Under Condition C, we have K - n - ^

o.

Suppose in addition that Condition D holds, then Jnh [ f s - P I - b*n(z))

N {0, < ( z ) 2 } ,

Estimate

of Biometrìe

59

Function

where tip+x\z) hp+lv;l(K)jup+lu*K(u)du, K(z) = (P+1)!

(3.3)

and E *:(z)

2

=

U

_

Y

E(S I Z=z) E(Y I Z=z)

— Z

J-v;1(K)V*(K2)V;HK).

z))2f(z)

{E(ò\Z =

(3.4)

Note that the brief proof of Theorem 6 can be found nowhere but in the Appendix. It can be seen easily that the asymptotic variance is somewhat different from that in Theorem 4. 4. SIMULATION STUDY In this section, we conduct a simulation study to illustrate our proposed methods in Sections 2 and 3. We consider the case of a single covariate Z and study two examples: one with unit exponential as the baseline distribution and the other with Hall-Wellner-type baseline distribution with eo (t) — 3t + 1. For Hall-Wellner-type baseline mean residual life distribution, the proportional mean residual life regression model (1.3) is equivalent to the following transformed regression model log Eo(T) = - log(A +

(Z)) + log e,

where e has the standard exponential distribution and A is the slope of the baseline mean residual life function. In this simulation study, the risk function is taken to be \jr(z) — z(l — z). The censoring random variable C is independent of Z and T and its distribution is indicated in the following tables. Three sample sizes and three bandwidths are considered, and the Epanechnikov kernel K(u) = 0.75(1 — u2)I(\u\ < 1) is used. The estimator V (v> ( - ) is assessed via the square-Root of Average Square Errors (RASE) 1/2

"grid

RASE,,



£

2 { £

( v )




-

where {zk, k = 1, . . . , «grid} are the grid points at which the function Vf(v) (•) is estimated. The grid point is taken from 0.05 to 0.90 with increment 0.01. The covariate Z is generated from U (0, 1) and the censoring random variable C is simulated from U(0, 5) and U (0, 10) for the standard exponential baseline residual mean function, and U(0, 1) and U(0, 5) for the HallWellner baseline residual mean function. We simulate data and compute

60

Z. Cai and L. Qian

Table 1. Simulation results for the exponential distribution with 100 replicates and three sample sizes and three bandwidths n = 250

n = 400

n = 700

Censoring: U(0, 5), Censoring rate: 15%-30% h = 0.5 ft = 1.0 h = 1.5

0.218 ±0.164, 0.184 0.147 ± 0.105, 0.116 0.131 ±0.098,0.107

h = 0.5 h = 1.0 h = 1.5

0.186 ±0.151, 0.153 0.146 ± 0.101, 0.126 0.133 ± 0.112,0.110

0.159 ± 0.120, 0.124 0.120 ± 0.091, 0.097 0.116 ± 0.090, 0.094

0.113 ± 0.089, 0.097 0.095 ± 0.070, 0.080 0.088 ± 0.064, 0.082

Censoring: U{0, 10), Censoring rate: 15%-20%

Estimate of psi(z)

-3

-2

0.139 ± 0.092, 0.128 0.116 ±0.082, 0.097 0.105 ± 0.083, 0.084

0.104 ± 0.089, 0.087 0.083 ± 0.071, 0.065 0.089 ± 0.057, 0.082

Estimate of derivative of psi(z)

-1

Figure 1. The estimated curves of xj/ (z) and i//(z) for the exponential baseline residual mean life function. True function - solid curve; estimated function - dashed curve.

the estimate of i/f(l,j(z) based on (3.2). For both examples, we repeat the simulation 100 times. For the standard exponential baseline residual mean function, Table 1 reports the results on RASEs of the local likelihood estimate of the risk function \Jr(•) based on 100 replicates. In each cell, the first, second and third numbers represent the mean, standard deviation and median of RASE for three sample sizes n — 250, 400 and 700, respectively. Figure 1 represents the estimated ifr(-) and y'(-) from a random sample with n = 500, h = 0.5, Z ~ U(—3, 3), C ~ f/(0, 5) and the censoring rate 7.8%, which shows that the local estimation method performs reasonably well. For the Hall-Wellner baseline residual mean function, we consider three sample sizes n = 500, 700, and 1000. A summary of the simulation results

61

Estimate of Biometrie Function

Table 2. Simulation results for the Hall-Wellner distribution with 100 replicates and three sample sizes and three bandwidths n = 500

n = 700

n = 1000

Censoring: U(0, 1), Censoring rate: 30%-45% h =0.5 h = 1.0 h = 1.5

0.111 ±0.093, 0.087 0.113 ± 0.069, 0.106 0.137 ±0.080, 0.128

0.101 ± 0.070, 0.084 0.094 ± 0.058, 0.089 0.124 ± 0.082, 0.118

0.1072 ± 0.0745, 0.1038 0.091 ± 0.053, 0.093 0.140 ± 0.071, 0.146

Censoring: U{0, 5), Censoring rate: 9%-20% h = 0.5 h = 1.0 /j = 1.5

0.173 ±0.118, 0.159 0.160 ±0.123, 0.131 0.185 ±0.121, 0.186 Estimate of psi(z)

- 3 - 2 - 1 0 1 2 3 Z

0.147 ± 0.097, 0.122 0.114±0.076, 0.102 0.148 ± 0.096, 0.141

0.141 ±0.104, 0.119 0.121 ±0.081, 0.107 0.147 ± 0.087, 0.144

Estimate of derivative of psi(z)

- 3 - 2 - 1 0 1 2 3 Z

Figure 2. The estimated curves of V (z) and i//(z) for the Hall-Wellner baseline residual mean life function. True function - solid curve; estimated function - dashed curve.

on 100 RASE-values of the local estimate of ^ ( 0 based on 100 simulations is given in Table 2. The estimated curves of ifr(-) and ijr'{-) are depicted in Figure 2 for n = 700, h = 1.0, Z ~ U(-3, 3), and C ~ U(0, 5) with the censoring rate 5.8%. Note that the estimation for the risk function seems to be somewhat underestimated by the trapezoidal rule of Tibshirani and Hastie (1987). From Tables 1 and 2, one can observe that the median of RASE is always less than the mean. Furthermore, as we can see from Tables 1 and 2 and Figures 1 and 2, the estimate for the exponential baseline mean residual function performs better than that for the Hall-Wellner. This is perfectly natural, since the former estimate is efficient when the underlying baseline distribution is exponential.

62

Z. Cai and L. Qian

APPENDIX: PROOFS We still use the same notation as in Sections 2 and 3. Proof of Proposition 1. We employ the martingale technique. Let N(t) = < t , S = 1), X(t) = I(Y ^ t ) and let Tt = a[Z, N(u), X(u), 0 < u ^ i} be the history up to time t. Set I(Y

M(o=Ar« - 1

Jo

x(u)ei(M;

(A.i) 0O)

Then, A/(f) is an J7,-martingale (Fleming and Harrington, 1991, p. 19). Since f°° 1 / ^ ; dM(jt) Jo e 1 ( t , 9 0 ) + V~HZ) f°° 1 = / —;—t———-—dN(t)J0 e d f , Oo) + V~KZ) 8

e, (Y- Oo) +

V~l(Z)

Eo(Y-

f°° 1 / /{K^f} —di Jo 'eo(r,0o) B0)

and {e\(f, do) + ^ " ' ( Z ) } 1 is ^-measurable, (2.6) follows by taking the conditional expectation of the above equality with respect to -'(Z)U \ = ( ei(f,ff0)+*-HZ) ) dM(t) Jo 9o\Z) J

\ 90))

V

by (A.l), conditioning on Z, one has < M,M

> (t) = X(t)

—— e0(t; 0O)

,

so that the first term on the right-hand side of (A. 14) becomes

M f c z - Z )

= EK\(Z-z)/

J0

\ -t'(t:O / n\Z) -£'(*;«(,|Z) oo / »-'(Z)U \®2 1 ( «i(';io)+*- (z) ) ./o \ —£'(t: 9n\Z) /

eo(t;9o)

Estimate of Biométrie Function

• 0 as |w| — o o ; (iii) f u2K(u)du

< oo.

Assumption B. max (a, c) —> 0, n x min(a 2 , c) — o o as n

oo.

The Estimation of Conditional

Densities

77

Assumption D. The density / of Z, is positive and has a continuous and bounded second derivative at z; the joint density /(•, •) of (Yiy Zi) has continuous second partial derivatives at ( j , z). Remarks. (1) Assumptions P, M and D describe the dependence and marginal properties of Xt. P and M/3(i) are the same as Roussas's (1991a) Al(i) (ii). We replace Roussas's assumption Al(iii) on p-mixing with p(j) = 0{j~v) for v > 1 by P-mixing with Pij) < oo in M^ii). By Davydov (1973), for a stationary homogeneous Markov process, the definition of p-mixing (3.1) is equivalent to:

where \-\var denotes the total variation norm of a signed measure; v(-) is the stationary invariant measure; and Pj(x, A) = Pr(X 1 + ; e A | X\ = x), the j-step transition probability kernel. Under Assumptions P and M^(i), this definition is equivalent to that in Assumption M^ii). Moreover, Assumption M ^ i ) implies that {X,} is stationary p-mixing, and hence a-mixing. In addition to M/3(i), Roussas (1967, 1969) assumed condition (Do) which is equivalent to uniform ( 1. However, by Bradley (1986), for a stationary Markov process, either the (p-mixing or p-mixing coefficient is identically one for all time lags or it decays exponentially. In general a-mixing allows for more temporal dependence than P-mixing and p-mixing, which is why authors such as Robinson (1983), Rosenblatt (1985), Masry (1989) and Bosq (1996) have assumed a-mixing. However under M^(i), a-mixing has the same decay rate as p-mixing, see e.g. Rosenblatt (1985), Davydov (1973) and Bradley (1986). Another advantage of the P-mixing assumption is that it automatically implies an assumption often used in asymptotic theory for nonparametric density estimation based on other types of mixing processes: | f j ( y , Z) -

f ( y ) f ( z )I

c

< o o , f o r all

SR,

where /}(-, •) is the joint density of (X], X \ + ; ) ; see e.g. Roussas (1991a, Assumption A5(iii)), Masry (1989) and Bosq (1996). For these reasons one might prefer M^(ii) over other mixing assumptions, given assumptions P and M^(i). However, the Markov assumption M ^ i ) is not really needed for the results in this paper, though it does partially motivate interest in f ( y |z). We shall only assume that the stationary sequence {X,} satisfies P and M a or M^. Notice that even though M^(i) implies M a (i), M^ii) may not imply M a (ii). (2) Assumption K is the same as Assumptions A2 (i), (ii), (iii) of Roussas (1991a), and imposes no serious practical restriction on the kernel.

78

X. Chen, O. Linton and P. M. Robinson

(3) Assumption B is a minimal restriction on the bandwidth numbers for the central limit theorem to hold. (4) Assumption D is the same as Assumption A5 (i)(ii) of Roussas (1991a), and is again standard, though it would be possible to obtain results under milder smoothness assumptions, or indeed under stronger ones if Assumption K were relaxed to permit use of higher order kernels; in particular this would affect the order of magnitude of the asymptotic bias, see Lemma 1 below. We first discuss the asymptotic bias AB(y, z), by which we mean the leading terms in the deviation of the ratio of expectations of numerator and denominator of faac(y lz) from f ( y |z);further to our remarks of the previous section, there is no presumption that the expectation of faac(y lz) exists. Define *.(y. z) =

J u2K{u)du 2f(z)

B2(y,z)=

d2f(y, z) , t - g ^ - +

fu2K(u)du 2 f ( z )

f(y\z)

d2f(y,z),

d2f(z)

LEMMA 1. Under Assumptions P, K, D and a,c —>• 0 as n —> oo, we have: AB(y, z) = a2B\ {y, z) - c2B2(y, z) + o (max{a2, c2}) . Hence for case (2.4) a = b = c?, we have: AB(y,z)

= a2B1(y,z)+o(a2),

(3.2)

and for case (2.5) a = b = c, we have: AB(y, z) = a1{Bl(y, z) - B2(y, z)} + o(a2).

(3.3)

This is proved by a standard Taylor series argument, as in Rosenblatt (1969, 1985), Robinson (1983). We next consider A V(y, z), the asymptotic variance of faac{y lz), where this refers to the variance in the limit distribution, and makes no presumption that faady lz) has finite variance. Define Vl(y,z)

= J K(u)2duf{yf^

V2(y,z) M> == {< / K(u) du) V3(y,z,a,c)=

,

f(z) ' au f ( y z) 2 2 K(u)K(—)du ^ ' . J c f(z)

f

79

The Estimation of Conditional Densities

LEMMA 2. Under Assumptions P, Ma or Mp, K, B and D, we have: AV(y, Z) -

+

nc

r nal

+ O

nc

\n x min{az,

c})



If, further, either a = 0(ci/2),

c = o(a),

(3.4)

or a = 0(c),

(3.5)

then AV(y, z) —

nc

H

T~+0\ naz

• r 2 J \n x mm {a1, c) /

(3-6)

Hence for case (2.4), a = b = c^, we have AV( AV(y,z)

, = Vi(y,z) , v2 (y,z) , ( l \I * 2 r ~ + 0 I naz

na1

\naL)

(3-7)

while for case (2.5), a = b = c, we have A V ( y ,

Z

) = V ^ naz

+

o ( ^ ) . \naiJ

(3.8)

This is proved via a standard linearization argument (see Roussas (1967, 1969), Rosenblatt (1969)), along with the use of the a-mixing and other assumptions to show that the outcome is identical to that when the X, are independent observations, as in Robinson (1983). The fact that V3(y, z, a, c) is absent in (3.6) when we impose (3.4) or (3.5) follows from a dominated convergence argument. Note that (2.4) implies (3.4) and (2.5) implies (3.5). Comparing with case (2.4), the variance is always smaller in case (2.5) because the contribution to variance from the denominator is negligible. As indicated by Lemma 1, however, the bias may be more or less. For example, suppose that the X, are independent standard normal random variables. Then in case (2.5), the variance is proportional to (y)/(z), where $ is the standard normal density function, while the bias is proportional to (y2 — l ) 0 ( y ) . In case (2.4), the variance is proportional to [1 + 0(y)]0(;y)/(z), while the bias is proportional to (y2 + z 2 — 2)/2(1 — p 2 ) 5 / 2 [ /

1/6

K(u)2duf

(3.12)

n(15/04 - 50p 2 + 39) [ / m2ì:(m)ì/m]

Of course the intention is that these be used without the Gaussianity assumption, in which case they are not optimal, though they do have the optimal rate of convergence, adapt naturally to scale at least, and may hopefully be useful even when the actual process is not close to being Gaussian. Notice that both formulae (3.11) and (3.12) are invariant with respect to ¡jl, but depend on the unknown a and p, for which we may insert the sample standard deviation and lag-1 sample autocorrelation , n+1 a ^ C - ^ - X ) n U

2

)

1

'

2

,

1

P =

" — 7 Y i X i - X){Xi+l n — 1 '

-

— X)/a2,

>=i

respectively, where X = (n + l ) - 1 YllH due to the n1/2— consistency of a and p (under the assumptions listed above along with finite fourth moments of Xi) the consequent plug-in bandwidths should be fairly stable in moderate samples, even though they are not in general approximately optimal. A simplified version, suitable for independent X,-, puts p = 0 in (3.11) and (3.12). We next consider central limit theory. THEOREM 3. Let a, c > 0 be such

Assumptions

P, Ma

or Mp, K, B and D be satisfied,

lim [n x min{a2, c} x max{a4, c4}] = 0, n—>oo c

a

lim min(l, —) + lim min(l, —) > 0 n-* oo a n-* oo c Then we Jn

and

that:

(3.13)

exists,

have

x min{a2, c} ( f a a c ( y

\z) -

f ( y

|z)) =>> Af

(0, V(y,

z)),

The Estimation of Conditional

Densities

83

where =

/ *2("M"[min{l,lim^}

j

K2(u)du

2

+ minjl, lim Hence, with B simplified to naI case (2.4) a=b = c1/2, ^{Taaciy

\z)-f(y

J / ( j | z )]. oo, and (3.13) to na\ -»• 0, we have for

\z))=^N(0,Vl(y,z)

+

V2(y,z)),

and for case (2.5) a = b = c, yfc*(faac(y\z)-f(y\z))=>Ar(P,V2(y,z)). The theorem is proved by applying Robinson (1983, lemma 7.1) in the a mixing case, and in the ^-mixing case by proceeding similarly but employing results like those of Viennet (1996). As usual, the asymptotic variances are of the same type as those in case of independent observations in that there is no contribution from 'covariance' terms, though of course in the present instance if we impose the independence by writing f(y |z) = f(y), there is some slight simplification in our asymptotic variance formulae. As in Robinson (1983) we can consistently estimate the limiting variances by inserting smoothed nonparametric estimates of the unknown components, in order to carry out pointwise inferences. These are useful because, as in that reference and Rosenblatt (1970, 1971), it is possible to extend the result to a multivariate central limit theorem, indicating asymptotic independence of the \fna2 ( f a a c ( y k k) — f(yk lz)) across finitely many distinct fixed points y\,y2,

••••

REFERENCES 1. Bosq, D. (1996). Nonparametric Statistics for Stochastic Processes. Statistics, Springer-Verlag, New York.

Lecture Notes in

2. Bradley, R. (1986). Basic properties of strong mixing conditions. In: Dependence in Probability and Statistics, pp. 165-192, E. Eberlein and M. S. Taqqu (Eds), Birkhausen 3. Cacoullos, T. (1966). Estimation of multivariate density. Ann. Inst. Statist. Math. 18, 179-189. 4. Chanda, K. C. (1983). Density estimation for linear processes. Ann. Inst. Statist. Math. 35, 4 3 9 - 4 4 5 . 5. Csörgo, S. and Mielniczuk, J. (1991). Density estimation under long-range dependence. Ann. Statist. 23, 9 9 0 - 9 9 9 . 6. Davydov, Y. A. (1973). Mixing conditions for Markov chains. Theor. Probab. Appi. 18, 312-328.

84

X. Chen, O. Linton and P. M. Robinson

1. Lee, M.-J. (1996). Methods of Moments and Semiparametric Econometrics for Limited Dependent Variable Models. Springer-Verlag, New York. 8. Masry, E. (1989). Nonparametric estimation of conditional probability densities and expectations of stationary processes: strong consistency and rates. Stock. Proc. Appl. 32, 109-127. 9. Parzen, E. (1962). On the estimation of a probability density and mode. Ann. Math. Statist. 35, 1065-1076. 10. Robinson, P. M. (1983). Nonparametric estimators for time series. J. Time Series Anal. 4, 185-207. 11. Robinson, P. M. (1986). On the consistency and finite-sample properties of nonparametric kernel time series regression, autoregression and density estimators. Ann. Inst. Statist. Math. 38, 539-549. 12. Robinson, P. M. (1991). Nonparametric function estimation for long memory time series. In: Nonparametric and Semiparametric Methods in Econometrics and Statistics, pp. 437-457, W. A. Barnett, J. Powell and G.E. Tauchen (Eds). Cambridge University Press, Cambridge. 13. Rosenblatt, M. (1969). Conditional probability density and regression estimators. In: Multivariate Analysis II, pp. 25-31, P. R. Shnaiah (Ed.), Academic Press, New York. 14. Rosenblatt, M. (1970). Density estimates and Markov processes. In: Nonparametric Techniques in Statistical Inference, pp. 199-210, M. L. Puri (Ed.), Cambridge University Press, Cambridge. 15. Rosenblatt, M. (1971). Curve estimates. Ann. Math. Statist. 42, 1815-1842. 16. Rosenblatt, M. (1985). Stationary Sequences and Random Fields. Birkhäuser, Boston. 17. Roussas, G. (1967). Nonparametric estimation in Markov processes. Ann. Inst. Statist. Math. 21, 73-87. 18. Roussas, G. (1969). Nonparametric estimation of the transition distribution function of a Markov process. Ann. Math. Statist. 40, 1386-1400. 19. Roussas, G. (1991a). Estimation of transition distribution function and its quantiles in Markov processes: strong consistency and asymptotic normality. In: Nonparametric Functional Estimation and Related Topics, pp. 443-462, G. Roussas (Ed.), Kluwer, Amsterdam. 20. Roussas, G. (1991b). Recursive estimation of the transition distribution function of a Markov process: asymptotic normality. Statist. Probab. Lett. 11, 435-447. 21. Samanta, M. (1989). Non-parametric estimation of conditional quantiles. Statist. Probab. Lett. 7,407-412. 22. Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, London. 23. Viennet, G. (1996). Inequalities for absolutely regular sequences: application to density estimation. Prob. Theor. Rel. Fields. 107, 467^192. 24. Yakowitz, S. (1979). Nonparametric estimation of Markov transition functions. Am. Statist. 7, 671-679.

Asymptotics in Statistics and Probability, pp. 85-116 M.L. Puri (Ed.) 2000 VSP

FUNCTIONAL LIMIT THEOREMS FOR INDUCED ORDER STATISTICS OF A SAMPLE FROM A DOMAIN OF ATTRACTION OF a;-STABLE LAW, a g (0, 2) * YU. DAVYDOV and V. EGOROV Laboratoire de Statistique et Probabilités, Université des Sciences et Technologies de Lille, 59655 Villeneuve d'Ascq, France Department of Mathematics, University of Electrical Engineering, 197376 St. Petersbourg, Russia

ABSTRACT We study the asymptotical behavior of empirical type processes constructed by induced order statistics, corresponding to a d-dimensional sample from the domain of attraction of an instable law with a < 2. The functional central limit theorem is proved under conditions close to optimal ones. For the limit processes the integral representations with respect to some stable stochastic measures are given.

1. INTRODUCTION Let Zj = (Xi, Yi), i = 1, 2 , . . . , be independent copies of random vector Z = (X, Y) such that X e R1, Ye Rd. Denote by XnA < Xn>2 Xnn the order statistics of a sample X\,..., X„, and Yn \, Yn 2,..., Yn jl the corresponding values of the vectors Y\,... ,Yn. The random vectors (Yn,i, i ^ n) are called induced order statistics (IOS). This generalization of order statistics was first introduced by (David, 1973) under the name of concomitants of order statistics and simultaneously by *Work partially supported by Russian Foundation of Fundamental Research N 99-0100724, N 99-01-00112.

86

Yu. Davydov and V. Egorov

(Bhattacharya, 1974). During the last twenty years the asymptotic theory of induced order statistics have been intensively developed. Interested readers may refer to (Bhattacharya, 1984, David, 1991 or Davydov and Egorov, 1998) for extensive reviews. Without loss of generality, due to Smirnov integral transformation, we can investigate limit behavior of the sums Yn i only for X having the standard uniform distribution. Therefore in this paper we suppose that X is uniformly distributed on the interval [0, 1], Denote T]n(t) = KX

VjlmWi)

- fn(0j

Ut)=b~l ^

.

-/„«j ,

(1)

(2)

where bn, fn(t) are some normalizing and centering constants. (Davydov and Egorov, 1998) in the case of the domain of attraction of a normal law proved the following functional central limit theorem (FCLT). THEOREM d

1.1. Let E F = 0, E | 7 | 2 < OO where | F | is Euclidean norm in

R . Set bn = y/n, fn(t) = 0. Then tin

D d [0.1]

T],

where V(t) =

f a(s)dW(s)+

f

Jo

Jo

m(s)dV(s),

W is a d-dimensional standard Wiener process, V is a Brownian bridge independent from W, m(t) = E{Y\X = i}, a(t)o(t)T = cov(F|X — t), D d ([0, 1]) is Skorohod space of d-dimensional functions . I f , in addition, m is continuous in the open interval (0, 1) and for some C > 0, a G (0, 1/2) \m(t)\ ^ Ct~a(l -t)~a,

»6(0,1),

then Sn

DJ[0 1] . i' S

where = r](t) -

m(t)v{t).

The main goal of this work is to establish the similar results for Y from the domain of attraction of a stable law with index a e (0, 2). This problem is

Limit Theorems for Induced Order

Statistics

87

more complicated then previous one. Roughly speaking, for the FCLT to be hold the conditional distribution of Y given X = s should be attracted with probability one to a stable law with the same normalization. (Obviously, this condition holds if E|F| 2 < oo.) Therefore we prove our main results provided that the vector (X, Y) has some special type of distribution. It should be noted that random processes (1), (2) are special cases of generalized empirical and quantile processes. Some versions of FCLT for such processes (only for a > 1) were proved by (Andersen and al., 1988, Arcones, 1998). Our theorems do not follow from these general results. Moreover, they are valid for all a < 2. The paper is organized as follows. After notation (Section 2) in Section 3 we formulate and partially prove auxiliary facts about stable laws and stochastic integrals; some of which are known, others are new. We also introduce some notation there. In Section 4 we formulate and discuss the main results. Section 5 contains some special lemmas used in the proofs. Section 6 is devoted to the proofs of our theorems, Section 7 contains an application to the problem of convexification of random walks and some open questions.

2. NOTATION

y (£2, T , P) Xnj Yn i §„, /?„ ri

main probability space. order statistics of the sample (X,, i n). induced order statistics. main random processes which are subject of study. limit random processes.

Y = £

random vector Y is ¿^-distributed.

Y = X

random vectors X and Y have the same distribution. measure on Sd~] symmetric to measure a: ors(A) = cr(-A), A e BSd-1. space of d-dimensional random vectors such that E|X| P < oo. norm in Ldp(Q, T , P), 1.

as T, P), p > 0 ||X|| / ; = m a x i ^ ( E | X i |i')1/i'

88

Yu. Davydov and V. Egorov

Lkp(E,£,m), p > 0

space of fc-dimensional functions such that fE \f\»dm < oo. norminL kp(E,£,m).

II/IIe = maxi^ti/g Lp(E,£,m)

\fi\pdm)l/p, =

p ^ I

Vp(E,£,m).

D d ([0, 1])

Skorohod space of ¿-dimensional functions.

||/||J = sup [0>1] |/(i)|. Euclidean norm in Rd. convolution of measures ¡x and v. ¿-dimensional strictly stable law with an index a e (0, 2) and with a spectral measure a. domain of attraction of the law C(d, a, a). part of Vom(C(d, a, a)) corresponding to the normalization with a fixed slowly varying function h.

in fi*V

L{d, a, a)

V(£(d, a, a)) Vom{C{d, a, a), h)

E{Y;X 0.

89

Limit Theorems for Induced Order Statistics

Let y(0), 0 € Rd, be the characteristic function of an a-stable vector Y in Rd. Then there exist a finite measure a on the unit sphere Sd~l of Rd and a vector I in Rd such that a) if a ^ 1, then 4>y(6) — exp] — /d = e x p H!s -i

5>r(l - « sign((9, s))

\a.n{na/2))o{ds)

+ i(0,l) j,

(4)

b) if a = 1, then Y(9) = e x p { - J^ ^

s)|

+

s>) log |• C(d, a, a).

(9)

Remark 3.2. In fact, if a ^ 0, then from an =>• a it follows that C(d, a, an) converges in total variation to £{d, a, a), but we do not use this fact (see Davydov and Nagaev, 1999, for the proof). 3.2. Stochastic measures Let E ^ 0 , £q be a ring of subsets of E, and let (H, BH) be a complete separable linear metric space. DEFINITION 3.1. A random process {v(A), A e So) with values in H is called //-valued independently scattered stochastic measure (i.s.s.m.) if 1) for any pairwise disjoint sets A\, A2,. • •, An e Sq the random variables V(A2), . . . , V(An) are independent, 2) for any sequence of pairwise disjoint sets A\, A2,... from £ 0 such that U An E SQ, with probability 1, v(UA„) = ^

v(A„).

(10)

The following precision of Theorem 5.4 in (Callenberg, 1983) is very useful for construction of an i.s.s.m. LEMMA 3.2. Let {Q A, A e £0} be a family ofprobabilities defined on Bh, and the following conditions are satisfied: 1) If A, B are disjoint, then QAUB

= QA*QB•

(11)

2) If An e £0, An I 0 then QA„ =• ¿0.

(12)

Then there exists a H-valued i.s.s.m. v such that for any A e £$ v(A) = Qa. Proof To prove the lemma it suffices to construct a compatible system of finite-dimensional distributions and to apply the Kolmogorov theorem. If A1, A2, • • •, An are disjoint, then we set VAuA2,...,A„

= Qa, x QA2 x • • • x

QAn.

If Ai — Uje&iBj, A, c { 1 , 2 , . . . , n], i ^ n, and Bj, j disjoint, then we define the map 7T : Rm

> R",

n(xu...,xm)

=

- 1 , . . . , m, are

(yu...,yn),

Limit Theorems for Induced Order Statistics

where yt =

xj,

91

i ^ n, and set (13)

VAUA2,...,AN = VBI,B2,...,BMN~L •

Obviously, this definition is correct and for the system of finite-dimensional distributions VAuA2 A„ the hypotheses of the Kolmogorov theorem are fulfilled. Hence, there exists a random process [v(A), A e £o] such that for any A\, A2,...,

An e

£o

(v(Ai), V(A2), ...,

v(A„)) = VAuA2

AN•

Next, if A fl B = 0, then v(A) and v(B) are independent, and v(AU5) = ^ )

+ D(S).

(14)

Indeed, since the joint distribution of the random variables v(A U B) is VA,B^~1, where Jr : R2 ->• R3,

ir(x,

y) = (x, y,x

v(A), v(B)

and

+ y),

we have (14). Now we prove a-additivity. Let (An) be a sequence of disjoint sets from So, and let A = UA„, A e £ 0 - We will show that v(A) = ^ v ( A „ )

(15)

a.s.

The variables (v(A„)) being independent, it suffices to prove that n -?> v ( A ) ,

J^v(Ak)

n

> oo,

(16)

k=l

where Bn = assumption

U" = ] Ak. Since A\Bn

e £0 and A\Bn

QA\B„

I

0 , we have by

So,

or, equivalently, v(A\Bn)

X

0,

(17)

which implies (15).



Let C(d, a, a) be a strictly stable ¿-dimensional distribution and (E, £, m) be a space with a c-finite measure m, £ 0 = {A e £ : m(A) < oo}, H = Rd. Denote QA

= C(d,a,m(A)a),

By stability property

QA*

QB — QAUB

A e

£0.

92

Yu. Davydov and V. Egorov

A n B = 0. From (9) it follows (12). Hence by Lemma 3.2 there exists i.s.s.m. v such that

if

v(A)

= C(d,a,m(A)o).

(18)

This measure is called an a-stable i.s.s.m. subordinate to the law C(d, a, a). The measure m is called a control measure for i.s.s.m. v. 3.3.

Stochastic

integrals

Let v be an a-stable i.s.s.m. on (E, £, m) subordinate to the law £(d, a, a). For any measurable function f : E Rl such that | f\adm

j

< oo

(19)

(in other words / e L a ( £ \ £, m)) define an integral l { f )=

j j d v .

Our construction is a slight modification of the one of (Samorodnitsky and Taqqu, 1994, pp. 122-124), and in the case d = 1 it is reduced to the integral constructed there with yS = 0. For a step-function / = J2"j=ì , Ai n Aj = 0 for i / j , set I ( f ) =

r JE

n

(20)

fdv = T a j v ( A j ) .

j=i

Obviously, (20) has with probability one the same meaning for different representations of 1(f), and 1 ( f ) is linear for step-functions. From independence of v(Aj), j < and from (8), (7) it follows that I ( f ) = C ( d , a , a

f

) ,

(21)

where a

f

= a

+

( f ) < j + a - ( f ) v s ,a

±

( f )= j j ± d m .

(22)

Let / e L a ( E , £, m), (/„) be a sequence of step-functions such that 1) / „

/

a.s.

2) 1/nOOI ^ om(C( 1, a, a;), h) for any I e Rd, where 07 is some finite measure on S°, and the s.v. function h is the same for any I. Then there exists a d-dimensional strictly stable law C(d, a, a) such that Y e Vom{C{d, a,

o),h),

and (32) holds. Clearly, (32) defines completely the measure a.

96

Yu. Davydov and V. Egorov

Proof. The first statement of this lemma follows from the definition of a domain of attraction and continuous mapping theorem. Now we prove (32). Let and fy are the characteristic functions of § and {£, /), respectively. Then for a ^ 1 1 the following statement holds:

+oo, +

(36)

(37)

97

Limit Theorems for Induced Order Statistics L E M M A 3.7. IfY

e Vom{C(d,a,

0, r

lim Proof. Remark

a),h)

„\

thenfor

any Ae

BSd-1,

ct(3A)

Y

y P | — e A,

(38)

\Y\ > r \ = a{A).

See (Araujo and Gine, 1994, p. 152). 3.3.

=



The normalizing constants in (30) can be taken such that ban = nh(bn).

(39)

The following interesting result seems to be new. LEMMA3.8.

Let Y G T>om(£(d,

8 > 0 , and the random

a,

CT), h), r) eLa+s(Q,

variables

Y and rj are independent.

Yt] G

Vom{C{d,a,a'),h)

T,

P) for

some

Then

where

a' = a+a

+ a- r ) =f L

(41)

implies the relation lim - ^ L p ( - 1 1 e l|F/?|

r->oo h(r)

A,

\Yr)\ > r,

rj >

oj J

= a+L

(42)

where a+ is defined in (40). Let 5] e (0, min{ r, 0 < h(r)

f ) 7,, J

98

Yu. Davydov and V. Egorov

= rrr

p

i

(T^

,aHr/\x h ( r ) J(o,r/TSlh{r) ]

where

€ A

I) ^(r, l m

'

\Y\>-iv^dx)

x)I(o, r / T s . l ^xi\^ )

Îr+ vp(r,;t) =

^

i

V k l /

By (41), for any

h(r/\x

y

1

— e A, \Y\

I)

(44)

r

> —

\x\

x

^(r,

x) —- L,

Lemma 3.11 implies that for r h

r

> TS[, x
0} o/(-l ) = /

\(l,s}\aa'(ds)=a+ol(l)+a-crl(-l),

K U ) r f f £ < d i ) = 0 + ^ - 1 ) +fl_or,(l).

(51)

Thus, the spectral measures of projections of a ' and a " coincide. Lemma 3.8 follows from (50), (51) and Lemma 3.6. • LEMMA 3.9. Suppose that Y e T>om(C( 1, a, a), h), a = p+&\ + p~S-\, and T] € Rk is a random vector. Let Y and r] be independent and for some 8 > 0 r , e L a + s ( Q , F , P). Then Yr) e

T>om{C{d,a,o'),h),

where a'(A)

= p + E { l / ? ! « ! ^ ^ ) } +P-E

.

(52)

The proof of this lemma is similar to one of Lemma 3.8 and we omit it. 3.5. Slowly varying (s. v. ) functions Remind that a measurable real function h defined on R+ is called slowly varying in infinity if for any x > 0 —— h(t)

-> 1,

as

t

oo.

(53)

One of the main instruments for studying s.v. functions is the Karamata representation (53) (see Karamata, 1933, or Ibragimov and Linnik, 1971). LEMMA 3.10. Let h be a s. v. function. Then there exist functions E(I) and c(x) such that e(t) — 0 for t —*• oo, c(x) —> c ^ 0 for x —> oo, and h(x) = c(x) exp { •

This lemma yields

-^dt}.

/ i

-

m

(54) -

100

Yu. Davydov and V. Egorov

Let h be a s.v. function. Then there exists a constant C > 0 such that for any S & (0, 1] and some constant T$ > 0 in the domain T > T$, xT ^ the following inequalities are valid LEMMA 3.11.

1 a h(Tx) x - y - s ( x ) ^ — — - < C * - 4 ( jwc ) , C h(T) where

v(55)

'

= max{;c, \/x}.

The following result is well known (see, for example, Feller, VIII, §9). LEMMA 3.12. Let for a distribution function F representations (36), (37) are valid. Then 1) there exists a constant B\ such that for any ¡3 > a I

J\x\ 0 such that I

J\x\>t

\xfF(dx)

~ B2tfi~ah(t),

t

oo.

(57)

4. MAIN RESULTS We study random processes of (1) and (2)-types with Y = VZ, where either V e Vom(C(d, a, a), h), and Z is a real random variable, or V e Vom{C{\,a,o),h) for some a, a, h, and Z is a ¿-dimensional random vector. Furthermore, we suppose that random vectors V and (Z, X) are independent and X has the standard uniform distribution. Stress that X and Z are dependent in general. Thus, we investigate the following random processes r,n(t) = b~l ^

VjZjI[0,t](Xj)

-

fn(t)

(

[«']

X X , - / „ « ) ,

(58)

where for t e [0, 1] JnK)

(0, i f o r ^ l , ~ \ nEV • E{ZI[0,(](X)}, if a > 1.

y

J

Limit Theorems for Induced Order Statistics

Note, that

(

101

[nt]

J2VjZnJ

-

fn(t)

Observe also that Inrjn =is Ka generalized empirical process because l Z j , V j ) - f (t)J , n

(60)

where ht(x, z, v) = t>zI[o,r](x). (See, for example, Andersen and al., 1988, Arcones, 1998.) Similarly, £„ is a generalized quantile process. The special type of the function ht in (60) allows us to establish a FCLT under condition which is close to optimal one. THEOREM 4.1. Let V, be a sequence of i.i.d. random vectors from the 0 < a < 2. Let domain of attraction of a strictly stable law £(d,a,a), (Z,, Xi), where X, has the standard uniform distribution, be a sequence of 2-dimensional i.i.d. random vectors which is independent from the sequence (V,). Assume that for some S > 0 E|Z,r+5[o,i], subordinate to the laws C(d, a, a) and C(d, a, a$) respectively, a±(s) = E{(Z 1 )1|X 1 = s],

s€ [0, 1],

(64)

Remark 4.1. If V,- belongs to the domain of normal attraction of the law C{d, a, (T.v), i.e. if h(t) — 1, then the hypothesis (61) can be replaced by E | Z , - r < 00.

(65)

Remark 4.2. If Z\ and X\ are independent then (63) can be represented as m

= f v(ds),

(66)

102

Yu. Davydov and V. Egorov

where v is a i.s.s.m. on ([0, 1], 0, X is uniformly distributed on the interval [0, 1], Consider the random processes 1 " yn(t) = - J 2 u j V > bn 7=1 where bn = n[/ah(n),

'e[0,l],

(71)

Uj(t) = Z j \ a A ( X j ) - a { f ) t

(72)

h is a j.v. function,

a(i) = E{ZI [ 0 ,,](*)}. Then llVnlli-^O,

n - > oo.

(73)

104

Yu. Davydov and V. Egorov

Proof. Let cn = nl/(a~&>_ Without loss of generality suppose that a + S < 2. Let ZnJ = ZjI[0iCn](\Zj\),

an{t) = E {Z„,iI [0ti ](Xi)},

8n(t) =

1

"

Y , U n " j=1

r b

J

( t ) ,

where UnJ(t)

= Z

n J

\

m

{ X j )-

an{t).

Obviously 1

"

iittiio < m\i + j- £ \

Z

J - z» 0 ^ P < max IZ/I > c„

^ nP{|Z| > cn] = o(nc~(a+S))

= o(l),

n

oo.

Therefore, it is sufficient to prove the relation (74) We prove (74) using Lemma 5.1. By Chebyshev and Marcinkiewicz inequalities (for Marcinkiewicz inequalities see Marcinkiewicz and Zygmund, 1937, or Petrov, 1987, chapter 3, §6, N29), for any t e [0, 1] and e > 0

U=1 C„ A n

as n ->• oo.

j=i

2CFn

Limit Theorems for Induced Order

105

Statistics

By Lemma 5.1 for n large enough P { | | 5 „ | | J > e } < 2P{||5„ — 5„|li > 8-\ 1

= 2P

e 2

> —

7

On

(76)

where (€j) is the Rademaher sequence (i.e. the random variables are independent, P{e ; = ±1} = 1/2) which is independent of the sequences (Uj),

(Uj).

The random processes (UnJ(t)-UnJ(t)), (ZnjI[Qj](Xj)-Z„jll0j](Xj)), t € [0, 1], are identically distributed. Therefore, (76) implies ' {ll^nllÒ > e } ^ 4P

(77)

> ^ J,

where 1

"

V„ = — ^ 2 z n j I "n j=i

[ 0 i

t](Xj)€j.

If (Xj) n j =i are fixed, then the conditional distribution of the processes v„ is the distribution of the cumulative sums of independent symmetric random values. Therefore, applying the Levy inequality to these conditional distributions and subsequently taking the expectation, we have P{lk.llo>

„| >

1

=2P

K

£

s Z

"J
4 } ^ C £

n E ^ t f -

P5 bl

=

0

I

, h2(n)

,

I =



Inequalities (78) and (77) imply Lemma 5.2. LEMMA

5.3. Let

,

• • •,

sup 0

j i=i + l

^ 2 ( P { sup

2 > 1=1

> e k=j+l

106

Yu. Davydov and V. Egorov

Proof. The proof is a simple modification of Lemma 1 of (Gikhman and Skorohod, 1965, Chapter IX, §6), which contains a similar inequality for i.i.d. random variables. • L E M M A 5.4.

Let d(x, y), x, y e

d(x, y) = inf {e > 0 : XeA

D * [ 0 , 1],

be the Skorohod metric, i.e.

- I\\l < e, ||x - y o k\\}> < e},

where I : Rl Rl is the identity mapping, y o X(t) = y(X(t)) for t e [0, 1], A is the class of continuous strictly increasing functions A such that A.(0) = 0, A(l) = 1. Let fi be a nondecreasing right-continuous function defined on the interval [0, 1] and such that 0 < P < 1, P(0) = 0, P( 1) = 1. Then for x e D*[0, 1] d(*,jcoi8) 0, there exists x e D^[0,1] with finite number of values, such that ll* — jc||J < 8,

||jCOyS —

.

For example, X(r) can be constructed in the intervals (r, , r, + ) ) by means of linear interpolation. Since x o X(t) = x o p(t) for t e [0, 1], (79) follows from the latter inequality and definition of d. •

6. PROOFS In this section we prove our main results.

Limit Theorems for Induced Order Statistics

107

Proof of Theorem 1.1. First we consider the processes rjn. Note that, by Lemma 5.2, for a > 1 1 " P IT f e[0,1] On SU

- E{Z;I [0 ,,](*,)}) 4« 0.

Therefore, in this case it is sufficient to study the limit behavior of the processes =

1

"

r

~ vvj^MXj). " 7=1

In other words, we can suppose that random variables Vj are centered by their expectations, i.e. EVj = 0. To prove convergence of finite-dimensional distributions, fix 0 ^ ?o < h < • • • < tm < 1, and denote Vn, V, respectively, the distributions of the random vectors U„ = (?7„(il) - I?n(io), • • • , r]„(tm) t/ = « ( f l ) - ?(io)

T]n(tm-0),

f (f™) - f Ci«-l)).

The components of U are independent, and it follows from (63), (64), (8), (21) and (22) that the distribution of the fc-th component of U is ¡C(d, a, crk), where ak=a

j a"(t)dt + as / a"(t)dt J At J Ak

= a I E{Za+\X = t)dt + as j E{Za_\X = t}dt J Ak J A* = crE{Za+lAk(x)}+asE{Za_IAk(x)},

(81)

and Ak = [tk^u tk). To prove the convergence Vn to V it suffices to establish the weak convergence of linear combinations m—1

j

n

Y l ° ^ n ( t k + 1 ) - »»„(ft)] = k=0

m— 1

VjZjJ^OMXj) " j=1

k=0

to the corresponding linear combinations of the increments of the vector Let us consider the i.i.d. random variables = Z7 Ylk=o j — 1, 2 , . . . By (61), E\iXj\ a+s < oo. Therefore, Lemma 3.8 implies m-1

^Sk[rin(tk+i) k=0

- T)n(tk)]

C(d,a,a§

A),

108

Yu. Davydov and V. Egorov

where m—1

m-1 o

l A

= a E

\zJ^e

k

l

(82)

(X)

A k

k=0

A=0

This implies that "P„ converges weakly to some limit measure V . We shall prove that V = V. By (82), for every k 1 Mrinih+l)

-

Vn(h)]

"

= T- ^

IjfyZ/I/^CYy)

£(d,

Of, Ofy.A*),

7=1 where fffc.A, - o o .

It follows from LLN.



Now, following (Gikhman and Skorohod, 1965, Chapter IX), define for D d [0, 1]

A: G

A c (;t) =

sup t-c^t%t^t" e, Acn U Bcn}

0,

n

oo.

Therefore it is sufficient to show that P { A c ( i 7 „ ) > e, An n Bn}

0.

(84)

Using transformations from (Gikhman and Skorohod, 1965, the proof of Theorem 2, Chapter IX) and Lemma 5.3, we have (85)

P { A c ( f j „ ) > e, An n Bn) < I + II + I I I ,

where I = P { sup |ij„(0| >

11=

III

= ¿2 E k|)L

"n I

oJ

I

.

(86)

Here and below denote summation for j ^ n, such that Xj e [kc, t). We denote also YT = Yl*k+3)c • The notation max* has analogous meaning. To estimate P' in (86), we apply once more the truncation method.

110

Yu. Davydov and V. Egorov

Set Vn,j

=

VjIl0,bn/]Zjn(\Vj\),

j =

1,2,....

Obviously, on the set A„

I I

sup kc^t^(k+3)c

+ 1"

*

I On T,

s

VJZJ

{ m a x | VJZJ \ >

J

bn)

£ sup ¿ W . J - EV„.,-)Z; I > 16 ic t>n

16

< I ( £ > o o ) (CA,, t ).

(90)

If a = 1, the random variable V is symmetric and EV N J = 0. Therefore in this case the third summand in (87) equals zero. Since EV = 0 for A > 1, we have EVNJ = - E { V - VNJ} and

"

t

"

t

Applying Lemma 3.12 again, we obtain inequality (89). It remains to estimate the second summand in (87). By the Kolmogorov inequality and

Limit Theorems for Induced Order Statistics

111

Lemma 3.12, this summand is less then (91) "'I

t

By (87)-(91), I I I ^ Ce

(ee{I,„

capBnDl

k

2P

) +

jCDn,k

> ^ J j

(92)

Remark that, with probability one, and in L{(£2, J 7 , P) D„ik

E { ( 1 + | Z \ a + s ) l [ k c A k + 3 c ] ( X ) } =7 Dk,

n ^ o o ,

and Dn k si 2A + 1 on the set Bn. Hence, for every k by the dominated convergence theorem, limsupE{IA„nB„D2n k ) oo

Estimation of the summands I and II is similar. Thus, for the processes r/n, Theorem 1.1 is proved. Now consider the processes Following (Davydov and Egorov, 1998), represent £„ in the form f „ ( i ) = T}n {Xn,[n,]) + b~ln(a(XnAni])

- a(t)),

(95)

where a(t) = 0 for a sC 1, and a(t)

=EV

f

m(s)ds

Jo

for a > 1 with m(s) = E{Z\X — 5}. By Lemma 5.4 and convergence of the quantile process, d(r]n, V o £„) ^ 0,

n —> 00,

(96)

112

Yu. Davydov and V. Egorov

where fin (r) = Xn,[nt] • We can estimate the second summand in (95) applying the Holder inequality \b-xn{a(XnAni])

< nb~l ( f \Jo =

nb~l I " - 7

- a{t))\

( sup \XnAnl] - t\] ye[0,l] f

\m(s)\a+sdsY+S /

0(nb~ln~{l~^))

Xn,[nt] m(s)ds

= o( 1),

n

(97)

oo.

The proved part of the theorem for the processes r/„ and the relations (96), (97) imply the statement of the theorem for the processes £„. • Proof of Theorem 4.2. The proof of the convergence of finite-dimensional distributions is similar to one of Theorem 4.1. Some additional work is necessary only to prove the coincidence of limit distributions of the increments rjn (?) — r}n (s) with the distributions of the increments of the process £ defined by (68). By Lemma 3.5, 1„(t)-rin(s)=

(98)

f zv(dz,dx), JRkx[s,t]

where v is i.s.s.m. on (E, £, m), subordinate to the stable law £(1, a, ct), a = p+S{ + pS_i, and E = Rk x [0, 1], £ = BE, m = V(Z,x). The distribution of (98) coincides with C(d, a, cr[S,t]), where a[t,t](A)

+p.

= P+ f k Izl" 1 a ( A ) V(z,x)(dz, jR x[s,t] VUI/

f \z\alA(-^]v(Z,x)(dz,dx), jRkx[s,t] V \Z\J

dx)

AeBgt-i.

(99)

On the other hand, by Lemma 3.8, the limiting distribution (as n — o o ) of the processes n Tln(t) - Vn(s) =

Kx

VjZjI{s,t](Xj) j=1

is C(d, a, a^t](A)

,j), where

= p+E {\Z\alA

+ p _ e { | Z | « % i ( - j f j ) } . A e BSk-\. (100)

Limit Theorems for Induced Order Statistics

113

Obviously, expressions (99), (100) are the same. Therefore the convergence of the finite-dimensional distributions is established. Tightness of the set (£„) (or (rj n )) follows directly from the proof of the tightness in Theorem 4.1, by the fact that the tightness of vector random processes is equivalent to the tightness of their components. •

7. CONCLUDING REMARKS 7.1. Connection

with convex

rearrangement

of random

walks

Let (Yj, j e N) be i.i.d. random vectors in R2. Let (pj, 0j) be the polar coordinates of (Yj). Denote by 0 n j , 6 n 2 , . . . , 9n,n the variables 6j rearranged by growth of their values and consider the polygonal continuous line L„ defined by the points SQ = 0, = & = 1, • • •, This line has the same origin and end as the initial random walk So = 0, Sk = 51?=i ^m ^ — 1, • • •, and represents exactly so-called convexification of the random walk (Sk). The asymptotic behavior of convexifications L„ and geometrical properties of their limits have been studied by (Davydov and Vershik, 1998). It is clear that Yn i are the order statistics induced by 6>,, i ^ n, and the lines Ln are analogues to our processes £„. Similarly, the processes r)n correspond, in this context, to the convexication defined by another parameterization. To give more precise construction define the two-dimensional random process r]n = (f]'nl>, jj®) as follows: n

^ ( f )=

cos^I[0i(](^),

t e [0, 2tt),

(101)

t € [0, 2jv).

(102)

7=1 n

= E

Pj sin0;I[o,,](0j),

i=i

This definition has simple geometrical interpretation: r j ^ i t ) , ^ 2 ) ( 0 are the coordinates of the points of tangency of the curve L„ with the line of support which is inclined at the angle t to the x-axis. Obviously, the functions (T]n(t), t e [0, 2n)) define the curve L„ identically. On the other hand, it is clear that rjn (t) are the processes of the same type as studied in Theorem 4.2. This theorem gives the following result. T H E O R E M 7.1. Let the random attraction p+&\

+

of a strictly p-8

j,

with

stable a s.v.

law

variables £ ( 1 , a, a),

function

(P;) a

h. Assume

belong €

to the domain

(0,2), that

a

^

( p j ) and

1, (6j)

a

of = are

114

Yu. Davydov and V. Egorov

independent,

and let Q be the common r)„(t)

where

bn = n{/ah(n),

a„(t)

distribution

= b;\i;n(t) — 0 for

-

a


1.

Then D 2 [0,1] rjn where

the limit process

==$•

r),

n

r] can be represented

r]t = = I/ ' (pdv,

oo, in the

form

oo L \ 2 log(l//i n

SU

P

(AM-2) and (HA-2-5), I

we have J

= j sup { / ( x ) * 2 « } f K2(t)dtV/2 a^x^b J—oo '

a.s.

(1.12)

The methods of Hall (1990) rely upon the evaluation of upper bounds for p„(e)

= P[

sup a 0 such that hn si n'11 = e~nlogn for all large n, whereas (H.3) allows sequences such as hn = e~login. It is well known (Deheuvels (1974), Bertrand-Retali (1978), Hall (1981), see, e.g. Section 4.2 in Deheuvels and Mason (1992) and Section 3.3 in Deheuvels (1992)) that (if. 1-2) are necessary and sufficient (for an arbitrary continuous / and suitable conditions on K) for the strong uniform consistency of f„tK on [a,b\. This gives a further motivation for the need of extending the validity of Theorems 1.1 and 1.2, as well as Corollary 1.1, to the case where (H.5) does not hold. In order to cover, in our forthcoming results, the full range corresponding to (HA-2), we must give an extended version of Theorem 1.1 holding when the assumption (H.3) in this theorem is replaced by {HA). This is achieved in Theorem 1.3 below, the latter being an easy consequence of the functional

122

P. Deheuvels

limit laws of Deheuvels and Mason ( 1 9 9 2 ) and Deheuvels ( 1 9 9 2 ) (see, e.g. Deheuvels and Einmahl ( 1 9 9 9 ) for versions of this result in the extended framework of randomly censored observations). Below, we will use the convention that d/(d + 1) = 1 under (H.3), i.e. when d = oo. We denote by "P" convergence in probability. THEOREM 1.3.Assume (F. 1 - 2 ) and (K. 1 - 2 - 3 ) . Then, under and (H.2), we have LIM

(O. "n'tu

S'

2

SU

= [ sup {f(x)^2{x)} 1

P

\y(x)Un,K{x)-Efn,K{x)}\ K2(t)dt\Xn

r

a^x^b

P.

1

J-co

I f , in addition, to the previous assumptions (H.l)(ii—iii) (H. 4) are fulfilled, we have lim sup ( , r)^2 oo V2{log(l//z„) +log2n}/

r h liminf( "] r)V2 n / ! n^oo V2{log(l/ft„)+ log2n}/

=

(1.15) and either (H.3) or

sup h(x){fn,K(x) I

f K2{t)dX'2 J — oo '

= j sup

(HA)(i)

-

a.s.

(1.16)

sup {*(*){/».*(x) 2 l/2 J-oo K (t)dt\ I

K T T T1/a^fc ) SUP

Efn,K(x)}\ J

-

Efn,K(x)}\ >

a.s.

(1.17)

The main results of the present paper may now be stated in Theorem 1.4 and Corollaries 1 . 2 - 1 . 4 below. Theorem 1.4 provides an extension of the results of Hall (1990), which turns out to be appropriate to derive the limit laws we seek. THEOREM 1.4.Assume (F. 1-2) and (K.l-2).

Let {L(t)

: t E K} be any

kernel in C\(R), with bounded total variation on M. Then, under and (H.2), we have, for any e > 0, lim

n^ 00

P

((oi

"/wt.

J

\\21og(l/h„)/

'

SU

P

a^x^b

> (l+£)(sup/w)1/2( r + 2j°°\d{K-L}(t)\))=0.

\fn,K-L(x)

-

(H.\)(i)

Efn,K-L(x)\

\K(t)-L(t)\dt (1.18)

Limit laws for kernel density estimators

123

If, in addition to the previous assumptions, (H.\)(ii-iii) (HA) are fulfilled, we have, almost surely,

and either (H.3) or

limsuP

/ nh„ (^—,mu \ i \

< (sup f(x))1/2(

x1/2 i)

f

SUP

-

1 fn,K-dx)

EfnJi-L(x)\

\K(t)-L(t)\dt

+ 2J

(1.19)

\d{K-LKt)\).

The proof of Theorem 1.4 is given in Section 2 below. We note that, if we set L = K m , where K M is defined, for M > 0, by j ( f \ m _— ij o L(t) — tKM{t)

for

for

1,1


then, for each specified s > 0, there exists an Ms > 0 such that, for all M>Me, (

s

u

p

/

«

)

1

/

2

(

\K(t)

- Km(t)\dt

+ 2 J°°

\d{K

< (supf(x))l/2(f \K(t)\dt + l\\K(M)\ + vxeR yJ\t\^M 1 '

-

tfM}|(f))

\K(-M)\

+ 2 fI \dK{t)\) W ) l ) < s, J\t\>M ' \t\>M

(1.21)

and [

J-oo

(K(t)-KM(t))2dt=

f

J\t\^M

K2(t)dt

< e.

(1.22)

By combining (1.18)-(1.19) with (1.20) and (1.21)-(1.22), we obtain readily the following corollary of Theorem 1.3, in the spirit of Theorem 1.2. COROLLARY 1.2 .Assume ( F . l - 2 ) and ( £ . 1 - 2 ) . Then, for each s > 0, there exists a compactly supported kernel L in £i(K), with bounded total variation on R, of the form L = Km for a suitably large M > 0, and such that the following properties hold. Under (H.l)(i) and (H. 2), lim p ( ( ) ' / 2 sup n^oo \ \ 2 l 0 g ( l / h n ) / {fn,L(x) - EfnX(x))|

\\fn,K(x)-Efn,K(x)\ j

^ e ) = 0.

(1.23)

124

P. Deheuvels

I f , in addition to the previous assumptions, (H.l)(ii-iii) (HA) are fulfilled, we have ( lim sup I

n

hn

\1/2 -)

sup

If

and either (H.3) or

/ a W - ^ / a W

- {/„,lW - £ / S | l w ) | < e

1 (1-24)

Remark 1.2. In the forthcoming Corollary 1.4, we will give a version of Corollary 1.2, with another choice of auxiliary kernel L than that given by KM- TO motivate the need of such an extension, we observe that the arguments above, based upon (1.19)—(1.20), do not enable us, in general, to select L in (1.23)-(1.24) within the class of differentiable functions. For example, in the case where K is the uniform kernel on with K(t) = 1 for — 5 ^ t < i and K(t) = 0 elsewhere, for any differentiable kernel L with bounded total variation, OO

/»OO

i*00

/ -00|d{K - L}(t)I = J— / 00\dK(t)\ +J—OO / \dL(t)\ > 2. Thus, it is impossible to achieve (1.21) with 0 < e < 2 for such an L. In that sense, Theorem 1.4 is only a partial extension of the Theorem of Hall (1990), cited in Theorem 1.2, by not allowing L to be arbitrarily smooth. This restriction, however, turns out to be minor in view of the applications we have in mind, since the differentiability of the kernel is not needed in Theorems 1.1 and 1.3. We may derive from these two theorems in combination with Theorem 1.4 and an easy argument based upon the triangle inequality, the following result. COROLLARY \3.Assume and (H.2), we have

(FA-2) and (A".1-2). Then, under

lim ( ) ' / 2 sup U(x){fn,K(x) n->-oo \ 2 1 o g ( l / h n ) ; I = j sup {/(j)*!/ 2 ^)} f K2(t)dt\V2 * ai^xH.b J—oo I f , in addition to the previous assumptions, (H.l)(ii-iii) (HA) are fulfilled, we have

-

(H.l)(i)

Efn,K(x)}\ J P.

(1.25) and either (H. 3) or

/ nhn \'/2 r i limsupl————— -) sup { ¥ ( * ) { / „ , * ( * ) - £ / „ , * ( * ) } [ „->00 V2{log(l//i n )+ log 2 «}/ I J = j sup {/(x)0/ 2 (jc)} f K2(t)dt\yl I a^x^b J-oo '

a.s.

(1.26)

Limit laws for kernel density estimators

tiinf (

V / 2 sup \^{x){fnK{x)



= K 7 T 7 ) SUP {fW*2(x)} l\d + 1/ a < x < b

f K2(t)dtV/2 y_oo J

125

-

EfnK{x)}\

a.s.

(1.27)

Remark 1.3. As follows from Corollary 1.3, the assumption {K.3) that K is compactly supported may be relaxed in Theorems 1.1 and 1.3. We now bring an answer to the query of Remark 1.2, asking whether it is possible or not to choose an L in (1.23)—(1.24) within the class of differentiable functions. The answer to this question turns out to be positive, and the corresponding proof is readily achieved by making use of the following simple argument. In view of (1.20) and (1.21)—(1.22), given any e > 0, we first choose, an M > 0 such that ( s u P / W ) 1 / 2 (v f | K ( t ) - K„{t)\dt ' J-oo

+ 2 f |d{K - * „ } ( 0 l ) J-oo ' < s/2,

and

/

(1.28)

r»00

(K(t) - KM(t))2du J -c We then select a kernel L in such a way that

< e/4.

(1.29)

(1) L is of bounded variation and infinitely differentiable on ] (2) sup (eR |L(i)l < sup (€R \KM(t)\ ^ sup (eR | A"(01; (3) iZ(KM(t)

- L{t))2dt < e/4;

(4) L ( 0 = 0 for |i| > 2M, and (sup/(*))V2( f

\KM(t) - L(t)\dt + 2 j°°

| d { K M - L}(i)|) < e/2. (1.30)

Note that (3) and (1.29) jointly imply that j

( K { t ) - L(t))2dt

si

( K { t ) - KM

it))2dt}1/2

-00

+( J
0, there exists a infinitely differentiable compactly supported function L of bounded variation on R, such that the following properties hold. In the first place, oo

/

(K(t) - L(t))2dt •00

Moreover, under (H.l)(i) and (H.2),

- | /,,'„) - Hixn)} = 00, in contradiction to the assumption that H is of bounded variation. We must have therefore Rx = R2 = lim,Hit). The only possible value of this

Limit laws for kernel density estimators

129

common limit allowing f^\H(t)\dt < oo is R\ — /?2 = 0. The proof is completed by a similar argument on the negative half-line. • The next lemma provides a description of the modulus of continuity the the uniform empirical process {an(u) : u e 1 ) . LEMMA 2.3. Under (H.l)(i) and (H.2), we have, for any specified C > 0 and s > 0, lim PI

sup |

sup

H ^ l ' o ^ C

.

= — } ^ (1 + e ) C ' ) — 0.

y/lhnlogil/hn)

J

' (2.12)

Under ( H . 1-2) and either (H.3) or (HA), we have, for any specified C > 0, ,. / f limsupl sup | sup n—>oo

\ctn(u + hnv) -an(u)\

O^m^I ^ O^u^C \J ¿tin{log(l/h

i\ 1/2 [I= C ' n ) + log 2 n) > '

a.s. (2.13)

Proof. See, e.g. Stute (1982a), Remark 2 in Mason, Shorack and Wellner (1983), Shorack and Wellner (1986), Section 4 in Deheuvels and Mason (1992) and Section 3 in Deheuvels (1992). • The following theorem provides a key inequality for the proof of Theorem 1.4. It will allow us to infer this theorem from the properties of the modulus of continuity of the uniform empirical process, previously given in Lemma 2.3. THEOREM 2.5 .Assume (FA), and let H be any function in £i(R), of bounded variation on R. Then, we have the inequality sup /»,hW xeR

-

Efn,H(x)\ sup I

sup

X ( y ° ° \H(t)\dt + 2

\an(u + hnv) -

an(u)\\)

\dH(t)\).

(2.14)

Proof. In the proof, we set H(t) = H(—t) for t e E and assume H to be right-continuous. In view of Lemmas 2.1 and 2.2, our assumptions on H imply that H e C\ (R) is a function of bounded variation in R, such that H(t) 0 when u ±oo. Thus, by (2.2)-(2.3), \fn,H(x) - EfnM(x)\

= I LH(X-Zl\d{i}n{F{z)) I J-oo hn \ hn / i f°° = «-[/V / H(t)dan(F(x J —OO

_ + h„t))

F(z)}

130

P. Deheuvels

= n-l/2h-l\

T

d\an(F{x

/

(2.15)

H{t)

+ hnt))

— an

oo =«

_ 1 /

v|

E

,

k=—oo

where, setting in (2.15) ** = x + khn and t = k + y for k = 0, ± 1 , . . . , the integration by parts convention (2.7) allows us to write A*,„(*)

= J

H(k

= H(k+

+ y)d\an(F{xk

+ h„y))

-

a„(F(jct))|

(2.16)

l)(a„(F(jc t + 1 )) - a „ ( F ( * t ) ) |

- jf

{ « „ ( F ^ t + M M - a n ^ t ) ) +

By combining (2.9) with (2.15) and (2.16), we obtain the inequalities sup jceR

fn,H(x)

- Ef

n M

(x)

< n '^„Msupl

sup

\an(F(x+hny))-an(F(x))\

oo x ( ]T

-i \H(k

+

< n-'^fsupf x ( f

J—00

\H{t)\dt

1)| sup

+

/

\dH(k

\an{F{x

+ 2 f

J — O0

+

+ h

y)\j

n

y))-

a„(F(*))| 1) (2.17)

\dH(t)\).

Now, by (F.l), obviously for any j ; e l and 0 < y < 1, 0

< F(x+hny)

-

F{x)

^ hny\

1

sup/Or)! = *eR '

(2.18)

hnyM.

In view of (2.17)—(2.18), the change of variables u — Fix) and v — My yields sup
R is a measurable function such that for all y e fRrJ K(x, y)dx = 1. We say that the additive estimate gn is regular if for each x, EX"(jc, X) < oo. We will use the notion of shatter coefficient as in the work of Vapnik and Chervonenkis (1971): s(.Ae,£)=

sup

IHyi

ftinAiAe^ll,

the maximal number of different subsets of a set of I points which can be intersected by sets in A&. This will be used to measure the richness of classes of density estimates. The first result upon which many of the subsequent results are built is the following non-asymptotic inequality: THEOREM 1. Let the set 0 determine a class of regular additive density estimates. Then for all n, m ^ n/2, ©, and f ,

E f \fn-m,en - f I < 3 inf E f | fnfi - f\ (1 + + 8 M ) J fe© J \ n—m Vn J +

l8\og(4e&s(A®, V m

m2))

Note that whevever 1) is bounded by a polynomial nk'ikl of n and £, 2 kl 2kl we have 5(^4.©, m ) ^ n m < nkl+2kl, and consequently y81og(4e*s(Ae,m2))

= Q

^

In the examples below, all bounds for s(An, I) will be polynomial in n and i. Furthermore, in this case, if m ~ « / log n, then Ef

\fn-m,en—f\

^

3ME

j\fn,e-f\

(l + o ( l / V ^ ) ) + 0



Because in most cases of interest, the optimal L i error tends to zero much slower than 1 />Jn, this bound essentially says that for polynomial shatter coefficients, we have asymptotically a performance that is guaranteed to be

L. Devroye et al.

136

within a factor of 3 of the optimal performance, and this without placing any restrictions on the density / . The proof of Theorem 1 is a minor modification of some arguments appearing in Devroye and Lugosi (1997). The details may be found in the Appendix below.

3. STANDARD KERNEL ESTIMATE: RIEMANN KERNELS A Borel set A of Mf' is called a star interval if for any y e Rd, {t e R : ty e A} is an interval. Thus, all convex sets are star intervals. A kernel K is said to be Riemann of order k if there exist star intervals A\,..., A* and real numbers a, such that k K(x) = ^aJAi(x), ¿=1 where IA denotes the indicator function of a set A. We require furthermore that f K = 1. We will call the smallest such k the Riemann order, which should not be confused with the order of a kernel, which is the smallest positive integer s such that / xs K(x)dx / 0, and is in this sense only defined for univariate kernels. The standard Akaike-Rosenblatt-Parzen kernel estimate is fn,K,h(x)

=

fn,h(x)

1 " = - V ] tffc (* - X,) .

When K is fixed and h is chosen by the method described above (so that 9 = h and ® = [9 e 1Z : 9 > 0}), Theorem 1 applies with the following shatter coefficient: L E M M A 1 (Devroye and Lugosi, Riemann kernel of order k,

s(A@,

1997).

For the kernel estimate with

1)(1 + 2kl(n - m)) 2 < 18k 2 n 2 £ 3 .

Let us now widen the scope a bit and pick a Riemann kernel from a finite class of N Riemann kernels, K, = {K\,..., KN], and choose the bandwidth h simultaneously as well. This is done by formally putting 0 = {{h, j) : h > 0, j e { 1 , . . . , N}}. Again, Theorem 1 aplies, but now with a slightly larger shatter coefficient: LEMMA 2. Consider the class © in which h > 0 and K e K, are the free parameters, and assume that all kernels in K, are Riemann of order not exceeding k. Then s(A&,£) ^ 18k 2 n 2 e 3 N 2 .

Selecting nonparametric density estimates

137

Proof. We generalize a proof from Devroye and Lugosi (1997). Set r = n — m. We first consider N = 2, and let the kernels in /C be K and L, and assume without loss of generality that their Riemann orders are exactly k. Define the vector

¿'(^H As u t oo, each component of zu changes every time ( y j — Xi)/u. enters or leaves a set A[, 1 ^ I ^ k for some X;, 1 ^ i ^ r, where the A/'s are the star intervals in the definition of K. Note that for fixed (yj — Xt), the evolution is along an infinite ray anchored at the origin. By our assumption on the possible form of the sets At, the number of different values a component can take in its history (as u f oo) is clearly bounded by 2kr. As there are i components, the cardinality of the set of different values of zu is bounded by \{zu

: u > 0}K

1 +

2klr.

If we define z'u similarly as zu, but replace K in the definition by L, then we have \{z'u :u

>0}|


cu/,. • • • » luit^cw'^) '• c ^ o j j < I + 1. But then \{{yu-..,ye}nAu,v

:(u,v)

>0}|
0,L&K

E

f

J

|fnX,M - f

|
0,L€K.

E

J

f |/„,L,„

- f

| + -. n

thus, a combination with Theorem 1 then yields, with the appropriate definition of 0 , and writing Bm0,Le/C J

- f\Bm_n + 7 8 l o g '


]"[ _ n x n 2 i i< 1 j> j= j=

where 0,- = (h\,i~..., hdi) for i = 1, 2. Within the set U(WtW'), are fixed for all t, and therefore |{{yi < |

and z^

y^Jn A„.„: (u,v) e £/ 0} | < (¿ + 1) |t/ (w , w ')| 4

d



Selecting nonparametric density

estimates

143

6. MULTIPARAMETER KERNEL ESTIMATES - ELLIPSOIDAL KERNELS Next we consider the kernel estimate

i=i where 9 = E, and E is a positive definite symmetric d x d matrix, and Ke(x)

=

ue/^J-E-ijc^i).

Here ve is a normalizing factor such that f Kg = 1, and xT denotes the transpose of the vector x. In this case, for £(n — m) > d2 + d + 2, we have

The proof is exactly the same as for the case of product kernels with the only difference that the shatter coefficients of the class £ of ellipsoids (i.e., class of sets of the form Eg = {x : xTH~xx < 1}) is bounded by

whenever l(n — m) ^ d2 + d + 2 (since the v c dimension of £ is bounded by d2/2 + J / 2 + 1, see, e.g., Devroye, Gyorfi, and Lugosi (1996, p. 221)). Although it is computationally challenging to optimize all (2) entries in a matrix, at least in theory, we can set up a method (by picking m) such that asymptotically, the performance is about three times or less times the best possible performance over all such matrices. Again, no conditions are placed on the density or the values of the entries in the matrix. Similarly to the univariate case, the argument may be extended via Riemann approximations to the class of estimates with Ke(x) — veL(xTo~1x), + where L : R R is a fixed function. The details are omitted.

7. THE TRANSFORMED KERNEL ESTIMATE The transformed kernel estimate on the real line was introduced by Devroye and Gyorfi (1985) in an attempt to reduce the Lt error in a relatively cheap manner. The data are transformed by a smooth monotone transform y = T(x), the transformed density is estimated by the kernel estimate, and the estimate is then subjected to the inverse transformation. As this leaves the L\ error unaltered, it suffices to study the L\ error in the transformed space, and hence the interest of such estimates. In particular, it is known

144

L. Devroye et al.

that heavy tails are to be avoided for kernel estimates (Devroye and Gyorfi, 1985). Thus, transforms that compact and compress the data are called for. Ideally, the transformed density should be triangular. Thus, we consider the joint optimization over (h, a), where h is the smoothing factor, and a is a parameter of the transformation. For simplicity, we will consider the BoxCox transformations, with which statisticians and data analysts are familiar. We will show that we can jointly pick h and a in an asymptotically optimal manner, still modulo a factor 3, without placing any restrictions on the density or the parameters. The transformations considered here are only useful to treat tail problems. A similar analysis may be carried out for piecewise linear transformations, the transformation being restricted to consist of a fixed number of segments, but otherwise arbitrary. Such estimators are close in spirit to variable kernel estimators. For practical data-based versions of other transformations, we refer to Wand, Marron and Ruppert (1991) and Ruppert and Cline (1994). In general, the transformed kernel estimate is 1 " fn,r(x) = - y ; K(T{x) i=i

-

T(Xi))T'{x),

where A" is a kernel with f K = 1, and T : R —»• R is a strictly monotonically increasing almost everywhere differentiable transformation. Clearly, f fn T = 1. If T = ax + b is linear, then f n j is just the ordinary kernel estimate with smoothing factor h = \/a. Here we are concerned with the data-based choice of T. Clearly, the collection of possible transformations has to be restricted somehow. Box-Cox transformations. Consider now the family [Ta : a e [0, 1]} of transformations defined, for x > 0, by ,,, , ,

i^ [ log x

if a > 0 if a = 0.

These functions are often used to transform the (nonnegative) data so that large tails become more manageable. We consider kernel estimates defined on the transformed data. In particular, we study the joint data-based selection of the tranformation {i.e., the value of a) and the bandwidth. For simplicity, we again only consider the naive kernel K — /[_ 1/2,1/21- Therefore, the class of estimates {/„ g : 6 e 0 } is defined by

/».«W

=

1 " 7 -tX! nh 1=1

-\\'a(.x)-Ta(Xi)\^hl2)X {\Ta

Selecting nonparametric

density

145

estimates

where 9 = (a, h) and 0 = [0, 1] x (0, oo). Note that we assume that all data points are positive and f„tS (*) is only defined for x > 0. Again, to see if the proposed parameter selection method works, it suffices to bound s(Ae, I). LEMMA 4. Let AQ denote the Yatracos class corresponding to the family of kernel estimates on ]R+ based on all Box-Cox transformations Ta, a e [0, 1] and all smoothing factors h > 0. If I ^ 2 and n — m ^ 2, then < \lb(n-mf. 4 Proof. In the proof we use a simple lemma which is an easy modification of Lemma 25.2 of Devroye, Gyorfi, and Lugosi (1996): • s(Ae,l)

LEMMA 5. lfb\,...,

b^, c\,...,

c^ e R, then the function k J2bieC,x i=1

g (x) =

is either identically zero or takes the value Ofor at most k— 1 different places. LEMMA 6 (Cover (1965)). Let A be the class of sets of the form {x : aTx ^ b] C

where a e ]SLd and be R are arbitrary. Then s{A,n)

\)d +

or equivalently, if and only if (a - 1) log y, - log h + log ^

u/M) j /n-m

> {a' - 1) log yt - log ti + log

\

[J2 \i=1

w / ' ° and W't = XXi* w/( ''°> the maximal

Therefore, denoting Wt = number of different values of (lAeye2(y\),

-••

JAe^iyt))

is at most the number of different values of the vector i^(a—a') logyi—log(/i/A')+log(Wi/W,')^0 • • • -

ha-a')lo%y,-\og(hlh')+log(W,IW',)^

takes as a, a', h, h' all vary through R + . But this is not more than the maximal number of different ways of dichotomizing I points by 2-dimensional hyperplanes, which, by Lemma 6, is at most ¿ 2 (since I ^ 2). Collecting bounds, the proof is finished. • Having Lemma 4, Theorem 1 yields the following bound: THEOREM 3. Assume that the basic estimate of Section 2 is used to simultaneously select the Box-Cox transformation Ta, a e [0, 1] and the smoothing factor h > 0 for the transformed kernel estimate fn,a,h(x)

1

= —

"

T I Il\Ta(x)-Ta(.Xi)Kh/2)Xa~l

i=1



Selecting nonparametric density estimates

147

If fn denotes the obtained density estimate, then for all n, m si n/2, and each density f over M+,

E [ \fn ~ /K 3 inf E f \fn,a,h -f\(l + J

ae[o,i],A>o

J

\

[siog(9e%mn(n +

V

n—m

+ 8 ß-)

Vn /

- m)4)

m



For example, ifn is even and we take m = n/2, E f \ f n ~ / K 26 inf E [ |/„,a,A - / | + J a€[0,l],A>0 J

1 6 / ^ . V M

8. MONTE CARLO SIMULATIONS

For testing the behavior of the proposed parameter selectors we have conducted a series of Monte Carlo simulations. We describe here its graphical and numerical results. Example 1: Pareto density with transformed kernel First we consider as target a Pareto density and we use the Box-Cox family of transformations as described in Section 7. To avoid the problem of selecting among infinite sets of parameters, we take only some values of the parameter a € [0, 1] and some bandwidth values. Consider the Pareto (1, 1) density f ( x ) = l/x2, x > 1, and let the sample size be N = 1000. The number of samples is B — 1000. For every sample, we select among three values of the Box-Cox parameter (0, 0.35, 0.70) and three bandwidths 0.010, 0.084, and 0.700 in geometric sequence. This gives a total of 9 estimators, and we select the one that minimizes the distance described in Section 2 for the given sample and the nine-element parameter set. 539 times out of 1000 the selected estimator was the right one, the one Table 1. Performance of transformed kernel estimators selection for examples 1 and 2. Relative error is (IAEj — IAE(,)/IAEi as described in the text

Average error Average relative error Worst relative error prob(rel. error > 10%) prob(rel. error > 50%)

Example 1

Example 2

0.196 0.0563 0.7716 0.218 0.002

0.0307 0.097 0.7746 0.328 0.018

148

L. Devroye et al.

Figure 1. Some of the estimators in competition shown for a particular sample. Upper part is for bandwidth 0.08, lower part for 0.7. In each figure, three Box-Cox parameter values are used: 0, 0.35, and 0.7. Note that horizontal scale is logarithmic and that left tail of the estimators in the lower graphs is not shown.

that minimized the IAE (integrated absolute error, or L, norm) for the sample. A summary of results is given in Table 1. The average error IAES — IAE/, is quite large due to the use of very few bandwidth values, but note that the

Selecting nonparametric density estimates

149

average relative error (average over samples of (IAE 5 — IAE fc )/IAEfc, where s subscript denotes the selected estimator and b the best) is small. In Figure 1 w e show six of the nine involved estimators in a particular sample run (to simplify the figure w e don't show the ones with bandwidth 0.01). As expected, it can be seen that bigger bandwidths yield better performance in the tails but are worse in the infinite peak. This shows the convenience of calibrating the transformation to use. Example 2: Transformed

triangular

densities

To see how our method can detect the right transformation to use, w e take a triangular density t{x) = (1 - |jc -

1|)+

and w e back-transform it using the inverse of a Box-Cox(/S) transformation. So, we consider as target density

If this density is transformed by a Box-Cox(a) transformation, the resulting density will be triangular when a — fi. Given that (see Wand-Devroye (1993)) the triangular density is the easiest to estimate using the Epanechnikov kernel, a good selector might choose when given several possibilities. We take ft = 0.5 and w e consider five values for the Box-Cox parameter (0, 0.250, 0.500, 0.750, 1.000) and seven bandwidth values from 0.01 to 2 in geometric steps. Note that a = 1 gives no transformation at all. For N — 15000 w e obtained 1000 simulation samples and selected a parameter pair as before. The third column in Table 1 summarizes the L\ error Table 2. In the back-transformed triangular density example, number of times (out of 1000) each parameter pair was the optimal one and the number of these times it was selected Bandwidth

0.010 0.024 0.058 0.141 0.342 0.827 2.000

Box-Cox parameter 0

0.250

0.500

0.750

1.00

0/0 0/0 26/6 27/6 0/0 0/0 0/0

0/0 0/0 2/1 232/59 0/0 0/0 0/0

0/0 0/0 0/0 476/120 0/0 0/0 0/0

0/0 0/0 0/0 200/30 0/0 0/0 0/0

0/0 0/0 0/0 26/5 11/2 0/0 0/0

150

L. Devroye et al. 1.6 1bttdens5(0.0,> tbitdons5(0.25.> tbttd«ns5(0.5,> tbttdens5(0-75,, tbttdens5(1.0,>

1
0, and let / be an estimate f„-m,g such that

for all 9 € 0 . Then

(by Scheffe's theorem),

151

Selecting nonparametric density estimates


):=

(1)

describes A c fi using Ia , which is a 2-valued property. Then the powerset Boolean algebra (iP(£2), Pi, U, 0, Q) is isomorphic with the Boolean algebra of indicators (2Q, A, V, c, / 0 , IQ).

156

C. A. Drossos, G. Markakis and P. L.

Theodoropoulos

Zadeh's fuzzy set theory is based on a direct generalization of the trivial Boolean algebra 2 to the closed interval (a de Morgan algebra), ([0, 1], A, v , e , 0 , 1)

where

xc:=\-x

(2)

Non-idempotent operations, expressed by t-norms and t-conorms, see, e.g., Hajek (1998) and more generally by residuated lattices (cirl-monoids) see, e.g., Drossos (1997, 1999), have also been used in order to describe this generalization. In this paper, as in the papers, Drossos (1990), Drossos-Markakis (1992, 1993, 1994), Drossos-Theodoropoulos (1996), Markakis (1999a, 1999b), we confine ourselves to a generalization of the trivial Boolean algebra 2 to a general complete Boolean algebra. In a second step, the nutural generalization is to generalize the entire Boolean machinery to non-idempotent structures, such as, MV-algebras, BL-algebras, etc., see Cignoli et al. (1999), Hajek (1998), and Drossos (1997), Drossos-Karazeris (1996). This program will finally supply a mathematical foundation to some of the Zadeh's ideas. The Boolean-valued models, based on probability algebras, provide a powerful way to study stochastic objects in general. One can even construct a universe of set theory consisting just of stochastic objects. This method gives a way to see random variables as elements of the Boolean power R[B] and, in general random elements taking values in any structure, as B-fuzzy elements of such a structure. In particular discrete random sets would be elements of the Boolean power In the papers cited above, we have developed a general theory of stochastic objects, including B-fuzzy probability. In this paper, we shall lay down, the foundations of B-fuzzy Statistics along with some definitions and results concerning random sets.

2. PRELIMINARIES First, we review some of the basic notions of point-free probability theory, see Kappos (1969) for whatever is used but it not defined here. A probability a-algebra is the study of a-Boolean algebras of equivalence classes of events belonging to a o-field of sets, together with the induced by the canonical mapping probability on a-Boolean algebra. More precisely a probability o-algebra consists of a pair (B, p) where B is a a-complete Boolean algebra (notice that every measure algebra satisfies the countable chain condition (c.c.c.) and so o -completeness essentially means completeness), and p :B

[0, 1]

(3)

B>-Fuzzy Stochastics is a probability, i.e. a function that is normed (pds) (p(a) > 0 whenever a > 0) and cr-additive, oo

157 — 1), strictly positive

oo

P R with T = {î,-}^,

A

=

=

S =

fx = f y l

(14) and Œ* < > > ] ] : =

V

V

Ç.VeM:

U x ( ï ) ^ f y ( f ) ] = \Lfx < /y II

Ç.i^eR:

(15) In a similar way we can extend any finitely relation on R. We can also extend the algebraic operations on R by, * +

J2

+ tOh^j

x •}••=

J2

& • V ' ; ) ( 1 6 )

By using these definitions the elementary stochastic space (£(B), si, + , •) becomes a Boolean-valued structure, which, with the help of the logical apparatus associated with it, can be seen that it is a Boolean extension of the real numbers. We may also consider the logical aspects of an elementary stochastic space which are incorporated in the dual structure, called the Boolean power of R, i.e. the set, R[B] := { f

x

\ x e £(!)} = {/ e 1

V / G ) = 1B} feM

R

I /(£) a f ( f ) , ^ r f r ,

(17)

supplied with the appropriate mathematical and logical structure see Mansfield (1971) Drossos-Markakis (1994).

160

C. A. Drossos, G. Markakis and P. L. Theodoropoulos

An element / e R[B] has a canonical representation dual to that of r.v.'s reduced by indicators, as follows: with

iel

iel

*

*

11 '

i e I

(18) In a similar way we may use the dual to the r.v. entities and define the extensions of relations and operations: If / = g — Y l j e j xlfjsj then, \Lfb([0, l]);

M-Fuzzy Stochastics

161

i.e. V[o,i] is isomorphic with D%(I), and therefore each M-valued Dedekind cut uniquely determines a class of almost everywhere equal random variables on (£2, A, P) with values in [0, 1]. THEOREM 4(Drossos et al., 1992). Let f : X -> [0, 1] be an arbitrary fuzzy set. Then there is a probability algebra (B, p) and a B - valued function 7t X —> ]B such that, f = pon

(21)

4. B-FUZZY STRUCTURES The Boolean-valued method is general and leads to the stochastization of a general mathematical structure, for this compare also Keisler (1999). Following exactly the same procedure as in the stochastization of the real numbers, we can have stochastic versions of any mathematical structure and in fact to have the stochastization of any category. Thus, if C = (Co, Cj) is a category, we may construct a nonstandard one, having as objects, Co[B] := {A[B] | A € e 0 } and arrows Ci[B] := { / [ B ] : A[B] B[B] | / : A —> fieCi). In this paper, we shall concentrate on the stochastization of the power-set Boolean algebra, which will provide a theory of qualitative random sets. Random Sets and B-Fuzzy Sets In the following two subsections, we shall show how B-fuzzy sets can be viewed as Boolean "density functions" of random sets. Also, they form a basis for a Boolean theory of evidence. The intention-extension duality introduced in Drossos (1990), allows us to see that the B-fuzzy real numbers are equivalent to classes of almost everywhere equal elementary random variables (see also Drossos-Markakis (1992, 1994)). Extending this duality to B-fuzzy sets, we can see that a similar result can be obtained in such a framework. Suppose that T is a random set taking values in !P(X), where X is countable: r : T = ? ( X ) .

(22)

According to the intention-extension duality, to this function there corresponds a unique function F : —> !P(£2) that partitions Œ by F(A) = {t e T- r(r) = A] e A.

(23)

By considering the measure algebra B induced by A, it can be seen that this function may be reduced to a B-valued function 'P(X) —> B. The procedure can be reversed so that to any B-valued function partitioning unity of B, there

162

C. A. Drossos, G. Markakis and P. L.

Theodoropoulos

corresponds a class of almost everywhere equal random sets taking values on T(X). The elements of the Boolean power of CP(X) can be defined as the B-fuzzy subsets of X Drossos-Markakis (1992). So by the above arguments it follows that in the particular case where B is a measure algebra, there is an one-to-one correspondance between B-fuzzy sets and random sets. If X is not countable, then the corresponding random sets must be elementary, i.e. they can only take countably many values, since the algebra B satisfies the c.c.c. Every B-fuzzy subset of X, can be seen as a Boolean mixing with various weights. A convenient notation for F is the "sum-product" notation

iel

where F(A)

' /, iff A = | 0 B else

— Ai

Note also that F(A) = || F = A||

(25)

M-valued Measures and Evidence Theory First, we recall some definitions and results from Markakis (1999a, 1999b). Let X be a finite set and let T(X) be the set of all its subsets. We define a B-possibility measure to be any function n : 3>(X) —> B , satisfying (i) n ( X ) = i „ , n ( 0 ) = 0b, (ii) for any collection {A, },e/ of subsets of X, n((jA,-)

= \ f n ( A i ) .

ie/

(26)

iel

To every B-possibility measure there corresponds a B-necessity measure, which is defined by N(A)

= 1 B - n(A c )

(27)

It is also shown (see Markakis (1999a)), that any B-fuzzy subset of X defines a B-possibility (and B-necessity) measure on CP(X) by the formula nf(A)

= V \\x e F||

(28)

xeA

The following also holds and gives us an interpretation of the B-valued measures defined by a B-fuzzy set (equivalently, by a random set):

163

M-Fuzzy Stochastics P R O P O S I T I O N 5.

If F is a ®-fuzzy subset ofX and A any subset ofX, then nF(A) = | | F n A # 0 | |

Nf(A)

and

= || F c A||

(29)

(30)

For proof, see Markakis (1999a). The theory of evidence, otherwise called the Dempster-Shafer theory see, Dempster (1967), Shafer (1976), is based on a basic probability assignment (b.p.a.) i.e. a function d : T(X) —> [0, 1], such that, J 2 d ( A ) = l.

(31)

ACX

This function takes strictly positive values in a collection {A, }, e / of subsets of X and the elements of this collection are called focal elements. It can also be interpreted as a probability "density function" of a random set S taking values in T(X), so that d(A) is the probability P{S = A). This imformation induces a belief measure (lower probability), by AiQC and a plausibility measure (upper probability) by p*(o=

¿2

di



(33)

According to the interpretation in terms of random sets see Kallenberg (1997), equation (32) defines the "distribution function" of the random set S, i.e. P„(A) = P(S c A). If we do not take the measure P into account, we may interpret every IBfuzzy subset of X as a basic B-probability assingment, since equation (31) becomes F

V

( A ) = 1b,

(34)

AQX

which is true for all B-fuzzy subsets of X. This function is an assignment of "qualitative" probabilities to the class {A, }, s / of ®-focal elements, such that the B-probability of each A,- is i, (see equation (25). So, F is the B-density function of a random set taking values in 'J'(X). Upper and lower B-probabilities are defined by BP*(C) =

\/

tt

and

BP,(C) = \/ th Aicc

(35)

which in Markakis (1999b) are proved to be B-possibilities and B-necessities, respectively. It is also clear, from equation (30) that the lower B-probability

164

C. A. Drossos, G. Markakis and P. L. Theodoropoulos

function plays the role of the distribution function of a random set. The consideration of the measure P, gives us a way to quantify the above "densities" or "distribution functions". This is actually a two step procedure: First we introduce the qualitative aspects of a random set and then we have the possibility of quantification, by taking P on the measure algebra B. M-fuzzy Probability For random sets as well as fuzzy random variables (especially quantitative) there is an approach different than the one presented in this paper, but mathematically involved, see, e.g., Kallenberg (1997), Puri-Ralescu (1968). Here, we shall follow again our approach (for more details see, DrossosTheodoropoulos (1996)). Definition

6. Let (£2, A, P) be an ordinary probability space. Then, • M#, (ii) x = Xiti, where, X,, i e I are ordinary r.v.'s

K-Fuzzy

165

Stochastics

For a proof see, Drossos-Theodoropoulos (1996). The above representation of a B-fuzzy r.v. allows us to lift all the ordinary concepts of probability theory. For example if x = Yliei XiU i s a B-fuzzy r.v. and for every i € I the mathematical expectation E(X,) is defined, then we defined the corresponding ®-fuzzy expected value of x as, £# (•) :

K # // x

E*(x) : = ^

EiXdh,

(38)

iel

where Lf is the set of all internal B-fuzzy r.v.'s that posses ¿'"-expected value. We can work in a similar way for the distribution function of a IBfuzzy r.v., etc. The proof of the Theorems below may be found in DrossosTheodoropoulos (1996). THEOREM 8 (Strong Law of Large Numbers). Let of internal M-fuzzy i-i-d. r.v.'s. If E#(XQ) isM-finite. Then, Q ( x j © . . . © * „ ) — E * (x) n—>oo

be a sequence

a.s.

(39)

THEOREM 9 (Central Limit Theorem). Let (xn)^Li be a random sample from B-fuzzy r.v. x, such that E#(x) = \JL and V#(x) = a2(x) — a2 are M-finite. Then,

J(J) ©

converges in ditribution N(0, 1).

5. B - F U Z Z Y S T A T I S T I C S

In the previous subsection we reviewed the basic concepts and results on B-fuzzy random variables and B-fuzzy probabilities. In this section we lay down the foundations of B-fuzzy Statistics. B-fuzzy r.v's in statistical research are needed whenever our data involve fuzziness beyond randomness, i.e. whenever we have a vague perception of a random variable. We suppose that the vagueness is introduced by measurements and the random experiment over which the r.v. is defined is an ordinary one. We consider thus, the following definition. DEFINITION 10. Let X R be an ordinary r.v. defined on the probability space (£2, A, P), and let x be an internal B-fuzzy r.v. on a standard B-fuzzy probability space (Q^, /Lb, P^. A M-fuzzy random sample of size n from x is a random vector: x :

[IR»]" / / ( « , , • • • , co„) ^

(JCI, • • • , * „ ) ,

(41)

166

C. A. Drossos, G. Markakis and P. L. Theodoropoulos

where Xk =x(cok)

k

=

= 1,

(42)

16/* Since the sample is always finite, in the case of ®-fuzzy statistical data, the common refinement of the random experiments involved always exist. Thus we may write the B-fuzzy statistical data in the form of B-fuzzy random vector, i.e. if a>:— (co\,..., &)„), then,

Let now, x = YJ%ti

(44)

i e/

be a B-fuzzy r.v., where the corresponding statistical space for Xt, i € / is the space: (Xi.Ex,,^),

(45)

The B-fuzzy statistical space of a: is: (46) ie/

Let T : X —>• (Z, Z) be a statistic defined on the space (X, Hx, is clear that the Boolean extension of T it is given by, T[B] : (X[B]", ® x [ B f )

• • • > '«)> which for every k = 0, 1, • • •, n

(©); *)(-)

= II F&](x

( « ) ; * ) = -1| =

n

(52)

n

=

V A- •• Axn(rn)l (53) ri ,•••,/>! eR: We have already seen that for every ordinary statistical concept there is a Boolean-valued analog, and, in addition we have a formal machinery to transfer results from ordinary statistics into a Boolean-valued one. For this 'transfer' between ordinary and nonstandard one, we have either to established formally a 'Transfer Principle' see e.g. Drossos-Theodoropoulos (1996), or work less formally and take advantage of the structure of Boolean extensions. The second option is more appropriate for people that feel not at home with logical matters. In this way, one can lift all results of ordinary statistics (estimation, hypothesis testing, etc.) In the present work we choose to formulate the Boolean liftings of some basic notions of statistics. Suppose that we have the B-fuzzy data xy,... ,xn, taken from a B-fuzzy r.v. x. Then the estimators of the B-fuzzy parameters involved are the Boolean extensions of the ordinary estimators of the corresponding parameters. Thus if T is the estimator of an ordinary parameter, then B-fuzzy estimator of the corresponding B-fuzzy parameter is:

(¿1

l»)€/l X~X/„

Example 12. Suppose that we want to estimate the B-expected value of a B-fuzzy r.v. x e L1 [B] on the basis of B-fuzzy data xy,... ,xn. One can prove that in the case of B-fuzzy data, the unbiased estimate of the B-fuzzy expected value is the sampling mean: X n = -G>(*i © * 2 0 - • • © * „ ) n

(55)

168

C. A. Drossos, G. Markakis and P. L. Theodoropoulos

where / x}

_ * » =

H

E

(ii

+ x" \ , —

-

p

iB)e/ix-x/„ \

-

K

-

/

An internal B-fuzzy r.v. x is said to be distributed with the B-fuzzy normal distribution, B-fuzzy exponential, B-fuzzy Poisson etc., if all the r.v. involved in the mixing in equation (44) are distributed correspondingly as normal, exponential and Poisson. Following the example above we can extend the whole of Theory of Estimation to the Boolean case. The Boolean method is a formal method that makes statistics on function spaces to look like ordinary statistics. Next, we see how the Neyman-Pearson Fundamental Lemma is lifted to the B-fuzzy environment. Suppose we test the simple hypothesis: H0 :&0 = y#°kwk

vs-

Hx

(57>

:

k€K

j€j

Let i := ( i ' i , . . . , in) and I : = I\ x • • • x /„.We form the quotient: A,[B](3c) =

=

/ [ B ] ( * i , . . . ,xn, &o) y

¿—I I f f v l

v"

(il A V

Every quotient of the form, xm = w h e r e m = m(i\,

JxK.

'j

2



(

5

8

)

¿ 2 , . . . , in, j, k), (i'i, ¿2. • • •>'«> j, k) € I\ x I2 x . . . x /„ x

The values of m is countable. Thus every value of ( x i , xf,...,

, i?j,

determines a value of m which in turn determines a real constant cm according to Neyman-Pearson see Roussas (1973). The Boolean mixing all of these constants gives us a B-fuzzy constant: = y^CmZm

(59)

with the property: B ] ( 5 ) > c) + y P * 0 ( X [ B ] ( 5 ) = c)=a.

(60)

169

H-Fuzzy Stochastics

Suppose now that we have the B-fuzzy data: x

* = 52*it*>

(61)

k = l,...,n

and want to carry out the testing stated above. We may have one of the following cases: (i) If [[ A[B](jti,..., xn) > c H = 1IB> then we reject the hypothesis H0. (ii) If H A[B](jci , . . . , xn) — c ]] = 1 B , then we reject the hypothesis H0 with B-fuzzy probability y. (iii) I [[ A.[B](jci,

. . . , xn)


2 v bj, = 1b £ It, then if, e.g.,b-i

implies

¿>, e U,

for some

i = 1, 2, 3, (62)

€ U we take the corresponding decision.

2. Decision with the help of st( ) function. The st( ) function connects the standard ordinary statistics with the Boolean one. Thus taking the standard part, we reduce the B-fuzzy decision to the ordinary one.

REFERENCES Cignoli, R., D'Ottaviano, I. and Mundici, D. (1999). Algebraic Foundations of Many-valued Reasoning. Trends in Logic, Vol. 7, Kluwer Acad. Pubi., Dordrecht. Dempster, A. P. (1967). Upper and lower probabilities induced by a multivalued mapping. Ann. Math. Statist. 38, 235-339. Drossos, C. A. (1990). Foundations of fuzzy sets: A nonstandard approach. Fuzzy Sets and Systems 37, 287-307. Drossos, C. A. (1997). Generalized Algebraic Structures of Many Valued Logics. Seminar Notes, Univ. Patras, Department of Math. Available from h t t p : / / w w w . math.upatras.gr/~cdrossos/Docs/monoids_ps.zip Drossos, C. A. (1999). Generalized t-norm structures. Fuzzy Sets and Systems 104, 53-59 (Special Issue on Tringular Norms).

170

C. A. Drossos, G. Markakis and P. L. Theodoropoulos

Drossos, C. A. and Karazeris, P. (1997). Coupling an MV-algebra with a Boolean Algebra. Journal ofApprox. Reasoning. Drossos, C. A. and Markakis, G. (1992). Boolean fuzzy sets. Fuzzy Sets and Systems 46, 81-95. Drossos, C. A. and Markakis, G. (1993). Boolean representations of fuzzy sets. Kybemetes 22 (3), 3 5 ^ 0 . Drossos, C. A. and Markakis, G. (1994). Boolean powers and stochastic spaces. Math. Slovaca 44(1), 1-19. Drossos, C. A., Markakis, G. and Shakhatreh, M. (1992). A Nonstandard Approach to Fuzzy Set Theory. Kybernetika 2 8 , 4 1 ^ 4 . Drossos, C. A. and Theodoropoulos, P. (1996), B-fuzzy probability. Fuzzy Sets and Systems 78, 355-369. Hajek, Petr (1998). Metamathematics of Fuzzy Logic. Kluwer. Fremlin, D. (1989). Measure algebras. In: Handbook of Boolean Algebras, Vol. 3, J. Donald Monk, R. Bonnet (Eds), North-Holland. Kallenberg, Olav (1997). Foundations of Modem Probability. Springer, N.Y. Kappos, D. (1969). Probability Algebras and Stochastic spaces. Academic Press. Jerome Keisler, H. (1999). Randomizing a model. Advances in Math. 143, 124-158. Mansfield, R. (1971). The theory of Boolean ultrapowers. Ann. Math. Logic 2, 297-323. Markakis, G. (1999). Boolean fuzzy sets and possibility measures. Fuzzy Sets and Systems 110, 279-285. Markakis, G. (1999). A Boolean generalization of the Dempster-Shafer construction of belief and plausibility functions. Tatra Mountains Math. Publ. 16, 117-125. Puri, M. L. and Ralescu, D. A. (1986). Fuzzy random variables. J. Math. Anal. Appl. 114, 409-422. Roussas, G. G. (1973). A First Course in Mathematical Statistics. Addison-Wesley. Shafer, G. (1976). A Mathematical Theory of Evidence. Princeton Univ. Press, Princeton, NJ.

Asymptotics in Statistics and Probability, pp. 171-184 M.L. Puri (Ed.) 2000 VSP

DETECTING JUMPS IN NONPARAMETRIC REGRESSION* CH. DUBOWIK and U. STADTMÜLLER Universität Ulm, Abt. Math. III, 89069 Ulm, Germany

ABSTRACT Given measurements (£,-, y,), i = 1 , . . . , n, we discuss an asymptotic method to decide whether the underlying regression function is smooth or whether it does contain jumps. The proposed method is an improvement over an approach suggested in a recent paper by Miiller and Stadtmiiller, 1999. The local power of the underlying test is investigated as well. The results are based on a CLT for a quadratic plus a linear form in a tringular scheme of rowwise independent and identically distributed random variables.

1. INTRODUCTION Quite recently H.G. Miiller and the second author, 1999, have investigated a test procedure deciding whether a regression function is smooth or whether it does contain jumps. All methods based on smoothing in curve estimation depend critically on the smoothness of the underlying regression function. However, there are many examples from various fields where the underlying regression function is smooth only up to a certain number of jumps. See e.g., the Nile River data (Cobb, 1978), the coal mining desaster data (Jarrett, 1979), stock market data (Wang, 1995) and crown-heel lengths growth data of Lampl et al., 1992. Another crucial point in smoothing is the possible dependence of the data. All basic investigations and methods in nonparametric curve estimation were originally done for independent data. But many results on smoothing can be carried over from independent data *Dedicated to Professor George G. Roussas.

172

Ch. Dubowik and U. Stadtmiiller

to dependent data. This important robustness question has been started and investigated in great detail among others by G.G. Roussas and co-authors (see e.g. Roussas, 1989, 1990; Roussas and Tran, 1992, Roussas et al., 1992 and Tran et al., 1996, and the references cited therein). We, however, want to concentrate on the smoothness assumption but consider independent data. Our investigations were motivated by a controversy about a 'saltation and stasis' hypothesis published in the journal Science by Lampl et al., 1992. The authors claimed that their study confirmed the saltatory growth of children, from daily measurements of crown-heel length data of children at the age inbetween 67 and 97 days. This contradicts the dogma in biology 'natura non facit saltus' and is difficult to explain physiologically, and has consequently started a discussion, see e.g. Heinrich et al., 1995. However, our methods also show that there is some evidence of pushes of growth within certain periodes and slower growth in the rest of the time, for details see Miiller & Stadtmiiller, 1999. The purpose of this paper is to improve the statistical test suggested in that paper using a symmetrization argument and to study its local power. The results are based on a lemma on the asymptotic distribution of quadratic plus linear forms of iid random variables generalizing some earlier results given in e.g. the papers of P. Hall, 1984, and De Jong, 1987.

2. THE PROPOSED MODEL Given two parameters M, £ > 0, we consider two classes of functions on the interval [0, 1] (for the ease of notation). These are ( | | / | | o o := sup f 0 1 ] | / ( x ) | ) : Sc(M) = { / : [0, 1] -> M, / is continuously differentiable with ll/'lloo < M} and m

Sd(S) := { / : [0, 1]

M, f(x) = ^ c ; 1 1 ,,.(*), with parameters ¿=0

= 0, 1, 2 , . . . , and reals 0 = co, c \ , . . . cm and intervals /, = [t, , r, + i) for i — 0, I , . . . , m — 1, Im — [r m , 1] based on the grid 0 = : r 0 < t i < . . . < r m + 1 = 1 satisfying min, = l m + , ( r , - r,_]) > £}. So, SC(M) is a class of smooth functions whereas is a class of step functions with a finite number of, say, m jumps. If m = 0 then / is constant / = Co = 0 and hence / = 0 is contained in Sd(f) as well. We shall consider the following nonparametric regression model

m

Yi,„ = f ( i / n ) +

Si,n,

for i

=

0, 1 , . . . ,

n

(2.1)

with residuals ei n, i = 0, 1 , . . . n, n e N having the following properties i)

for each

n, s0t„,...,

£„ „ are independent r.v.'s

(2.2)

Detection of Changepoints ii) iii)

173

£] i = e, „ for all i and n E ( S

U

) =

2

0,

O

:=

(2.3)

2

E(S A)

>

A

0 , / ¿ 4 : = E(S

XI)


a4. We assume now that the regression function in (2.1) satisfies f { x ) = g(x)

with some g e SC(M)

+ h(x)

and h e SD(%).

(2.5)

We want to know whether h = 0 i.e., the regression function is smooth or whether it does contain jumps. This can be put in the following parametric frame. Setting m-1 y

(2.6)

- C i f

/=0 we can reduce our problem to the following parametric hypothesis H0:

y — 0 versus H{ : y > 0.

(2.7)

The problem with the inference is that we have to estimate y and a2 simultaneously. To do so, we considered in Miiller & Stadtmiiller, 1999, the following quantities as new data for a (asymptotically) linear model j

Zk,n : =

ft

n—L L/

~

l^k^L,

(2.8)

j=i

where L is a parameter subject to the choice of the statistician, similar to the choice of a smoothing parameter in curve estimation. For the asymptotics we will assume throughout that L = L(n) satisfies L(n)

- > 00

but

L(n)/n

0.

(2.9)

Then we shall consider the following linear regression model Z*,„ = 2a1 +

k -Y n — L

+ fjk,n for k = 1 , . . . , L,

(2.10)

with residuals f]k n satisfying E(m,„)

=

Cov(fjk,nfji,n)

=

U(lx4-cr4)+aXj)+0(L/n2).

(2.11) So, the residuals are only asymptotically centered at zero and they are not uncorrelated. Using the ordinary least squares method (which turns out to deliver the same estimates as the weighted least squares method based on the leading covariance matrix for see Miiller & Stadtmiiller, 1999) we

174

Ch. Dubowik and U. Stadtmuller

obtain the estimators (2.12) Where Z is the vector ( Z i , „ , . . . , ¿L,„) 7 and the hat matrix A is given by

( A =

1 1

2/n

1

L / n )

l/n\

Using the fact that a 2 , y are asymptotically equivalent to weighted Ustatistics the following result was shown in Mtiller & Stadtmuller. T H E O R E M 0 . If E(s\,) some

=

0 and

PnK

L(n)

with K 6 ( 1 / 2 , 2 / 3 )

and

fi > 0 then

I VL(Y

-

y)

Af2

( ( S ) ^ - ^ ( i

.2%))-

i) Actually for a wide range of K'S we have MSE{a2) = O ( l / n ) whereas the rate-optimal MSE for y is attained for L — cn2/3 leading to a MSE(y) = 0(n~2/3). The optimal constant can be given as well but depends on unknown parameters. A data-driven method to choose L is discussed in Miiller & Stadtmuller, 1999. Using the optimal L in Theorem 0, a bias occurs in the asymptotic distribution, as it happens e.g. also in curve estimation using the optimal bandwidth w.r. to the IMSE. ii) The theorem leads to an asymptotic level a—test for the hypthesis H(,:y = 0 based on Remarks:

r„ :=

VI- Y y/(12/5)(£4

-

a2)

using a suitable estimate /t.4 of /X4. The nullhypothesis is rejected if f „ > ^ / ( l - a). In Miiller & Stadmuller, 1999, the quality of this test was investigated with the help of a simulation study. The purpose of this note is to provide an improved test and to study it's quality.

3. NEW RESULTS In this section we are going to show that using a symmetrization in the definition of the random variables „ we can improve our test on y = 0.

175

Detection of Changepoints

Consider now Zk,n-=\{Z'k,n

+ Z'ln),

(3.1)

l ^ k ^ L

where

^ Z'k,n

: =

n

-

n—L I

~

j Zk,n

n

^J.«)2'

j=1 n—L

_ l

~

Yn-j+l,n)2-

j=1

Again these variables follow an asymptotic linear model k

Z*,„ = 2CT2 +

-y

n — L

+ T)Kn for k = 1, 2

L

(3.2)

with residuals behaving essentially as the residuals % T H E O R E M 1. Under

(2.9)

Var(a2)

=

Var(y)

=

Cov(&2,

y) =

we have for

-

n

48n

,

-jycr4

a4)

48

+

(3.2)

+

+ 0(L/n2)

+ —

0(l/L2)

the LSE in

y + 0(1/L2)

0(l/(nL))

,

+

0(l/n)

0(\/ri).

Proof. The results follow from the representation given in Lemma A2, combined with the moments f o r 1 < i, j ^ n

Cov(sin,

£j 1 Cov(eUnsi+kin, Remarks:

= (T48ij8kj

ejtn£j+i,n) i ) F o r L(n)

~

finK

f o r 1 < i + k, j + I ^ n, I , k ^ 1.

with 0 < k ^

3/4 w e have MSE(a2)

=

0(\/ri).

ii) For the estimation of y the following choices of L(n) are rate-optimal with respect to the mean squared error: If i) y = 0 then L*(n) ii)

y > 0 then L*(n)

yielding MSE(y)

=

0(n'4'5)

= c 2 « 2 / 3 yielding MSE(y)

=

0(n~2/3)

= a«3/5

176

Ch. Dubowik and U. Stadtmilller

with constants cj, c2 > 0 which can be specified explicitly, however, they depend on unknown quantities and have to be chosen from the data. A suitable method can be found in Miiller & Stadtmiiller, 1999. In case y = 0 we get an improved rate over the old estimate. Using a suitable asymptotic representation and a limit theorem for quadratic plus a linear form in iid random variables (see Theorem A1 in the Appendix) we obtain asymptotic normality again. THEOREM

2.IfL(n)

~ BnK where we choose

K

e I „ [ (0, 2/3) if y > 0

and ¡3 > 0 then we have

33)

e^-SOMXMi!))- Q and yn = p22ny f € [0, oo)N THEOREM

>

0.

Detection then

we

of Changepoints

177

have -2

I)

P

Gn

s

"

2 G

/48n

,

b) If k € ( 1 / 3 , 3 / 5 ) a n d p

48 +

vZX^ 2

,

5 L ~

\-i/2 ")

^

d

\Yn -

Yn)

^

then

P(r„ > A ^ / d - a))

jVo.i

- a)) .

(3.5)

(a) By Theorem 1 we find that E(a2) —> a and Var{a2) 0. By Lemma A l , A2 and Theorem A1 in the appendix the asymptotic normality follows from some tedious but not too difficult calculations, similarly as in the proof of Theorem 2. Proof

of Theorem

3.

(b) In this situation we have y„ —>• 0. Using part a) ii) and the fact that — 0 we obtain n • ^L

Hr)

~Yn)

s

and ^

w

y

-

This together with Slutsky's result yields the desired statement.

4. SIMULATIONS We are going to report on a small simulation study comparing the test on Ho : y = 0 based on Theorem 0 with that of Theorem 2, i.e., the new test procedure. We generated 1000 repetitions of an iid sample of size 100 of standard normal pseudo random variables e (i.e. a2 = 1) and reproduced data YkW = Mk/n)

+ ek,

for k = 1 , . . . , 100.

Here we chose / , = 0 (y = 0), f2 = 11(1/2,1] (y = 1), f3 = 11(1/3,2/3] (y = 2) and f4 = 11(1/4, 1/2] + 11(3/4,1] (y = 3). We calculated the estimates a 2 , y and CT2, y and their average MSE's for different choices of L. Furthermore we calculated the empirical power of the tests at a = .1 based on r„ and f „ . The results are summarized in the table below.

5. APPENDIX At first we have to provide a suitable representation of the estimators as a quadratic plus a linear form in the residuals.

Ch. Dubowik and U. Stadtmtiller

178

L

f

of

MSE

power of

à2

a2

Y

y

f„

f„

fy

10 20 30 40 50

0.02587 0.02457 0.02481 0.02499 0.02322

0.02678 0.02729 0.03102 0.03626 0.04434

4.452 0.513 0.105 0.034 0.014

5.054 0.766 0.291 0.164 0.115

0.112 0.106 0.061 0.042 0.039

0.220 0.113 0.064 0.050 0.041

h

10 20 30 40 50

0.02612 0.02537 0.02669 0.02875 0.02963

0.02701 0.02825 0.03306 0.04000 0.05054

5.548 1.075 0.423 0.261 0.194

6.1753 1.3791 0.651 0.416 0.308

0.239 0.463 0.738 0.912 0.969

0.381 0.421 0.513 0.598 0.664

h

10 20 30 40 50

0.02706 0.02689 0.02877 0.02800 0.02955

0.02792 0.02935 0.03474 0.04030 0.05469

6.662 1.598 1.044 0.746 1.374

7.330 1.849 1.227 0.873 1.520

0.370 0.786 0.943 0.969 0.930

0.518 0.735 0.808 0.818 0.586

U

10 20 30 40 50

0.03089 0.03065 0.02931 0.04374 0.11261

0.03253 0.03391 0.03791 0.06172 0.14723

7.623 2.747 1.730 4.210 9.441

8.060 2.971 1.867 4.429 9.725

0.496 0.882 0.966 0.896 0.043

0.674 0.849 0.882 0.549 0.030

The results show the following: The estimation of the variance works well for both methods and is robust w.r. to the choice of L. The new test procedure and the new estimators outperform the old ones under any reasonable choice of L.

LEMMA A l . For any fixed n ^ 2 such that £ ^ 2 L / n we have for weighted sums Ylk=iak^k,n where the (Zkn) are given in (3.1), the representation: L

L

L

UkZk,„ = 2 a 2 ^ a *

k= 1

H



^ k a

H

k=1

k

+ r

k=1

n + ^

n a

£

j ( ln

2

- O- ) +

^

dij£i,n£j,n

+

with a remainder |r| satisfying

| r | ^ 2m(

+

j=1

7=1

sup \ak\)(M Via^i

/

+

2m^)^ n

Cj)8j,n

Detection of Changepoints

179

and coefficients l L j i 1I 2 E1 ak + j ~ a E k E~ ak \ , 2(n-L) \ k=l k=1

1 < y' < L

L < j n—L it=l / L n-j j—n+L — 1 \ 2(^1) ( 2 E «it + E «it E "it), n - L < j ^n \ k=\ k=1 — 2 f o r 1 ^ i ^ n — L and max{ï, L] < j < i + L for n — L < i < j ^ n



otherwise L 2(ci + i~c,) n-L E

0

if [n r, ] < j ^ [nr, ] + L for some

k=j-[nTi]

i € {1,2, . . . , m } bj

=

c )

'

'/[« r i] ~ L < j < [nti] for some

E k=[nxi]-j+1

i £ {1, 2 , . . . , m} otherwise

0 and

\cj\ ^ 8M(( max Vi ' j=1 j=L-k+1 n-L ' _i_ 7, + 2V(/() - /(-))(e;+t,„ - sm) ' n n j=i

180

Ch. Dubowikand U. Stadtmuller + 2

n-k T (fC—)-f(l)(£j j=tk+X 1

/j T+ tk " E n m

/7 n n

j=i

+ 2

2 ,=^+i

+ k

,n-Sj,n)

/T + *k n n

/i n n

0, /x4 :=

< oo-

182

Ch. Dubowik and U.

Stadtmiiller

e K"xn and a

Furthermore let be given a sequence of matrices An = sequence of vectors bn = (bj n) € M". Then we have for n

n 2

Wn:=Y^

ajj.n {X\n -ct )+

£

ay,« X;> Xhn + £

7= 1

bjf„ Xj,n

7=1

that (Var(Wnyl/2)Wn

X M,,

provided that Var( Wn ) > 0

for all n large enough

and that 2

max

X! 7= 1

\aU,n\ + \bi,n\

= o{Var(Wn)).

(5.1)

7=' + l

Remarks: i) If An = 0 the condition (5.1) is just Feller's condition for the CLT of a weighted sum of iid random variables, namely

ii) If the An are not nullmatrices but the diagonal elements are all vanishing and bn = 0, then we get results by P. Hall, 1984 and De Jong, 1987. Proof of Theorem Al. The proof follows the ideas of Hall, 1984, and De Jong, 1987. We write for n e N W

.. lJ,n

|

'

a

jj,n(

X

l n ~ °"2) +

b

i,nXj,n

if 1

alhnXl

Xj,n]

and satisfies Yl"=\ Yj.n — Wn. Now we have to verify the conditions for the asymptotic normality of a triangular scheme of a martingale difference

Detection of Changepoints

183

sequence (see e.g. Shiryaev, 1996). These are a conditional normalizing condition and a conditional Lindeberg condition. Furthermore we use truncation of Wij n at §„!%,„. With the following notation mn

:=

y/E(Wfrn)

J(n)

:= {(i, j ) I 1

n, wiJin

> 0}

this program can be carried out under the following conditions: exists

a sequence

0


$nWij,n) I 'J,"

0 as n -»

oo.

/

The assumptions in our theorem imply these conditions and the result follows. For details see Dubowik, 1998.

REFERENCES Cobb, G. W. (1978). The problem of the Nile: conditional solution to a change-point problem. Biometrika 62, 243-251. De Jong, P. (1987). A central limit theorem for generalized quadratic forms. Probab. Th. Rel. Fields 75, 261-277. Dubowik, Ch. (1998). Entdeckung von Changepoints in nichtparametrischen Regressionsmodellen, Dissertation Universität Ulm, 1998. Hall, P. (1984). Central limit theoremfor integrated square error of multivariate nonparametric density estimators. J. Multvar. Analysis 14, 1-16. Heinrich, C., Munson, P. D., Counts, D. R., Butler, G. B. and Baron, J. (1995). Pattern of human growth. Science 268, 442-445. Jarrett, R. G. (1979). Time intervals between coal mining desasters. Biometrika 66, 191-193. Lampl, M., Veldhuis, J. D. and Johnson, M. L. (1992). Saltation and stasis: A model of human growth. Science 258, 801-803. Müller, H. G. and Stadtmüller, U. (1999). Discontinuous versus smooth regression. Ann. Statist. 27, 299-337. Roussas, G. G. (1989). Asymptotic normality of the kernel estimate under dependence conditions: Application to hazard rate. J. Statist. Plann. Inference 25, 81-104. Roussas, G. G. (1990). Nonparametric regression estimation under mixing conditions. Stock Process. Appl. 36, 107-116. Roussas, G. G. and Tran, L. T. (1992). Asymptotic normality of the recursive kernel regression estimates under dependence conditions, and time series. Ann. Statist. 20, 98-120. Roussas, G. G., Tran, L. T. and Ioannides, D. A. (1992). Fixed design regression for time series: asymptotic normality. J. Multivariate Analysis 40, 262-291. Shiryaev, A. N. (1996). Probability, 2nd edition. Springer Verlag, New York.

184

Ch. Dubowik and U. Stadtmilller

Tran, L. T., Roussas, G. G., Yakowitz S. and Troung Van B. (1996). Fixed-design regression for linear time series. Ann. Statist. 24, 975-991. Wang, Y. (1995). Jump and sharp cusp detection by wavelets. Biometrika 82, 385-397.

Asymptotics in Statistics and Probability, pp. 185-196 M.L. Puri (Ed.) 2000 VSP

SOME RECENT RESULTS ON INFERENCE BASED ON SPACINGS KAUSHIK GHOSH and S. RAO JAMMALAMADAKA * Department of Statistics, George Washington University, Washington, DC 20052 Department of Statistics and Applied Probability, University of California, Santa Barbara, CA 93106

ABSTRACT In parameter estimation problems, a new method based on spacings is discussed. It is shown to produce a whole class of estimators that are consistent and asymptotically normal. The "best" estimator in this class is shown to be asymptotically efficient and possesses properties similar to the ML estimator. In testing for goodness-of-fit, tests based on spacings are often the natural choice and their use requires the sampling distribution of certain statistics based on uniform spacings. Methods of accurately approximating these distributions are investigated and compared. In particular, Edgeworth and saddlepoint approximations are studied.

1. INTRODUCTION The two most important and fundamental problems in statistical inference are parameter estimation and hypothesis testing. In this article, we look at both from the point of view of spacings, which are the gaps between ordered observations. Suppose Xi,..., Xn-\ is a random sample from some unknown distribution F. Without loss of generality, we can assume that F has support on the interval (0, 1). Denoting the ordered observations by 0 = X(o> < X ( d < • • • < * ( „ _ , ) < X(„) = 1,

* Research partially supported by NSF Grant DMS-9803600.

186

K. Ghosh and S. Rao

Jammalamadaka

we define the (1-step) spacings based on the above sample to be A = X(i) - *(,_!),

i = 1 , . . . , n.

For the particular case when F is the uniform distribution on (0, 1), we call the spacings uniform spacings and denote them by 7) instead of £), . Using the change-of variable formula, it is easy to find the joint distribution of the DCs in terms of F . In particular, for uniform spacings, it is easily shown that the 7}'s have a Dirichlet distribution. Note that the uniform spacings are exchangeable and hence, we get EM)

= -. n

(1)

Simple calculations establish the following result (here = stands for equivalence in distribution): THEOREM 1.

If W\,...,

Wn are independent Exponential random vari-

ables,

where W is the sample mean of the W, 's. Theorem 1 and (1) are the two most important results on uniform spacings which will be useful in defining the estimators and later in investigating their properties. For more on these and related results, see Pyke (1965).

2. PARAMETER ESTIMATION Suppose we have a random sample X\,..., Xn_\ from a parametric distribution Fa where the form of F0 is known but the true value 60 of 9 is unknown. The goal is to estimate 6q from the observed data. For technical reasons that will become apparent soon, we assume that Fg is a continuous distribution throughout this section. Since our method makes use of spacings which require us to be able to order the observations, we will only consider the case where the X, 's are scalar valued. The parameter 0 can however be vectorvalued. Note also that here we do not restrict Fg to have support on (0, 1) - it can possibly have support on the whole real line. Let the order statistics be denoted by < XQ) < • • • < X(n_i). Note that by the assumption of continuity, we have strict inequalities with probability 1. First, we construct the following "1-step" spacings: DiiQ) = Fe{X(i))

- Fg{X(i^),

i =

l,...,n

(2)

Some recent results on inference based on spacings

187

where we define = 0 and F0(XM) = 1. The Generalized Spacings Estimator (GSE) of 6 is defined to be the argument 6 which minimizes n d

T(d) Mj2h(nDi(9)) ¡=1

(3)

where h : (0, oo) —>• R is a strictly convex function. Some standard choices of h(-) are: h(x) = — logx, xlogx, x2, —^/x, j and |jc—11 (seePyke (1965), Rao and Sethuraman (1975), Wells etal. (1993)). The Maximum Spacings Estimator (MSPE) discussed by Cheng and Amin (1983) and Ranneby (1984) estimates 9 by maximizing the product n

n

i

;=1

^

W

-

W a - i ) ) }

which corresponds to the special case of using h(x) = — log(x) in (3). See Shao and Hahn (1994) for a detailed discussion of the Maximum Spacing Method. The Generalized Spacings Estimator (GSE) that we propose thus generalizes the idea behind the MSPE and hence the name. The above estimation procedure is motivated by the fact that when 0 — 0q, the spacings D,(0Q) are actually uniform spacings. So, 9Q is estimated by finding that 9 which brings Di(9) "close" to the expected value of uniform spacings, i.e., K The convex function h( ) gives a measure of "closeness" of the two quantities. Hence, our proposed method may also be viewed as a minimum divergence method of estimation. In particular, ft(*) = -log(jr)

(4)

minimizes the Kullback-Leibler divergence, while h(x) = x log(;t)

(5)

maximizes the entropy. One may also choose xa h(x)=

{

if 0 < a < 1 :

and when a =

if a > 1

a

(6)

if - - < a < 0 2

this corresponds to minimizing the Hellinger distance.

EXAMPLE 2. Let X\, X%,..., Xn-i be a random sample from U(0,9),6 € (0, oo). It is well known that the MLE is X{n j,. It can be checked that the estimator corresponding to h(x) = x log JC as in (5) is

y

X(n-l)

, _

f X(1) log X(1) + E d (X(0 - Xg-o) log(*(0 - *o I A(„_i)

+ exp{

188

K. Ghosh and S. Rao Jammalamadaka

while that corresponding to (6) is X(n-1) + | —

y

I

A(„_l)

Finally if we take h(-) = — log x as in (4), the resulting estimator (which is the MSPE) is given by n

X

This is the UMVUEfor 0. EXAMPLE 3. Suppose we have a random sample X[, Xj,..., Xn-i from the double exponential distribution given by the following density /,(*) = ± « P ( - | * - * | ) ,

-

M

.

l

.

For odd sample size (say, n — 1 = 2k + \), (4) gives the same estimator as the MLE: X^). For n — 1 = 2k (even sample size), the MLE is not unique; it is any value in the interval (X(k), The estimator corresponding to (4) in this case is the midpoint of the previous interval, i.e. which again corresponds to the UMVUE. 2.1. Properties ofGSE For simplicity, we will assume from now on that 6 is a scalar. The results given below, however, apply to the vector parameter case with obvious modifications. THEOREM4. Let T{9) be defined as in (3) where h(-) is a continuously differentiable (non-linear) convex function with h( 1) = 0. Under certain regularity conditions on the function h(-) and the distribution Fg,

Peo(T(0o) < T(6))

-+lasn-*oo.

Proof. See Ghosh and Jammalamadaka (2000). The above theorem thus justifies our estimation procedure.



COROLLARY 5. Under the assumptions of Theorem 4, if © is finite, the GSE 0n exists, is unique with probability tending to 1 and is consistent. The following theorem gives conditions for consistency of the GSE when the parameter space is not necessarily finite. THEOREM 6. Suppose the assumptions of Theorem 4 are satisfied and that for almost all x, f$(x) is differentiable wrtOford e co, with derivative fg(x).

Some recent results on inference based on spacings

189

Then, with probability tending to 1, the equation T'{0) = 0

(7)

has a root 0 which converges to the true value 6Q in probability. Proof. Let e > 0 be small enough so that (6Q — e, SN = {X

: TQBQ) < T(90

- E) a n d T(90)

< T(90

+ e) C OJ and let +

E)}-

By Theorem 4, Pe0(Sn) -»• 1. For any X e Sn, there exists a value 9q — e < 9n < 6>o + e at which T(0) has alocal minimum. Hence, T'(9n) = 0. Hence, for any e > 0 sufficiently small, there exists a sequence 9n = #„(e) of roots such that Peo{\0n-0o\

1.

Next, we show that there exists such a sequence which does not depend on e. Let 9* be the root closest to 6q. This exists because the limit of a sequence of roots is again a root by the continuity of T(9) in 6. Now, it is easy to see that Pe0(\9* — 60\ < e) 1, which proves the result. • COROLLARY 7. Under the assumptions of Theorem 4, if the spacings equation (7) has a unique root 9 for each n and for all x, {0} is a consistent sequence of estimators of0Q. If the parameter space is an open interval (0_, 9), then with probability tending to 1, 0 minimizes the function T(0). Hence, 0 is the GSE and is consistent.

Proof. The first statement follows from Theorem 6. Suppose, if possible, the probability of 0n being the GSE does not tend to 1. Then, for large n, the function T(9) must tend to an infimum as 9 tends to 0_ or 0 with positive probability. Now, with probability tending to 1, 9n is a local minimum of the function T(6) which must then also possess a local maximum. This contradicts the established uniqueness of the root. • The above, however, gives local consistency, since taking derivatives does not guarantee the presence of a global minimum. The same method of estimation has been independently proposed by Ranneby and Ekstrom (1997) in which a proof of consistency under global conditions is given. The following two results, whose proofs can be found in Ghosh and Jammalamadaka (2000), establish the asymptotic normality of the GSEs and find the asymptotically optimal one among this class. THEOREM 8. Let h(-) be a convex function (except, possibly, a straight line) which is thrice continuously differentiable. Let W denote an exponential random variable with mean 1. Then, under certain regularity conditions,

V^(0n-0o)

(8)

190

K. Ghosh and S. Rao

Jammalamadaka

where 2

E\Wh'(W)}2

°h =

- 2E{Wh'(W)}Cov[Wh'(W), i 72 [E{W2h"(W)}]

W} '

(9)

9n is a consistent wot of T'(0) = 0 and 1(9) is the Fisher Information in a single observation. T H E O R E M 9.

Assuming that

lim w2h'(w)e~w ui-»0

= 0

and

lim w2h'(w)e~w w—>oo

= 0,

given by (9) is minimized iff h(x) = a logCt) + bx + c where a, b and c are constants. Since we know that Yl"= i A W = 1 and that h(-) has to be convex, we can without loss of generality choose a = —I, b = 0, c = 0. Hence, asymptotically, the MSPE has the smallest variance in this class. This, incidentally, coincides with the Cramer-Rao lower bound, which is the asymptotic variance of the MLE when the latter exists. MSPE is thus asymptotically equivalent to MLE when the latter exists. Except in simple cases like the uniform distributions, the estimator is not explicitly obtained. We have thus presented results only on asymptotic properties. Simulation studies have however shown that h(x) = — \og(x) (the "optimal" choice for large samples) is not necessarily the best for finite samples. Further studies are necessary and are in progress for finite sample properties.

3. GOODNESS-OF-FIT TESTS Given a random sample from some unknown distribution, it is often necessary to check whether the sample comes from a particular completely specified continuous distribution F0. By the probability integral transform x i—y F()(x), this is equivalent to testing for uniformity of the transformed sample. In this section, we assume that such a transformation has already been applied and our data X\, X2, • • •, Xn-\ is on the unit interval (0, 1). We are interested in testing H0 : F = [7(0, 1) where U (0, 1) is the uniform distribution on the interval (0, 1). Let X(i) < X(2) < . . . < X( n -i) be the corresponding order statistics for the data in hand. Define the spacings obtained from the sample by Di = X(j) — X(,_i); where Z(o) = 0 and X(„) = 1. Under Ho, and the D, 's will be uniform spacings.

i = 1,2,...,« • • •, X n _i are i.i.d. U(0, 1)

Some recent results on inference based on spacings

191

A large class of tests of uniformity based on spacings have the general form n

Gn =

^h(nD,),

(=1 with h as in (3). Some standard cases, which have been discussed in the literature, correspond to h(x) = x2 (called the Greenwood Statistic), h(x) = \x — 11 (called the Rao's Spacing Statistic), etc. It has been shown by Sethuraman and Rao (1970) that under a smooth sequence of alternatives, the Greenwood Statistic has the maximum efficacy in this class Gn. Tests based on spacings are the natural choices in directional data where measurements are directions in 2-dimensions and spacings are the maximal invariants with respect to change of origin and sense of rotation. In order to use these tests based on spacings, it is necessary to know the null distributions of the corresponding statistics. It turns out that the exact/small sample null distributions are not known in most of the cases or are very difficult to obtain. Using the relation between uniform spacings and exponential observations, the asymptotic null distributions can be shown to be normal under mild conditions on h(-). However, this asymptotic normality is potentially misleading since it is generally good only for considerably large sample sizes. Table 1 gives the skewness and kurtosis for the Greenwood Statistic. In comparison with the skewness and kurtosis values for a normal distribution of 0 and 3 respectively, these are quite "non-normal" even for n = 50 or 100. This shows that in some cases, a normal approximation would be quite bad for small samples. Hence, there is a need to look for approximations to the exact distributions. In particular, we investigate Edgeworth expansions and saddlepoint approximations. See Ghosh and Jammalamadaka (1998) for more details. Table 1. Skewness and Kurtosis of Greenwood Statistic n

ßl

ßl

5 10 20 50 100 500 1000

1.587 1.706 1.584 1.218 0.926 0.440 0.314

6.827 8.351 8.201 6.378 5.026 3.473 3.241

oo

0.000

3.000

K. Ghosh and S. Rao Jammalamadaka

192

For all our later calculations we will use the following equivalent form of the Greenwood Statistic: n

G„ = J2 T2 i=i

(10)

with the corresponding cumulative distribution function denoted by /*„(•)• Burrows (1979) and Currie (1981) tabulated selected percentage points of the Greenwood Statistic upto n — 20 using a recursive algorithm. The method breaks down for higher n due to complicated nature of the algorithm and hence, no exact tables are available for such cases. 3.1. Edgeworth Expansion

Method

From now on, we use the symbols 4>( ) and 4>{-) to denote the standard normal distribution and density functions respectively. Edgeworth expansions for statistics have a long history and we first quote the following general result from Hall (1992), pp. 46-48: THEOREM 10. Suppose Sn is a statistic with a limiting standard normal distribution and is a "smooth" function of vector means. Then,

P(Sn < , ) = * ( , ) + , 0 c ) ( B i p . + \ y/n

n

+

...

+

P M ) nJ'z

/

+

oto-J»)

where

P\(x) = -{¿i,2 + P2(x)

= -x\l(k2i2

- 1) + kl2) + ^(kAA+4kU2kxl)(x2

- 3)

(11) + ^ * 3 2 . , ( * 4 - 10*2 + 15)|

and Kj,n = n-u~2)/2(kjA

+ n~xkh

2

+ n~2kj,

3

+ •••),

j > h

is the jth cumulant of Sn.

Using the characterization of spacings given in Theorem 1, Theorem 10 can be applied to obtain an Edgeworth expansion of the normalized Greenwood Statistic: *Jn S„ -- -^{nGn

- 2}.

(12)

Some recent results on inference based on spacings

193

Kabaila (1993) describes a method for computer calculation of Edgeworth expansions of smooth function models accurate to the o(n~l) term, in which the cumulants of the statistic are asymptotically expanded and relevant coefficients are collected to get p\(•) and P2(-)- His method was implemented using M a t h e m a t i c a to get: P(Sn

= 4>(x) +

1 /101

-

191

3

25

5

\|

,

,

Table 2 presents the results obtained for Greenwood Statistic. Note that ta satisfies P(Gn < ta) = a. The exact quantiles are taken from Burrows (1979) and Currie (1981). Close examination suggests that Edgeworth expansion provides better approximation than the normal approximation as was expected. A plot of the corresponding graph suggests however that the approximation fluctuates wildly in the tails. This is one inherent deficiency of the Edgeworth expansion and occurs due to the polynomial frame of approximation. Since our primary goal is to approximate tail probabilities, this deficiency is a serious one. Table 2. Quantiles of Greenwood Statistic using Edgeworth expansion, n = 10 a

t Exact

Normal

Edgeworth of order 2

.01 .05 .1 .2 .3 .4 .5 .6 .7 .8 .9 .95 .99

.111694 .121088 .127248 .136050 .143499 .150744 .158375 .166976 .177436 .191648 .215717 .240356 .300793

.054287 .091647 .111563 .13568 .15307 .16793 .181818 .195707 ..210566 .227956 .252073 .271989 .309349

n-V

n~>

.096327 .104439 .112076 .12381 .133414 .142088 .150434 .15892 .168109 .179087 .196186 .31685 .35937

.130842 .133684 .137016 .143213 .149072 .154819 .160627 .166661 .17311 .180226 .188392 .193063 .197196

194

K. Ghosh and S. Rao

Jammalamadaka

In the next subsection, we present an alternate method that does not suffer from this particular problem and show that it still produces reasonably accurate approximations. 3.2. Saddlepoint Approximation Let Tn be a real valued statistic and Kn (f) be its cumulant generating function. Let Rn(t) = K„{nt)/n. Then the saddlepoint approximation of the density of Tn with uniform error of order n~l is given by: gn(x) = J 2 j t ^ ( t o ) exp[n{fl„(fo) - io*}]

(14)

where to is the saddlepoint, determined as the root of the equation Rn(to) = x.

(15)

The saddlepoint approximation of the distribution of Tn is given by Gn(x) = 4>00 + • L2(T2) are defined in complete analogy to Rxx- We note that we do not assume stationarity of the processes as in Brillinger (1975, ch. 10) and that we are aiming at quantifying the dependency between pairs of processes. Thus our target is different from that of a body of work, exemplified by Roussas (1990), where the impact of dependency in a sequence of low-dimensional data on nonparametric regression and related approaches is considered. One of the basic problems which sets infinite-dimensional data analysis apart from multivariate statistical analysis is that the covariance operators are not invertible. The reason is that a covariance operator of an ¿ 2 - P r o c e s s is a compact operator, which is not invertible in an infinite dimensional Hilbert space. One can indeed show that canonical weight functions do not necessarily exist for a given L 2 -process (He, 1999). It is therefore of interest to provide a sufficient condition for the existence of canonical correlations and canonical weight functions for Z.2-P r o c e s s e s - This is the purpose of Condition 2.1 below. Using the Karhunen-Loeve decomposition, X and Y may be expanded as 00

X(s) = E [ X ( j ) ] +

seTu i=i 00

Y(t) = £ [ r ( 0 ] +

(f), t e T2,

(2.5)

¿=1 with a sequence of uncorrected random variables with £(£, ) = 0, and a sequence of uncorrelated random variables with £(£,) = 0. Here, ¿x; A.yi = E[;f], YZx < oo. YZx H < and {(A,-, 0,-)}, 4>j)} are the eigenvalues and eigenfunctions of the covariance operators

From Multivariate to Functional Data

201

RXx and Ryy. The following condition refers to expansion (2.5) and, as will be seen in Theorem 3.1 below, ensures that functional canonical correlation is well defined. This fact has been observed in He, Mliller and Wang (1999). Condition 2.1. L2-processes X and Y satisfy z^ ~ m —
i)\2 i=1

< oo, wJ.ker(ÄKK)},

From Multivariate

to Functional

Data

203

With these representations we may define 00

RXX2U = ^ i=1

for u = Tixx,

and 00

1

RYY » =

^

'

for

V

i=1 referring to the Karhumen-Loeve expansion (2.5).

=

T H E O R E M 3 . 1 . Assume the L,2-processes X and Y satisfy Condition 2.1. Then all canonical correlations are well defined. Specifically, let (A.,-, q{), i ^ 1 be the ith non-zero eigenvalue and orthonormal eigenvector of R*R, and let pi — Rqjyfki. Then for i, j ^ 1, (a) p, e llxx, q, e 1ZYY; (b) Pi = Ui = R~lJ2pi, and v, = Ryy/2qi; (c) Corr(i/,, Uj) = («,-, RXxUj) = {pt, pj) = Su; (d) CorrCVi, Vj) -