Vol. 2 Mathematical Statistics Theory and Applications [Reprint 2020 ed.] 9783112319086, 9783112307922


153 73 40MB

English Pages 871 [872] Year 1987

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
CONTENTS
ABSTRACT INFERENCE (semi-parametric models...)(Session 1)
INVITED PAPERS
EFFICIENT TESTING IN A CLASS OF TRANSFORMATION MODELS: AN OUTLINE
ABSTRACT INFERENCE IN IMAGE PROCESSING
BAYES IAN INFERENCE IN SEMIPARAMETRIC PROBLEMS
SEMI-PARAMETRIC BAYES ESTIMATORS
ON ESTIMATING IN MODELS WITH INFINITELY MANY PARAMETERS
INFERENCE FOR STOCHASTIC PROCESSES (Session 2)
INVITED PAPERS
SEMMARTINGALE CONVERGENCE THEORY AND CONDITIONAL INFERENCE FOR STOCHASTIC PROCESSES
THE FOUNDATIONS OF FINITE SAMPLE ESTIMATION IN STOCHASTIC PROCESSES - II
CONTRIBUTED PAPERS
SOME USES OF MAXIMUM-ENTROPY METHODS FOR ILL-POSED PROBLEMS IN SIGNAL AND CRISTALLOGRAPHY THEORIES
On Asymptotic Efficiency of the Cox Estimator
ASYMPTOTIC PROPERTIBS OP THE MAXIMUM LIKBLIHOOD ESTIMATOR, ITO-VENTZSL'S FORMULA FOR SEMIMARTINGALES AND ITS APPLICATION TO THE RECURSIVE ESTIMATION IN A GENERAL SCHEME OP STATISTICAL MODELS
MAXIMUM ENTROPY SELECTION OF SOLUTIONS TO ILL-POSED MARTINGALE PROBLEMS
ASYMPTOTIC INFERENCE FOR THE GALTON-WATSON PROCESS WITHOUT RESTRICTION TO THE SUPER-CRITICAL CASE
CROSS-VALIDATION (Session 3)
INVITED PAPERS
ON RESAMPLING METHODS FOR CONFIDENCE LIMITS
THE INTERPLAY BETWEEN CROSS-VALIDATION AND SMOOTHING METHODS
BOOTSTRAP OF THE MEAN IN THE INFINITE VARIANCE CASE
AUTOMATIC CURVE SMOOTHING
NON-PARAMETRIC SMOOTHING OF THE BOOTSTRAP
DATA ANALYSIS (projection pursuit, curve estimation . . .) (Session 4)
INVITED PAPERS
ON CONSTRUCTING A GENERAL THEORY OP AUTOMATIC CLASSIFICATION
DATA ANALYSIS : GEOMETRIC AND ALGEBRAIC STRUCTURES
CONTRIBUTED PAPERS
GENERALIZED CANONICAL ANALYSIS
DETECTING OUTLIERS AND CLUSTERS IN MULTIVARIATE DATA BASED ON PROJECTION PURSUIT
THE INVERSE PROBLEM OP THE PRINCIPAL COMPONENT ANALYSIS
DESIGN OF EXPERIMENTS (nearest neighbour designs . . .) (Session 5)
INVITED PAPERS
NUMERICAL METHODS OF OPTIMAL DESIGN CONSTRUCTION
ORDERING EXPERIMENTAL DESIGNS
OBSERVATION AND EXPERIMENTAL DESIGN FOR AUTOCORRELATED PROCESSES
CONTRIBUTED PAPERS
THE DESIGN OF EXPERIMENTS FOR MODEL SELECTION
THE DESIGN AND ANALYSIS OF FIELD TRIALS IN THE PRESENCE OF FERTILITY EFFECTS
ON THE EXISTENCE OF MULTIFACTOR DESIGNS WITH GIVEN MARGINAIS
LOCAL ASYMPTOTIC NORMALITY IN GAUSSIAN MODEL OF VARIANCE COMPONENTS
ASYMPTOTIC METHODS IN STATISTICS (second order asymptotics, saddle points methods e etc.)(Session 6)
INVITED PAPERS
DIFFERENTIAL GEOMETRICAL METHOD IN ASYMPTOTICS OF STATISTICAL INFERENCE
LIKELIHOOD, ANCILLARITY AND STRINGS
ON ASYMPTOTICALLY COMPLETE CLASSES OF TESTS
TAIL PROBABILITY APPROXIMATIONS
ON SECOND ORDER ADMISSIBILITY IN SIMULTANEOUS ESTIMATION
Bounds for the asymptotic efficiency of estimators based on functional contractions; applications to the problem of estimation in the presence of random nuisance parameters
CONTRIBUTED PAPERS
ON CHI-SQUARED GOODNESS-OF-FIT TESTS FOR LOCATION-SCALE MODELS
SOME PROBLEMS IN STATISTICS
ADAPTIVE PROCEDURES FOR DETECTION OF CHANGE
ON LOCAL AND NON-LOCAL MEASURES OF EFFICIENCY
MAXIMAL DEVIATIONS OP GAUSSIAN PROCESSES AND EMPIRICAL DENSITY FUNCTIONS
CHI-SQUARED TEST STATISTICS BASED ON SUBSAMPLES
ON HODGES-LEHMANN INDICES OP NONPARAMETRIC TESTS
DIFFERENTIAL GEOMETRY AND STATISTICAL INFERENCE
LARGE SAMPLE PROPERTIES FOR GENERALIZATIONS OF THE TRIMMED MEAN
MULTIVARIATE ANALYSIS (large number of parameters . . .) (Session 7)
INVITED PAPERS
DISCRIMINANT ANALYSIS FOR SPECIAL PARAMETER STRUCTURES
ASYMPTOTICS OF INCREASING DIMENSIONALITY IN CLASSIFICATION
EXTENSIONS AND ASYMPTOTIC STUDIES OF MULTIVARIATE ANALYSES
ESTIMATION OF SYMMETRIC FUNCTIONS OF PARAMETERS AND ESTIMATION OF (»VARIANCE MATRIX
CONTRIBUTED PAPERS
INTRODUCTION IN GENERAL STATISTICAL ANALYSIS
ESTIMATION AND TESTING OP HYPOTHESES IN MULTIVARIATE GENERAL GAUSS-MARKOFF MODEL
SYMMETRY GROUPS AND INVARIANT STATISTICAL TESTS FOR FAMILIES OF MULTIVARIATE GAUSSIAN DISTRIBUTIONS
PRESCRIBED CONDITIONAL INTERACTION MODELS FOR BINARY CONTINGENCY TABLES
QUADRATIC INVARIANT ESTIMATORS WITH MAXIMALLY BOUNDED MEAN SQUARE ERROR
TIME SERIES (long range dependence, non-linear processes, estimation of spectra) >(Session 8)
INVITED PAPERS
NON-GAUSSIAN SEQUENCES AND DECONVOLUTION
NON-LINEAR TIME SERIES MODELS OF REGULARLY SAMPLED DATA: A REVIEW
ROBUST SPECTRAL ESTIMATION
CONTRIBUTED PAPERS
ON MARGINAL DISTRIBUTIONS OF THRESHOLD MODELS
ON THE BOUNDARY OF THE CENTRAL LIMIT THEOREM FOR STATIONARY p-MIXING SEQUENCES
DETECTION OF PARAMETER CHANGES AT UNKNOWN TIMES IN LINEAR REGRESSION MODELS
SOME ASPECTS OF DIRECTIONALITY IN TIME SERIES ANALYSIS
THE ALGORITHM OF MAXIMUM MUTUAL INFORMATION FOR MODEL FITTING AND SPECTRUM ESTIMATION
BOUNDARY CROSSING PROBLEMS AND SEQUENTIAL ANALYSIS (Session 11)
INVITED PAPERS
OPTIMAL SEQUENTIAL TESTS FOR RELATIVE ENTROPY COST FUNCTIONS
ASYMPTOTIC EXPANSIONS IN SOME PROBLEMS OF SEQUENTIAL TESTING
CONTRIBUTED PAPERS
ASYMPTOTIC METHODS FOR BOUNDARY CROSSINGS OF VECTOR PROCESSES
FIRST PASSAGE DENSITIES OF GAUSSIAN AND POINT PROCESSES TO GENERAL BOUNDARIES WITH SPECIAL REFERENCE TO KOLMOGOROV-SMIRNOV TESTS WHEN PARAMETERS ARE ESTIMATED
CONVERSE RESULTS FOR EXISTENCE OF MOMENTS FOR STOPPED RANDOM WALKS
MATHEMATICAL PROGRAMMING IN SEQUENTIAL TESTING THEORY
EXTREME VALUES AND APPLICATIONS ( strength of materials) (Session 12)
INVITED PAPERS
THEORY OP EXTREMES AND ITS APPLICATIONS TO MECHANICS OP SOLIDS AND STRUCTURES
EXTREMES, LOADS, AND STRENGTHS
STATISTICAL MODELS FOR COMPOSITE MATERIALS
CONTRIBUTED PAPERS
THE DISTRIBUTION OF BUNDLE STRENGTH UNDER GENERAL ASSUMPTIONS
AN ESTIMATE OP THE HATE OP CONVERGENCE IN THE LAW OP LARGE NUMBERS FOR SUMS OP ORDER STATISTICS AND THEIR APPLICATIONS
THE INDEX OF THE OUTSTANDING OBSERVATION AMONG N INDEPENDENT ONES
RAIN FLOW CYCLE DISTRIBUTIONS FOR FATIGUE LIFE PREDICTION UNDER GAUSSIAN LOAD PROCESSES
HIGH-LEVEL EXCURSIONS OP GAUSSIAN FIELDS: A GEOMETRICAL APPROACH BASED ON CONVEXITY
EPIDEMIOLOGY (mainly observational studies) (Session 13)
INVITED PAPERS
EPIDEMIC PREDICTION AND PUBLIC HEALTH CONTROL, WITH SPECIAL REFERENCE TO INFLUENZA AND AIDS
MATHEMATICAL MODELS FOR CHRONIC DISEASE EPIDEMIOLOGY
GLOBAL FORECAST AND CONTROL OF FAST-SPREADING EPIDEMIC PROCESS
CONTRIBUTED PAPERS
A STATISTICAL ANALYSIS OF THE SEASONALITY IN SUDDEN INFANT DEATH SYNDROME (SIDS)
EPIDEMIOLOGICAL MODELS FOR SEXUALLY TRANSMITTED INFECTIONS
RESULTS AM) PERSPECTIVES OP THE MATHEMATICAL FORECASTING OF INFLUENZA EPIDEMICS IN USSR
THE GENERALIZED DISCRETE-TIME EPIDEMIC MODEL WITH IMMUNITY
GEOLOGY AND GEOPHYSICS (Session 14)
INVITED PAPERS
STOCHASTIC KODEL OF MIESRAL CRYSTALLIZATION PROCESS FROM MAGMATIC MELT
APPLICATIONS OP STOCHASTIC GEOMETRY IN GEOLOGY
CLASSIFICATION AND PARTITIONING OF IGNEOUS ROCKS
CONTRIBUTED PAPERS
APPLICATION OP FUZZY SETS THEORY TO THE SOLUTION OP PATTERN RECOGNITION PROBLEMS IN OIL AND GAS GEOLOGY
THE APPLICATION OP MULTIDIMENSIONAL RANDOM FUNCTIONS FOR STRUCTURAL MODELLING OF THE PLATFORM COVER
PREDICTION OF ROCK TYPES IN OIL WELLS FROM LOG DATA
ON TESTS FOR OUTLYING OBSERVATIONS
THE IDEAS OP PERCOLATION THEORY IN GEOPHYSICS AND FAILURE THEORY
HYDROLOGY AND METEOROLOGY (Session 15)
INVITED PAPERS
SOME STOCHASTIC MODELS OF RAINFALL WITH PARTICULAR REFERENCE TO HYDROLOGICAL APPLICATIONS
CANONICAL CORRELATIONS FOR RANDOM PROCESSES AND THEIR METEOROLOGICAL APPLICATIONS
STATISTICAL DECISIONS AND PROBLEMS OF THE OPTIMUM USE OF METEOROLOGICAL INFORMATION
CONTRIBUTED PAPERS
APPLICATION OF DATA ANALYSIS METHODS FOR THE EVALUATIONS OF EFFICIENCY OF WEATHER MODIFICATIONS EXPERIMENTS
PREDICTOR-COUNTING CONFIDENCE INTERVALS FOR THE VALUE OF EFFECT IN RANDOMIZED RAINFALL ENHANCEMENT EXPERIMENTS
ON BEHAVIOUR OF SEA SURFACE TEMPERATURE ANOMALIES
THE SAMPLING VARIABILITY OF THE AUTOREGRESSIVE SPECTRAL ESTIMATES FOR TWO-VARIATE HYDROMETEOROLOGICAL PROCESSES
ON THE FORECASTING OF THE FLUCTUATIONS IN LEVELS OF CLOSED LAKES
BIOLOGICAL MODELS AND GENETICS (Session 16)
INVITED PAPERS
THE EQUILIBRIUM LAWS AMD DYNAMIC PROCESSES IN POPULATION GENETICS
COMMUNITY SIZE AND AGE AT INFECTION: HOW ARE THEY RELATED?
BRANCHING PROCESSES AND NEUTRAL MUTATIONS
CONTRIBUTED PAPERS
LIMIT THEOREM FOR SOME STATISTICS OP MULTITYPE GALTON-WATSON PROCESS
THE REGULARITY OP METAPHASE CHROMOSOMES ORGANIZATION IN CEREALS
THE GENEALOGY OP THE INFINITE ALLELES MODEL
THE RELATIONSHIP BETWEEN THE STOCHASTIC AND DETERMINISTIC VERSION OF A MODEL FOR THE GROWTH OF A PLANT CELL POPULATION
GENE ACTION FOR AGRONOMIC CHARACTERS IN WINTER WHEAT
TOTAL PROGENY OF A CRITICAL BRANCHING PROCESS
LIMIT THEOREMS POE A CRITICAL BRANCHING CRUMP-MODE-JAGERS PROCESSES
THE BEHAVIOUR OF THE PRAY-EDITOR SYSTEM IN THE NEIGHBOURHOOD OF STATISTIC EQUILIBRIUM
BELLMAN-HARRIS BRANCHING PROCESSES AND DISTRIBUTIONS OP MARKS IN PROLIFERATING CELL POPULATIONS
STOCHASTIC SIMULATION (Session 18)
INVITED PAPERS
METHODS IN QUANTUM MONTE CARLO
THE MONTE-CARLO METHOD AND ASYNCHRONIC CALCULATIONS
INCREASING THE EFFICIENCY OF STATISTICAL SAMPLING WITH THE AID OF INFINITE-DIMENSIONAL UNIFORMLY DISTRIBUTED SEQUENCES
CONTROLLED UNBIASED ESTIMATORS FOR CERTAIN FUNCTIONAL INTEGRALS
KAC'S MODEL FOR A GAS OF N PARTICLES AND MONTE-CARLO SIMULATION IN RAREFIED GAS DYNAMICS
STATISTICAL COMPUTING (Session 21)
INVITED PAPERS
MATHEMATICAL PROGRAMMING IN STATISTICS : an overview
MODEL SEARCH IN LARGE MODEL FAMILIES
PROGRAMMING LANGUAGES AND OPPORTUNITIS THEY OFFER TO THE STATISTICAL COMMUNITY
CONTRIBUTED PAPERS
STATISTICAL SOFTWARE FOR MICRO-COMPUTERS
DATA CLASSIFICATION BY COMPARING OP COMPUTER IMPLEMENTATIONS OP STATISTICAL ALGORITHMS
TIME DISCRETE APPROXIMATION OF ITO PROCESSES
FAST ALGORITHM OF PEAK LOCATION IN SPECTRUM
EMPIRICAL PROCESSES (Session 23)
INVITED PAPERS
APPROXIMATIONS OF WEIGHTED EMPIRICAL PROCESSES WITH APPLICATIONS TO EXTREME, TRIMMED AND SELF-NORMALIZED SUMS
MINIMIZATION OF EXPECTED RISK BASED ON EMPIRICAL DATA
CONTRIBUTED PAPERS
RATES OP CONVERGENCE IN THE INVARIANCE PRINCIPLE FOR EMPIRICAL MEASURES
ALMOST SURE BEHAVIOUR OF WEIGHTED EMPIRICAL PROCESSES IN THE TAILS
CONVERGENCE OF THE EMPIRICAL CHARACTERISTIC FUNCTIONAL
SAMPLE APPROXIMATION OF THE DISTRIBUTION BY MEANS OF k POINTS: A CONSISTENCY RESULT FOR SEPARABLE METRIC SPACES
AUTHOR INDEX
Recommend Papers

Vol. 2 Mathematical Statistics Theory and Applications [Reprint 2020 ed.]
 9783112319086, 9783112307922

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Proceedings of the 1st World Congress of the

BERNOULLI SOCIETY

P r o c e e d i n g s of the 1st W o r l d C o n g r e s s of the

BERNOULLI SOCIETY Tashkent, USSR 8-14 September 1986 Volume 2 Mathematical Statistics Theory and Applications Editors

Yu. A. Prohorov and V. V. Sazonov

WWNU SCIENCE

PRESSili

Utrecht, The Netherlands 1987

V N U Science Press BV P.O. Box 2093 3500 GB Utrecht The Netherlands © 1987 V N U Science Press BV First published in 1987 ISBN 90-6764-103-0 (set) ISBN 90-6764-104-9 (Vol. 1) ISBN 90-6764-105-7 (Vol. 2) All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner.

Printed in Great Britain by J. W. Arrowsmith Ltd, Bristol

CONTENTS

ABSTRACT INFERENCE (semi-parametric models...) (Session 1 - Chairman: P.J. Bickel) INVITED PAPERS Efficient testing in a class of transformation models: an outline P.J. Bickel

3

Abstract inference in image processing U. Grenander

13

CONTRIBUTED PAPERS Bayesian 0. Bunkeinference in semiparametric problems

27

Semi-parametric Bayes estimators N.L. Hjort

31

On estimating in models with infinitely many nuisance parameters A.W. van der Vaart

35

INFERENCE FOR STOCHASTIC PROCESSES (Session 2 - Chairman: A.N. Shiryaev) INVITED PAPERS Semimartingale convergence theory and conditional inference for stochastic processes P.D. Feigin

41

The foundations of finite sample estimation in stochastic processes-II V.P. Godambe CONTRIBUTED PAPERS Some uses of maximum-entropy methods for ill-posed problems in signal and cristallography theories D. Dacunha-Castelle v

49

55

vi On asymptotic efficiency of the Cox estimator K. Dzhaparidze

59

Asymptotic properties of the maximum likelihood estimator, Ito-Ventzel's formula for semimartingales and its application to the recursive estimation in a general scheme of statistical models N.L. Lazrieva and T.A. Toronjadze

63

Maximum entropy selection of solutions to ill-posed martingale problems R. Rebolledo

67

Asymptotic inference for the Galton-Watson process without restriction to the super-critical case D.J. Scott

71

CROSS-VALIDATION (Session 3 - Chairman: D.V. Hinkley) INVITED PAPERS On resampling methods for confidence limits D.V. Hinkley

77

The interplay between cross-validation and smoothing methods B.W. Silverman

87

CONTRIBUTED PAPERS Bootstrap of the mean in the infinite variance case K.B. Athreya

95

Automatic curve smoothing W. HSrdle

99

Non-parametric smooting of the bootstrap A. Young

105

DATA ANALYSIS (projection pursuit, curve estimation . . .) (Session 4 - Chairman: E. Diday) INVITED PAPERS On constructing a general theory of automatic classification S.A. Aivazyan

111

Data analysis: geometric and algebraic structures B. Fichet

123

vii CONTRIBUTED PAPERS Generalized canonical analysis M. Tenenhaus

133

Detecting outliers and clusters in multivariate data based on projection pursuit I.S. Yenyukov

137

The inverse problem of the principal component analysis S.U. Zhanatauov

141

DESIGN OF EXPERIMENTS (nearest neighbour designs . . .) Session 5 - (Chairman: H.P. Wynn) INVITED PAPERS Numerical methods of optimal design construction V.V. Fedorov

147

Ordering experimental designs F. Pukelsheim

157

Observation and experimental design for autocorrelated processes H.P. Wynn

167

CONTRIBUTED PAPERS The design of experiments for model selection A.M. Herzberg and A.V. Tsukanov

175

The design and analysis fo field trials in the presence of fertility effects C. Jennison

179

On the existence of multifactor designs with given marginals 0. Krafft

183

Local asymptotic normality in Gaussian model of variance components M.B. Maljutov

187

ASYMPTOTIC METHODS IN STATISTICS (second order asymptotics, saddle points methodse etc.) (Session 6 - Chairman: W. van Zwet) INVITED PAPERS Differential geometrical method in asymtotics of statistical inference S. Amari

195

viii Likelyhood, ancillarity and strings O.E. Barndorff-Nielsen

205

On asymptotically complete classes of tests A.V. Bernstein

215

Tail probability approximations H.E. Daniels

223

On second order admissibility in simultaneous estimation B.Ya. Levit

225

Bounds for the asymptotic efficiency of estimators based on functional contractions; applications to the problem of estimation in the presence of random nuisance parameters J. Pfanzagl

237

CONTRIBUTED PAPERS On chi-squared goodness-of-fit tests for location-scale models F.C. Drost, J. Oosterhoff and W.C.M. Kallenberg

249

Sone problans in statistics F. Hampel

253

Adaptive procedures for detection of change M. HuSkovà

257

On local and non-local measures of efficiency W.C.M. Kallenberg

263

Maximal deviations of gaussian processes and empirical density functions V.D. Konakov

267

Chi-squared test statistics based on subsamples M. Mirvaliev

271

On Hodges-Lehmann indices of nonparametric tests Ya.Yu. Nikitin

275

Differential geometry and statistical inference L.T. Skovgaard

279

Large sample properties for generalizations of the trimned mean N. Veraverbeke

283

ix MULTIVARIATE ANALYSIS (large number of parameters . . .) (Session 7 - Chairman: Y. Escoufier) INVITED PAPERS Discriminant analysis for special parameter structures J. LSuter

289

Asymptotics of increasing dimensionality in classification L.D. Meshalkin

299

Extensions and asymptotic studies of multivariate analyses A. Pousse

307

Estimation of symmetric functions of parameters and estimation of covariance matrix K. Takeuchi and A. Takamura

317

CONTRIBUTED PAPERS Introduction in general statistical analysis V.L. Girko

327

Estimation and testing of hypotheses in multivariate general Gauss-Markoff model W. Oktaba

331

Symmetry groups and invariant statistical tests for families of multivariate Gaussian distributions E.A. Pukhal'skii

335

Prescribed conditional interaction models for binary contingency tables T. Rudas

339

Quadratic invariant estimators with maximally bounded mean square error F. Stulajter

343

TIME SERIES (long range dependence, non-linear processes, estimation of spectra) (Session 8 - Chairman: H. Tong) INVITED PAPERS Non-gaussian sequences and deconvolution M. Rosenblatt

349

Non-linear time series models of regular sampled data: a review H. Tong

355

X

Robust spectral estimation I.G. Zhurbenko

369

CONTRIBUTED PAPERS OnJ. marginal distributions of threshold models AndSl and A. Fuchs

379

On the boundary of the central limit theorem for stationary p-mixing sequences R.C. Bradley

385

Detection of parameter changes at unknown times in linear regression models V.K. Jandhyala and I.B. MacNeill

389

Soma aspects of directionality in time series analysis A.J. Lawrance

393

The algorithm of maximum mutual information for model fitting and spectrum estimation Z. Xie

397

BOUNDARY CROSSING PROBLEMS AND SEQUENTIAL ANALYSIS (Session 11 - Chairman: D.O. Siegmund) INVITED PAPERS Optimal sequential tests for relative entropy cost functions H.R. Lerche

403

Asymptotic expansions in some problems of sequential testing V.I. Lotov and A.A. Novikov

411

CONTRIBUTED PAPERS Asymptotic methods for boundary crossings of vector processes K. Breitung

421

First passage densities of Gaussian and point processes to general boundaries with special reference to Kolmogorov-Smirnov tests when parameters are estimated J. Durbin

425

Converse results for existence of moments for stopped random walks A. Gut

429

Mathematical progranming in sequential testing theory U. Mflller-Funk

435

xi EXTREME VALUES AND APPLICATIONS ( strength of materials) (Session 12 - Chairman: R.L. Smith) INVITED PAPERS Theory of extremes and its applications to mechanisms of solids and structures V.V. Bolotin

443

Extremes, loads and strengths H. Rootz£n

461

Statistical models for composite materials R.L. Smith

471

CONTRIBUTED PAPERS The distribution of bundle strength under general assumptions H.E. Daniels

485

An estimate of the rate of convergence in the law of large numbers for sums of order statistics and their applications M.U. Gafurov and I.M. Khamdamov

489

The index of the outstanding observation among n independent ones L. de Haan and I. Weissman

493

Rain flow cycle distributions for fatigue life prediction under Gaussian load processes G. Lindgren and I. Rychlik

495

High-level excursions of Gaussian fields: a geometrical approach based on convexity V.P. Nosko

501

EPIDEMIOLOGY (mainly observational studies) (Session 13 - Chairman: N.T.J. Bailey) INVITED PAPERS Epidemic prediction and public health control, with special reference to influenza and AIDS N.T.J. Bailey and J. Estreicher

507

Mathematical models for chronic disease epidemiology K.G. Mantón

517

xii

Global forecast and control of fast-spreading epidemic process V. Vasilyeva, L. Belova, D. Donovan, P. Fine, D. Fraser, M. Gregg, I. Longini, L.A. Rvachev, L.L. Rvachev and V. Shashkov

527

CONTRIBUTED PAPERS

A statistical analysis of the seasonality in sudden infant death syndrome (SIDS) H. Bay

535

Epidemiological models for sexually transmitted infections K. Dietz

539

Results and perspectives of the mathematical forecasting of influenza epidemics in the USSR Yu.G. Ivannikov

543

The generalized discrete-time epidemic model with inmunity I.M. Longini

547

GEOLOGY AND GEOPHYSICS (Session 14 - Chairman: D. Vere-Jones) INVITED PAPERS

Stochastical model of mineral crystallization process from magnetic melt D.A. Rodionov

555

Applications of stochastic geometry in geology D. Stoyan

563

Classification and partitioning of igneous rocks E.H.T. Whitten

573

CONTRIBUTED PAPERS

Application of fuzzy sets theory to the solution of pattern recognition problems in oil and gas geology B.A. Bagirov, I.S. Djafarov and N.M. Djafarova

579

The application of multidimensional random functions for structural modelling of the platform cover J. Harff and G. Schwab

583

Prediction of rock types in oil wells from log data M. Homleid, V. Berteig, E. B^lviken, J. Helgeland and E. Mohn

589

On tests for outlying observations V.I. Pagurova and K.D. Rodionov

593

xiii The ideas of percolation theory in geophysics and failure theory V.F. Pisarenko and A.Ya. Reznikova

597

HYDROLOGY AND METEOROLOGY (Session 15 - Chairman: A.H. Murphy) INVITED PAPERS Some stochastic models of rainfall with particular reference to hydrological applications D.R. Cox and I. Rodrigues-Iturbe

605

Canonical correlations for random processes and their meteorological applications A.M. Obukhov, M.I. Fortus and A.M. Yaglom

611

Statistical decisions and problems of the optimum use of meteorological information E.E. Zhukovsky

625

CONTRIBUTED PAPERS Application of data analysis methods for the evaluations of efficiency of weather modification experiments G. Der Megreditchian

637

Predictor-counting confidence intervals for the value of effect in randomized rainfall enhancement experiments E.M. Kudlaev

643

On behaviour of sea surface tanperature anomalies L.I. Piterbarg and D.D. Sokolov

647

The sampling variability of the autoregressive spectral estimates for two-variate hydrometeorological processes V.E. Privalsky, I.G. Protsenko and G.A. Fogel

651

On the forecasting of the fluctuations in levels of closed lakes M.I. Zelikin, L.F. Zelikina and J. Schultze

655

BIOLOGICAL MODELS AND GENETICS (Session 16 - Chairman: P. Jagers) INVITED PAPERS The equilibrium laws and dynamic processes in population genetics Yu.I. Lyubic

661

Corrmunlty size and age at infection: how are they related? A.R. McLean

671

XIV

Branching processes and neutral mutations 0. Nerman

683

CONTRIBUTED PAPERS Limit theorem for some statistics of multitype GaltonWatson process I.S. Badalbaev and A.A. Mukhitdinov The regularity of metaphase chromosomes organization in cereals N.L. Bolsheva, N.S. Badaev, E.D. Badaeva, O.V. Muravenko and Yu.N. Turin

693

697

The genealogy of the infinite alleles model P. Donnelly and S. Tavar£

701

The relationship between the stochastic and deterministic version of a model for the growth of a plant cell population M.C.M. de Gunst

705

Gene action for agronomic characters in winter wheat W. Lone

709

Total progeny of a critical branching process S.M. Sagitov

713

Limit theorems for a critical branching Crump-Mode-Jagers processes V.A. Topchii

717

The behaviour of the pray-editor system in the neighbourhood of statistical equilibrium Ye.F. Tsarkov

721

Bellman-Harris branching processes and distributions of marks in proliferating cell populations A.Yu. Yakovlev, M.S. Tanushev and N.M. Yanev

725

STOCHASTIC SIMULATION (Session 18 - Chairman: G.A. Mikhailov) INVITED PAPERS Methods in Quantum Monte Carlo M.H. Kalos

731

CONTRIBUTED PAPERS The Monte Carlo method and asynchronic calculations S.M. Ermakov

739

XV

Increasing the efficiency of statistical sampling with the aid of infinite-dimensional uniformly distributed sequences I.M. Sobol'

743

Controlled unbiased estimators for certain functional integrals W. Wagner

747

Kac's model for a gas of n particles and Monte Carlo simulation in rarefied gas dynamics V.E. Yanitskii

751

STATISTICAL COMPUTING (Session 21 - Chairman: S. Mustonen) INVITED PAPERS Mathematical Y. Dodge programming in statistics: an overview

757

Model search in large model families T. HavrAnek

767

Progranming languages and opportunities they offer to the statistical conmunity P. Naeve

779

CONTRIBUTED PAPERS Statistical software for micro-computers R. Gilchrist

793

Data classification by comparing of computer implementations of statistical algorithms N.N. Lyashenko and M.S. Nikulin

797

Time discrete approximation of Ito processes E. Platen

801

Fast algorithm of peak location in spectrum I.I. Surina

805

EMPIRICAL PROCESSES (Session 23 - Chairman: E. Gine) INVITED PAPERS Approximations of weighted empirical processes with applications to extreme, trimmed and self-normalized sums S. Csflrgfl and D.M. Mason

811

xvi Minimization of expected risk based on empirical data V.N. Vapnik and A.Ja. Chervonenkis

821

CONTRIBUTED PAPERS Rates of convergence in the invariance principle for anpirical measures I.S. Borisov

833

Almost sure behaviour of weighted empirical processes in the tails J.H.J. Einmahl and D.M. Mason

837

Convergence of the eripirical characteristic functionals V.I. Kol^inskii

841

Sample approximation of the distribution by means of k points: a consistency result for separable metric spaces K. PSrna

845

Author index

849

ABSTRACT INFERENCE (semi-parametric models. . .) (Session 1) Chairman: P.J. Bickel

EFFICIENT TESTING IN A CLASS OF TRANSFORMATION MODELS: AN OUTLINE by P.J. Bickel University of California, Berkeley Transformation models of the following type have been discussed by Cox (1972), Clayton and Cuzick (1985), Doksum (1985), among others. We observe (Z,,Yj) with Y; e Jj an open subinterval of R, which are a sample from a population characterized as follows. There exists an unknown transformation x from J 0 an open subinterval of R onto J; with T'>0 such that Y = x(T) where (Z,T) follow a parametric model. The intervals J ; here may be proper or halfrays or R itself. Colloquially, if Y is expressed in the proper unknown scale, i.e. as T, then the joint behaviour of (Z,T) has some nice parametric form. The case considered by previous authors is logT = 0 T Z + e where e is independent of Z. The distributions of e considered so far include: Cox (1972):

e£ has an exponential distribution.

Clayton and Cuzick (1985): sity (1)

f(t) = (l+te)

e e has a Pareto distribution with denc

, t>0, c > 0

where c = 0 is the Cox model. An important special case of (1) considered by Bennett (1983) is the log logistic model, c = 1 which has the attractive proportional odds property. Doksum (1985): In generalization of the Box-Cox model, e has a Gaussian distribution. It seems reasonable in these models to base inference about the parameters of the underlying parametric model such as 0,c above on the maximal invariant of the group of transformations generating this semiparametric model, {(z,t) —»(z,i(t))}. This maximal invariant is 3

4

just M = (Z,R) where Z = (Z,, • • • ,Z N ) and R = (Rx, • • • ,R N ) is the vector of ranks of the Y^. The likelihood of M or the conditional likelihood L(9) of R given Z = z can in general only be expressed as an N dimensional integral. It can be evaluated explicitly for the Cox model. Clayton and Cuzick propose some ingenious approximations and Doksum proposes that both the value of L and its distribution be calculated approximately by Monte Carlo. So far, however, the asymptotic behaviour of these procedures is not well understood. In this paper we specialize to Z = 0,1 as in Bickel (1985). Moreover we suppose, as did Clayton and Cuzick that the parameter 9 governing the conditional density of T = T(Y) given Z = j, denoted fj(-,0) is real, and in particular that the distribution of e is assumed known. In this context, for a subclass of transformation models, we indicate how to construct asymptotically efficient tests of H : 9 = 9 0 vs K : 9 > 9 0 . The proofs of our results and a detailed treatment are given in Bickel (1986). The subclass includes the Pareto model for c > 1. The testing problem as such is not very interesting save in the case where 9 0 corresponds to independence of Y and Z which is already well understood. However, the solution of the testing problem is a first step in the solution of the estimation problem whose importance is clear. The tests we propose are based on "quadratic rank statistics". (2)

T n = N- 1 L ^ Z O + N - 2 Z b ( ^ , ^ , Z i , Z j ) .

We interpret efficiency in this ^context conditionally on Z, or equivalently the two sample sizes ZZ, and N - Z Z . We show, i=l i=l i) If 9 n = 9 0 +tN- 1 / 2 , t > 0 , (3)

L 0 (/ "^N | Z)\ —> N(at,l) in probability for some a > 0 CT

N

ii)

where c N is a sequence of normalizing constants. If S N is any other sequence of statistics not necessarily depending on the ranks only such that pISnN supx P(0 X) [S N > s | Z] = a then, for each T, 9 n as above, plim N P (0NiT) [S N * s | Z] s 1 - 9 O is given by N 1 / 2 S n where S N = N- 1 EZioEeJcoCT;) |Z,R} + Z il E 9o {c 1 (T i ) |Z,R} Slop Cj(t)

= -^fj(t,e0) and Zy = I(Z4 = j). Equivalent^, if D = (Dj, • • • , D n ) are the antiranks defined by T q = TD. where T(j)< • • • < T ^ are the order statistics of the sample, then (4)

S N = N- 1 .E i |{Z D J E e o (c j (T ( i ) )|Z,p).

To get an approximation to the scores in (4) we write, fj(-,9 0 ) as fj(-) and define, N

m u = — = 1 -it-,. 1 i=i N We treat it} as deterministic constants in the sequel. Let, n = ZZ,-, m = N - n ,

h= ft0f0(-)+ftifi(-) with H the corresponding distribution function. Note that h and H depend on N and are random only through the ftj. Finally let, for 0 < t < 1, (5)

= Cj(H-1(t)),

tyt) gj (t)

= fj(H _1 (t))/h(H _1 (t))

the density of H(Tj) given Zj = j, and (6)

Yj(t) = —§-(0.

We can rewrite (4) as SN = Sni + SNO

6

where S N j = N- 1 |z D i j E(X j (U ( i ) )|Z,p) where (Z^U,) are i.i.d with U j given Zl = j having density gj and the marginal density of U j is uniform, (7)

ftogo+Aigi

The next step is to note that UQ = (8)

L

=

so that

SNj = N - 1 . | { Z D J a j ( - i - ) + V ( ^ - ) E [ ( U ( i ) - ^ - ) | Z , p ] }

plus terms we expect to be of order 0(N _ 1 ). The first term of the approximation is a linear rank statistic. For the second we use a heuristic argument of Clayton and Cuzick who argue that if Yi = E(U (i) |Z,D) then yj satisfies approximately the recurrence relation, (9)

Cyi+i - Yi)-1 - Cyi - Yi-i)-1 = ( i - Z D ^ o C y ^ + z ^ C y i ) .

Let G/t) = (NA j r 1 EI(U i t}. Q 0 is a distribution function with jumps of size mf 1 at

such that ZD. = 0 while Qj

1

jumps ( N - m ) " at with ZD. = 1". Evidently y{ is a function of i « » j—j — , Z, Q 0 , Qj only. Interpolate smoothly in some way between and

1 < i < N to obtain a function v on (0,1) such that, Yi = v ( ^ ) .

Any solution of (9) must satisfy, for some c,d 7i = d + .E(c+EZ^oYoCy^ + Z^jYxiyk))-1 j, °rf0rU=

i

i-1 — u

(12)

1

v(u) = d + J ( ^ + jY 0 (v(s))ft 0 dQo(s)+Yi(v(s))ftidQ 1 (s))~ dt.

0 iN t This is essentially the integral equation of Bickel (1985), save that we make the transformation H(-) and apply (8). Unfortunately, the hopes for analytic approximation of solutions to (12) expressed in Bickel (1985) have so far not been realized. However, suppose we (still formally) extend the definition of (12) to functions v( -,Q,Q') by replacing Q 0 Qi by arbitrary Q,Q' such that, ft0Q(t) + ft!Q'(t) = t, for t = 0,-^-, • • • , 1 with c,d depending on Q,Q'. Then, if Q = G 0 , Q' = Gj, — = 1 and d = 0, v(u) = u formally satisfies the extension of (12) since by (7) YoftoEo + Yiftigi = 0. Therefore, if Â(u) = v f a . Q ^ Q ^ - u , v = v(-Q 0 Q!) u 1 A(u) = d + J { c + {[Y0(v(s))ft0dQ0(s) + Y1(v(s))it1dQ1(s)]}-1dt o t 1 1 — J{ 1 + J[Y 0 (s)MG 0 (s)+YICS^IDG^s)] }-'dt o t We determine the constants c(Q0,Q!), d(Qo,Qi) formally by smooth fit at the boundaries, (13) Let,

Â(0) = A(l) = 0.

(14)

(-,8,z) be a density measure v on a measurable

space

with

(X,B).

and H be a space

(Z,A).

respect

Suppose

collection

to

that

For

a

each

o-finite

j)(x,8,z)

is

measurable as a function of (x,z) and set (1)

p(x,e,Ti)= J d(x,8,z) dT)(z).

A mixture model is defined by XJJX^J-.-JX

are i.i.d. random elements

(8,ti)s 0xh is unknown Xj has density p(-,0,n) of type (1) Mixture models are sometimes called structural models as opposed to functional models. The latter type of model is described by XjjXg,... are independent random elements (8,Zj,z^,. . . )e 0xZ"° X^ has density

is unknown w.r.t. y on (X,B)

In fact it is possible to embed the two models in a single

and

general model for each n=l,2,...

(2)

X,,X ......X are independent random elements il n! nn (0,Ti^j, . . . j'ljjjj)« Q*H is unknown X ^ has density p( - , 0 , ) w.r.t. y on (X,B) p(',8,T|) takes the form (1). 35

more

36

If H

contains the degenerated distributions

then

the

functional

model is a submodel of (2), since p( •,8,6 )=e(•,8,z). Next suppose that there exist measurable functions

h(-,8):(X,B) -KR

• ( - , 0 ) : ( X , B ) -KRm and jgC • , 8 ,n) iD?" -KR such that (3)

e(•,8,z)= h(•,8) g(0. In many examples which have the structure (4), the score

function

for

efficient

8 (cf. Begun et al.(1983)) is given by

i(-,B,u)= *(-,e,n)- E 0 (£(X 11 ,8,ti) |8,1nj)-

While the estimator T^ is often optimal in the i.i.d. model, it

is

difficult to make a same statement for the performance in

the

(4), due to the fact that it is unclear how to define

optimality

concept in models with infinitely many nuisance

an

model

However

parameters.

the estimator improves other constructions in the literature. As an example consider the model (l)-(2) with Z=(0,~) and p(x,y,0,z)= z e " Z X 0 z e " 0 Z y l(x>0,y>0}. In the functional form of this model we (Xj,Y.) of independent, exponentially with hazard rates z. and 0z^

have

a

sequence

distributed

respectively

and

the

random

of

pairs

variables

problem

is

to

estimate the ratio 0 of the hazard rates. Set n = n I. .11 . . n ]=1 nj Under the condition that the sequence of measures {ri^} is tight in such a way that all limit point n

have

ti^(0,«)=1

(no

mass

should

escape to either zero or infinity), the construction sketched above can be made rigorous and gives an asymptotically normal estimator with variance determined by (8). This follows from general results which will appear in van der Vaart(1987). REFERENCES Begun, J.M., Hall, W.J., Huang, W.M. and Wellner J. (1983). Information and asymptotic efficiency in parametric-nonparametric models. Ann. Statist. 2, 435-452. Vaart, A.W. van der (1986). Estimating a Real Parameter in a Class of Semi-Parametric Models, Report 86-9, Dep. Math., Univ. Leiden. Vaart, A.W van der (1987). Statistical Estimation in Models with Large Parameter Spaces. Thesis.

INFERENCE FOR STOCHASTIC PROCESSES (Session 2) Chairman: A.N. Shiryaev

SEMMARTINGALE CONVERGENCE THEORY AND CONDITIONAL INFERENCE FOR STOCHASTIC PROCESSES Feigin, P.D. Technion - Israel Institute of Technology, Haifa, Israel. 1.

CONVERGENCE OF EXPERIMENTS

We commence by giving some brief and non-rigorous background to the decision theoretic approach to asymptotics for statistical experiments . Suppose we have a parameter space servables

X

respectively. 6

=6

and

0

and two experiments with ob-

Y

and families of measures P = {P„} and Q = ÍQ.}, 8 0 Consider procedures (statistics) pp = 6p(X) and p^ =

(Y) for each experiment.

For the loss function

p,6) we may

define the risk function as Rp(pp,0)

= f

l(PP>0)j£Tdpo

(1.1)

;

*" = ÍQQ}} and a sequence of corresponding procedures {p_J , we need to show that

C[(p,{He-,ee@})\Po}

(i,4)

where {H } is a likelihood process (i.e. E_ (H ) = 1, for all 9 6 0). 0 "o « This would then allow us to give asymptotic bounds to "best" (e.g. "minimax") procedures for {£>t}.

In most cases, by a tightness arguw

ment, the joint weak convergence of (p^,

i H follow from the

weak convergence of the second component, at least along subsequences. Thus the essential aspect of verifying the convergence of experiments is in showing the weak convergence of the likelihood ratio processes. Remark 1.

If we think of

then typically

Q^

and

QQ

t(or call it n) as a sample size index will separate as

t -»• °° . So, for the

usual asymptotics to fit into this framework we are required to localize the original experiment about a given value

0q

and define:

Pt = {PtK = Qte0+Hat(e0)'heH} with

Oj.

0

Ci.5)

at rate fast enough to prevent separation of

P^

and

PQ , and slow enough to make them asymptotically distinguishable. Remark 2.

Given the convergence of {P^} to

P

in the sense of

convergence of likelihood ratios, together with weak convergence of procedures {p1"} to of

p

in

be Zocatty of some 2.

p , we define the asymptotic risk of {p^} as that

P . (The loss function asymptotic. p^

and

suAki

if

I p*

is given and fixed). and

P

t

These will

are localized versions

Q t ; see Feigin (1986) for more details.

LOCALLY ASYMPTOTIC MIXED NORMAL EXPERIMENTS

When the (localized) likelihood ratio process converges weakly to that of a normal shift experiment with a random variance then we say that the corresponding experiments are £ o c j x l t y asymptotic, (LAMN).

mixed YionmaJL

This behaviour occurs for many families of non-ergodic pro-

cesses and we will illustrate the counting process case below.

The

LAMN conditions were discussed by Jeganathan (1982), Basawa and Scott

A3 (1983), and others. Given {(?*} as in Section 1, we proceed to formally define the LAMN property at converges to

6 0 e e . We let 0

and

h 6 H .

9t(k) = 9o If

G

+

where {a t = a^Oo)}

is some (finite) vector space

then {&(.} may be a (matrix) linear operator sequence. 1

P" =

= Qg

ck)}

We define

as the local experiment and |At(fe) =

log

^jj-

k 6 H } as the logarithm of the likelihood ratio process. Definition.

{ot} is LAMN at

0q

if there exists a sequence {at}

as above and sequences of random variables {U^.}, {cr2} satisfying: (i)A t (fe) + ^ [ ( t f , - A ) 2 - D ? ] (ii) t [( t ^ ~ ~ by PQ the extension of {Pjj} to V F .

We denote

44 A Taylor expansion in the scalar

0

case will demonstrate the

semimartingale approach to verifying (i)

an

d (ii)•

We denote by [M]

the "square bracket" process of a martingale M, and by vative of

f

with respect to

h , and evaluated at

f

the deri-

h = 0 .

Then

At{h) « fcAt(0) + (l/2)fc2At(0).

(2.4)

We may readily show, under regularity conditions, that A« = { A s t ; 0 < 6 < l }

C2.5)

and R* = A* + [A*] = {A a t + [A]st ;0 < s < l} Gt = {F st ;0 0 n.ß(sup{

w™ , 0 « / « l } > 5 ) -> 0,

7=0,1,2,

as «->oo uniformly in /Se®. III (Asymptotic regularity). The function (0) and its first two derivatives (1,=(9/3/J)f.(0) are continuous in /9e® uniformly in ie[0,1]; they are bounded on ®X[0,1] and (a,ß)

=

-

m'(ß) 59

> 0.

60 Define the Cox estimator fi„ for fi by the condition i i sup f lnT*"s(P)dK = (lnT*"s(j})dM"s, I n * " = c o / f l n * " , / = 1, . . . ,r„) o i

(2)

with r,

i=l Before characterising asymptotic properties of fi„ we give the following definitions: Definition 1. Let / / " ( a , f i ) = (H"(a, ft),F] be an r„-variate predictable process such that - -

2

'

f(H\S")TdM" J » , } =» ¿,7 = 1,2]) (3) 1 0 where AT = Mn(a,fi) = Nn-A"(a,fi), while for each iiefi1 and aeL2(i0>da) S"=S"(a,b) = col{bZm+a, i = l, . . . ,r„}. Therefore concerning the second component solely the above requirement is met under the Conditions I-III with the limiting variance e{K„

c22= j [b2 T

^Co,

(f) constants ^>0, X>0)C>0 exist such that for any 0 e K and n>0 hQCuO>C 1 Lt^ where ^ @ (u) = -in E@ {exp-"X ( < [_,/• + + S Z ^ + a L ^ - I - B a L +Z.(e^ P K- P K + 0 , K = ( 1 + L>= L ( M e m « " ( e ) - M © , M© ). Then: 1) the family P Q satisfies the uniform LAW condition; uniformly w.r.t.Qon any compactum, 2)

Gn= S

, 3) trr,

{ ^ ( e ) (0-©)W(o y 0

2°. Ito-Ventzel's Formula for Semimartingales. Let on some stochastic basis [1} , F P) a semimartingale ^ and a family of semimartingales p(-t,x)~ M (-t ,x) + ACt ,x), X&R 1 ,

65

, P^ 1) the mapping p

: /

x)

be given. Assume that:

is twice continuously

differentiable in the sense of the norm i! • I!- ( i f

$

ia a aemimartingale and S= M +A

"then || SilT = E^.idAilt E ^[M])

with the second derivative p x x

being continuous in x

for all -fc

and co , the processes

F*(t,0) , F (-t, O )

" ^ p I Fx* (t,>Ol = ^ (-t) , \ X I ^ V 0 1 2) £ C T b^-p I x) iiaAsl£is concave a n d l ^ F s . all ueCb(D),a11 (s,x) in R + x F d ,

for

70

n ( u ) = s u p { F ( P ) - í u d P ; PeProb(s,x,L)} s

s

J D

=sup{H.(P)-JudP ; P€Prob(s,x,L)) D

With the notations of Theorem I we have then the following COROLLARY. For all (s,x) in R+xRd, H (P, s

,)=sup{F (Q) ; QeProb(s,x,L)}=F (P,

(s,x)

s

s

,)=0,

(s,x)

Thus the procedure presented in Theorem 1 gives the maxima of both the entropy and the free energy together with the Markovian property. ACKNOLEDGEMENTS. The author wish to express his gratitude to the Academy of Sciences of the USSR. This research was partially supported by UNDP-UNESCO grant and FONDECYT grant * 1087/86.

REFERENCES. BOBADILLA,Gladys (1986a).Problemas de Martingalas sin condiciones de unicidad.Existencia y Aproximación de soluciones Markovianas Fuertes.Doctoral Thesis,Universidad Católica de Chile. (1986b).Une méthode de sélection de probabilités liarkoviennes.C.R.Acad.Sci.París,t.303,Série 1/4,147-150. TAKAHASHI,Yo1ch1ro(1984).Entropy Functional for Dynamical Systems and their Random Perturbat1ons.ln:Stochastic Analysis,Proceedings of theTaniguchi Int.Symp.Katata and Kyoto,K. I to (ed.),NorthHolland, Amsterdam-N.Y.-Oxford,437-467.

ASYMPTOTIC INFERENCE FOR THE GALTON-WATSON PROCESS WITHOUT RESTRICTION TO THE SUPER-CRITICAL CASE David J .

Scott

Department of S t a t i s t i c s ,

La T r o b e U n i v e r s i t y ,

Consider the Galton-Watson process distribution

p

Bundoora,

{Z^;n>0 }

with

Australia. offspring

g i v e n by

8

=pr(z=k|z =1) = a . 0 k / A ( 6 ) n n-1 k

p . (k) ö

k = 0 , l ,2,...

for

,

6 £ 0

.

Then u(e)

=E(zJz0=i)

=eA'(6)/A(0)

and p2(9)

= E[(Z1-y (6))2|z0=1]

=6y'(0)



Set Y = Z . + Z, + . . . + Z . n 0 1 n The maximum l i k e l i h o o d e s t i m a t o r

(MLE) o f

U(6)

is

y (6 ) = (Y -Y ) / Y n n u n-l and

9

n

t h e MLE o f •

(

V

Y

0

) / Y

0

is the solution

n-1



L e t t h e p r i o r d i s t r i b u t i o n of posterior $(x)

5

n »

d i s t r i b u t i o n by =

(2TT)-ì

[

e~U

of

0

be d e n o t e d by

¥ (6 | Z , . . . Z^) / 2

du

,

W J H n - l * *

and

71

n(0)

and d e f i n e

,

the

72

u (a,b) = {9

(6 ) + a a < y (9 ) < y (9 ) + b a } . n n n n

Heyde (1979) proved the following. Theorem. and

p e

o

If

6 ->• 9„ , Y , °° , n 0 n-1 is non-degenerate then

IT (6 I Z

tt(9)

, . . . Z N ) D 9 -»• $ (b) - $ ( a ) n

- VJ (a,b)

is continuous at

9„ 0

.

From this Theorem, for

n and Y large, n-1 the probability assuming the prior II ,

letting

P

TT

denote

ir(9 Z . . . . Z )d9 0 n u(i,°°)

P (u(0)>l) = TT

Of 1 - i([l-y(8 )]/a ) n n which allows calculation of the posterior probability that the process is supercritical. Heyde's result is notable in that in contrast to the well-known results using a frequentist approach, no restriction to the supercritical case is required.

A corresponding result may be obtained

using a frequentist approach, the essential change being that the asymptotics are as Theorem. of

converges to infinity.

For the Galton-Watson process if

9^

is the true value

9 , Pr( I 9 -9 1l>e|Y 1 ' n 0 n-1

0

as

N

00

and sup|Pr([Y(9n ) - Y(9 )]/A < X|Y , > N) - 0(x) I -»• 0 x£R 0 n ' n-1 as

00

N

.

This result justifies an asymptotic test of HQ

: U(9) < 1

which is to reject

vs

^

HQ 1

: Y(6) > 1 if

u(6 ) > 1 + a $~ (l-a) . f n n

73 The P-value of the observation

u(9 ) n

for this test is

1 - $C [u(6 ) - l]/a ) n n = $([1 - u(§ )]/a n n which is the same as the posterior probability as calculated using Heyde's result. Heyde's result is a form of asymptotic posterior normality and it is of interest to compare it to the usual Bernstein-von Mises Theorem which also gives asymptotic posterior normality. Consider observations

X, ,X_,... X from a stochastic process 1 2 n p (x.,x„,... x |9). Let 9 be the MLE of 0 and n 1 2. n n let &n(6) = log p (X,,...,X I6) be the log-likelihood. Suppose n 1 n ir(8) is a continuous prior density for 8 and ir (6 X, ,X_, . . . ,X ) 1 2 n is the posterior'density.

with density

A fairly recent version of the Bernstein-von Mises Theorem was given by Heyde and Johnstone (1979). Theorem.

Under regularity conditions

+b a n n tt (6 IX ... X )d6 -+• $ (b) - $ (a) 1 n +a a n n

JS (where

a = [-£"(8 )] *) n n n

in

P. -probability, 8Q

It is important to note that the regularity conditions involve convergence of various quantities in

P. -probability and the result 0 O also gives convergence in P„e -probability. The Bernstein-von Mises 0 Theorem can actually be viewed as being a composition of two statements. The first is analytic, and is that for certain sequences of observed values asymptotic posterior normality holds.

The second is that with

P 0 -probability approaching one, the observed values of the process are such that asymptotic posterior normality holds.

Then Heyde's

Theorem concerning asymptotic posterior normality of the Galton-Watson process consists of the analytic part of the Bernstein-von Mises Theorem only.

This suggests it can be obtained by stripping off the

probabilistic aspects of Heyde and Johnstone's proof of the Bernsteinvon Mises Theorem.

This does indeed work, producing a new result

from which Heyde's Theorem may be obtained.

74 Theorem.

If

8 +b a n n tt

, , 3 +a a n n where

8 n

6„ , Y 0 n-1

(9 I Z _ , .. . , Z )d6 0 n

and

it (8)

is continuous at

6. 0

4>(b) - $(a)

a = 6 /(Y ^ (8 ))* . n n n-1 2 n The connection between this result and Heyde's Theorem is quite

simple.

The theorem above states that asymptotically 6 ~ N(6 ,ct2) . n n

Thus asymptotically U(9)

~N(vi(8n),CT^)

where

which is Heyde's result. References Heyde, C.C. (1979). On assessing the potential severity of an outbreak of a rare infectious disease: a Bayesian approach. Austr. J. Statist. 21, 282-292. Heyde, C.C. and Johnstone, I.M. (1979). On asymptotic posterior normality for stochastic processes. J. Roy. Statist. Soc. B, 41, 184-189.

CROSS-VALIDATION (Session 3) Chairman: D.V. Hinkley

ON RESAMPLING M E T H O D S F O R C O N F I D E N C E LIMITS David V. Hinkley Center for Statistical Sciences and Department of Mathematics T h e University of Texas at Austin SUMMARY Some recent research on bootstrap resampling methods is reviewed. Topics include: Monte Carlo and theoretical approximation as efficient alternatives to naive simulation; construction of approximate pivots; inversion of bootstrap tests; and conditional bootstraps. T h e majority of the discussion is addressed to statistics based on homogeneous random samples. Key words and•phrases:ancillary statistic, bootstrap, double bootstrap, likelihood, Monte Carlo, pivot, saddlepoint method. 1.

INTRODUCTION This is a review of some bootstrap techniques associated with confidence

limit calculations. T h e objective is to be reasonably comprehensive, and to introduce some topics of current research interest. Our starting point is a s u m m a r y and illustration of the basic bootstrap method for homogeneous, independent data; see Efron (1982). Suppose that x — ( x i , . . . , x n ) is a random sample of fixed size n from an infinite population for which P r ( X < x) = F(x)

is the cumulative distribu-

tion function of a randomly sampled datum. T h e population quantity 6, which is a differentiable functional t(F),

0 is

estimated by

is of interest. It is assumed that

T = T{xi ,...,£„) = t(F),

tribution function; defined by nF(x)

where

F

is the empirical dis-

= card{i : X{ < x}, and the functional

¿(•) is assumed regular. For the purpose of calculating confidence limits, distributions of quantities D such as D = T — 9 are required. Because F will be unknown, although possibly belonging to a known family indexed by $ and nuisance parameters, it will be usual to estimate this distribution by F, say, and thence to estimate

78

the distribution of T — 6. If the latter step is not amenable to theoretical calculation, then we can approximate the distribution by a Monte Carlo simulation method. To be specific, consider D = T — 6 and its distribution function G(d)

=

G(d, F) = P r ( T - 9 < d \ F), which is to be estimated by G(d) = G(d, F) = P r ( T - 6 < d \ F) .

(1)

The simplest Monte Carlo technique for approximating G is as follows: Step 1°. Use a Monte Carlo simulation method to generate a random sample x* = ( x j , . . . , x* ) from F. Step 2°. Calculate T* = T ( x j , . . . , x * ) and thence the simulated value T* - T , which is to F what T - 6 is to F. Step 3°. Perform Steps 1° and 2° a total of B times and approximate G{d) by GB(d)

= B'1

freq{T* - T < d} .

(2)

Superscript * will always denote a random variable generated from F. When F is used for F in Step 1°, x* is obtained by uniform random sampling with replacement from x - hence the name "resampling method." While this nonparametric case is of most interest, many theoretical aspects of bootstrap methods Eire most easily discussed in the parametric case where F belongs to a known family. The estimated distribution G yields confidence limits in the usual way. Thus if dp = G - 1 ( p ) , then the lower 1 — a confidence limit for 6 is T — and the upper 1 — a limit is T — da. If the Monte Carlo approximation (2) is used, then G g 1 ( p ) is taken to be the [ ( B + l ) p ] t h ordered value of T* — T. Example 1. Suppose that the first row of Table 1 is a random sample from a population whose mean is 6 = J xdF(x)

= fi, and that we estimate ¡1 by

T = x = n - 1 E x j , whose value is 17.87. We wish to estimate the distribution of x — ft and hence obtain an upper 90% confidence limit for fi. (Superior alternatives to use of x — ¡i will be discussed later.) If F is assumed to be a normal distribution, then we estimate F by the N(T,a2)

distribution with a2 = n _ 1 E ( x j — x ) 2 , and thence calculate G the-

oretically to be G(d) = $(y/nd/a).

Because &2 = 46.53, G(d) = $ ( 0 . 4 6 d )

for these data. Then the 90% upper confidence limit for fi is T — G - 1 (0.10) = 17.87 — ( 0 . 4 6 ) - 1 < J > - 1 ( 0 . 1 0 ) = 20.63.

79

If nothing is assumed about F and we take F = F, then the estimate (1), now written G(d) = Pr(x* — x \ F), will be approximated by (2) using the simulation resampling procedure. A very small application with 5 = 9 is illustrated in Table 1, wherein each sample x* is given in the equivalent form of frequencies /* for data values x;. We approximate such that m n

1

-*• 0.

Let a and b be such that m,n m,n mn

where

T

a (x)

n

1

Xra (X ) - 1. j=l m,n, ' = x i f l < a < 2

m (6) Let H (x, oj) = P(( I m,n

and

nX. - b m n I Ta( 3 ' )= 0 j=l m,n

and x(x) of (2) for 0 < a < 1.

Y. - b ) a~ ] m,n m,n



< x |x") 1

98

Theorem 3 : Let a < 2 and m, n •+• " such that mn ^

0.

Let

H (.,.) be as in (6) . Then m,n sup|H

x

m.n

(X,OI) - G (x) | 5 0

a

where G (.) is the distribution function of a stable law of order a a whose characteristic function i(i(t) is given by (t) = exp (/(eltX-l-it xa(x))Aa(dx) where A (.) is as in (3). m -1 Notice that (,I,Y, - b ) a is the bootstrap version of the staj=l j m,n m,n

tistics (. E,X. - L )a where L and a are as in case (i). j=l ] n n n n This indicates that the bootstrap method works when a < 2 provided the resample size in is small campared to the original sample size n. in

Details of the proof of the results of this paper may be found 1,2 J.

REFERENCES: 1. Athreya, K.B. (1986) Bootstrap of the mean in the infinite variance Case-I and II, Technical Reports 86-21, 86-22 of the Department of Statistics, Iowa State University, Ames, Iowa, 50011. 2. Athreya, K.B. (1986) Bootstrap of the mean in the infinite variance case To appear in the Annals of Statistics (1987). 3. Bickel, P.J. and Freedman, D. (1981). Some asymptotic theory for the bootstrap. Annals of Statistics, 9 1196-1217. _ 4. Efron, B. (1979) Bootstrap methods - another look at the Jack knife. Annals of Statistics, 7, 1-26. 5. Feller, W (1971) An introduction to Probability Theory and Applications. John Wiley, N.Y. 6. Singh, K (1981) On the asymptotic efficiency of Efrfln's bootstrap, Annals of Statistics, 9, 1187-1195.

AUTOMATIC CURVE SMOOTHING

Wolfgang Härdle Institut Wirtschaftstheorie II Universität Bonn Adenauerallee 24-26 D-5300 Bonn, Federal Republic of Germany

1. INTRODUCTION Regression smoothing is a method for estimating the mean function from observations ( x ^ ) , . . . , ( x n » Y n ) °f the form Y. = m(x.) + e., 1 I I

i=1,...,n,

where the observation errors are independent, identically distributed, mean zero random variables. There are a number of approaches for estimating the regression function m. Here we discuss nonparametric smoothing procedures, which are closely related to local averaging, i.e. to estimate m(x), average the Y ^ s which are in some neighborhood of x. The width of this neighborhood, commonly called bandwidth or smoothing parameter, controls the smoothness of the curve estimate. Under weak conditions (bandwidth shrinks to zero not too rapidly as n increases) the curve smoothers consistently estimate the regression function m. In practice, however, one has to select a smoothing parameter in some way. A too small bandwidth, resulting in high variance, is not acceptable and so is ovzAimooth-ing which creates a large bias. It is therefore highly desirable to have some automatic, cusivz smoothing procedure. 99

100

Proposed methods for choosing the window size automatically are based on estimates of the prediction error or adjustments of the residual sum of squares. It has been shown by Hardle, Hall and Marron

(1986)

(HHM) that all these proposals

are asymptotically equivalent but can be quite different in a practical situation. In this paper we highlight these difficulties with automatic curve smoothing and construct situations where some of the proposals seem to be preferable. 2. AUTOMATIC CURVE SMOOTHING To simplify the presentation, assume the design points are equally spaced, i.e.

x^=i/n, and assume that the errors

have equal variance, Ee^ = a^. We study kiKnut m. (x) = n h

1

h

AmoothzKi

n x-x. I K(—¡-^)Y. ,=1 h i

1

where h is the bandwidth and K is a symmetric kernel function. It is certainly desirable to tailor the automatic curve smoothing so that the resulting regression estimate is close to the true curve. Most automatic bandwidth procedures are designed to optimize the averaged squared error

(ASE) d A (h) = n" 1

S^[m h (x i )-m(x i )] 2 w(x i ),

where w is some weight function. These automatic bandwidth -1

n

-

2

selectors are defined by multiplying p(h)=n E (Y.-rtu (x.)) w(x.) 1 _1 i=1 1 n 1 by a correction factor H(n h ). The examples we threat here are General Gross-Validation

(Craven and Wahba 1979),

H G C V ( n " 1 h " 1 ) = (1 - n" 1 h~ 1 K(0))" 2 . Akaike's Information Criterion 1

1

1

(Akaike 1970), 1

E A I C ( n " h ~ ) = exp(2n~ h" K(0)). Finite Prediction Error 1

1

(Akaike 1974),

S r ,„,(n~ h~ ) = (1 + n~ 1 h" 1 K(0))/(1

- n"1 h" 1 K (O) ) .

101

A model selector of Shibata (1981), E s (n" 1 h" 1 ) = 1 + 2n~1h~1K(0). The bandwidth selector T of Rice (1984), S T (n" 1 h" 1 ) = (1 - 2n~ 1 h" 1 KC0))~ 1 . Let h denote the bandwidth that minimizes (p-S)(h). The automatic. cuA-ve imoothiK is defined as mc n (x) . This automatic curve smoothing procedure is asymptotically optimal for the above E in the sense that

where h Q denotes the minimizer of d^. The relative differences are quantified in the 1 /5 then Theorem. Let fi o ~ n n 3/10 (fi - h ) + N(0,a2) o n[d A (h) - d A ( h 0 ) ] - C-xJ.

in distribution, where a 2 and C are defined in HHM. A very remarkable feature of this result is that the 2 constants a and C are ¿nd.zpe.nde.nt ofi E. In a simulated example we generated 100 samples of size n=75 with a=0.05 and m(x) = sin(X2irx). The kernel function was taken to be K (x) = (1 5/8) (1-4x2) 2 I (| x| ,(E(1),... ,E(K)), where E(k) = s

(k). If Z is an ordered set, this corresponds to a

preference or hierarchical classification. The fuzzy classification problem corresponds to Z=(z^,...»z^) where Zjj. are real numbers such that z ^ s O , £ z^=1, or, formally, in this case Z is a simplex whose vertices correspond to the numbers of classes and the coordinates of points to the probabilities that an object belongs to a class.Obviously, the availability of a priori information concerning

1 14

the type of a n AC problem and the admissible classifica-r tions leads to constraints w h i c h pinpoint a set S(E) in the set of all mappings. Por example, if there is a training (verified) subsample E 1

(k) in E, then one ob-

tains the conditions s(x)=k for x ^ E ^ C k ) . Thus to specify S, one needs to formalize the problem, i.e., to set Z and to state the conditions which single out S i n the set of all the mappings E—*Z. L ( E ) is the set of the descriptions of E in the framework of a given algorithm. This component relates to the choice of the classification means a n d corresponds to the representativeness subspace w h i c h is understood in a broader sense than i n Diday et al.(l979). The set L is regarded as a subset of the set of all mappings Z—*-Y, where Y is the set of the values w h i c h represent the classification results. Por example, if E C R

p

and

Z=[l ,...,k]

and

if every class is supposed to be described by the sample mean, then Y = R P and L(E)={ Z — y ] =R P * .. .xR p . If a class is described by a standard and if the standard of the k t h class is known to be in a a n-neighborhood of the r e n presentative y^, specificolly, may be zero for some k, then L(E) = ¿( y i

yk),

6 R P , j| y ^ l l ^ S J • If the

classes are described by standard sets, then the corresponding part of the set of all subsets of E is taken as Y. The set Y can be of a completely different nature than the space of the characters to be measured. Moreover, Y itself can consist of spaces w h i c h differ in nature as required by the standards of different classes. Note that o n this w a y we can incorporate m a n y well-known algorithms which cannot be described by the above-mentioned nuées dinamiques method (for example, F0RE1, see A i v a z y a n et al.(1974)) and construct some n e w algorithms. R(E) is the set of certain finite subsets of E, the so-called 'portions', into w h i c h E is subdivided for classification. This component is introduced in order that one can treat algorithms both in parallel (R(E) consists

1 15

of a single element symbolizing the entire set E), and in series (for example, when objects are classified in a one-at-a-time manner, R ( E ) = E ) . Note that although Diday et al.(l979) describes only parallel procedures, the nuees dinamiques method for serial procedures was developed in Diday(1975). JC

is an operator from S X L X R

into S called a classi-

fier since it shows how to apply the available means Y to the AC problem of type Z in order to pass from the state s n of sample E with the description 1 8

v

given the current portion

r

n+ -]

0 ) ,

then some basic sta-

tistical assumptions of the classical asymptotic analysis (which are valid for m — a n d

p=const) are violated.

1 18

In situations like this it is required that the statistical properties of the employed rules and procedures are analyzed under the conditions of the above-mentioned asymptotics (which is often called the Kolmogorov asymptotics). Specifically, the paper of Tsibel*

(1987) is

devoted to these problems. In the theory and practice of automatic classification it is important that the dimension of a mathematical m o del 1 is chosen correctly depending on the sample size m . The formulation and solution of such problems can be found in Enyukov 2.4

(1986).

The methods of constructing partitions stable to variation in the controlled free parameters of an AG algorithm

The idea of multiple solution (by different methods) of the same problem and subsequent selection of most frequent variants have largely been used in statistics. This idea underlies the approach (see Aivazyan et al.(l983) and Aivazyan (1980)) to developing the statistic methods which enable us to obtain inference stable to variation in initial conditions on the nature and accuracy of data. Specifically, it is suggested that an AC problem (in its optimization statement) should be multiply solved for different objective functions, for example, for a parametric family of objectives. As a result we obtain the set of partitions into classes: every objective is associated w i t h its best method and conversely every best method corresponds to its partition. In the set obtained one has to select one or several partitions which are relatively stable to changing the objectives. Obviously, the change in values of the controlled free parameters of an AC algorithm, in particular, the parameters which determine the specific form of an objective, is equivalent to varying the initial conditions concerning the

1 19

nature and accuracy of the data under classification. 2.5

The employment of training elements in the choice of an appropriate metric for AG problems

The definition of the distance between the objects (or the groups of objects) to be classified is a 'bottleneck' of AC theory. Usually, a priori information on the probabilistic and geometrical nature of the multivariate observation is lacking. In such a case the successful choice of a metric depends on the statistician skill of formalizing the professional knowledge and intuition of the expert in the area where the AC problem arises. An interesting approach to using training elements for the 'adjustment' to an appropriate metric is suggested, for example, in Diday and Moreau (1984). 2.6

Estimation of the number of classes in AC problems

The problems related to estimating an integer parameter are traditionally difficult in mathematical statistics (for example, estimating the number of factors in factor analysis, the number of basic functions in regression analysis, etc). In automatic classification the problem of estimating the unknown number of classes can be stated (in probabilistic terms) as a problem of determining the number of the modes of a multivariate density (in the nonparametric formulation) or the number of components in the mixture of distributions characterizing the multivariate observation to be classified (in the parametric formulation). Some interesting results both asymptotic (the amount of observations grows infinitely) and nonasymptotic have been obtained in the latter case (see Orlov (1983), Tsibel' (1987)).

120

REFERENCES Aivazyan, S.A. (1979). Extremal formulation of the basic problems of applied statistics. In: National School on Applied Multivariate

Statistical Analysis A l g o -

rithms and Software. Computer Centre of the Planning Committee of the Armenian SSR, Erevan, pp. 24-49. Aivazyan, S.A. (1980). Statistique mathématique appliquée et problème de la stabilitée des inferences statistique. In: Data Analysis and Informatics. North-Holland Publ. Comp. Aivazyan, S.A., Bezhaeva, Z.I., and Staroverov, O.V.(1974). Classification of Multivariate

Observation . Sta-

tistika, Moscow. Aivazyan, S.A., and Bukhshtaber, V.M.(1985). Data analysis, applied statistics, and constructing a general theory of automatic classification. In: Diday et al. (1979)(Russian translation), pp. 5-22. Aivazyan, S.A., Enyukov, I.S., and Meshalkin, L.D.(1983). Applied statistics: introduction to modelling and primary data processing. Pinaney I Statistika, Moscow. Bukhshtaber, V.M., and Maslov, V.K.(1977). Factor analysis and extremum problems on Grassmann varieties. In: Mathematical Methods of Solving Economic Problems, N 7, pp. 87-102. Bukhshtaber, V.M., and Maslov, V.K.(1980). The problems of applied statistics as extremum problems on irregular domains. In: Algorithms and Software of Applied Statistical Analysis. Uchenye Zapiski po Statistike. v.36, Nauka, Moscow, pp. 381-395. Bukhshtaber, V.M., and Maslov, V.K.(1985). Tomography methods of multivariate

data analysis. In: Statis-

tics. Probability. Economics. Nauka, Moscow, pp.108-116 Demonchaux, E., Quinqueton, J., and Ralambondrainy, H. (1985). CLAVECIN: Un systeme expert en analyse de donnees. Rapports de Recherche, N 431. Institut Natio-

121

nal de Recherche en Informatique. Le Ghesnay. Diday, E., et al.(1979).Optimisation en classification automatique. Institut National de Recherche en Informatique et en Automatique, Le Ghesnay. Diday, E., and Moreau, J.V.(1984). Learning Hierarchical Clustering from Examples. Rapports de Recherche N 289. Institut National de Recherche en Informatique et en Automatique, Le Chesnay. Enyukov, I.S.(1986). Methods, Algorithms, and Programs of Multivariate

Statistical Analysis. Finansy I

Statistika, Moscow. Enyukov, I.S.(1986). Projection pursuit in reconnaissance data analysis. Reviews of the First World Congress of the Bernoulli Society for Mathematical Statistics and Probability. Tashkent. Friedman, J.H., and Tukey, J.VV. ( 1974) • A projection pursuit algorithm for exploratory data analysis. IEEE Transaction on Computers, C-23, pp. 881-890. Girko, V.L.(1935). 'Struggle against dimension' in multivariate Multivariate

statistical analysis. In: Application of Statistical Analysis in Economics and

Quality of Product. Tartu, pp. 43-52. Hahn, G.J.(1985). The American Statistician, v.39, N 1, pp. 1-16. Huber, P.J.(1985). Projection pursuit. The Annals of Statistics, v.13, N 2, pp. 435-475. McQueen, J.(1967). Some methods for classification and analysis of multivariate observation. Proc. Fifth Berkley Symp. Math. Stat, and Probab.,v.1, pp.281-297. Orlov, A.I.(1983). Some probabilistic aspects of classification theory. In:Applied Statistics. Nauka, Moscow, pp. 166-179. Sebestyen, G.S.(1962). Decision Making Process in Pattern Recognition. The McMillan Company. Shlezinger, M.I.(1965). On spontaneous pattern recognition. In: Reading Automata. Naukova Dumka,Kiev,pp.88-106

122

Tsibel', N.A.(1987). Statistical investigation into the properties of the estimates of multivariate analysis model dimension. Ekonomika I Matematicheskie Metody, the USSR Acad, of Sci. (in print). Vapnik, V.N.(1979). Restoration of Dependences from Empirical Data. Nauka, Moscow.

DATA ANALYSIS : GEOMETRIC AND ALGEBRAIC STRUCTURES Fichet B. Laboratoire de Biomathématiques - Faculté de Médecine Université d'Aix - Marseille II.

Here we present a survey of mathematical

structures which arise in

data analysis. After recalling the fundamental triple of data analysis, we investigate the usual representations of this triple : Euclidean embeddings, Lp-embeddings, hierarchies, pyramidal representations, additive trees, star graphs... Each representation corresponds to special dissimilarities and the set of these dissimilarities is shown to be a cone in a finite dimensional vector space. We examine the respective inclusions of the cones, as well as their geometric nature, especially convexity and closure. Then, approximation problems may be studied : least squares approximation with respect to a given norm, subdominant (or submaximal) and superdominated (or superminimal) approximation, additive constants... For all these mathematical aspects, many problems remain unsolved and some conjectures have been made. Finally, we pay particular attention to monotone transformations on data. They play an important role in data analysis and we will present their impact on the afore-mentioned representations.

123

124

1 - DATA STRUCTURES The main basic concept in data analysis is dissimilarity. A dissimilarity d on a finite nonempty set I is a nonnegative real function defined on I 2 such that : Vi £ I, d(i,i) = 0 ; V(i, j) £ I 2 , d (i , j) = d(j,i). The finite dimensional vector space of real functions w h i c h satisfy the same conditions will be denoted by D, and a dissimilarity is a n element of the positive orthant D + . In practice, I represents

indivi-

duals, quantitative variables, categories of a qualitative variable, logical or presence-absence characters... Let us assign a mass m^ (i.e. a positive number) to each unit i in I. These masses arise e s sentially in view of approximation problems. The fundamental triple (I,d,{m^,i€l}) will be called a data structure. Here we recall some common data structures. For a set I of individuals and a set J of quantitative variables, let X^ be the observation of the variable j on the individual i. Denoting by Oj , Pjj, respectively the standard deviation of the variable j and the correlation coefficient of the variables j and j 1 , and supposing cjj > 0 for every j, we generally consider the data structures (I,d^,{m^,i£l}) and ( J , d ^ , j £ J } )

such that :

xJ-xJ,

v(i.i') e i 2

d 2 (i,i') = ? 1 J

v(j,j') e J 2

(l/2)dJ2 (j,j') =

Vi € I, n^ = 1 /111 a.

J

1 - p. JJ

V j 6 J, n. = 1 J

Let I and J be two qualitative variables and {f „ ,(i,j)£IxJ} be a frequency table (derived from a contingency table). We use the following notations : Vi £ I, f. = E f. . ; Vi £ J, f . = E f. . 6 i- j ij -J i iJ Supposing that the previous quantities are all strictly positive, we generally consider on X the data structure V(i,i') £ I 2

d2(i,i') = E

f.. f.,. 1J _ J

,i£I}), where :

(X 2 metric)

f. l.

and a symnetrical data structure (J,d ,{f .,j£J}) on J. J .j

125 For a set I of individuals and a set J of presence-absence

characters,

let x? b e the observation of the character jJ on the individual i 1 (xJ equals 1 (or 0) for presence (or absence) of j). We use the folV(i,i') £ I 2 , m. .. = E ¿ d ; Vi £ I, m. = E x] . i i . i l 1 . 1 J J Then, w e may consider o n I the data structure (I,d ,{m^,i£l}), where : lowing notations

V(i,i')£I2,

:

(l/2)d 2 (i,i') = 1 - m i i , /•m.nu,

and a symmetrical d a t a structure

(Ochiai's metric)

(J,d J ,{n.,j£J}) on J. J

A dissimilarity d on I is said to be : proper iff d ( i , j ) = 0 implies : i = j semi-proper iff d ( i , j ) = 0 implies

: V k £ I , d(i,k) = d(j ,k) .

If d is semi-proper, an equivalence is introduced as follows : i ~ j iff d(i,j) = 0 . Then, a proper dissimilarity d may be constructed on the quotient space I ; we have 3(i,j) =

d(i,j), where i (resp. j) is in i (resp. j).

Moreover, if masses are assigned to units i in I, we put : Vi £ I, nu1 =

I {m. Ii£i} .

Then, (I,d,{m»,i£l}) is called the induced quotient data structure.

It is usual practice to aggregate units

w h i c h are equivalent. In a

mathematical sense, a property of d w h i c h is preserved on the quotient space, has only to be proved w h e n d is proper.

2 - REPRESENTATIONS IN DATA ANALYSIS Given a data structure, different graphical representations may be proposed. E a c h of them corresponds to a particular

dissimilarity.

Here w e recall some usual representations and their associated dissimilarities .

-

Lp-spaces

A dissimilarity d on I is said to be L^ iff there exist an integer N and a family of real numbers {x^ , j = 1,...,N ; i£l} satisfying : V(i,i') £ I 2 , d P (i,i') =

Z |x|-xj, | P .

126

It admits an embedding in an L^-space, and the set of such dissimilarities will also be denoted by L . P Two particular cases an noteworthy. Euclidean dissimilarities (i.e. dissimilarities in I^) which yield a Euclidean representation, and the city-block semi-distances (i.e. dissimilarities in L.) which yield an L.-representation .

Euclidean representation

L. - representation

Mathematically, it is useful to consider the sets L^ , obtained from L

by the transformation d i—* dP. In particular,

L„ is the set

of squared Euclidean dissimilarities. Moreover, every finite metric space is shown to be embedded in an L^-space. According to this property, the set of semi-distances on I will also be denoted by LCO J

- Hierarchical_re2resentation A hierarchy on I is a class H of non-empty subsets, satisfying : i)

i

;

ii)

V(H,H')£H 2 , HflH' 6 {H,H',0}

iii) V H EH , U{H'£ H , H'crH, H'^ H} 6 {H,0} If f : H i-» R

is such that :

iv)

(H (p + 2) and a preordonance on I such that there exist figures satisfying the inequalities of the preordonance only in spaces with dimension p and (n - 1).

We end here our survey. Obviously, many other problems could have b e e n raised in this field. However, we hope to have shown that, even w i t h i n the context evoked here, a great number of questions open. A note on bibliography

remain

: the field is so vast that we have b e e n

obliged to suppress all bibliographical

references.

Bernoulli, Vol. 2, pp. 133-136 Copyright 1987 VNU Science Press

GENERALIZED CANONICAL ANALYSIS

Michel TENENHAUS, Centre HEC-ISA 1, rue de la Libération, 78350 JOUY-EN-JOSAS, France

Introduction Canonical analysis has been considered for a long time as a method having a real theoretical interest, but few practical applications. Situation is changing as we can see from the new book of R. Gittins (1985) : "Canonical analysis : a review with applications in Ecology". Generalized canonical analysis (Mc Keon, 1965 ; Carroll, 1968 ; Kettenring, 1971) includes canonical analysis as a particular case (and, consequently, multiple regression, analysis of variance, discriminant analysis, correspondence analysis, ...) and also principal component analysis and multiple correspondence analysis. This method is also useful to study a population described by several numerical or nominal variables and evolving in time. So Generalized Canonical Analysis represents a remarkable synthesis of multivariate linear methods. We present in this paper Generalized Canonical Analysis from a geometrical point of view, showing its relationship with Principal Component Analysis (Saporta, 1975 ; Pages, Cailliez, Escoufier, 1979). This permits many numerical simplifications useful for the writing of a computer program.

I - GENERALIZED CANONICAL ANALYSIS

1. The data

We consider

p

data tables

X j , . . . , X t ,..., Xp. Each table X t

is formed with n

rows

corresponding to the same n subjects, the columns represent standardized numerical variables or dummy variables associated with the categories of nominal variables. In other words the data can be numerical or nominal variables. If the variables are nominal they are transformed in binary tables.

2. The problem

We look for standardized and uncorrelated variables z j , . . . , z m maximizing the quantity m p (1) Z Z R 2 (z h , X t ) h=l t=l where R2 (z h , X t ) represents the coefficient of determination between variable z h and table X t . 3. The solution 133

134 The centering operator P 0 is equal to I -(1/n) u u' where u is a column vector of n ones. We denote by P t the projection operator into the subspace L(Xt) of R n generated by the columns of table Xt. The coefficient of determination R 2 (z[,, Xt) being equal to (1/n) z'j, P t zj,, quantity (1) may be written (2)

m (1/n) I z'h P z h , h=l

P where P = X P t . The standardized and uncorrected variables zj z m maximizing (2) are t=l obtained as the eigenvectors of the matrix P 0 P associated with the m largest eigenvalue m Xj,...,X m ranked in a decreasing order. Then the maximum of (2) is equal to £ h=l In effect it will be enough to diagonalize the matrix P . Vector u is an eigenvector of P associated with the eigenvalue XQ equal to the number of tables X t containing a binary table. Consequently the eigenvectors of the matrix P 0 P are the eigenvectors of the matrix P different from the eigenvector u associated with the eigenvalue XQ .

4. The associated principal component analysis

We denote by X = [Xj, ..., X t ,..., Xp] the data table obtained by horizontally adjoining the several X t , M the block diagonal matrix formed with the generalized inverses n(X't Xt)" of the matrices (l/n)X t X t , N = (1/n) I . Generalized canonical analysis of the tables X j , . . . , Xp is a principal component analysis of the triplet (X, M, N), (Saporta, 1975). The standardized principal components of the triplet (X, M, N) are the canonical components zj

z m . They are

independent from the chosen generalized inverses n(X't Xt)". The duality diagram (Pages, Cailliez, Escoufier, 1979) associated with the triplet (X, M, N) (figure 1) gives immediatly useful relations for numerical calculations : going from calculations in R n to calculations in Rk , where k is the number of columns of X.

135

X

E.Rk

N • (1/n) I

1

E"

=> F-R"

X

Fifurt 1 Duality diagram associated with the triplet (X, M, N)

The projection operator P t into the subspace L (X t ) may be written P t = X t ( X j Xt)" X"t and so we obtain WN = P . The following relations are useful for the writing of a generalized canonical analysis computer program. 1) zj, = Xjj

x q ^ where the factors o) (3) where E^ is the averaging operator with the density f(z). Here the "theoretical" quantity of the PI is designated by Qg(U,X) in contrast to the sampling value Give without the proof the inequalities connecting the (3) with^the ratio t 2 (U) g(e,B) Q > 2 + B ) (1 +t2CD) )B/2V 2 ) a r e s m a l l enough i n c o m p a r i s o n w i t h t2» t h e n f r o m t h e r e a s o n s of t h e c o n t i n u i t y t h e same two v e c t o r s e x t r a c t t h e main p a r t of s u c h i n f o r m a t i o n . On t h e o t h e r hand, i f a l l the eigenvalues t . are approximately e q u a l i t i s i n d i f f e r e n t which v e c t o r s t a k e f o r p r o j e c t i n g i f only they belong t o t h e R+, but i t i s s a t i s f i e d . REFERENCES

F r i e d m a n J . H . . T u k e y J.W. (1974) A p r o j e c t i o n p u r s u i t a l g o r i t h m f o r e x p l o r a t o r y d a t a a n a l y s i s . IEEE T r a n s . Comput., C - 2 3 , 8 8 1 - 8 8 9 Huber P . J . (1985) P r o j e c t i o n P u r s u i t . Ann. S t a t i s t . , •15, 435-^7* R a o C . R . (1965) L i n e a r S t a t i s t i c a l I n f e r e n c e s and i t s A p p l i c a t i o n s .New York, Wiley Yenyukov I . S . (1986) M e t h o d s , a l g o r i t h m e s and programmes of t h e m u l t i v a r i a t e s t a t i s t i c a l a n a l y s i s ( i n r u s s i a n ) . F i n a n c e s and S t a t i s t i c s , Moscow

THE INVERSE PROBLEM OP THE PRINCIPAL COMPONENT ANALYSIS Zhanatauov S.U., Computing Center, Novosibirsk, USSR In data analysis it is usually impossible to connect a single real multivariate sample with one of the theoretical distribution functions and to obtain additional samples from the same population. In this situation one obtains on the computer the artificial samples, which in some way or other are similar to a real multivariate sample. Let N (x,W) be a set of multivariate samples X° = {f? , J} . , S x9 = pc? ) G E n , generated from mn i i=1,m' i* i1 in independent observations over dependent multivariate random values with dependent components and having a vector of sampling mean values x = (x .. . pc ), X.= (l/m)« m in 1 n j Q and s y - i-*113 sampling covariant matrix W = (1/m)1=1 ij x) (X - x) given; is a set of correlation matrices (c.m.) with the spectrum j \ m = diag(Jl^, ... ¿X ) given. The functions f (A) = tr(A) = n, f CA,1) = JL)/n will be called main functional f4 J-I J A -parameters of the spectrum /\ c.m., which are stable and reliably calculated statistics. Problem: to obtain a multivariate sample of the volume m > n > 2 , satisfying one of the following requirements: a) a sample should have a sampling c.m., exactly equal to the c.m. given; b) a sample should have a sampling c.m., whose spectrum is either exactly equal to the spectrum given, or the values of its main f-parameters with given accuracy are equal to the values given. Here everywhere it is necessary that samples have the vectors of mean values and dispersions given. 141

142

For s o l v i n g t h i s problem i t

i s s u f f i c i e n t t o o b t a i n on

a computer the samples w i t h sampling c . m . , having one and the same g i v e n spectrum as w e l l as s p e c t r a w i t h v a l u e s of main f - p a r a m e t e r s , w i t h g i v e n a c c u r a c i e s of the equal v a l u e s g i v e n . Theorem (Zhanatauov S . U . , 1 9 8 0 ) . of the d i a g o n a l m a t r i x /\ A

m

=

diagC^ , . . .

^ •••»7lk>Ak+1 exist i n f i n i t e

Let

, trC/l) =

=

m>n^2, the

elements

s a t i s f y i n g the r e l a t i o n s : + ...

Jln =

n,

"l^kiin.

Then t h e r e

sets:

a) of the o r t h o g o n a l m a t r i c e s C ^ l ) , number 1 = 1 , 2 , . . . ; b ) the c o r r e l a t i o n m a t r i c e s

having the

spectrum A. and e i g e n v e c t o r s l o c a t e d i n columns of the matrix C ^ : E ^ = C ^JA1 c ( 1 ) T ; nn nn nn nn ' /. n , s c ) o f the m u l t i v a r i a t e samples N" s (0,1) , Y ^ e N ( 0 , A ) , number t = 1 , 2 , . . . , the m a t r i c e s J\ , C^ 1 ^, s nn nn s a t i s f y i n g a l l the r e l a t i o n s of t h e R ( l ) 7 y ( t ) z ( t , l ) nn mn mn d i r e c t model of p r i n c i p a l components by II. H o t e l l i n g (DM P C ) ,

and the

principal

A ~ samples of the i n v e r s e model o f

components (IM PC)

perties:

Z^^"^ ^

having the p r o -

"k

1) w i t h number 1 f i x e d , number t = 1,k^_, M=m^ + . . . + m j j if

Y „^ _£ N v(0,y\), Z ^ f e N ( 0 i,7R/) then YMn M L n ss^ n m^n s" ' [_ nun ( 0 , A ) , zM = i z ^ T : . . . : z < k t ) T ' T e N s ( 0 , R ) ; : m, n J " ' Mn L m, n : : m, n 1 Kt • 2)K t w i t h number t f i x e d , number 1=1,k^, N=k-^»m, if sv

Ymn£Ns(0'A)^

z f f * N^O,^1))

then

Y ^ Y

eNs(0,S),

• mn £ where

1

=1,n; 3) w i t h numbers l=t=T7k7, m,k-^', i f ' 1 ' M=m„+...+ 1 € N (O^A), s

Z^'De m^n

N ( O , ^ 1 ) ) then YM c N v( 0u, Ay) , s nn ^ Mn s '

1^ Y^ m^n

143

n( • Mix

k l "ST (i ) eI, (0, s l-TA'R '' where

. (kn,kn)T- T

, •Z)(1 _

,1 )T: m^n

m K k•1 k 1 jg1 = 1.

0< j 6 1 = m 1 /M < 1,

For computation of the spectrum with the above properties, algorithms have been developed, making use of the following relations. Let n ^ - k ^ l ^ l be integers, a^ ^ 1, i=2,k be real numbers. Then the elements of the spectrum of c.m. are uniquely defined: = f.j/B(k,k), f 1 = n, (¿n^ai)'-^' j = 1 ,k-1, f 1 = n, where B(t,k) = t k = [ I a.). f-Parameters of the spectrum are of the i=1 j = i+1 3 q form: f1 (A) = B(k,k)-Ak, f 2 (A) = D ( k , k f (A)=B(_1 ,k), f.(/\,l) = B(l.k)/B(k,k), f.CA) = B(k)aj, ffi(A) = f ^ a , , 4

t

k

^

k

2

where D(t,k) = ^ ( H ± + 1 a .), B(k) = Q 2 a i ~ 1 These algorithms compute monotonous successions of lues of f-parameters, obtained after increments:

va-

S(1 k) = ai+i W Yi+r 1 ' - B(i,k) + (y i+1 -i> •B(i,k), 5(k,k) = D(k,k)+(y|+1-l)-D(i,k), E(k)=E(k)-^j+r

With the use of IM PC there are developed a nonparametric algorithm of interval estimation of statistics, characterizing interrelations between variable samples (including missing values in variables) (Zhanatauov S.U., 1985a,b) and an algorithm of the point estimate of missing values in the multivariate sample (Zhanatauov S.U., 1981). The degree of adequacy of A-samples in a real sample and the accuracy of estimates of the algorithm presented further are practically independent of the law of distribution of the standard population, whose samples are transformed into A -samples. A multidimensional Gaussian distribution and the uniformly distributed in a unit hypercube are presented as standard distribution. A -samples were used both for comparison and elucida-

144

t i o n of the domains of the p r e f e r a b l e a p p l i c a t i o n of m e thods of incomplete d a t a .analysis. As this takes place, algorithms of c a l c u l a t i o n of the s p e c t r u m of the c.m. w i t h g i v e n a l g e b r a i c p r o p e r t i e s w h i c h along w i t h p r o c e dures of the IM PC and its applications are a part of the package p r o g r a m "Spectrum", w h i c h is the package m o d e l i n g of m u l t i v a r i a t e

of

samples (adequate to those real)

of testing of d a t a analysis m e t h o d s and d a t a p r o c e s s i n g u s i n g the IM PC. REFERENCES Z h a n a t a u o v S.U.(1980). Technique of c o m p u t a t i o n of the sample w i t h g i v e n e i g e n v a l u e s of its sampling c o r r e l a t i o n m a t r i x . - In: M a t e m a t i c h e s k i e v o p r o s y a n a l i z a dannykh, D r o b y s h e v Ju.P.(Ed.). V C S O A N SSSR, N o v o s i b i r s k , pp. 62-76. Z h a n a t a u o v S.U.(1985a). A n o n p a r a m e t r i c a l a l g o r i t h m of interval e s t i m a t i o n s . - In: V s e s . s i m p . " M e t o d y i p r o g r a m mnoe obespechenie o b r a b o t k i i n f o r m a c i i i p r i k l . s t a t . a n a l i z a d a n n y k h n a EVM", BGU, Minsk, pp. 53-54. Z h a n a t a u o v S.U.(1985b). D e t e r m i n a t i o n of confidence intervals f o r e s t i m a t e s of m i s s i n g v a l u e s of a r e a l sample. - In: Struktury i analiz dannykh. D r o b y s h e v Ju.P.(Ed.). V C SOAN SSSR, N o v o s i b i r s k , pp. 111-122. Z h a n a t a u o v S.U.(1981). The m e t h o d of incomplete d a t a analysis. V C S O A N SSSR, N o v o s i b i r s k . (Preprint No 257, 15 p.).

DESIGN OF EXPERIMENTS (nearest neighbour designs . . .) (Session 5) Chairman: H.P. Wynn

NUMERICAL METHODS OF OPTIMAL DESIGN CONSTRUCTION V.V. Fedorov International Institute for Applied Systems Analysis, Laxenburg, Austria 1. INTRODUCTION In this paper numerical approaches for the construction of optimal designs will be considered for experiments described by the regression model yt = i» T /(x) + e t , i

(1)

where / (x) is a given set of basic functions, x € X , and X is compact; at least some of the variables z can be controlled by an experimenter, 4 E R m a r e estimated parameters, yt

e Sl is the i - t h observation, and e t e S1 is the

random e r r o r , E[ci ] = 0, E [c ( c^] = ¿tj.

In practice, technically more com-

plicated problems could be faced (for Instance, yt

could be a vector or

e r r o r s could be correlated) but usually the methods a r e straightforward generalizations of the methods developed for problem (1). The most elegant theoretical results and algorithms were created for a continuous (or approximate) design problem when a design is considered to be a probabilistic measure defined on X, and an information matrix is defined by an integral Xi(() = ff(x)r(x)((dx).

In this case, the optimal design of the

experiment turns out to be the optimization problem in the space of probability measures:

f

= Xrgmlr

^ * [ « ( { ) ] , / f ( c £ i ) = 1.

f

(2)

X

where • is the objective function defined by an experimenter. The first ideas on numerical construction of optimal designs can be found in the pioneer works by Box and Hunter (1965) and Sokolov (1963), where some sequential designs were suggested. These procedures can be considered as very particular cases of some iterative procedures for optimal design in construction, but nevertheless they implicitly contain the idea that one can get optimal design through Improving intermediate designs by transferring a finite measure to some given point in X at every step of the sequential design. 147

148

This idea was developed and clarified by many authors and the majority of algorithms presented in this survey (which does not pretend to be a historical one) a r e based on it. 2 . FIRST-ORDER ITERATIVE PROCEDURES It will be assumed that (a) the functions / ( x ) a r e continuous on compact X, (b) • (M) is a convex function, (c) there exists q such that {{:•[«(*)]= 9 < - { = S (?) * * . and (d) for any f E £(g ) and any other f • [ ( 1 - a ) « a ) + a A f ( i ) ] = * [ « ( * ) ] + « / < f i x . ( ) ( ( d z ) +o (a). X

(3)

If these assumptions hold, then the following Iterative procedure will converge to an optimal design: fatt-a- B € convex hull of the orbit of A,

for some B.

The corresponding information increasing ordering for information matrices will also be denoted by

That these information preorderings nicely agree

with the various levels of our problem is shown by the following. Theorem.

(Giovagnoii et a1. (1986) ) M = > C(M) »

A C{A)

=> 4>(C{M)) > 4>{C{A}), for all invariant

3.3 Universal optimality vs. simultaneous optimality The proceeding theorem suggests to discriminate between the notions of universal optimality whenever C > 5 ,

for all competing D,

and of simultaneous optimality whenever (C) > (D), for all competing D and for all invariant . Frequently these notions will coincide according to the following.

163

Theorem.

(Giovagnoli et al. (1986) ) If the underlying group is compact

and the information matrix C is invariant then C is universally optimal

C is simultaneously

optimal.

When the group fails to be compact or the matrix C is not invariant it seems that the notion of simultaneous optimality is of a greater bearing. The following table gives an overview of some known results and open problems. group

ordering

invariant functionals

{/.}

Loewner

all 4>

Perm(s)

?

1

Orth(s)

upper weak majorization of ordered eigenvalues

symmetric functions of ordered eigenvalues

Unim(s)

?

determinant

reflection groups

?

?

?

?

p-means

As an outstanding result we mention that this provides a further justification for the most popular criterion of D-optimality as being the sole invariant information functional under the group of unimodular linear transformations (i. e. those with determinant ±1). On the other hand it would be of interest to study finite reflection groups as they also arise in other aspects of multivariate analysis, or to find a group such that the invariant functionals are determined by the p-means.

4 Quadratic regression; regression over the unit cube As mentioned above the model for quadratic regression over the symmetrized unit interval [— 1,-fl] is Y(xt)

= Po + Pit + p2t2 + ae.

164

A design £ is invariant under the sign-change group if and only if £ is symmetric about 0. This reduces the corresponding moment matrices to a two-parameter subset. If we augment this with an improvement in the Loewner ordering we obtain a reduction to the one-parameter family of symmetric designs

three-points

given by a

a

2

2

i> 1 - a
.

There is a 1 - 1

correspondence between such sequences and realizations (with probability one) of stationary binary sequences. balanced case, so that

tt1(0) = j.

The set of

Consider the c(r)

or

Trlx(r) (r = 1, ... p) is a closed convex polytope whose extreme points are given by very special periodic sequences. since (1) is a linear programme for fixed these sequences must be the optimum.

cr

Moreover

one (or more) of

Computational methods of

Martins de Carvalho and Clark (1983) and of Karakostas and Wynn (1986) give results up to

p = 5.

Thus theoretically and to

some extent computationally the design problem is solved.

Kiefer

and Wynn (1984) contains a full discussion of the k-treatment case. 3.

SAMPLING The most closely related sampling model is the following.

and observe Let

rs

Zt

if

X^ = 2

and do not observe

Z^

be the covariance matrix of the sample.

when

Let

X t = 0.

Then consider

various criteria, based on the observed values. (1)

min var (e) where

(2)

min E(T - T) 2

e

is the BLUE of

e.

N where

T =

l Z.

t=i

z

and

X

is the BLUE of

T

in the sense of prediction. (3)

min var size.

( )

where

= ^ z Zt s

and

n i N

is the sample

172 Under the conditions of Section (2) we see that (1) and (2) lead to the same criterion asymptotically: 1

elements of

r" .

the elements of

maximize the sum of the

Criterion (3) leads to minimizing the sum of rg.

We can compare these criteria with that

from Section (2) whdch with obvious notation can be reexpressed as minimizing the sum of the elements of _rs ~ r s , s where

s

r

?lrs,sl

(1)

is the complement of the sample.

A final class of criteria is obtained in the pure prediction problem when E(T - T) r

2

9=0.

In this case we may be interested again in

which is the sum of the elements of r

T/s =

7-

r

?,s

r

s_1

r

(2)

s,s •

the conditional covariance matrix of the unsampled sample

l^.

choice of

Z^

given the

This quantity is asymptotically independent of the {X^}

is to minimize det(r s ).

(for fixed det(rj| s )

n).

A criterion under investigation

which is equivalent to maximizing

We have referred elsewhere to this criterion as

"maximum entropy sampling" and there are analogies with statistical mechanics. form of

r-1

It is clear from consideration of the asymptotic that this criterion again only depends on the

structure of the

{X t }

process up to lag p.

173

REFERENCES Kiefer, J. (1960).

"Optimum experimental designs V. with

applications to systematic and rotatable designs". Proc. Fourth Berk. Symp., 2, 381-405. Kiefer, J. and Wynn, H.P. (1983). design of experiments". Robustness.

"Autocorrelation - robust

S c i e n t i f i c I n f . Data Analysis and

Academic Press, New York.

Kiefer, J. and Wynn, H.P. (1984).

"Optimum and minimax exact

treatment designs for one-dimensional autoregressive error processes".

Ann. S t a t i s t .

431-450.

Karakostas, K. and Wynn, H.P. (1976).

"Optimum systematic

sampling for autocorrelated superpopulations". Inf.

J. Stat. Plan.

submitted.

Martins de Carvallo, J.L. and Clark, J . M.-C. (1983). "Characterising the autocorrelation of binary sequences". IEEE. Trans. Inf. Th. 24, 502-508.

THE DESIGN OF EXPERIMENTS FOR MODEL SELECTION A.M. Herzberg Department of Mathematics, Imperial College of Science & Technology London, U.K. A.V. Tsukanov Sevastopol Instrument Making Institute, Sevastopol, U.S.S.R. 1.

INTRODUCTION

The problem of the selection of the optimal model has a large literature.

For example, Mallows (1973), Akaike (1974), Woodruffe (1982),

Shibata (1980, 1981), Allen (1974) and Vapnik (1982) were concerned with techniques for selecting an appropriate model to fit a given set of data; Andrews (1971) and Atkinson and Cox (1974) were concerned with the design of experiments for model selection. Herzberg and Tsukanov (1985a) discussed the design of experiments for linear model selection with the jackknife criterion.

In other

papers, Herzberg and Tsukanov (1985b, 1986) gave a Monte-Carlo comparison of the C^ criterion with that of the jackknife under different measures and considered modifications of the usual jackknife procedure to include the possibility of the removal of different numbers of observations at a time, and selected observations. In this paper, further considerations for the optimal design in the selection of models will be presented. 2.

THE CRITERION FOR THE SELECTION OF MODELS

Let the true functional relationships be represented by y. = n(x.) + e. i l l

(i=l,...,N)

,

(1)

where y^ is the ith observation of the dependent variable at the kdimensional design point, x^, the independent and controlled variable, r|( . ) is the true but unknown function, model, and e. is an indepen. 1 2 dent random variable with mean 0 and constant variance O .

For

ease in presentation and without loss of generality, suppose that the problem is to,choose one of two models 175

176

r|j(x,cO

(j= 1 ,2)

,

(2)

where a^ is a vector of unknown parameters to be determined by least squares. Consider as a measure for the adequacy of the jth model, the jackknife criterion N TJK. = i J "

V [y.-n.{x.,a(-i1}] 2 t—i J 1i=1

(j-1,2)

,

(3)

where a.(-i) is the least squares estimator of a. determined from J

. .

J

the N-l points consisting of all the design points except x^. model riji.) will be preferred to n 2 ( . ) if TJK^ < T J K ^ .

The

This and

other related criteria and measures for the discrimination among two or more models are given and elaborated on in the papers by Herzberg and Tsukanov. Mallows (1973) suggested choosing the model for which C

=

P

C./a2 J

-

N

(4)

is a minimum, where C . = RSS . + Jlp .O2 J J J

' 2 is •

and O

• 2 an estimate of (J .

Further, RSS. is the residual sum of J

squares and p^ is the number of unknown parameters for the jth model. The constant H may be changed; Mallows set il = 2. In particular, Herzberg and Tsukanov (1986) considered modifications of the usual jackknife procedure to include the possibility of the removal of different numbers of observations at a time and selected observations. 3.

THE DESIGN OF EXPERIMENTS FOR MODEL SELECTION

In order to determine the optimal design for model selection, the following method is used: (i)

(ii)

a function r^^ specifying the goodness of decisions is determined, where r.. is the price of the selection of the model ij from set S. when the true model is in set S.; i J the function of average risk is obtained, i.e. R= £ r..p.., . i.j 1 J 1 J where p ^ is the probability of the selection of the

177

model from the set S. when the true model is in the set S.; 1 .J (iii) the p^j are varied according to the criteria and the design used. The value of R depends on the vector of unknown parameters,«!.

In

this case, a minimax or Bayesian approach may be used. It is always possible to transform the response function in such a way that a < n(x) < b

and

c < x. < d

(i=l,...,k) .

(5)

i Consequently, a vector of unknown parameters a is restricted to a finite field ft. The function R can be investigated further for ft 2 and fixed variance of the error,O . 4.

A MONTE-CARLO EXAMPLE

Consider one-dimensional polynomial models.

In order to compare

the criteria and the design, it is necessary to choose a set of test

models.

and b = d = 1.

The restrictions of (5) are used with a = c = -1 It is possible to use a Tchebycheff system of

orthogonal polynomials as a network of models which apprimates the behaviour of risk in the region ft. One set of such polynomials is 2 2 4 2 n ] = x, n 2 = - i+2x , n 3 = 3x-4x , n^ = i-8x +8x , n 5 = - 5x+20x 3 -16x 5

.

(6)

Consider the following two designs with 12 points: Xj:

12 equally spaced points ±1, ±0.82, ±0.64, ±0.45, ±0.27, ±0.09;

X2:

the D-optimal design for a polynomial of degree five, i.e. 12 points, two replicates at each of ±1, ±0.77, ±0.29.

Table 1 gives the result of a computer simulation for two designs. Observations were generated from the models given in (6) with normal2 ly distributed errors with zero mean and variance C7 = 1 . The table gives the number of correct decisions out of 500 simulations. computing was done on the computer complex CDC Cyber 174 and CDC 6500 of the Imperial College of Science and Technology, London.

The

178

Table

1 :

F r e q u e n c y of c o r r e c t d e c i s i o n s 2 for X 1 a n d X 2 a n d 0 == 1

for C

D e s i g n of Design X

1

2

Criterion

P

and

TJK critei

true m o d e l

1

2

3

4

5

325

282

267

271

334

TJK

330

295

248

180

113

C

328

300

259

286

395

352

311

255

270

330

C

p

P

TJK If the m a t r i x of the r . . ' s ij

is k n o w n , t h e n the f u n c t i o n R c a n be

a p p r o x i m a t e d a n d the c h o i c e o f the d e s i g n a n d c r i t e r i o n m a d e

to-

gether . REFERENCES A k a i k e , H. ( 1 9 7 4 ) . A n e w look at s t a t i s t i c a l m o d e l i d e n t i f i c a t i o n . I E E E T r a n s . A u t o m a t i c C o n t r o l , 19, 7 1 6 - 7 2 3 . Allen, D.M. (1974). The relationship between variable selection a n d d a t a a u g m e n t a t i o n a n d a m e t h o d for p r e d i c t i o n . Technometrics, 16, 125-127. Andrews, D.F. (1971). S e q u e n t i a l l y d e s i g n e d e x p e r i m e n t s for s c r e e n ing o u t b a d m o d e l s w i t h F - t e s t s . B i o m e t r i k a , 58, 4 2 7 - 4 3 2 . Atkinson, A.C. and Cox, D.R. (1974). P l a n n i n g e x p e r i m e n t s for d i s J . R . S t a t i s t . Soc. criminating between models (with discussion). B36, 321-348. Herzberg, A.M. and Tsukanov, A.V. (1985a). T h e d e s i g n of e x p e r i m e n t s for l i n e a r m o d e l s e l e c t i o n w i t h the j a c k k n i f e c r i t e r i o n . U t i l i t a s M a t h e m a t i c a , 28, 2.43-253. Herzberg, A.M. and Tsukanov, A.V. (1985b). The Monte Carlo compari s o n of two c r i t e r i a for the s e l e c t i o n of m o d e l s . J. S t a t i s t . C o m p u t . S i m u l . , 22, 1 1 3 - 1 2 6 . H e r z b e r g , A . M . a n d T s u k a n o v , A . V . ( 1 9 8 6 ) . A n o t e o n m o d i f i c a t i o n s of the j a c k k n i f e c r i t e r i o n for m o d e l s e l e c t i o n . Utilitas Mathematica, 29, 2 0 9 - 2 1 6 . M a l l o w s , C.L. (1973). S o m e c o m m e n t s o n C . T e c h n o m e t r i c s , 15, P 661-675. S h i b a t a , R. ( 1 9 8 0 ) . A s y m p t o t i c a l l y e f f i c i e n t s e l e c t i o n of the o r d e r o f the m o d e l for e s t i m a t i n g p a r a m e t e r s o f a l i n e a r p r o c e s s . A n n . S t a t i s t . 8, 1 4 7 - 1 6 4 . S h i b a t a , R. ( 1 9 8 1 ) . A n optimal selection of regression v a r i a b l e s . B i o m e t r i k a , 68, 4 5 - 5 4 . V a p n i k , V. ( 1 9 8 2 ) . T r a n s l a t e d b y S. K o t z . Estimation of

Dependences Based on Empirical Data. Woodruffe, M. (1982). On model A n n . S t a t i s t . 10, 1 1 8 2 - 1 1 9 4 .

Springer-Verlag, New York. s e l e c t i o n a n d the arc sine laws.

THE DESIGN AND ANALYSIS OF FIELD TRIALS IN THE PRESENCE OF FERTILITY EFFECTS

C. Jennison School of Mathematical Sciences University of Bath BATH BA2 7AY United Kingdom

The recent interest in analyses of field trials incorporating adjustments for variations in fertility or other systematic effects can be traced back to the work of Papadakis (1937) who demonstrated how conventional treatment estimates can be improved by performing a second analysis using the average of the residuals of its neighbours as a covariate for each plot. Over thirty years later Atkinson (1969) investigated this unconventional use of a function of the response variables as a covariate and showed the resulting treatment estimates to be close to those obtained by fitting the first-order autoregressive model of Williams (1952). The use of spatial models for field experiments has since developed in its own right with major contributions from the work of Besag (1974,1977) and Bartlett (1978). In general, spatial models define a covariance structure for the observations, possibly involving variance ratios to be estimated from the data, and both treatment estimates and estimates of standard error are obtained by the usual methods for general linear models. A convincing model for one-dimensional layouts has recently been developed by Besag and Kempton (1986) and consists of a fertility process with independent first differences plus superimposed independent error for each plot; Williams (1986) proposes a similar model based on the relationship between correlation and inter-plot distance determined by Patterson and Hunter (1983) for a set of 166 cereal variety trials. The approach of Wilkinson, Eckert, Hancock and Mayo (1983) is more in the spirit of Papadakis although "adjustment" is by the yields rather than residuals of neighbouring plots. These authors propose a smooth trend plus independent error model Y=DT+5+TI,

where Y is the vector of yields, D the design matrix, x the vector of treatment effects, S, represents a trend term which varies smoothly within columns, and T) denotes independent errors. Let plots be indexed along columns and suppose i; is 179

180

locally approximately linear within each column so estimating equations are formed from adjusted yields, Y(=Y(-—jCY,-^ +Y i+I ), thereby removing almost completely the effect of trend, In the "least squares smoothing" method of Green, Jennison and Seheult (1985) this same model is fitted by the penalty function approach well known in nonparametric regression. Values of x, E, and 11 are found by minimizing the penalty function where A^ is the vector of second differences and X, a tuning t constant controlling the degree of smoothness of ths fitted An appropriate value of X must be chosen either by inspection of the fitted E, and T| or by an automatic method such as cross-validation - see Green (1985). A full decomposition of Y is obtained and the fitted trend and residuals rj,- can be inspected for features of interest. Note that minimizing the above penalty function is equivalent to solving the pair of simultaneous equations t=(D T D) _1 D T (Y-^) £=S(Y-Dx) where S=(I+XATA)_1. Thus, for given x, S, is obtained by applying the smoothing matrix S to Y-Dx, and for given x is the ordinary least squares estimate based on adjusted yields The form of these equations suggests extensions in which x is estimated robustly from Y-£ and % is obtained by applying a robust, nonlinear smoother to Y-Dx, a solution being found by iterating between the two equations. In a recent paper Papadakis (1984) describes modifications to his original method to deal with single abnormal observations and apparent discontinuities in fertility attributable to, say, changes in soil type or drainage pattern; both these problems can be handled by the simultaneous equation approach using a treatment estimate which downweights extreme values and a smoother that recognises jumps in fertility and does not smooth across them. The use of blocks in the design and analysis of field trials deserves some comment. Patterson and Hunter (1983) discuss incomplete block designs for cereal variety trials and demonstrate the substantial reduction in variance of treatment estimates from complete block designs (typically 35% for large blocks) but they show that the further improvements obtained by fitting a full spatial model are rather small (5 or 10%). The blocks in these designs are physically contiguous and divisions between blocks have no direct physical meaning, rather, the fitting of block effects allows a step function approximation to a smooth trend The inclusion of such artificial blocks is unnecessary in other methods of analysis (although they do appear in the model of Williams (1986), ostensibly as a means of curtailing long range correlations). Blocking by real physical criteria

181

is of course desirable and a method which can detect from the data where blocks should be introduced is most useful; there is greater scope for detection of "regions" when the experimental layout is two-dimensional and in this case there are interesting parallels with the identification of objects and distinct areas in image analysis. As the above discussion illustrates, recent research has led to a variety of analyses and the experimenter may be faced with a bewildering choice. Fortunately, different methods usually give very similar estimates of treatment effects - most give the least squares estimate for some assumed correlation structure of the observations and changes in this assumed structure tend to affect the estimates only slightly. Rather than find fault with particular methods we should recognise the potential of a selection of tools for data analysis: model based methods which provide both treatment estimates and estimates of standard errors, as long as we accept the assumptions of the model, and more exploratory methods which allow a full investigation of the data and have greater flexibility to adapt to features of the data as they are discovered. I would now like to turn to the problem of design. Firstly, it should be pointed out that special designs are in no way essential for a "neighbour" analysis, in fact, these methods can be used to retrieve a satisfactory analysis from a poorly designed experiment; for example, fitting an appropriate spatial model can remove the bias that would otherwise be introduced by an improperly randomized or even a completely systematic design. Good design will of course improve efficiency and several recent papers discuss optimal designs for correlated observations, see for example Gill and Shukla (1985) and references therein. The general conclusion is that designs should be balanced, i.e. treatments should be neighbours and possibly also second neighbours of each other an equal number of times but no treatment should appear next to itself. One aspect of the theory of optimal design that may need to adapt to new methods of analysis is the role of blocks. As mentioned previously, artificial blocking is no longer necessary for analysis and experiments with a small number of very large blocks may become more common: a typical variety trial can consist of three replicates of 50 varieties so if the replicates were physically separate we would have just three blocks of size 50. To assess the importance of optimal design I performed calculations for an example with four replicates of 20 varieties, comparing the average standard error of treatment differences from a second order balanced design and a design with treatments allocated randomly within each replicate. Using a variety of autoregressive and moving average processes for the true model and both correct and slightly incorrect models in the analysis I found the balanced design to be always superior but often only marginally and at most by 1 or 2%. Other factors may be of greater practical importance. When correlations are high it is noticeable that the variance of treatment estimates for treatments appearing on end plots is considerably higher than average. The suggestion by Wilkinson et al

182

(1983) of adding extra plots at the end of each column in order to give an "adjusted" yield for each internal plot has led to some confusion - clearly these plots are in many ways no different from other plots and they must certainly be counted when discussing efficiency - but such additional plots could be used to ensure that no single treatment estimate is too variable. Alternatively, one or more treatments may be given an extra replicate in return for appearing on several end plots, thereby equalising as nearly as possible the variances of treatment estimates. T o conclude, there is presently a great deal of practical and theoretical interest in the analysis and design of field trials and w e are seeing an influx of ideas from many different areas of statistics. Future work offers an exciting prospect as ideas not traditionally associated with field trials are developed and the areas of application are extended to the whole range of agricultural experiments. ACKNOWLEDGEMENTS My own work in this area has been in collaboration with Peter Green and Allan Seheult. I am particularly grateful to Julian Besag for stimulating my interest in this topic. REFERENCES Atkinson, A.C. (1969) The use of residuals as a concomitant variable. Biometrika, 56, 33-41. Bartlett, M.S. (1978) Nearest neighbour models in the analysis of field experiments (with Discussion). JJt.Statist.Soc.,B, 40, 147-174. Besag, J.E. (1974) Spatial interaction and the statistical analysis of lattice systems (with Discussion). JJt.Statist.Soc.,B, 36, 192-236. Besag, J.E. (1977) Errors-in-variables estimation for Gaussian lattice schemes. JM.Statist.Soc.,B, 39, 73-78. Besag, J.E. and Kempton, R.A. (1986) Statistical analysis of field experiments using neighbouring plots. Biometrics, 42, 231-251. Gill, P.S. and Shukla, G.K. (1985) Efficiency of nearest neighbour balanced block designs for correlated observations. Biometrika, 72, 539-544. Green, PJ.(1985) Linear models for field trials, smoothing and cross-validation. Biometrika, 72, 527-537. Green, PJ., Jennison, C. and Seheult, A.H. (1985) Analysis of field experiments by least squares smoothing. J.R.Statist.Soc.,B, 47, 299-315. Papadakis, J.S. (1937) Méthode statistique pour des expériences sur champ. Bull. Inst. Amél. Plantes â Salonique, 23. Papadakis, J.S. (1984) Advances in the analysis of field experiments. ripccKXtKOt XT|Ç AkoStihiocç A(hyv©v, 59, 326-342. Patterson, H.D. and Hunter, E.A. (1983) The efficiency of incomplete block designs in National List and Recommended List cereal variety trials. JAgric.Sci., 101, 427-433. Wilkinson, G.N., Eckert, S.R., Hancock, T.W. and Mayo, O. (1983) Nearest neighbour (NN) analysis of field experiments (with Discussion). JJi.Statist.Soc.fi, 45, 151-211. Williams, E.R. (1986) A neighbour model for field experiments. Biometrika, 73, 279-287. Williams, R.M. (1952) Experimental designs for serially correlated observations. Biometrika, 39, 151-167.

ON THE EXISTENCE OF MULTIFACTOR DESIGNS WITH GIVEN MARGINAIS

K r a f f t , Olaf I n s t i t u t für S t a t i s t i k der RWTH Aachen Federal Republic of Germany Consider the following row-and column-design with p=3 treatment levels (index k), m=3 row-factor levels (index i ) and n=3 columnfactor levels (index j )

j ^ 1 1 3

2

3

1 1

2

3

3

2

3

2 1 2

When performing an analysis of variance f o r such a design, i t turns out that only the row-frequencies r ^ and column-frequences c^^ of the treatment levels are relevant:

E.g. f o r investigations on design-optimality one i s thus i n c l i n e d to take these matrices as a basis. But changing R^ and C^ only s l i g h t l y into

1

183

0

2 \

2

1

0

.0

2

1

184

one easily sees that a design corresponding to Rg and C2 does not exist. Hence we have the problem: Given R = ( r ^ ) e

IN

mxp

C = (Cj^) e IN™' 3 . Which conditions on R and C guarantee the existence of a design corresponding to R and C ? Using indicators x ^ ^

( x . ^ = 1 iff treatment level k is combined

with row-and column levels i,j), this problem can formally be stated as of finding conditions for consistency of the system n X

•^

ijk

=

r

ik ' ^ ' K )

e

MxP

m I x... = c,. , ( j, k ) e NxP JK i=l 1 J k

(I)

xijk = 1

, (i,j) e MxN

k=l x.. k e {0,l},(i,j,k) e MxNxP , where M = {l,...,m}, N = {l,...,n}, P = {1

p}.

In case p=2 the solution is known as Gale-Ryser theorem: Let r . J

= |{i : r., ^ j}|, c. ' 0

obtained from c., by arrangement in J^ ^ '•

non-ascending order. Then (I) is consistent iff the vector c majorized by the vector r

is

in the Schur-ordering.

A generalization to the case p>3 is unknown: Neither an adequate generalization of the majorization ordering to matrices R and C has been found nor could the known proofs of the Gale-Ryser theorem (algorithm for explicit construction (Gale-Ryser), application of Hall's theorem on systems of distinct representatives (Higgins)) be carried over. We have the following conjecture: (I) is consistent iff

(1)

I r,-i, = n. 1 < i - •

Let us consider a classical mixed ANOVA model =

is (TIX /) - v e c t o r of measurements ,A a. ,

respectively^

p)

CD j,it) are

and fit * K^)-matrices of k n o w n parame-

ters, is fn,L S i) - n o r m a l l y distributed random vector f i ^ N/(0 , i^.^k •In. is an identity mattix,^5 is an unknown parameter. It is clear that-

V-Covj-ZU&i thus t/ -

, G-^VJJ:,

are called variance

components.

The pioneer work where ANOVA methods were applied to testing hypotheses about variance components for a balanced model was Fisher (1918), later R.Fisher devoted some attention to those models i n his famous book (1925). Important contributions to this theory were made later by F.Yates, A.Wald, C.Eisenhart, C.Sheff^, S.R.Searle, C.R.Rao, T.W.Anderson among many others» Modern works on this subject may be classified into two main streams. In the first one (see Rao and Kleffe

(1980))

invariant unbiased quadratic estimates (MINQUE and others) were examined. The second stream deals with maximum likelihood estimates (MIE) for distribution

fft)s)}Jv\i>k i s such that f o r d i - t i j __r i T c j / a i f e f v - ' G . v - ^ . K u j (2)

ifyjMVk

e x i s t and some other conditions similar to those of M i l l e r (1977) are f u l f i l l e d .

= UTX where for all K

+ , tf

§ ) , ¿L > 0 , , t , , L^o?

> O

^

«

S

>

w

*

L ^ ) > 0 > 0

(3) and (4) (5)

The condition ( A ) ( r e s p e c t i v e l y , ( 5 ) ) i s suitable f o r d e r i v i n g AN of Q ( o f MLE). The method of deriving ( 3 ) i s standard. The f i r s t term of ( 3 ) i s a principal part of the f i r s t term of T a y l o r ' s

189

expansion, the second term i s a principal part of the second degree term. The I - t h component of ¿1 , i i k , i s the limit of the normalized and centered expression It U[X/~ ()\\ SL • Components 2-i and , L< k , j > k , are uncorrelated because of symmetry of gaussian d i s t r i b u t i o n . Uniform convergence of the residual can be proved by the methods of Maljutov (1983). Our second aim i s to investigate designs optimizing some function of in (3) which i s simultaneously the normalized covariance matrix of MLE or of Q . We begin with the simplest one-way mixed model f.j y 0 Y Here vectors f e2{y • notations X - ( i j

i

x y

= (1) cc),

-fe - ey, > Ji>0 G R *

i s a common mean, random aj^^ ) and a r e independent. In matrix i{\ UUT- 6 - < 4 6 , . . , G *

We consider the asymptotics of MLE-s when Using the obvious equalities

we obtain limiting equations x (ar^oc1!)) n y ' l

- J(OL1 + O(i)) ^ V •-

5

OCD)

,

ttV'*& = 1 ^ ( a i r V o c i ) ) , Thus when ( e . g . when

JCLj

=

ccL

CL ' ) we get that limiting distribution

190

of t h e v e c t o r has t h e c o v a r i a n c e

matrix

which does n o t depend on mutual r e l a t i o n s h i p between , • • • > ^ x *—* 0 0 • Designs o p t i m i z i n g a convex d i f f e r e n t i a b l e f u n c t i o n by c h o o s i n g X i v . . , C t - j . may e a s i l y be found by s t a n d a r d methods known f o r h o m o s c e d a s t i c i n d e p e n d e n t me a s u r e m e n t s . If ^—> D > 0 , t h e n the c o v a r i a n c e m a t r i x of t h e l i m i t i n g d i s t r i b u t i o n of t h e v e c t o r ( J I / ( i 0 - A 0 ) j W i l - R ) G e n e r a l i z a t i o n t o t h e g e n e r a l case of t h e mean3C^> where ^ ^ ^ V O H ^ - S o b v i o u s . A s u r v e y of non-asymp t o t i c d e s i g n s f o r e s t i m a t i n g m u l t i - w a y v a r i a n c e compon e n t s i s i n Anderson ( 1 9 7 5 ) . REFERENCES Anderson, R.L. ( 1 9 7 5 ) . Designs and E s t i m a t o r s f o r V a r i a n c e Components. I n : A Survey of S t a t i s t i c a l Design and L i n e a r Models, S r i v a s t a v a , J . N . ( E d . ) . N.Holland P u b l . Co, p p . 1 - 2 9 . F i s h e r , R.A. ( 1 9 1 8 ) . The c o r r e l a t i o n between r e l a t i v e s . . . T r a n s . R.Soc. Edinburgh, v . 52, 399-433. F i s h e r , R.A. ( 1 9 2 5 ) . S t a t i s t i c a l methods f o r r e s e a r c h w o r k e r s . Olivex & Boyd, London. G o l d s t e i n , H. ( 1 9 8 6 ) . M u l t i l e v e l mixed l i n e a r models analysis using i t e r a t i v e generalized l e a s t squares. Biometrika, v. 73, N 1, 43-56. H a r t l e y , H.O. and Rao, J . N . K . ( 1 9 6 7 ) . MLE f o r t h e mixed ANOVA model. B i o m e t r i k a , v . 5 4 , 9 3 - 1 0 8 . I b r a g i m o v , I . A . and Khasminsky, R . Z . ( 1 9 8 1 ) . Asymptotic t h e o r y of e s t i m a t i o n . S p r i n g e r , W.Y. L u a n c h i , M. ( 1 9 8 3 ) . Asymptotic i n v e s t i g a t i o n of i t e r a t i v e e s t i m a t e s ( t h e s i s ) . Moscow Lomonosov U n i v e r s i t y , D e p a r t , of Math, and Mech. M a l j u t o v , M.E. ( 1 9 8 3 ) . Lower: bounds f o r a v e r a g e sample s i z e . . . I z v . v u s o v , Matematika, N 1 1 , 1 9 - 4 1 .

19 1

Maljutov, M.B. and Luanchi, M. (1985,). Iterative quadratic estimates of mixed ANOVA models® Abstracts of III-th conference "Application of multivariate analysis..." part II, Tartu, pp. 49-51. Miller, J.J. (1977). Asymptotics for MLE's in the mixed ANOVA model. Ann. Statist., v. 5, 746-762.

ASYMPTOTIC METHODS IN STATISTICS

(second order asynptotics, saddle point methods, etc«) (Session 6)

Chairman: W. van Zwet

DIFFERENTIAL

GEOMETRICAL

IN ASYMPTOTICS

OF

METHOD

STATISTICAL

INFERENCE

Shun-ichi AMARI University of Tokyo, Tokyo, 113 Japan Abstract Differential geometry provides a new powerful method of asymptotics of statistical inference. Geometrical concepts are explained intuitively in the framework of a curved exponential family without technical details. We show some fundamental results of higher-order asymptotics of estimation and testing obtained by the geometrical method. Further prospects of the geometrical method are given. Why geometry?

A typical statistical problem is to make

some

inference on the underlying probability distribution p(x) based on N independent observations x^

x^ therefrom.

In many cases,

statisticians do not directly treat the function space F = { p(x) } of

all

the

possible

distributions,

but

presume

a

parametric

statistical model M = { p(x, u) }, where u is an n-dimensional vector parameter. manifold

imbedded

Then, a model M is regarded as an n-dimensional in

F,

and

it

is

distribution p(x) is included in M or

assumed at least

that is

the

close

true to M.

Roughly speaking, a naive distribution p(x) is obtained in F from the observations as, for example, the empirical distribution or its smoothed

version.

distribution which

Then, we

infer based

is supposed

to belong

on to

this P on the true M.

Hence,

it

is

important to know the geometrical shape and the relative position of M inside F. When the number N of observations is sufficiently large, £S is very close to the true distribution p(x, u), so that one may use linear

approximation

procedures.

at p of M

in F in evaluating

Hence, linear geometry is

order asymptotic theory.

sufficient

for

inferential the

first

This is the reason why one can construct

a first order asymptotic theory in a unified manner. 195

In order to

196

evaluate

higher-order,

characteristics,

for

linear

example

second

approximation

is

and

third

order,

insufficient.

necessary to connect these tangent spaces or linear

It

approximations

of M obtained at various points, thus taking the non-linear into account.

a

effects

To this end, one needs to introduce invariant affine

connections by w h i c h the curvatures are defined. not

is

trivial

task.

We

introduce

two

However, this is

dually

coupled

affine

connections, the exponential or c< = 1 connection and the mixture or oC=

-

1

connection.

exponential

Then

it

will

is

shown

that

the

related

and mixture curvatures play a fundamental role in The n o t i o n of a more general

higher order asymptotic theory. bundle

be

useful

for studying non-parametric or

the

fibre

semi-parametric

models.

Curved

exponential

family.

A

curved

exponential

family

is

very

tractable statistical m o d e l by the following

two reasons:

has

a

the

minimal

sufficient

statistic

x,

and

a It

enveloping

exponential family is flat w i t h respect to the o( — i 1 connections. This

makes

the

related

geometrical

theory

very

simple

transparent, and we can avoid technicality of differential by using this model. to

explain

should

be

theory

for

model

the noted

by

a

Therefore, we use a curved exponential

results that w e

general

using

of

the

can

method.

model

differential

or

even

geometric

family

However,

construct a differential

parametric

proper

geometrical

and

geometry

a

it

geometrical

non-parametric

notions

or

their

extentions. An

exponential

family

functions q(x, Q) = £

m

has

the

following

probability

density

1

0 x. - f($)

i=l

1

w i t h respect to a suitable measure o n the sample space, w h e r e x = (x^) and

0=

(0 1 ), i = 1,

, m , are m-dimensional vectors and

^(6) is the cumulant generating function.

The family S = {q(x, 0 ) }

is an m-dimensional manifold in F, where

the natural or

parameter Q defines a coordinate system of S. distribution

q(x, 6)

coordinate system

in

S

specified

is «(= 1 - affine

by (or

6.

canonical

A point 6 implies We

say

e-affine)

by

that

a

this

regarding

197

5 as an ck - 1 ( or exponentially) linear This

is a definition

define

the

introducing

e-linearity an e- or

coordinate

of the e-linearity. in

a

general

1-affine

system of

S.

(Obviously, we need to statistical

connection.)

parametrized by a scalar t is e-linear, when

it

model

by

A curve &=

0(t)

is linear

in

t.

More generally, a submanifold of S which is represented by a set of linear equations in 0 is e-linear.

An e-curvature of a submanifold

can be defined in an ordinary way, when it is not e-linear.

(This

curvature is a tensor, but we do avoid technical descriptions). There

is

another

coordinate

system

expectation parameter or the expectation

f =

called

coordinate

the

system, which

is defined by 1

= E[x.],

±

where E denotes the expectation with respect to q(x, 0). also

an

important

coordinate

system

dually

This

coupled with

is

6 , and

there is a one-to-one relation between them,

6

= 6(1),

1 = 1 ( 6 ) .

We may use this ^ to specify a distribution in S.

The 7 is said to

be m-affine or 0( = - 1-affine, and any submanifold

of S which

is

defined by a set of linear equations in 1 is said to be m- or ol = 1-flat.

We have thus defined the m- or OL = - 1 flatness.

submanifold

is

not

m-flat,

the

m-curvature

When a

can be defined

in a

similar manner. Let x^, ... , Xj| be N independent vector observations. arithmetic mean

is

a

minimal

x = (1/N)J£ x t=l sufficient

statistic.

(distribution) in S as follows. r

Their

N

This

x

defines

a

point

Let "7 be the point in S whose

?-coordinates are put equal to the sufficient statistic A

_

rf = x. A

We call the point ^ (or more precisely a distribution specified by A

A

*)) the observed point.

Its ©-coordinates

9=

A

0(7) is the m.l.e.

of d. A curved exponential family M = {p(x, u)} is a submanifold of S parametrized by an n-dimensional parameter u = , n), where n < m, such that

(u 3 ),

(a = 1,

198 p(x, u) = q{x, 0(u)}. The submanifold M is represented by 0 = 8(u)

or

"7 = *}(u)

in the respective coordinate systems. be

The e - and m-curvatures

calculated by differentiating Q{u) and 7 ( u )

twice w i t h

can

respect

to u.

Inferential procedures.

Estimation of the true parameter u is

stated geometrically as follows : Given an observed point TJ= x €. S which belongs to S but does not in general lie in M , find a point u e M or point ^(u) 6 M w h i c h is closest to u in some sense. u = etf)

Let *

be an estimator, w h i c h is a mapping

from S

to M ,

e

a

: 1h+u.

Let

A(u) be the inverse image of this mapping, i.e., A(u) = e _ 1 ( u ) = ( 1 6 S | u = Then, A(u)

forms an

a foliation of S.

e(D}.

(m - n)-dimensional submanifold, and {A(u)} is W e call this A(u)

the ancillary submanifold

estimating submanifold attached to u by the estimator e. of the estimator

or

The value

is u, w h e n and only w h e n the observed point 'J is

in A(u). When

an

estimator

eA is

point *l(u).€ M , because

consistent,

A(u)

passes

tends to *J(u) as

through

the

Let us introduce

a coordinate system v = (v K ) , w h i c h is (m - n)-dimensional, in each A(u) such that (u, v) is a coordinate system of S.

Then, any point

I C S is uniquely specified by (u, v) as T where

= u

T ( u , v), shows

relative

that

position

the in

point "I belongs

A(u) .

The

origin

to A(u) v

=

and 0

v

is

shows

put

at

its the

intersection of A(u) and M, so that v = 0 if and only if the ""J is in M. The sufficient statistics x or equivalently the observed point ^ is decomposed into two statistics (u, Q) by solving = T?(fi, 0). As

can be

most

of

easily Fisher

seen, the statistic u is an estimator information

(asymptotically ancillary)

in

x

including

and

v

is

including

rather

ancillary

little information

concerning

199

the

true

u.

We

distribution of geometric their

can obtain (u, 0) up

quantities

angle

of

the

to

related

Edgeworth

the to

expansion

third order the

intersection.

of the

the

curvatures of M and A(u)

and

elucidates

by

how

geometric

quantities are related to the performances of estimators. of

conditional

inference

and

joint

using

This

terms

ancillarity

can

also

be

Problems understood

from this geometric point of view. Testing analyzed

in

hypothesis a

similar

H^ way.

: u £

D

Since

against

H^

: u ^ D give

observations

an

can

be

observed

point 7 i n S, the critical region R of a test is set in S such that A

the hypothesis H ^ is rejected if, and only if,

"7 6 R.

Now let us

compose an ancillary family {A(u)} such that the critical region R is composed of some of these A(u)'s,

R = UA(U). uftR

M Then, the decomposed {A(u)}

has

function

the

of

u

statistics

(u,

following

meaning

only,

X(u).

with : The The

respect

test

statistic

information, but it can be used as a conditioning characteristics

of

various

tests

can

be

to

this A

statistic

analyzed

0

has

is

a

little

statistic. by

=

using

The the

geometric shape of the related A(u) through the Edgeworth expansion of

the

distribution

estimators

are

of

closely

(u, ). related

The to

characteristics

those

they can be analyzed in a similar manner.

Estimation

of

interval

of associated tests,

and

200 Asymptotic estimator.

theory Then,

of

estimation.

its

mean

Let

square

NE[(u - u)'(u - u)] = A l + N

_1

u

be

error

is

a

consistent

expanded

as

2

A 2 + 0(N~ ).

Then, the first-order error matrix A^ is given by ax = ( g - g

A

r\

where g is the Fisher information matrix of M, g^ represents the square of the cosine of the angle between A(u) and M, and the angle is defined with respect

to

the Fisher information matrix of S.

Therefore, as is well known, an estimator is efficient when the estimating manifolds A(u) are orthogonal to M, and A 1 = g ^ holds. * Let u be the one-step bias corrected version of an efficient estimator u.

Then, its A^ term is decomposed into the sum of three

non-negative terms as A 2 ^ Here, !"*„

is

+

coordinate

system

+ 2square of

the

u,

and

the e 2

mixrture

^^^

the

connection

square

of

of

the

the ( o( —

l)-curvature of M, both of which do not depend on the estimator. The third term (H™)2 is the square of the (0( = -1)-curvature the estimating because

manifold

A(u),

vanishes

for

the

m.l.e.,

Hence, the m.l.e. is second-order efficient.

Asymptotic theory of tests. a

it

the estimating manifolds of the m.l.e. are mixture-( o( =

-l)-flat.

of

and

of

test,

we

show

a

Before studying the power function

geometric

result

obtained

from

the

Neyman-Pearson fundamental lemma: The critical region R of the most powerful test of H^ : u = u^ against Hj : u = u^ is bounded by an m-flat

hypersurface

which

connecting u^ and u^.

is

orthogonal

to

the

e-flat

curve

The e-flat curve forms an exponential family

connecting p(x, u^) and p(x, u^), and the critical region R remains the same for any alternative H^ : u = u^1 if u^' is on the curve. This shows the reason why a uniformly most powerful test exists for an exponential family.

However, when M is curved, there are no

uniformly most powerful tests. We consider the simplest case of testing H^: u = UQ against H^ : u / UQ in a scalar parameter case where M = {p(x, u)} forms a curve in S.

The power function P_(t) of a test T is defined by the

201

probability of rejecting HQ w h e n the true distribution is at u = u Q + t/jNg, w h e r e g is the Fisher information of M at u^.

It

is well

uniformly powerful

most at

known

that

powerful

any

t.

there

tests

They

exist

in

are,

the

for

It is expanded as

0(N"3/2)

PT(t) = Pxl(t) + PT2(t)/jN + P t 3 / N +

a number sense

that

example,

test, Wald test, efficient score test, etc.

of

the

first

PT^(t)

is

likelihood

of

equivalently

at

A(u))

is

(asymptotically)

orthogonal

second-order efficient in the sense that

no third-order

to

It is also known that a test is

any t, whenever it is first-order efficient

most ratio

Geometrically, a test

is first-order efficient if and only if the boundary 9 R

intersecting point.

order

M

R

(or

their

automatically

most powerful at

efficient.

However,

there

exist

(uniformly most powerful) tests, implying

that a test can b e good at a specific t^ but not so good at

other

t's.

above

Then, what are the third order characteristics

mentioned The

widely

used

characteristics

tests?

depend

Geometry

on

the

can

cosine

answer

Let

us

define

the

this

problem.

of the asymptotic

between 3 R and M , w h i c h plays a role of canceling (non-exponentiality) of M .

of the

the

angle

e-curvature

We show the results. deficiency

or

third

order

power

loss

function A P T ( t ) of an efficient test T by APT(t) - l i m ^ N ^ t )

- PT3(t)},

where P^(t) is the third order term of the test T (t) w h i c h is most powerful

at

t

(but

not

at

t1).

other

Then

AP^,(t)

is

obtained

explicitly as 6 P T ( t ) = a(t, at) ic - b(t, o i ) } 2 r 2 , where

a(t, ot ) 2

level U , curvature)

Y

and is

and

c

b(t, ot ) are known

the is

square a

of

factor

the of

functions

depending

e-curvature compensating

of the

M

on

the

(Efron's

e-curvature

through the asymptotic angle between M and A(u) or ^ R . The values of c are calculated for various tests.

The results

are as follows: c = 0 for the W a l d test, c = 1/2 for the likelihood ratio test, c = 1 for the locally most powerful W e show the universal deficiency

test,etc.

curve for various

tests.

It

should b e noted that the results hold after b o t h the level and bias

202 of the corresponding test statistics are adjusted up to 0(N w e do in the given

from

Bartlett the

adjustment.

Edgeworth

The

expansion

adjustment

of

related

(û,

where us

define

estimator

is E an by

203

- u)2],

AV[S] = l i m j ^ N E U u where

g =

§2»

incidental

or

a n

•••)

nuisance

infinite sequence of values of

parameter.

An

estimator

u

is

said

the

to

be

optimal, w h e n its asymptotic variance AV[£] is not larger than that of

any

other

estimators

for

any

sequence § .

This

definition

of

optimality is very strong so that there might not exist the optimal estimator.

The

geometrical

obtaining a necessary existence

of

the

method

can

solve

the

problem

of

and sufficient condition that guarantees the

optimal

estimator

and

obtaining

the

estimating

function y(x, u) of the optimal estimator w h e n it exists. To this end, w e define a vector space R(u, § ) at

each

point

(u, 1 ) of M by the set of random variables R(u, S ) = ir(x)

| E[r] = 0, E [ r 2 ]

;a)

1(w;w,a)

indicates the operation of substituting

w

by

a). The discussion in the present paper evolves around the following formula for the conditional model function for a) given

(3)

(Barndorf f-Nielsen (1980,1983)) ^ i = p*(w;u)|a) ^ i p(o);o)|a)

(2) where

a

p*

is defined by * (o);w i| >a) = / »I^I^ 1 W - 1 M p* c(oi,a)|]Pe

205

206

Let j

t n

c = c(to,a) = log{ (2tt) ' c(w,a)}.

(4) This quantity

c

is often close to 0.

The interest of (2) is tied to the conditonality viewpoint, that is to cases where

a

is not only auxiliary,

in the above sense, but is also distribution constant, either exactly or approximately. We refer to a statistic having both these properties as an ancillary

(statistic).

In the case of ordinary repeated sampling with sample size

n,

if the approximation (2) is correct to order

0(n

(typically,

v = 2,3

or

then to that order

many developments traditionally requiring calculation of moments or cumulants can instead be carried out in terms of the mixed log model derivatives (1). For instance, if -3/2 (2) holds to 0(n ), confidence limits for a one-3/2

dimensional interest parameter, valid to ditionally on

a

0(n

)

con-

as well as unconditionally, can be thus

determined. Bartlett adjustment affords another instance, to be discussed later. (Barndorff-Nielsen (1986a,b)). Yet another exemplification is provided by formula (5) below. By Taylor expansion of the right hand side of (3) in around

a) one may derive an asymptotic expansion of

p*(ui;o)|a)

of the form

(5) where

p* (io;u) | a) = ip, (w-u;^) {1+R 1 +R 2 +. . . } ip^(•; A)

denotes the d-dimensional normal probabi-

lity density function with mean vector matrix

w

A,

where

i

0

and precision

is the observed information matrix

with elements (6) and where

R'v

is generally of order

0(n

under re-

207

peated sampling. In particular (w-oo)fS

R1 =

-1 - 1 / r s t + h r S t (w- U ;2i) /Jst} .

Here, and elsewhere, we employ Einstein's summation conrst vention. Furthermore, h ( • i s the tensorial Hermite polynomial of degree 3, corresponding to the precision -1 and

/

and

$,

31 /

are affine connections, in the sense

of differential geometry, on the parameter space of the a model M. For any real a the observed a-connection ? is defined by

(Barndorff-Nielsen

(7)

(1986a))

? = 1±2L v + 1z°l Y 'rst 2 "rs; t 2 ^tjrs

These connections are

'observed analogues' of the Chentsov-

a

Amari connections The expansion

T.

(Chentsov

(1972), Amari

(1985).)

(5) has some similarity to but is distinct

from the Edgeworth expansion for the distribution of Thus

oj .

(5) employs mixed log model derivatives instead of

(approximate) cumulants of

w.

(5) is valid as an asymptotic expansion

Note that

spective of the accuracy with which mates

p(w;w|a)

p*(w;w|a)

and irrespective of whether

irre-

approxia

is

a

there is

(approximately) distribution constant or not. For fixed value of the auxiliary statistic in general

(locally, at least) a smooth one-to-one

spondence between 14=

(1^ (w) , . . . ,

to

corre-

and the score vector

(u) ) .

Hence, by the usual formula for

transformation of probability densities,

(2) can be trans-

formed to a formula for the conditional distribution of 1 + • Equivalently, one may transform to fi ^

is the square root of the matrix

for the matrix with elements

1

r;s

= j! •

^

Writing

one finds A

(8)

P (If ~

2

| a) = c(o,,a){|j| | ^ | / | l . | } ^ e f

1(a))

-

1(a))

where 1# F

208

where on the right hand side oi has to be expressed in terms of 1 + (and a). The relation (2) is, in fact, exact for most transformation models as well as for a variety of other models (Barndorff-Nielsen (1 983), Blassild and Jensen (1984), Barndorff-Nielsen and Blaesild (1986a)). Outside these cases the best that can generally be achieved is an -3/2 asymptotic (relative) error of order 0(n ) . In particular, if M is an exponential model of -3/2 order k and if d = k then (2) is valid to order 0(n ' ). -3/2

Now suppose that (2) holds with error 0(n ' ) and let MQ be a submodel of M, having parametric dimension dp < d. Using the asymptotic normality of the score vector for M, under the hypothesis MQ, one may, by standardizing the part of the score vector orthogonal to MQ so as to have variance matrix equal to the unit matrix asymptotically, construct a supplementary auxiliary statistic of dimension d-dn so as to make (2) valid to -1

order 0(n ) under MQ- In fact, there is a considerable variety of approximately distribution constant statistics of dimension d-dn which could serve in the capacity of supplementary auxiliary and yielding accuracy 0(n- 1 ). However, demanding accuracy 0(n-3/2 ' ) of (2) narrows down the choice significantly. More specifically and supposing, for simplicity, that d n = d-1, it can be shown that accuracy 0(n-3/2 ' ) of (2) as applied to M Q may be achieved by taking as supplementary

auxiliary

+

r

= r - bias correction

where r = ±/2{1(£)-10(u0) }

(9)

is the signed log likelihood ratio statistic for testing Mn

against

M;

and that this choice of a

supplementary

209

auxiliary is unique to the asymptotic order concerned. (Barndorff-Nielsen The statistic

(1984,1986b).)

r^

is asymptotically

N(0,1)

distribu-

- 1

ted to order

0(n

).

By introducing a variance adjustment

it is possible to establish a statistic r* = r + /s.d. adjustment

(10)

which is asymptotically standard normal to order This may be used for a refined test of

Mq

0(n

versus

-3/2

M,

).

as

well as for the role of supplementary auxiliary. Moreover, -3/2 with error 0(n ' ), r* = r-r~1 log K ,

(11)

where K is a certain explicitly given function of the observations, and the right hand side of (11) is often simpler to calculate than (10). In case M is a (k,k) exponential model while Mq is a (k,k-1) exponential model we have - 1

_36 T 8«

K = |r 89 -¿PT

(u

0)

9 (w) -9 (o)0)

For details and examples, see Barndorff-Nielsen (1986b). Let b = d~1 E^w

(12) where

w

is the log likelihood ratio statistic for

210

testing a particular value of (13)

10 under

M,

i.e.

w = 2{1 (w) -1 (to) } .

The quantity

b

and suitable approximations thereof are

termed Bartlett adjustments for the log likelihood ratio statistic. The Bartlett adjusted version

w' = w/b of 2 is, in wide generality, asymptotically x -distributed

w on

d

degrees of freedom, the degree of approximation to 2 -3/2 the limiting x distribution being 0(n ' ) , or even - 2

0(n

-1

),

as opposed to

0(n

The norming quantity

c

)

for

w

itself.

is related to

b

by the

approximate relation bie-(2/d)5.

(14)

(Barndorff-Nielsen and Cox (1984).) Decomposition of the norming quantity

c

(or, equiva-

lently, of Bartlett adjustments) into invariant terms can be achieved by the use of strings, a differential geometric concept generalizing those of tensors, connections, and derivatives of scalars (functions).

(Barndorff-Nielsen

(1 986c), Barndorf f-Nielsen and Blaesild (1986b,c); see also McCullagh and Cox (1986) which provides and discusses the first example of such a decomposition.) (p,q)

A

a sequence M

string of length M

(m,n)

of multiarrays

M

(m
m

into

y

_

3~ma)t

c 11 SiJ;

The string

M

blocks and where

. cm . . . dip

is said to be a costring if

a contrastring if

m = 0.

n = 0

and

These types of strings can be

represented in terms of tensors and special, simple kinds of strings. In particular, any costring can be represented as the intertwining of a connection string, i.e. a (1,0) costring, and a sequence of tensors. Mixed log model derivatives provide examples of strings, and so do moments and cumulants of log likelihood derivatives. Tensors determined by associated intertwining operations may be used to obtain invariant decompositions. For example, to order 0(n-3/2 ) we have •¿r . + 12|< ) i r S ai t U t 24 (v(3K ' rstu 'rt;su" + (3J'drst' ^ uvw+2tf'rtv' . yi suw ) i"X S i°t U fü ™ \J , where the right hand side is a sum of four invariant (separately interpretable) terms, due to the fact that the quantities H are tensors. These tensors were obtained by means of intertwining, applied to the first few of the mixed log model derivatives. For further examples and an extensive study of the mathematical properties of strings, see Barndorff-Nielsen and Blassild (1986b,c).

212

REFERENCES Amari, S.-I. (1985). Differential-Geometric Methods in Statistics. Lecture Notes in Statistics 28. SpringerVerlag, Heidelberg. Barndorff-Nielsen, O.E. (1980). Conditionality resolutions. Biometrika 67, 293-310. Barndorff-Nielsen, O.E. (1983). On a formula for the distribution of the maximum likelihood estimator. Biometrika 70, 343-365. Barndorff-Nielsen, O.E. (1984). On conditionality resolution and the likelihood ratio for curved exponential models. Scand. J. Statist. 11, 157-170. Corrigendum: 1985, Scand. J. Statist. 12, 191. Barndorff-Nielsen, O.E. (1986a). Likelihood and observed geometries. To appear in Ann. Statist. Barndorff-Nielsen, O.E. (1986b). Inference on full or partial parameters, based on the standardized signed log likelihood ratio. Biometrika 73, 307-322. Barndorff-Nielsen, O.E. (1986c). Strings, tensorial combinants, and Bartlett adjustments. Proc. Roy. Soc. London A 40j>, 1 27-1 37. Barndorff-Nielsen, O.E. and Blaesild, P. (1986a). Combination of reproductive models. Research Report 107, Dept. Theor. Statist., Aarhus University. To appear in Ann. Statist. Barndorff-Nielsen, O.E. and Blaesild, P. (1 986b). Strings: Mathematical theory and statistical examples. Research Report 146, Dept. Theor. Statist., Aarhus University. To appear in Proc. Roy. Soc. London. Barndorff-Nielsen, O.E. and Blaesild, P. (1986c). Strings: contravariant aspect. Research Report 152, Dept. Theor. Statist., Aarhus University. Barndorff-Nielsen, O.E. and Cox, D.R. (1984). Bartlett adjustments to the likelihood ratio statistic and the distribution of the maximum likelihood estimator. J.R. Statist. Soc. B 46, 483-495. Blaesild, P. and Jensen, J.L. (1985). Saddlepoint formulas for reproductive exponential models. Scand. J. Statist. 12, 193-202.

213

Chentsov, N.N. (1972). Statistical Decision Rules and Optimal Inference. (In Russian). Nauka, Moscow. English translation 1982. Translation of Mathematical Monographs Vol. 53. American Mathematical Society, Providence, Rhode Island. McCullagh, P. and Cox, D.R. (1986). Invariants and likelihood ratio statistics. To appear in Ann. Statist.

ON ASYMPTOTICALLY COMPLETE CLASSES OF TESTS Bernstein A.V. Moscow, USSR Let X'"')=(X1)...,XH,) bution P^ Let P^

be the sample from the distri^ eRk,

depending on the unknown parameter

have the density p(:x,{!) w.r.t. some

6"-finite

measure. Based on this sample the null hypothesis {• = 0

is tested against the sequence of the conti-

guous alternatives where

A

H0".

H iri ;

6

,

is the given subset of

R

k

A , and }

The decision rule in this asymptotic (as.) testing problem (t.p.) is determined by the test-sequence (t.s.) {(f^)

^(X'"" 1 )

where for each n. the test

on the sample X

(rL)

of size

n,

is based

. This test is characteri-

zed by their power function

where

to. -fold product measure of identical

denotes

components P^

=

. Denote by

the level of the test 0

(5^(0.).

Remark 2. The coefficient C(B) in fined as the

^

m | CJ €

(8) for &g ^

(4) the if IB)

in

3cC+l$-l(ç

./QY,oÇ+S

(11) reduces to

~(a))

and for the l.f. (5)

« J

M

-

d

W

^ d&j

+ lfadlM/

à

+

Therefore in this casegthe m.l.e. © g U) (4)

for

oTfôMl

y=i «

*

£1

where for the l.f.

«

3ot+£$-%•

a expl-^fauMaj(12) u V-r 0

S2 (D=UD (5)

i

d(r)'1

ilx(2)=

e x p { -

¡m(ti)du}.n3i

%

0

4. Some admissibility results. Definition.

^

An estimator

is said to

be second order admissible in there do not exist estimators

P'*

P3

P'*

-

coincides with

(9j)

X

and for the l.f.

*±lm(9j)) a

fa

a

(s.o.a.) if Gg

£ (§^(0)

with

P.

The next assertion is an immediate consequence from the theorems 1,2. Assertion 1.

The estimator

^fi^tj is s.o.a. iff there

does not exist nontrivial positive solution to the inequ• lity

S

,

,

LU>P = .Z

6 « 0.

Proof. Note that for any positive u)

u)-|

^ Lu) f LOlO-j

u), CO j € C with

^

=

(©) U)^ To" '

230 and that

L ^

\f ±

COnSt

implies L , T

Í 1/

with strict inequality at least in a point. Then apply theorem 2 and the remark 1. L e t W / ® ) be the class of piecewise continuously rentiable functions ©

^

diffe-

vanishing outside compacts in

, with square integrable first order derivatives. Theorem 3.

The estimator 0g 3 co

f €V(0), y

:

'-s

iff

1 } = 0 for any compact K c ©

where

J ú/(G>)Z

CL(0j)(M)*'d9 4

©

.

(14)

J

The proof is a straight generalization of that of Theorem 4.1 in Levit

)1985).

We present next a useful necessary condition for s.o.a. in the special case Theorem 4. of

Let for a given relatively open subset Q.

= { 6 | I6|= 1 }

be contained in

©

c

the

coneC={8|e=7^í>qi>eQ}

. Suppose that both the functions0.(2)

and n

(x) are regularly varying at the points U , 0 0

and

that the integral J C l ^ ^ ( ^M ? is ) ^s.o.a. c i l Then diverges at the points 0 and 0 0 . The proof is a straight generalization of that of Theorem 4.5 in Levit

(1985). One has only

any relatively open Gl^-

to note that for

not having any points in

common with the coordinate axis the expression 0.(2 l

>

uniformly in

1 > 0}

)

i

s

bounded from zero

Q. .

5. Some applications. It^is clear from the s.o.a. of the m.l.e. or of the functions of ©

(12), (13) that

depends only on the behavi-

LÍ1)} tn("l) andcL(Z) near the endpoints

(but not on the particular family

1)

at hand).

However for the illustrative purposes we consider here two classical families, the first being the normal density

231

the o t h e r b e e i n g the Poisson

distribution

L e t us c o n f i n e t o one of the l o s s f u n c t i o n s £ a 06/5

W

^

^ Z

=

R i 0c:

9:

jh

oi. ©: I ,

j

y

J

( o t > 0 +

'

(16)

J

We s t a t e f i r s t some a d m i s s i b i l i t y r e s u l t s postponing l a t e r a remark on t h e i r s t a t i s t i c a l A s s e r t i o n 2. The m . l . e . a) f o r b) f c)

o

iff r

i

f

is

^=3, S

forf^Wi

meaning.

s.o.a.

(

f

a

n

d

then f o r any oC > 0 );

or i c = 3 ^

iff

£ = -



P r o o f , a) From ( 6 ) ( 1 2 )

o r S ^ ^ ^ ^ O .

one e a s i l y

obtains

a(z) = const• 2\ q H*) = 1 \

= (ot+s){3-k)/(z (PWJ.

According t o Theorem 4 the necessary c o n d i t i o n is

c

o

Substituting

to

Kf) =

f

£

I

a

s

t

in

(14)

.

(

1

7

)

reduces

~

w

iff

for

= 0 , in which case act) =

Now i t

-

oC=Jt;

d) f o r f ^ VC^j, i f f

s.o.a.

till

,

(^T-J

.

dz

^

-

,

VCW(R$)

ljr>0j

k * 0 )

etc.

J

Remark 6. There is an abundant statistical literature on admissibility for the families ^ £ as well as for other distributions within the exponential family, main-

233

ly restricted to the case cC —Si in (15), (16); see the list of references below and further references therein. Analyzing s.o.a. of the m.l.e. for different oL exhibits the curiously sensitive way in which its admissibility depends on the particular choice of parametrization / ¿ j / y as well - as i$ the case with the families as on the loss function at hand, as in the case ^ ^ . The situation thus presented seems to demonstrate that the whole affair of relying on the admissibility of estimators has in a way to be reconsidered. But looking at it yet in another way forces one to admit the fruitfullness of relating the admissibility properties to the existence of corresponding non-trivial positive superharmonic functions and through this to some more involved mathematical fields of a running interest. 6. A sufficient condition for s.o.a. Turning back to Assertion 1, let iOfPj^Offy;..• n(B$lM6)=(itaga(&)) with both • Q 0 O and Q,(t) positive even functions, a i O e C ^ h & X m ^ C ^ Z * ) • Denote h j l ) = C K t ) Q ( ? ) ( J ' S l H u ) ^ ) ~ &nd provided the last C L ( t ) n H l ) { j ^ ( u H u . ) integral converges, A

L

0

S

0

J

*Z- /

Lemma 1. Let J (•¿^¿¿

-

C(P)=

Q

rG -v

o c L t l

k ~ H u )

as ij, —^

= P-d",r"

i m p l i e s tf =

d".

240

Let Q := {Qj,r '• & € ©, T € § } , where Q denotes the family of prior distributions T|C. It is easy to see that (see e.g. Pfanzagl and Wefelmeyer (1982), p. 227 and p. 228) Q) contains all functions (;x,ri) - > c r ( z , t ? , r ? ) + fc(»?) with c € R and

k e T ( T , g ) ,

and that k * ( ( x , r¡),Q „ ,

where

f { x , d , r ¡ )

:= ^

r

l o g p ( x ,

Finally, the level space is

)

=

1?,

f ( x , 0, n ) / Q o , r i ^ M , - )

2

) .

r¡).

—>

{ ( x , r ¡ )

k ( r ¡ ) :

G T(r,

k

Q ) } .

According to the general prescription outlined in Part 1, we have to determine conditional expectations, given U. Thanks to the special nature of the function U as the projection into the first component, this becomes particularly simple: For any function / : X x H —> R which is integrable with respect to Qi9,r, we have

{7)

{ Q

*'

r f ) [ x )

~

J

p

M

v

W

r

,

)

To obtain the canonical gradient, k * ( - , P * U expectations of the elements of the level space,

x e X

'

) ,

-

we determine the conditional

and the conditional expectation of £'(-, ti, •), which is (9)

/ e ' { x ^ , r

J

) p { x , ' d , T , ) r { d

S p ( x , d ,

V

) T ( d

V

n

)

)

Let (/(•, 1?, r ) denote the orthogonal component of this function with respect to K { d , T). With this notation, the canonical gradient can be expressed as (10)

r) = d(x)t?,r)/P„,r(d(-,t?,r)2).

From this we obtain the asymptotic variance bound (11)

l/P^v{d{-^Tf).

The application of the general result of P a r t 1 to the problem of unknown random nuisance parameters has led us to a result which was obtained earlier (see Pfanzagl and Wefelmeyer (1982), Section 14.3) in a direct way.

241

Now we apply this result to a more special model, namely (12)

p(x, i>, r?) = 9(1, t?)p 0 (S(x, tf), 1?, V)-

The representation (12) is not unique, and it is convenient in applications to have a certain freedom in choosing q and po- For some purposes, it is, however, advantageous to use a certain canonical form of (12), in which po(-,tf,»?) is a density of (with respect to an appropriate a-finite measure v * S(-, not depending on 1?). Whenever a representation (12) exists, there also exists a canonical representation of this type. (The argument brought forward in connection with (14.3.11) in Pfanzagl and Wefelmeyer (1982), p. 232, remains true if the sufficient statistic S depends on 1?.) The level space if (1?, T) (see (8)) now consists of all functions Jfc(r 7 )p 0 (S(x,i?),t?,> 7 )r(d7 ? ) fp0(s(x,0),#,v)r(dv)

'

All these functions are contractions of S(-,i?) (i.e. they depend on x through S(x, i?) only). Determining orthogonal components with respect to i f ( i ? , r ) becomes particularly simple if K(1?, T) is the class of all functions in A, -Ptf.r) with expectation zero which are contractions of 5(-,i?). This is the case if (see Pfanzagl and Wefelmeyer (1985), p. 95, Proposition 3.2.5) (») C2(H,C,

the family Q of prior distributions is full (i.e.

if T(T, §) = {k G

T) : T(fc) = 0 } ) , a n d

(it) the family {P#,v * S(-,i9), tj G H} is complete for every t? G 9 . Since many interesting applications are of this type, we consider it now in more detail. Up to now we have not yet introduced explicitly the image space of 5(-, i>). Assume this is a measurable space (Y,D). Then the class of all functions in ¿•2{X,A,Pt l T) which are contractions of S(-,i9) is {h o

: h G L2(Y, D,P»,t

* SM))}.

Hence our assumption about K{d, T) may be written as K{8, T) = {ho S{; 0): he

£2(Y,

D, Pr(a'(.,tf)))2.

The

243

Since f^ j r(a*(-,i9)) = —J > 1 > i r(a(-,$)d(-,i?,r)), this asymptotic variance is, in general, larger than the asymptotic variance bound (11). The asymptotic variance coincides with the asymptotic variance bound if S*(•, i?) belongs to K(1?, T), for in this case d(x,i?,T) = a(x,i?, T). S"M) € is trivially fulfilled if S(-,d) is in its canonical form, we have P

s