Statistical Sciences and Data Analysis: Proceedings of the Third Pacific Area Statistical Conference [Reprint 2020 ed.] 9783112318867, 9783112307595


185 27 43MB

English Pages 578 [580] Year 1993

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Contents
Preface
Kullback-Leibler Information for Ordering Genes Using Sperm Typing and Radiation Hybrid Mapping
Statistical Models For Forecasting Tornado Intensity
Incorporating Geographic Distribution into the Expected Number of Deaths in a Comparative Study
Fitting Random Coefficient Regression Models
Design and Analysis in Repeated Measurement Experiment : on Use of Preliminary Knowledge about Covariance Structure of Observations
Least Squares Estimators of Regression Coefficients by using Misspecified Covariance Structure for Error Process
Estimation of Regression Parameters When the Errors are Auto correlated
Centering and Scaling in Ridge Regression
Reparametrization Methods in Linear Minimax Estimation
Robust Tests for Linear Models
CIRCULAR REGRESSION
Properties of Least Squares Methods for Choosing the Parameter of the Simple Exponential Smoothing Predictor
Characterization of MTV model and its diagnostic checking
Limit theorems for statistical inference on stationary processes with strong dependence
Two Sample Problem in Time Series Analysis
Malliavin Calculus and Higher Order Statistical Inference
Estimation of Levels of Intensity in a Simple Self-Correcting Point Process
Additional Information and Precision of Estimators in Multivariate Structural Models
Sensitivity Analysis in Covariance Structure Analysis: A Numerical Investigation in the Case of Confirmatory Factor Analysis
Likelihood Ratio Tests for Means and Covariances with Incomplete Multinormal Observations
Maximally orthogonally invariant higher order moments and their application to testing elliptically-contouredness
Asymptotic Theory for the Concentrated Matrix Langevin Distributions on the Grassmann Manifold
On Canonical Correlations and the Degrees of Non-Orthogonality in the Three-way Layout
Partial Canonical Correlations Associated with the Inverse and Some Generalized Inverses of a Partitioned Dispersion Matrix
Approximation to the Upper Percentiles of T2max – type Statistics
Curvature Measures in Data Analysis
Statistical Inference from Observations with Censoring and Grouping for Exponential Families
Pitman Closeness of the Equivariant Shrinkage Estimators of the Normal Variance
Normal Approximation to the Distribution of the Sample Mean in the Exponential Family
A New Approach to Asymptotic Distributions of Maximum Likelihood Ratio Statistics
Test of homogeneity of parameters
Determining the No-Observed-Adverse-Effect Level in Continuous Response
Asymptotics on the Statistics for a Family of Non-Regular Distributions
Error Bounds for Asymptotic Expansions of Some Distributions in a SUR Model
Second Order Asymptotic Bound for the Variance of Estimators for the Double Exponential Distribution
Asymptotic Expansions for E0 {min ( t , m )} and E0{xmin(t,m)}
Aspects of Goodness-of-Fit
On the Central Limit Theorem in Hilbert Space with Application to U-Statistics
Asymptotics of the Perturbed Sample Quantile for a Sequence of m—dependent Stationary Random Process
The L1 Complete Convergence of Recursive Kernel Density Estimators Under Weak Dependence
MULTIVARIATE L1-NORM ESTIMATION AND THE VULNERABLE BOOTSTRAP
Convergent Rates of M-Estimators for a Partly Linear Model
Discrete Distributions Related to Succession Events in a Two-State Markov Chain
How Large the Class of Waiting Distribution Can Be?
The use of the inverse Gaussian model for analysing the lognormal data
AN ANALISYS OF DEGRADATION DATA OF A CARBON FILM AND THE PROPERTIES OF THE ESTIMATORS
Bayesian Analysis for Exponential Zero-failure Data
The Estimation of Distribution under a Particular Random Censoring
New Main Effect Plus One Plans For 27 Factorial Experiments And Their Robustness Property Against Deletion Of Runs
PROFILES OF 2m FACTORIAL DESIGNS
Characterization and Optimality of Block Designs with Estimation of Parameters Under Mixed Models
Recommend Papers

Statistical Sciences and Data Analysis: Proceedings of the Third Pacific Area Statistical Conference [Reprint 2020 ed.]
 9783112318867, 9783112307595

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Statistical Sciences and Data Analysis

Editorial Board K. Matusita, Editor-in-Chief (Teikyo University) M.L. Puri (Indiana University) T. Hayakawa (Hitotsubashi University) K. Hirano (The Institute of Statistical Mathematics) S. Konishi (The Institute of Statistical Mathematics) N. Kashiwagi, Managing Editor (The Institute of Statistical Mathematics)

Advisory Board S. Nisihara (Sophia University) M. Siotani (Meisei University) N. Inagaki (Osaka University)

STATISTICAL SCIENCES AND DATA ANALYSIS PROCEEDINGS OF THE THIRD PACIFIC A R E A STATISTICAL CONFERENCE

EDITORS KAMEO MATUSITA MADAN L. PURI TAKESI HAYAKAWA

/ / / V S P / / / Utrecht, The Netherlands, 1993

VSPBV P.O. Box 346 3700 AH Zeist The Netherlands

© VSP B V 1993 First published in 1993 ISBN 90-6764-150-2

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner.

CIP-DATA KONINKLIJKE BIBLIOTHEEK, DEN HAAG Statistical Statistical sciences and data analysis: proceedings of the third Pacific Area Statistical Conference / ed. by K. Matushita... [et al.]. - Utrecht [etc.]: VSP Conference held in Makuhari on 11 - 1 3 December 1991. ISBN 90-6764-150-2 bound NUGI815 Subject headings: statistical sciences / data analysis

Printed

in The Netherlands

by Koninklijke

Wöhrmann

BV,

Zutphen.

Contents Kullback-Lei bler information for ordering genes using sperm typing and radiation hybrid mapping H. Chernoff

•••

1

Statistical models for forecasting tornado intensity J. F. Monahan, K. J. Schrab and C. E. Anderson

•••

13

Incorporating geographic distribution into the expected number of deaths in a comparative study T. Yanagawa and D. G. Hoel •••

25

Fitting random coefficient regression models R. Beran

•• •

33

Design and analysis in repeated measurement experiment: on use of preliminary knowledge about covariance structure of observations Y. Uragari and M. Goto

•••

41

Least squares estimators of regression coefficients by using misspecified covariance structure for error process Y. Usami and M. Huzii

•••

49

Estimation of regression parameters when the errors are autocorrelated E. J. Chen and A. K. Md. E. Saleh

•••

61

•••

77

H. Drygas

•••

87

S. Morgenthaler

•••

97

Centering and scaling in ridge regression M. Jimichi and N. Inagaki Reparametrization methods in linear minimax estimation Robust tests for linear models Circular regression Y. R. Sarma and S. R. Jammalamadaka

•••

109

Properties of least squares methods for choosing the parameter of the simple exponential smoothing predictor C. H. Chen and M. Huzii

• -•

129

Characterization of MTV model and its diagnostic checking T. Kariya

•••

143

vi

Contents

Limit theorems for statistical inference on stationary processes with strong dependence Y. Hosoya •••

151

Two sample problem in time series analysis M. Kondo and M. Taniguchi

•••

165

N. Yoshida

•••

175

Estimation of levels of intensity in a simple self-correcting point process T. Hayashi and N. Inagaki

•••

181

Additional information and precision of estimators in multivariate structural models Y. Kano, P. M. Bentler and Ab Mooijaart •••

187

Sensitivity analysis in covariance structure analysis: a numerical investigation in the case of confirmatory factor analysis S. Watadani and Y. Tanaka •••

197

Malliavin calculus and higher order statistical inference

Likelihood ratio tests for means and covariances with incomplete multinormal observations S. B. Provost •••

211

Maximally orthogonally invariant higher order moments and their application to testing elliptically-contouredness A. Takemura •••

225

Asymptotic theory for the concentrated matrix Langevin distributions on the Grassmann manifold Y. Chikuse •••

237

On canonical correlations and the degrees of non-orthogonality in the three-way layout J. Berube, R. E. Hartwig and G. P. H. Styan

•••

247

Partial canonical correlations associated with the inverse and some generalized inverses of a partitioned dispersion matrix H. Yanai and S. Puntanen

•••

253

Approximation to the upper percentiles of T^ a l -type statistics T. Seo and M. Siotani

•••

265

•••

277

Curvature measures in data analysis R. E. Kass and E. H. Slate

Contents

Statistical inference from observations with censoring and grouping for exponential families S. Eguchi

•••

291

Pitman closeness of the equivariant shrinkage estimators of the normal variance N. Sugiura

•••

301

Normal approximation to the distribution of the sample mean in the exponential family R. Nishii and T. Yanagimoto

•••

313

A new approach to asymptotic distributions of maximum likelihood ratio statistics J. Zhang and G. Y. Li •••

325

Test of homogeneity of parameters T. Hayakawa

•••

337

Determining the no-observed-adverse-effect level in continuous response Y. Kikuchi, T. Yanagawa and H. Nishiyama

•••

345

Asymptotics on the statistics for a family of non-regular distributions M. Akahira

•••

357

Error bounds for asymptotic expansions of some distributions in a SUR model S. Mukaihata and Y. Fujikoshi

•••

365

Second order asymptotic bound for the variance of estimators for the double exponential distribution M. Akahira and K. Takeuchi •••

375

Asymptotic expansions for J?j{min(i, m)} and -E»{íniin(ím)} H. Takahashi

•••

383

•••

395

On the central limit theorem in Hilbert space with application to {/-statistics M. L. Puri and V. V. Sazonov •••

407

Asymptotics of the perturbed sample quantile for a sequence of m-dependent stationary random process M. L. Puri and S. Sun •••

415

The L\ complete convergence of recursive kernel density estimators under weak dependence L. T. Tran •••

427

Aspects of goodness-of-fit M. A. Stephens

viii

Contents

Multivariate Zq-norm estimation and the vulnerable bootstrap P. K. Sen Convergent rates of M-estimators

for a partly linear model G. Y. Li and P. D. Shi

••• •••

441 451

Discrete distributions related to succession events in a two-state Markov chain S. Aki and K. Hirano

•••

467

• ••

475

•••

489

An analysis of degradation d a t a of a carbon film and the properties of the estimators ••• K. Suzuki, K. Mala and S. Yokogawa

501

Bayesian analysis for exponential zero-failure d a t a D. J. Tang and S. S. Mao

•• •

513

The estimation of distribution under a particular random censoring W. L. Lu

•••

521

New main effect plus one plans for 2 7 factorial experiments and their robustness property against deletion of runs S. Ghosh

•••

529

Profiles of 2 m factorial designs S. Yamamoto, Y. Fujii, Y. Hyodo and H. Yumiba

•••

543

Characterization and optimality of block designs with estimation of parameters under mixed models S. Kageyama and R. Zmyslony

• ••

559

How large the class of waiting distribution can be?

P. D. Chen

The use of the inverse Gaussian model for analysing the lognormal d a t a E. Yamamoto and T. Yanagimoto

Preface The Third Pacific Area Statistical Conference was held in Makuhari in the outskirts of Tokyo, on 11-13 December 1991 under the auspices of the Pacific Statistical Institute and with the support and cooperation of the Foundation for Advancement of International Science, the Japan Statistical Society and the Institute of Statistical Mathematics. There were about 180 participants from the greater Pacific area as well as other continents. We are pleased to present herewith the Proceedings of this Conference. The main theme of the Conference was "Statistical Sciences and Data Analysis." Its purpose was to bring together researchers in statistics and related fields from those areas to exchange results and problems in topics of mutual interest and concern. The papers comprising this volume were presented at the Conference. All papers were subsequently examined by referees before their inclusion here. These papers contain many recent developements in statistical sciences and data analysis and in application. Consequently, this book will be of interest both to statisticians and to researchers in other fields who apply statistical methods in their work. The Conference has benefited greatly from the generous financial support of several industrial and commercial organizations, including those which are affiliated with the Japan Federation of Economic Organizations, the Japan World Exposition Commemorative Funds, the Kajima Foundation and Chiba Convention Bureau. To these organizations we express our deepest gratitute. In the preparation of this volume we had the help and cooperation of many people in the refereeing of papers. We thank them sincerely.

K. Matusita

October

1992

Slat. Sci. & Data Anal., pp. 1-11 K. Matsusita ct al. (Eds) © VSP 1993

Kullback-Leibler Information for Ordering G e n e s Using S p e r m Typing and Radiation Hybrid M a p p i n g HERMAN CHERNOFF Harvard University, Cambridge, MA 02138, USA and Mathematical Sciences Research Berkeley, CA 91,120, USA

Institute,

Abstract. Two technologies applicable to gene mapping are those of sperm typing and radiation hybrid mapping. They are used to determine the ordering of the genes. For each of these methods, the analysis grows in complexity as the number of genes being considered increases. At the same time the accuracy of the probabilistic models used in the analysis becomes more questionable. On the other hand the ability to determine the order of three genes may be enhanced by the inclusion, in the analysis, of the data on nearby genes. For both of these methods, Kullback-Leibler information numbers are derived to test hypotheses involving the order of m genes. These information numbers are computed for testing hypotheses concerning the ordering of three genes with and without considering the presence of data involving other nearby genes. The results suggest when it pays to incorporate the additional data and how much radiation to use in radiation hybrid mapping. A MS 1980 subject classifications. Primary 621310; secondary 92D20. Keywords and phrases. Kullback-Leibler Information; sperm typing, radiation hybrid mapping, gene ordering. 1. I N T R O D U C T I O N Two technologies applicable to gene mapping are those of sperm typing and radiation hybrid mapping. Sperm typing makes use of the polymerase chain reaction, a biochemical technique which allows enormous amplification (production of multiple copies) of small, selected DNA fragments from a single chromosome. A sample of sperm from a single donor is analyzed to see which alleles (distinct forms of the various genes) are present in the individual sperms. The frequencies with which the various possibilities occur can be used to supply estimates of the ordering and of the recombination probabilities among the genes for which that donor is heterozygous (having different alleles of the same gene.) Radiation hybrid mapping employs a different technology where hybrid rodent cells containing a human chromosome are subjected to a close of radiation, which leads to breaking the chromosome into segments, a fraction of which are retained in succeeding generations. T h e simultaneous presence or absence of various genes provides indirect information on how close together these genes are, and also on the ordering of these genes. T h e results suggest when it pays to incorporate the additional data and how much radiation to use in radiation hybrid mapping. For each of these methods, the analysis grows in complexity as the number of genes

Herman ChernojJ

2

being considered increases. At the same time the accuracy of the probabilistic models used in the analysis becomes more questionable. On the other hand the ability to determine the order of three genes may be enhanced by the inclusion, in the analysis, of the data on nearby genes. For both of these methods, we shall examine the relevant KullbackLeibler information numbers for hypotheses concerning the ordering of three genes with and without considering the presence of data involving other nearby genes. The results suggest when it pays to incorporate the additional data, and how much radiation to use in radiation hybrid mapping. In Section 2 we introduce the model for sperm typing and discuss the maximum likelihood estimates of the recombination probabilities. In Section 3 we derive expressions for the relevant Kullback-Leibler informations for sperm typing. In Section 4 we describe the model for radiation hybrids and derive the corresponding information numbers. The outcome of the calculations is described in Section 5. We terminate this introduction with a brief discussion of the Kullback-Leibler (KL) information. Given two simple hypotheses concerning the (density) distribution f(x) of the data X, H0 : f ( x ) — fo(x) and Hi : / ( x ) — fi(x) the KL information for discriminating between Ho and Hi is K(f0, /0 = EJo{\og[MX)/fi(X)}}. (1) The subscript / 0 refers to the fact that the expectation is calculated for the case where the distribution of X is governed by fo- The information K measures the exponential rate at which the posterior probability of III approaches zero when H0 is true, as independent observations on X are obtained. It is particularly relevant in the design of sequential experiments, such as were discussed by Goradia and Lange [1]. Suppose now that under our model the density of X can be described in terms of a parameter 6, i.e. f ( x ) = f(x, 8), and the underlying probability distribution is governed by 0 = 00, and we are interested in a composite alternative 1i\ : I) £ iij to the true hypothesis // 0 : 0 = 0o. Then the appropriate measure is K(I-Io,Ih)

=

inf

£Ulog[/(A',0o)//(*,0i)]}

V\ fcUi

(2)

which can be decomposed into the following difference if either term is finite K(H0,Hi)

= £ „ 0 { l o g / ( A ' A ) } - sap ^ { l o g / ^ ) } . 0l6S2i

We shall suppress the subscript 0o when there is no danger of ambiguity. 2. T H E S P E R M T Y P I N G M O D E L A N D M A X I M U M L I K E L I H O O D Consider first the case of three genes for which the donor is heterozygous, and his two chromosomes have genes ABC and abc respectively. A sperm will have a chromosome providing one of the 8 following observations, ABC, ABc, AbC, Abc,aBC,aBc,abC,abc with probabilities depending on the recombination probabilities and the ordering of the three genes on the chromosome. Suppose that the genes appeared in the order ABC rather than ACB or BAC. Suppose also that the recombination probabilities (indicating the probabilities that in the reproduction process, the chromosomes would separate and recombine) between A and B is qia(, and between B and C is i,c. Finally suppose that the recombination events are independent. Then the probabilities associated with ABC, abc, and AbC, would be (1 — ab)( 1 — 4>bc)/2. (1 — ab)(l — 4>bc)/2, and respectively. The probabilities associated with the other 5 events can be calculated similarly.

Kullback-IAebler

Information

for ordering

yenes

3

W h i l e t h e estimation of bc are of interest and relevant, our main focus in the next section will be on deciding which is the correct one of the three possible orderings ABC,ACB,BAC. Note that without reference to other parts of t h e chromosome the orderings ABC and CBA are equivalent and we need consider only three, or half of t h e six possible permutations of ABC. It is also evident that the relevant information in t h e observed categories ABC and abc are equivalent, and thus we may combine these two observations into one equivalent one, ABC with probability (1 — ab)( 1 — bc) under t h e ordering ABC, and probability (1 — *c)( 1 — lc) under the ordering ACB, and probability (1 — *l)(l — **) under the ordering B A C . T h u s we need only consider 4 possible observations, e.g. ABC, ABc, AbC, and Abc, each representing a pair of t h e original 8 categories. In our analysis it would seem important to bear in mind that the statistician does not know which alleles appear on the original chromosomes. Thus, even with t h e order ABC, it might be that the original chromosomes of the donor have AbC and Abc. For our problem involving relatively small recombination probabilities, the d a t a would quickly and easily determine the form of the chromosome, for an original chromosome with ABC would lead to a great preponderance of the ABC observations independent of the order. Nevertheless it turns out that symmetry aspects of the analysis make it unimportant to hypothesize or estimate which alleles appear on each chromosome. Goradia and Lange [1] analyze two sequential methods of selecting t h e correct order. T h e y do not analyze the sequential probability ratio method, since the two approaches that they use are much easier for them to analyze. One may wonder whether there is a substantial loss of efficiency in using their methods. T h e related question that we address is whether there would be an increase in the efficiency of deciding the order of ABC if the analysis were extended to include 4 or 5 genes. Several complications arise in the use of K L numbers to address this question. One is that in ordering 4 (or 5) genes, there are 12 = 4!/2 (or 60 = 5!/2) possible orderings of concern. Another issue is t h a t it is more difficult to find t h e donor who is heterozygous on four, rather than three, specified genes. Finally, technical problems in the technology may make the simple extension of t h e above probability model less reliable in the application to four or more genes. In any case, when the K L numbers indicate that there is little to be gained by introducing 4 or 5 genes, then it makes sense to confine attention to three at a time. In case there is a potential gain of a great amount of information, then one ought to consider t h e relative merit of doing the possibly more complicated analysis required to deal with more than 3 genes. Assuming t h e order ABC, the likelihood, based on H A B C ^ ABO^Abc, and n A b c observations ABC,ABc, AbC,Abc respectively, is L

=

[(1 -

ID. • - xl» -' t ) ( 2 - r ) •J

r(i

_

rT)(i

_ ft)

vanishes only at r = 4>ij in the interval (0,1), and indeed, u>;j(r) attains its maximum value W(i,) =

r ( l - ra) log(l - ripa) + rrij l o g ( r r ^ ) + r( 1 - rij) log(l - rtf>ij)

(24)

at r = Incidentally, this result could also be derived without calculating the derivative, by noting the relationship between Wij and a Kullback-Leibler number and that a KL number is always nonnegative, and hence an expression of the form Ee0[\og f (x,0)] attains its maximum value when 9 = 0q. We axe now in position to calculate the KL information. Let Ho : 0 = 0o correspond to the permutation w° and = (^12, ^23, •• -,0m-l,m ), and Hi : 0 = 0i correspond to the permutation tt" and Then, with E$0 represented by E, we have K(H0,Hi)

= =

E\ogf(X,0o)-Elogf(X,O1) E log -JSlog

=

~V(r)

n m—1 + E t=l

which is minimized with respect to " by the ordering 7r, m —1 I