183 27 43MB
English Pages 578 [580] Year 1993
Statistical Sciences and Data Analysis
Editorial Board K. Matusita, Editor-in-Chief (Teikyo University) M.L. Puri (Indiana University) T. Hayakawa (Hitotsubashi University) K. Hirano (The Institute of Statistical Mathematics) S. Konishi (The Institute of Statistical Mathematics) N. Kashiwagi, Managing Editor (The Institute of Statistical Mathematics)
Advisory Board S. Nisihara (Sophia University) M. Siotani (Meisei University) N. Inagaki (Osaka University)
STATISTICAL SCIENCES AND DATA ANALYSIS PROCEEDINGS OF THE THIRD PACIFIC A R E A STATISTICAL CONFERENCE
EDITORS KAMEO MATUSITA MADAN L. PURI TAKESI HAYAKAWA
/ / / V S P / / / Utrecht, The Netherlands, 1993
VSPBV P.O. Box 346 3700 AH Zeist The Netherlands
© VSP B V 1993 First published in 1993 ISBN 90-6764-150-2
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner.
CIP-DATA KONINKLIJKE BIBLIOTHEEK, DEN HAAG Statistical Statistical sciences and data analysis: proceedings of the third Pacific Area Statistical Conference / ed. by K. Matushita... [et al.]. - Utrecht [etc.]: VSP Conference held in Makuhari on 11 - 1 3 December 1991. ISBN 90-6764-150-2 bound NUGI815 Subject headings: statistical sciences / data analysis
Printed
in The Netherlands
by Koninklijke
Wöhrmann
BV,
Zutphen.
Contents Kullback-Lei bler information for ordering genes using sperm typing and radiation hybrid mapping H. Chernoff
•••
1
Statistical models for forecasting tornado intensity J. F. Monahan, K. J. Schrab and C. E. Anderson
•••
13
Incorporating geographic distribution into the expected number of deaths in a comparative study T. Yanagawa and D. G. Hoel •••
25
Fitting random coefficient regression models R. Beran
•• •
33
Design and analysis in repeated measurement experiment: on use of preliminary knowledge about covariance structure of observations Y. Uragari and M. Goto
•••
41
Least squares estimators of regression coefficients by using misspecified covariance structure for error process Y. Usami and M. Huzii
•••
49
Estimation of regression parameters when the errors are autocorrelated E. J. Chen and A. K. Md. E. Saleh
•••
61
•••
77
H. Drygas
•••
87
S. Morgenthaler
•••
97
Centering and scaling in ridge regression M. Jimichi and N. Inagaki Reparametrization methods in linear minimax estimation Robust tests for linear models Circular regression Y. R. Sarma and S. R. Jammalamadaka
•••
109
Properties of least squares methods for choosing the parameter of the simple exponential smoothing predictor C. H. Chen and M. Huzii
• -•
129
Characterization of MTV model and its diagnostic checking T. Kariya
•••
143
vi
Contents
Limit theorems for statistical inference on stationary processes with strong dependence Y. Hosoya •••
151
Two sample problem in time series analysis M. Kondo and M. Taniguchi
•••
165
N. Yoshida
•••
175
Estimation of levels of intensity in a simple self-correcting point process T. Hayashi and N. Inagaki
•••
181
Additional information and precision of estimators in multivariate structural models Y. Kano, P. M. Bentler and Ab Mooijaart •••
187
Sensitivity analysis in covariance structure analysis: a numerical investigation in the case of confirmatory factor analysis S. Watadani and Y. Tanaka •••
197
Malliavin calculus and higher order statistical inference
Likelihood ratio tests for means and covariances with incomplete multinormal observations S. B. Provost •••
211
Maximally orthogonally invariant higher order moments and their application to testing elliptically-contouredness A. Takemura •••
225
Asymptotic theory for the concentrated matrix Langevin distributions on the Grassmann manifold Y. Chikuse •••
237
On canonical correlations and the degrees of non-orthogonality in the three-way layout J. Berube, R. E. Hartwig and G. P. H. Styan
•••
247
Partial canonical correlations associated with the inverse and some generalized inverses of a partitioned dispersion matrix H. Yanai and S. Puntanen
•••
253
Approximation to the upper percentiles of T^ a l -type statistics T. Seo and M. Siotani
•••
265
•••
277
Curvature measures in data analysis R. E. Kass and E. H. Slate
Contents
Statistical inference from observations with censoring and grouping for exponential families S. Eguchi
•••
291
Pitman closeness of the equivariant shrinkage estimators of the normal variance N. Sugiura
•••
301
Normal approximation to the distribution of the sample mean in the exponential family R. Nishii and T. Yanagimoto
•••
313
A new approach to asymptotic distributions of maximum likelihood ratio statistics J. Zhang and G. Y. Li •••
325
Test of homogeneity of parameters T. Hayakawa
•••
337
Determining the no-observed-adverse-effect level in continuous response Y. Kikuchi, T. Yanagawa and H. Nishiyama
•••
345
Asymptotics on the statistics for a family of non-regular distributions M. Akahira
•••
357
Error bounds for asymptotic expansions of some distributions in a SUR model S. Mukaihata and Y. Fujikoshi
•••
365
Second order asymptotic bound for the variance of estimators for the double exponential distribution M. Akahira and K. Takeuchi •••
375
Asymptotic expansions for J?j{min(i, m)} and -E»{íniin(ím)} H. Takahashi
•••
383
•••
395
On the central limit theorem in Hilbert space with application to {/-statistics M. L. Puri and V. V. Sazonov •••
407
Asymptotics of the perturbed sample quantile for a sequence of m-dependent stationary random process M. L. Puri and S. Sun •••
415
The L\ complete convergence of recursive kernel density estimators under weak dependence L. T. Tran •••
427
Aspects of goodness-of-fit M. A. Stephens
viii
Contents
Multivariate Zq-norm estimation and the vulnerable bootstrap P. K. Sen Convergent rates of M-estimators
for a partly linear model G. Y. Li and P. D. Shi
••• •••
441 451
Discrete distributions related to succession events in a two-state Markov chain S. Aki and K. Hirano
•••
467
• ••
475
•••
489
An analysis of degradation d a t a of a carbon film and the properties of the estimators ••• K. Suzuki, K. Mala and S. Yokogawa
501
Bayesian analysis for exponential zero-failure d a t a D. J. Tang and S. S. Mao
•• •
513
The estimation of distribution under a particular random censoring W. L. Lu
•••
521
New main effect plus one plans for 2 7 factorial experiments and their robustness property against deletion of runs S. Ghosh
•••
529
Profiles of 2 m factorial designs S. Yamamoto, Y. Fujii, Y. Hyodo and H. Yumiba
•••
543
Characterization and optimality of block designs with estimation of parameters under mixed models S. Kageyama and R. Zmyslony
• ••
559
How large the class of waiting distribution can be?
P. D. Chen
The use of the inverse Gaussian model for analysing the lognormal d a t a E. Yamamoto and T. Yanagimoto
Preface The Third Pacific Area Statistical Conference was held in Makuhari in the outskirts of Tokyo, on 11-13 December 1991 under the auspices of the Pacific Statistical Institute and with the support and cooperation of the Foundation for Advancement of International Science, the Japan Statistical Society and the Institute of Statistical Mathematics. There were about 180 participants from the greater Pacific area as well as other continents. We are pleased to present herewith the Proceedings of this Conference. The main theme of the Conference was "Statistical Sciences and Data Analysis." Its purpose was to bring together researchers in statistics and related fields from those areas to exchange results and problems in topics of mutual interest and concern. The papers comprising this volume were presented at the Conference. All papers were subsequently examined by referees before their inclusion here. These papers contain many recent developements in statistical sciences and data analysis and in application. Consequently, this book will be of interest both to statisticians and to researchers in other fields who apply statistical methods in their work. The Conference has benefited greatly from the generous financial support of several industrial and commercial organizations, including those which are affiliated with the Japan Federation of Economic Organizations, the Japan World Exposition Commemorative Funds, the Kajima Foundation and Chiba Convention Bureau. To these organizations we express our deepest gratitute. In the preparation of this volume we had the help and cooperation of many people in the refereeing of papers. We thank them sincerely.
K. Matusita
October
1992
Slat. Sci. & Data Anal., pp. 1-11 K. Matsusita ct al. (Eds) © VSP 1993
Kullback-Leibler Information for Ordering G e n e s Using S p e r m Typing and Radiation Hybrid M a p p i n g HERMAN CHERNOFF Harvard University, Cambridge, MA 02138, USA and Mathematical Sciences Research Berkeley, CA 91,120, USA
Institute,
Abstract. Two technologies applicable to gene mapping are those of sperm typing and radiation hybrid mapping. They are used to determine the ordering of the genes. For each of these methods, the analysis grows in complexity as the number of genes being considered increases. At the same time the accuracy of the probabilistic models used in the analysis becomes more questionable. On the other hand the ability to determine the order of three genes may be enhanced by the inclusion, in the analysis, of the data on nearby genes. For both of these methods, Kullback-Leibler information numbers are derived to test hypotheses involving the order of m genes. These information numbers are computed for testing hypotheses concerning the ordering of three genes with and without considering the presence of data involving other nearby genes. The results suggest when it pays to incorporate the additional data and how much radiation to use in radiation hybrid mapping. A MS 1980 subject classifications. Primary 621310; secondary 92D20. Keywords and phrases. Kullback-Leibler Information; sperm typing, radiation hybrid mapping, gene ordering. 1. I N T R O D U C T I O N Two technologies applicable to gene mapping are those of sperm typing and radiation hybrid mapping. Sperm typing makes use of the polymerase chain reaction, a biochemical technique which allows enormous amplification (production of multiple copies) of small, selected DNA fragments from a single chromosome. A sample of sperm from a single donor is analyzed to see which alleles (distinct forms of the various genes) are present in the individual sperms. The frequencies with which the various possibilities occur can be used to supply estimates of the ordering and of the recombination probabilities among the genes for which that donor is heterozygous (having different alleles of the same gene.) Radiation hybrid mapping employs a different technology where hybrid rodent cells containing a human chromosome are subjected to a close of radiation, which leads to breaking the chromosome into segments, a fraction of which are retained in succeeding generations. T h e simultaneous presence or absence of various genes provides indirect information on how close together these genes are, and also on the ordering of these genes. T h e results suggest when it pays to incorporate the additional data and how much radiation to use in radiation hybrid mapping. For each of these methods, the analysis grows in complexity as the number of genes
Herman ChernojJ
2
being considered increases. At the same time the accuracy of the probabilistic models used in the analysis becomes more questionable. On the other hand the ability to determine the order of three genes may be enhanced by the inclusion, in the analysis, of the data on nearby genes. For both of these methods, we shall examine the relevant KullbackLeibler information numbers for hypotheses concerning the ordering of three genes with and without considering the presence of data involving other nearby genes. The results suggest when it pays to incorporate the additional data, and how much radiation to use in radiation hybrid mapping. In Section 2 we introduce the model for sperm typing and discuss the maximum likelihood estimates of the recombination probabilities. In Section 3 we derive expressions for the relevant Kullback-Leibler informations for sperm typing. In Section 4 we describe the model for radiation hybrids and derive the corresponding information numbers. The outcome of the calculations is described in Section 5. We terminate this introduction with a brief discussion of the Kullback-Leibler (KL) information. Given two simple hypotheses concerning the (density) distribution f(x) of the data X, H0 : f ( x ) — fo(x) and Hi : / ( x ) — fi(x) the KL information for discriminating between Ho and Hi is K(f0, /0 = EJo{\og[MX)/fi(X)}}. (1) The subscript / 0 refers to the fact that the expectation is calculated for the case where the distribution of X is governed by fo- The information K measures the exponential rate at which the posterior probability of III approaches zero when H0 is true, as independent observations on X are obtained. It is particularly relevant in the design of sequential experiments, such as were discussed by Goradia and Lange [1]. Suppose now that under our model the density of X can be described in terms of a parameter 6, i.e. f ( x ) = f(x, 8), and the underlying probability distribution is governed by 0 = 00, and we are interested in a composite alternative 1i\ : I) £ iij to the true hypothesis // 0 : 0 = 0o. Then the appropriate measure is K(I-Io,Ih)
=
inf
£Ulog[/(A',0o)//(*,0i)]}
V\ fcUi
(2)
which can be decomposed into the following difference if either term is finite K(H0,Hi)
= £ „ 0 { l o g / ( A ' A ) } - sap ^ { l o g / ^ ) } . 0l6S2i
We shall suppress the subscript 0o when there is no danger of ambiguity. 2. T H E S P E R M T Y P I N G M O D E L A N D M A X I M U M L I K E L I H O O D Consider first the case of three genes for which the donor is heterozygous, and his two chromosomes have genes ABC and abc respectively. A sperm will have a chromosome providing one of the 8 following observations, ABC, ABc, AbC, Abc,aBC,aBc,abC,abc with probabilities depending on the recombination probabilities and the ordering of the three genes on the chromosome. Suppose that the genes appeared in the order ABC rather than ACB or BAC. Suppose also that the recombination probabilities (indicating the probabilities that in the reproduction process, the chromosomes would separate and recombine) between A and B is qia(, and between B and C is i,c. Finally suppose that the recombination events are independent. Then the probabilities associated with ABC, abc, and AbC, would be (1 — ab)( 1 — 4>bc)/2. (1 — ab)(l — 4>bc)/2, and respectively. The probabilities associated with the other 5 events can be calculated similarly.
Kullback-IAebler
Information
for ordering
yenes
3
W h i l e t h e estimation of bc are of interest and relevant, our main focus in the next section will be on deciding which is the correct one of the three possible orderings ABC,ACB,BAC. Note that without reference to other parts of t h e chromosome the orderings ABC and CBA are equivalent and we need consider only three, or half of t h e six possible permutations of ABC. It is also evident that the relevant information in t h e observed categories ABC and abc are equivalent, and thus we may combine these two observations into one equivalent one, ABC with probability (1 — ab)( 1 — bc) under t h e ordering ABC, and probability (1 — *c)( 1 — lc) under the ordering ACB, and probability (1 — *l)(l — **) under the ordering B A C . T h u s we need only consider 4 possible observations, e.g. ABC, ABc, AbC, and Abc, each representing a pair of t h e original 8 categories. In our analysis it would seem important to bear in mind that the statistician does not know which alleles appear on the original chromosomes. Thus, even with t h e order ABC, it might be that the original chromosomes of the donor have AbC and Abc. For our problem involving relatively small recombination probabilities, the d a t a would quickly and easily determine the form of the chromosome, for an original chromosome with ABC would lead to a great preponderance of the ABC observations independent of the order. Nevertheless it turns out that symmetry aspects of the analysis make it unimportant to hypothesize or estimate which alleles appear on each chromosome. Goradia and Lange [1] analyze two sequential methods of selecting t h e correct order. T h e y do not analyze the sequential probability ratio method, since the two approaches that they use are much easier for them to analyze. One may wonder whether there is a substantial loss of efficiency in using their methods. T h e related question that we address is whether there would be an increase in the efficiency of deciding the order of ABC if the analysis were extended to include 4 or 5 genes. Several complications arise in the use of K L numbers to address this question. One is that in ordering 4 (or 5) genes, there are 12 = 4!/2 (or 60 = 5!/2) possible orderings of concern. Another issue is t h a t it is more difficult to find t h e donor who is heterozygous on four, rather than three, specified genes. Finally, technical problems in the technology may make the simple extension of t h e above probability model less reliable in the application to four or more genes. In any case, when the K L numbers indicate that there is little to be gained by introducing 4 or 5 genes, then it makes sense to confine attention to three at a time. In case there is a potential gain of a great amount of information, then one ought to consider t h e relative merit of doing the possibly more complicated analysis required to deal with more than 3 genes. Assuming t h e order ABC, the likelihood, based on H A B C ^ ABO^Abc, and n A b c observations ABC,ABc, AbC,Abc respectively, is L
=
[(1 -
ID. • - xl» -' t ) ( 2 - r ) •J
r(i
_
rT)(i
_ ft)
vanishes only at r = 4>ij in the interval (0,1), and indeed, u>;j(r) attains its maximum value W(i,) =
r ( l - ra) log(l - ripa) + rrij l o g ( r r ^ ) + r( 1 - rij) log(l - rtf>ij)
(24)
at r = Incidentally, this result could also be derived without calculating the derivative, by noting the relationship between Wij and a Kullback-Leibler number and that a KL number is always nonnegative, and hence an expression of the form Ee0[\og f (x,0)] attains its maximum value when 9 = 0q. We axe now in position to calculate the KL information. Let Ho : 0 = 0o correspond to the permutation w° and = (^12, ^23, •• -,0m-l,m ), and Hi : 0 = 0i correspond to the permutation tt" and Then, with E$0 represented by E, we have K(H0,Hi)
= =
E\ogf(X,0o)-Elogf(X,O1) E log -JSlog
=
~V(r)
n m—1 + E t=l
which is minimized with respect to " by the ordering 7r, m —1 I