177 81 145MB
English Pages 680 [664] Year 2020
STATISTICAL ASTRONOMY
THE ANDROMEDA NEBULA, Lick
Observatory
photograph
A GALAXY SIMILAR TO OUR OWN
ROBERT J . TRUMPLER AND HAROLD F. WEAVER
U N I V E R S I T Y OF C A L I F O R N I A PRESS Berkeley and Los Angeles
1953
University of California P r e s s Berkeley and L o s Angeles, California Cambridge University P r e s s , London, England Copyright, 1953, by The Regents of the University of California Printed in the United S t a t e s of America
Dedicated to the Memory of WILLIAM W A L L A C E
CAMPBELL 1862 -
1938
A Pioneer Investigator of Stellar Motions
PREFACE
The present text i s based essentially on a graduate course in Statistical Astronomy given regularly since 1935 at the University of California in Berkeley by the senior author. The purpose of the course and of this text i s to introduce the student to the principal statistical problems in Astronomy, to their mathematical formulation, and to methods and techniques of their solution. Completeness in covering investigations or in citations of numerical results has not been attempted; the references at the end of each part are merely suggestions for further study and reading, not a complete bibliography. In the presentation of subject matter logical arrangement seemed in general more appropriate to our aim than historical sequence. An effort has been made, however, to point out the historical order of developments in the text or in the Notes. Statistical Astronomy in the widest sense of the term comprises all applications of statistics to astronomy. Undoubtedly the most fruitful field of such applications is the study of the structure, constitution, and dynamics of our galactic star system, and the larger part of our text (Parts II to VI) has been devoted to this more restricted field of stellar statistics. The general discussion of statistical methods in Part I, however, was planned on a sufficiently broad basis to provide the student with tools for solving some of the many other statistical problems in astronomy such a s those connected with observational errors, binary stars, variable stars, asteroids, comets, extragalactic nebulae, and so forth. The earlier investigators in stellar statistics generally worked out solutions to their particular problems without recourse to general mathematical methods of statistics. In fact these latter were often not available at the time. At present, however, it seemed desirable to relate the special problems and solutions of stellar statistics to the basic principles of modern statistical theory. As a consequence, our derivations of methods and formulae often differ from those of the original publications. The choice of terminology and of a consistent system of symbols has been particularly difficult. Some kind of compromise had to be made between the language of the theoretical statistician and the time honored designations familiar to the astronomer. To prepare a student for research in Statistical Astronomy it i s necessary to give him not only a knowledge of the theory but to introduce him to the actual procedure of statistical calculations. Considerable attention has therefore been paid to numerical methods, and several numerical examples are worked out in
full detail. We are indebted to Professor Ronald A. Fisher, Cambridge, and to Messrs. Oliver and Boyd Ltd., Edinburgh, for permission to reprint Tables III and IV of their book "Statistical Methods for Research Workers". We gladly acknowledge our indebtedness to Dr. Elizabeth L. Scott, Assistant Professor of Mathematics and Research Associate in the Statistical Laboratory at the University of California. She has contributed the entire chapter 1.8, has read the whole manuscript, and made many helpful suggestions. To Mrs. R. J. Trumpler and Mrs. H. F. Weaver we express our thanks for the typing of the manuscript. We are grateful also to the President of the University of California and to the Benjamin Apthorp Gould Fund of the National Academy of Sciences for subsidies facilitating the publication of this book. Berkeley, August 15, 1951.
R. J . T. H. F. W.
CONTENTS
List of Symbols Part I
ELEMENTS
xvii OF STA TISTICAL
THEOR Y
Introduction Chapter 1 . 1
1.11 1.12 1.13 1.14 1.15 1.16 Chapter 1.2
1 Univariate
Distributions
3
Empirical Distribution Laws Continuous Frequency Functions Transformation of Variables Parameters of a Frequency Function The Normal Frequency Function Gram-Charlier Series Bivariate
and, Multivariate
Frequency
3 6 9 14 18 24 Functions
.
29
Bivariate Distributions Marginal and Array Distributions; Means and Moments . . . Stochastic Dependence Correlation The Bivariate Normal Frequency Function Bivariate Normal Distribution of Vector Components or Position Coordinates 1.27 Trivariate Distributions 1.28 The Trivariate Normal Frequency Function
.
.
.
29 32 35 37 42
1.21 1.22 1.23 1.24 1.25 1.26
Chapter 1.3
Transformation
Problems
in Multivariate
Frequency
Functions
1.31 Transformation of Variables in a Bivariate Frequency Function 1.32 Numerical Transformation of an Empirical Bivariate Distribution 1.33 Distribution of a Function of Two Variables 1.34 The Case of a Function of Two Independent Variables . . 1.35 Transformation of Multivariate Frequency Functions . . . 1.36 Distribution of a Set of Functions of n Variables . . . . Chapter 1.4
Integral Equations
of Statistics
49 56 61 69
69 76 78 82 87 92 95
1.41 Linear Integral Equations and Types of Solutions . . . . 95 1.42 Direct Solution of Volterra's Equation of the First Kind with Kernel of the Form K(xry) 97 1.43 Eddington's Solution of Fredholm's Equation of the First Kind 101
1.44 1.45 1.46 1.47
Solution by Fourier Transforms Solution by Interpolation Functions or Interpolation Series Solution by Numerical Integration Numerical Solutions by the Gaussian Integration Method .
104 . 108 112 .115
Chapter 1.5 Special Problems and Techniques 1.51 Correction of a Univariate Distribution for Observational Errors 1.52 Correction of a Bivariate Distribution for Observational Errors 1.53 Elimination of Observational Errors when the Variable is Calculated from Observed Quantities 1.54 Reduction of Grouped Frequencies to Central Frequencies . 1.55 Truncated, Limited, and Conditioned Distributions . . . 1.56 Statistical Determination of the Functional Relationship between Variables
154
Chapter 1.6 1.61 1.62 1.63 1.64 1.65
158 158 160 161 164 166
General Theory of Samples Relation of Sample and Parent Population Binomial Distribution Poisson Distribution Normal Distribution of Sample Counts Statistical Uncertainty .
Chapter 1.7 Estimation of Statistical Parameters 1.71 Representation of an Observed Distribution by a Mathematical Formula 1.72 Probability of a Specified Sample 1.73 Classification of Estimators and Methods of Estimation . . 1.74 Estimation by the Principle of Least Squares 1.75 Mean and Variance of the Probabilty Distribution of a Sample Parameter 1.76 Calculation of Moments from Grouped Data (Sheppard Corrections) 1.77 Numerical Example of Representing Sample Data by a Formula
121 121 131 135 148 149
170 170 171 174 181 186 191 193
Chapter 1.8 Testing Hypotheses 200 1.81 General Statement of the Problem 200 1.82 Distinction between Phenomenal Hypotheses and Statistical Hypotheses 200 1.83 Background of Testing Statistical Hypotheses 203 1.84 Hypothesis of Constant Radial Velocity 205 1.85 The Likelihood Ratio Principle and Student's i-Test . . . 211 1.86 x 2 —Tests 218 Notes and Bibliography to Part I 226
Part II STATISTICAL DESCRIPTION OF THE GALACTIC SYSTEM: GENERAL ANALYSIS Introduction Chapter 2.1 The Problem of Description 2.11 General Features of the Galactic System 2.12 Descriptive Variables 2.13 Formulation of the Galactic Distribution Law Chapter 2.2 2.21 2.22 2.23
Chapter 2.3 2.31 2.32 2.33 2.34
231 232 232 232 234
Observational Data Sources of Information and their Limitations Observable Quantities Functional Relationships between Observational Data and Descriptive Variables Partial Solutions of the Galactic Distribution Difficulties of a General Solution The Stars in the Vicinity of the Sun Space Distribution of the Stars Galactic Rotation
Problem
237 237 238 243 .
. 246 246 249 252 255
Part III STELLAR MOTIONS IN THE VICINITY OF THE SUN Introduction Chapter 3.1 Sample Population and Coordinate Systems 3.11 Sample Population 3.12 Velocity Components and Observational Data 3.13 Local Standard of Rest and Solar Motion Chapter 3.2 3.21 3.22 3.23 3.24
261 262 262 262 266
Principal Hypotheses Concerning Local Velocity Distribution Single Drift Hypothesis Two Star Stream Hypothesis Ellipsoidal Hypothesis Comparison of the Two Star Stream and Ellipsoidal Hypotheses
Chapter 3.3 Studies Based on Space Velocities 3.31 Empirical Representation of the Velocity Distribution . . 3.32 Determination of the Velocity Ellipsoid from Space Velocity Components 3.33 Effects of Selection and of Observational Errors . . . .
268 268 272 274 277
280 . 280 283 286
Chapter 3.4 3.41 3.42 3.43
Analysis of Radial Velocity Data Solar Motion Determined from Radial Velocities . . . . Radial Velocity Distribution in a Small Area of the Sky . . Determination of the Velocity Ellipsoid from Radial Velocities
289 289 292 298
Chapter 3.5
Analysis
302
of Proper
Motion Data
3.51 Distribution of Tangential Velocities in a Small Area of the Sky 302 3.52 Proper Motion Distribution in a Small Area of the Sky . . . 308 3.53 Determination of the Solar Apex from Proper Motions . . . 316 3.54 Determination of the Velocity Ellipsoid from Proper Motions . 321 Chapter 3.6
3.61 3.62 3.63 3.64
Chapter 3.7
Relation of the Velocity Absolute Magnitude
Distribution
to Spectral
Type and 327
Subdivision of Stars into S, M Groups Variation of the Local Star Motion with S and M Variation of the Velocity Ellipsoid with S and M . . . Asymmetry of Large Peculiar Velocities for Stars in the Solar Vicinity Statistical Determination of Stellar Distances Motions and Radial Velocity Data
327 329 . 331 333
from Proper 336
3.71 Determination of the Parallax Distribution within the Local Sphere 3.72 Determination of Mean Parallaxes within the Local Sphere 3.73 Determination of Mean Parallaxes for Distant Stars . . Notes and Bibliography to Part III Part IV LUMINOSITY
- SPECTRAL
TYPE
337 . 344 . 349 353
DISTRIBUTION
Introduction Chapter 4 . 1
4.11 4.12 4.13 4.14 Chapter 4 . 2
4.21 4.22 4.23 4.24 4.25 4.26 4.27 Chapter 4 . 3
4.31 4.32 4.33 4.34
357 General Studies
of Stellar Populations
358
The Hertzsprung — Russell Diagram Population Diagrams of Galactic Star Clusters . Population Diagrams of Globular Star Clusters . Population Characteristics and Galactic Structure . Luminosity
Law in the Vicinity
. . .
. . .
. . .
of the Sun
367
Choice of the Volume Element Analysis of the Problem Method of Trigonometric Parallaxes: General Outline . . Method of Trigonometric Parallaxes: Practical Procedure . Method of Mean Parallaxes Method of Mean Absolute Magnitudes Numerical Results Luminosity — Spectral of the Sun
Type Distribution
358 360 363 364
in the
367 .368 . 371 . 373 381 385 388
Vicinity
The Hess Diagram and its Parameters Methods of Determining the M, S Distribution Apparent Luminosity — Spectral Type Distribution . Determination of the Apparent M, S Distribution from Spectroscopic Absolute Magnitudes or Trigonometric Parallaxes
.
392 392 394 . 396
397
4.35 Determination of the Apparent M, S Distribution from the Tau Components of Proper Motions 4.36 Practical Application of the Tau Component Method . . 4.37 Other Methods Based on Proper Motions 4.38 Observational Results of the M, S Distribution for Stars Brighter than Apparent Magnitude Six Notes and Bibliography to Part IV
398 . 400 404 409 411
Part V SPACE DISTRIBUTION OF STARS Introduction Chapter 5.1 Apparent Distribution of the Stars 5.11 Star Counts and Sources of Photometric Data 5.12 Variation of Areal Star Density with I and b 5.13 Determination of the Galactic Plane, of the Direction to the Galactic Center, and of the Sun's Distance from the Galactic Plane Chapter 5.2 5.21 5.22 5.23 5.24
Determination of the Space Density Law when Interstellar Light Extinction is Neglected Integral Equation of the Space Density Law Distance Modulus Special Solutions Space Density Determinations with Interpolation Formulae
Extinction of Light in Interstellar Space Observational Evidence of Interstellar Matter Processes of Extinction Wavelength Dependence of Interstellar Extinction . . . . Effect of Interstellar Extinction on Apparent Magnitude and Color Index 5.35 The Extinction Ratio 5.36 Space Distribution of Interstellar Matter
5.41 5.42 5.43 5.44 5.45 5.46 5.47
420
426 426 429 431 . 434
Chapter 5.3 5.31 5.32 5.33 5.34
Chapter 5.4
413 415 415 417
Analysis of Star Counts According to Apparent Magnitude when Interstellar Extinction is Present Distribution of Distance Moduli Numerical Determination of £ly(y) Resolution and Indeterminateness of the Solution of the Integral Equation Determination of the Space Density Law when the Extinction Law is Known Determination of the Extinction Law when the Space Density Law is Known Studies of Localized Clouds of Interstellar Material . . . Star Couits According to Visual and Photographic Magnitudes in the Same Area
440 440 441 443 446 449 451 454 454 459 471 474 481 486 493
5.48 Determination of the Space Density Law and Extinction Law in the Direction of the Galactic P o l e s 5.49 Combined Solutions Including the Determination of the
496
Luminosity Law Chapter 5.5
501
Studies of the Space Distribution of Stars Based on Spectral Types, and Color Indices
Magnitudes, 511
5.51 Magnitudes and Spectral T y p e s with Luminosity Classifications 5.52 Magnitudes and Spectral T y p e s without Luminosity
511
Classifications
517
5.53 Apparent Magnitudes, Spectral T y p e s , and Color Indices Cbserved
519
5.54 Star Counts According to Apparent Magnitude and Color: Extinction Law Known
524
5.55 Star Counts According to Apparent Magnitude and Color: Extinction Law Unknown 5.56 Summary of Problems and Methods in the Statistical Study of Galactic Structure Notes and Bibliography to Part V
536 539
Part VI GALACTIC ROTATION Introduction Chapter 6.1 Basis of Galactic Rotation
543 544
529
Theory
6.11 The Simplified Model of the " T y p i c a l Star System" . . 6.12 Sources of Observational Data for Galactic Rotation Studies
.544 . 545
6.13 Solar Motion Relative to the Galactic Standard of Rest 6.14 Dynamical Conditions and Considerations of Equilibrium
. 546 . 548
. .
6.15 The Oort - Lindblad Theory Chapter 6.2
Differential
Effects
of Galactic
551 553
Rotation
6.21 Differential E f f e c t on Radial Velocities and Proper Motions for Objects Situated in the Galactic Flane 6.22 Series Development 6.23 Observable E f f e c t s Predicted by the Oort - Lindblad T h e o r y . 6.24 Differential E f f e c t s for Objects not Situated in the Galactic Plane
562
6.25 Comparison of Theory and Observations 6.26 Determination of Precession Constants 6.27 Dynamical Constants of the Galaxy 6.28 Combined E f f e c t of Radial Motion and Rotation Chapter 6.3 Peculiar Motions of Stars in the Solar Neighborhood 6.31 Peculiar Motions and Galactic Orbits 6.32 The Bottlinger Diagram
553 556 559
566 572 576 .
.
. .
6.33 Peculiar Motions Interpreted in Terms of Orbital Elements
. .
581 . 586 586 588 . 592
6.34 Galactic Orbits for Empirical Force Law 6 . 3 5 Stars of Large Peculiar Velocity 6.36 Discrepancies between Theory and Observations 6.37 Motions Perpendicular to the Galctic Plane Notes and Bibliography to Part VI Appendix:
.
.
.
.
594 601 607 609 616
Tables A1 - A 10
619
Index of Names
633
Index of Subjects
635
LIST
log In k
OF
SYMBOLS
logarithm to base 10 natural logarithm (base e) = 4.738, velocity in km/sec corresponding to 1 astronomical unit/year = 3.046 10" 4 , area of 1 square degree measured in radians
(o j8
= — - — = 0.4605; 5 log e
= 10°'2
General statistics F(x), F(x,y) '" (Latin capitals) absolute frequency function (number of individuals per unit intervals of x, y, " "for population studied)
®(x), 0>(x,y) ''' (Greek capitals) relative frequency function (fraction of population per unit intervals of x, y, •••), or probability density at
x; x, y; ' • •
u - u(x,y "•) ((I.e. Latin letters) mathematical relation between variables
(t) Q(x | y)
1 v/2^
e
-X t2 dy
frequency function of * for fixed value of y (y ± —)
ay' P of 0>(x,y) may be based are discussed in Chapter 1.7. In general the estimated values of these parameters are so chosen that the moments (i^ of the first and second order of the mathematical frequency function y) become equal to the corresponding moments m.. of the observed distribution. The estimated values of the parameters (marked by asterisks) are then according to (1.92) and (1.96) calculated by the equations x*
= m'J0
= *
=
Elements
of Statistical
« 1.26
Theory Yo = moi
49
= y = Jj Z / i
i
I ^
X
v = const.
-*•
u
Figure 1.13 Transformation of bivariate frequency function.
defined by the equations u(x, y) = const, and v(x, y) = const. An arbitrary infinitesimal cell u ± du/% v i dv/2 which appears as a rectangle in the u, v plane, becomes in the x, y plane an infinitesimal quadrilateral bounded by the curves du u(x,y)
= u± —
v(x,y)
=
v±
Let us imagine our population plotted as a point diagram in both the x, y and the u, v coordinate systems. The expression ^(u,v) du dv , which we are trying to find, specifies the fraction of the population whose plotted points fall within the infinitesimal rectangle Q R S T . This population fraction must be the same as that whose plots in the x, y plane fall within the corresponding parallelogram Q R S T. The given frequency function $(x, y) defines the areal density of the plotted points (each counted with weight \/N) at any position in the y system. The particular point P, which corresponds to an arbitrary point P' with coordinates u, v, has the coordinates x(u, v), y(u, v). If we designate by dA the area of the limiting parallelogram Q R S T, the fraction of the population which this area
Elements
of Statistical
contains is
$1.31
Theory
71
v), y(u, vf\ dA. and we have the fundamental relation Vfu.vjdudv
= $ bc(u,v), yfu,vj]
(1.154)
dA .
The area dA must, of course, be expressed in terms of the differentials du and dv, and is in general also a function of u, v. F or many common transformations the area element dA is easily found by direct geometrical considerations as we shall show in the examples. In order to derive a general expression for ¿Awe consider the limiting infinitesimal triangle P P j P 2 (Figure 1.13). By a well known theorem of analytical geometry the area of a triangle with one apex at the origin is given in terms of the rectangular coordinates Xj, y1 and x2, y2 of the other two apices by
2 i*l/2 In the triangle PP1P2 origin are
-
x2Yi
;
/2
the coordinates of P1 and P2 taken with respect to P as dxfu, v)
P>
Yl
) = 2
^
= ~du
dx2
=
dx(u,v)
du
dyj
2
n V~
dv
dy 2
Y
~!h
dy(u,v) =
~
dy(u,v) =
du
du
~v
dv
T
The differential area of the parallelogram is 8 times that of the triangle and is thus dA *
dx(u,v)
dy(u,v)
du dx (u, v)
du
du
dyf u, v)
dv
dv.
(1.155)
dv
The determinant which occurs in this expression is termed the Jacobian or functional determinant of x and y with respect to u and v. We shall denote it by the customary symbol
dx( u, v) (x,y) (u,v)
du dx(u, v) dv
dy( u, v) du dy(u,v) dv
(1.156)
72
11.31
Elements of Statistical
Theory
and we shall normally assume that J(x, y)/(u, v) is expressed as a function of u, v. By means of (1.155) and (1.156) the transformation equation (1.154) may be written in the form: Jx>y) ( u,v)
*P (u, v) = $>[x(u,v), y(u, vj]
(1.157)
Since frequency functions are never negative, we must use the absolute value of the Jacobian. To complete this transformation, we should, in general, also ascertain the limits of u and v corresponding to those of x and y. We note that the transformation of a bivariate frequency function is made exactly like a change of integration variables in a double integral. By the methods of determinants it can be shown that / M (u,v)
=
y j
M (x,y)
Sometimes it is easier to calculate J(u, v)/(x,y) the equation V(u,v)
and to use for the transformation
= Wu,vi yfu,v)]
/
(u,yj (x,y)
(1.158)
where the division of Qfx, y) by \J(u,v)/(x,y)\ is made before the variables x,y are expressed in terms of u, v. When only one of the two variables of a bivariate frequency function is to be transformed, we have a particularly simple case. From a given frequency function y) we then have to find the frequency function Wu, y), where u is defined by the equation u =
u(x,y).
Solved for x, this equation has the form x = x(ii,y) ,
and the Jacobian becomes (x,y) (u,y)
The transformation equation i s thus
dx(u,y) du
Elements of Statistical
Theory
«1.31
V(u,y)
= Wx,y), y]
dx(u,y) du
73 (1.159)
Examples: 1. Transform the frequency function Q(x,y) to Wu, v) when u and v are both linear functions of x and y. In this case x and y are also linear functions of u and v: x
= Yio
+
Vuu
y = y2o + ?2iu
+
Yl2v
+ r22v
.
The Jacobian , (*>y) (u,v)
Vll
Vl2
~ Y11Y22
Y12Y21
Y21 Y22 is a constant. The function ^Hfu, v) is given by the equation V(u,v)
= I (yny22
- y12y21)
I ®(Yio
+
Yiiu
+
ri2v> Y20 + y21u +
y22v)-
At corresponding points of the u, v and x, y coordinate systems the values of the frequency functions W«, v) and ®(x, y) differ only by a constant factor represented by the Jacobian. The most frequent application of a linear transformation is found in the rotation of the coordinate axes through an angle a. For such a transformation by rotation Y1} ~ Y22 ~ cos a> an •
(x,y) (r,)
cos 4> sin
~ r sin 0 t cos 4>
74
Elements
« 1.31
Theory
0 ^ ^ 277
= ry) (u,v)
dv
The integration limits are now functions of u. It is easy to s e e that in equation (1.163) the integration limits are a and b, while in (1.164) the limits are xfu, a) and x(u, b), where x = x(u, y) is the solution for x of the functional relation u = u(x, y). The problem that we arc discussing in this se.ction may be stated in a more general and more symmetrical form: If we have three variables x, y, z that are subject to a condition f(x, y, z) = 0, the frequency function of the three variables y, z) is also subject to certain conditions. We wish to formulate these conditions.
82
Elements of Statistical Theory
i 1.34
The equation f(x,y,z) = 0 defines a surface in x, y, z space; the condition between the variables means that the distribution of the points x, y, z is confined to this surface. For the variables x, y, z there e x i s t s a perfect correlation, and the equation f(x,y,z) = 0 represents the three identical regression surfaces. In the c a s e of a given condition f(x,y, z) = 0 the trivariate frequency function ®(x,y,z) evidently i s completely specified by one of the bivariate marginal distributions. From any of the bivariate marginal distributions, say xy(x, y), we may thus find the univariate marginal distribution (z)oi the third variable: + oo
=
J
dy
%yUy,z),y\
— oo
(1.165)
+ oo
= J
dx .
\\x,y(x,z}\
These are the two equations (1.163) and (1.164) in which u has been replaced by z, while x = xfy,zj and y = y(x,z) are the solutions of the equation f(x, y, z) = 0 for x and for y. Two similar pairs of equations expressing conditions to which (x, y, z) is s u b j e c t are obtained by cyclical change of variables and subscripts. There are also simple relationships between pairs of bivariate marginal distributions of yz (y,z) and ® x z ( x ,
§ 1.34 The case of a function of two independent variables A bivariate frequency function y(y). In this c a s e equation (1.162) which we developed in § 1.33, takes on the form + oo
Vfu) = J
%ix(u,vj\ %[y(u,v)]
rfayl (u,v)
dv
(1.167)
Since the Jacobian (which is determined from the given function u = u(x, y) and
Elements of Statistical
51.34
Theory
83
the arbitrarily assumed function v = v(x, yf) is known, equation (1.167) represents a relation between the frequency functions of three variables, of which two, x and y, are independent. Such a relation can be used to determine any one of the three functions provided the other two are known. The most frequently encountered astronomical applications are those in which *Pf'u), the distribution of the variable u (which is a function of the independent variables x and y) is established by observation, and $ fy) is known either from theory or from observations. Our problem is then to determine the unknown function ® x f x ) through equation (1.167). In the solution of this problem it is generally advantageous to use the form (1.164), namely, + oo
Vfu)
=
J
\(x)
dy(x,u)
Qy[yfx,ujl
du
dx
(1.168)
For the determination of ^ fx) the relation (1.168) is to be considered as an integral equation, that is, an equation in which the unknown function appears under the sign of integration. If for physical reasons the variable x, whose frequency function is to be determined can take only values within a limited range, the integration need be extended only over the actual range. For simplicity we shall always write the integration limits as — «a and +°o with the understanding that $ f x ) = 0 in the range where * does not occur. When the other variable y has a limited range a-^y^b the integration limits have to be determined as shown in § 1.33, and equation (1.168) becomes: x(b,u)
V*)
=
f
%fx)
% [yfx,uj\
dy(x,u) du
dx
(1.169)
x(a,u)
The integral equation now has integration limits which are functions of u. In most practical applications the functional relationship between the three variables u, x, and y can be set up in the form i3fu)
= f/x)
+
f2fy).
(1.170)
For convenience, we may then introduce three auxiliary variables:
e =
f/x),
v = f2(y) > C =
f3M.
(1.171)
84
i 1.34
Elements
of Statistical
Theory
When transformed to these variables, the frequency functions ® x (x), ®y(yX W(u) become, according to (1.11)
and
MO it dyirj)
(1.172)
dr] du(0 ¿t Here *
-
xiO
y = /(f) u =
(1.173)
u(£)
are the solutions of the three equations (1.171) for x, y, u. It is evident that £ and 77 are independent whenever x and y are independent. In terms of the auxiliary variables, the dependent variable is simply the sum of the two independent variables C = £ +
(1.174)
v,
and the relationship between the three auxiliary frequency functions becomes + 00
=
/
dt.
(1.175)
We shall speak of this as the standard form of the integral equation. It is a generally advantageous form to employ when numerical methods are used in the solution of the equation. While the roles of the two independent variables x and y (or £ and 77) are intei^ changeable, the frequency function on the left side of equations (1.168) or (1.175) must always be that of the dependent variable. If each side of equation (1-168) or (1.175) is multiplied by N, the number of individuals in the sample or finite population under study, the integral equation refers to absolute frequency functions. It should be noted, however, that the factor N transforms only one of the two functions appearing under the integral sign into an absolute frequency function. One of the functions under the integral sign must always be a relative frequency function.
Elements of Statistical Theory
« 1.34
85
The relationship between the frequency functions of three variables, two of which are independent, may be generalized to the case where no two of the three variables are stochastically independent, but where we consider the array distribution of one of the variables to be known. Even if x and y are not independent, we can write their bivariate distribution in the form
(x,y) = Jx)®y\x(y\x) , where
U
U
2'
3> ' ' '»
J
then the frequency functibn of the u s is given by the expression (1.182) Uj, u2,
••
un) = ^[xjfuj
,u2,
••
un), x2(u1 ,u2,
••
un),
•' \
x
x
u
n( l>
l> x2> ' ' '> xn) (u1.u2, • • •,un)
r(
u
2> ' '
The Jacobian (X1 > x2 • x3 ' fUj, u2, u3, '
•
* > Xn ^ un)
dxj
dx
duj
2 duj
dx,ä . . duj
dxj
dx
dx,i . . du2
du2
2 du2
. du^
.
. ^L du2 (1.183)
dxj
dxg
dun
dun
dun
..
•
dxn dun
Elements
of Statistical
i 1.35
Theory
89
must always be taken with its absolute value. The most important application of multivariate transformations in astronomy is the transformation of three dimensional space coordinates or vector components from one set of coordinates to another. In this case it i s often convenient to use a formula analogous to (1.154) (u,v,w)
du dv dw = 0>[x(u,v,w,),
where dV is the volume element in surfaces corresponding to u i du,/2, this volume element in terms of u, v, from geometrical considerations. Examples: 1. Evaluate the Jacobian occuring coordinates by rotation. The old coordinates xJt x2, x3, are through the equations ** = y a u i
+
y(u,v,w),
z(u,v,wj\
dV
(1.184)
the y, z space that is bounded by the v ± dv/2, w ± dw/2. The expression for w, du, dv, dw may be established directly in a transformation of orthogonal space related to the new coordinates uly
y i 2 u 2 + yi3
u2, u3
u3
j (xi» (uv
x2
> u2, u3)
j2 (X1' X2> X3) _ yn (ultu2. u3)
y21
Yn
y^
y3i
Vl2
y 22
y32
yis
y23
y33
y3i
yu
y2i
Y3I
11
where y.. is the direction cosine between the x. axis and the u. axis.
1
0
0
Yl2
Y22 y32
Yl2
^22
y32
0
1
0
YI3
Y2 3 y33
Yl3
^23
y33
0
0
1
since the orthogonality of the coordinate system requires that
and Z
ya
yik = 2
yu yu
=
= i.
90
§ 1.35
Elements
of Statistical
Theory
Our result j ( x j , x2, x3) ^ ^ f u j , u2, u3) is easily verified by geometrical considerations» The volume element dV in the x1, x2, %3 space bounded by the planes corresponding to u^ ± du/2, u2 ± du2/% u3 ± du3/2 is of rectangular shape with sides du1,du2, du3. We have therefore j (21—21—Ju^ ( UJ ,
U
2
,
U
3
du^ du3
= dV = duj du2 du3 ,
)
which shows that the Jacobian equals unity. 2. Transform the frequency function ®(x,y,z), where x, y, z form an orthogonal system of space coordinates, to a frequency function V(r,l,
b),
where r, I, b refer to a system of spherical polar coordinates so oriented that the positive z axis of the x, y, z system points to the pole of the spherical system, and the positive x axis toward the point 1 = 0, 6 = 0. Under these conditions x =
T
cos I cos b
y = r sin I cos b
(1.185)
z = r sin b, and (x^, %2, x 3 ) (r, I, b)
cos I cos b — r sin I cos b -
r cos I sin b
sin I cos b
sin b
r cos I cos b
0
— r sin I sin b
= r cos b .
r cos b
We might also proceed from geometrical considerations (see Figure 1.17) and use equation (1.184). The surfaces in x, y, z space corresponding to r + dr/2 are concentric spheres, those corresponding to I ± dl/2 are planes, those corresponding to b + db/2 are coaxial cones. All these surfaces intersect each other
Elements
of Statistical
51.35
Theory
91
z
at right angles; the sides of the rectangular volume element are dr, r cos b dl, t db, and the volume is dV = r2 cos b dr dl db . By (1.184) the transformation equation is thus Wfrfl.b) = Ofr cos I cos b, r sin I cos b, r sin b) r2 cos b .
(1.186)
3. A trivariate normal frequency function with equal dispersion in all directions (Maxwellian distribution) has the equation
1 *(x'y'z)
" SÔrfTS
~ ^
( x 2
+ y2 +
z2)
e
In terms of spherical polar coordinates as defined by equations (1.185) such a Maxwellian distribution takes the form
92
5 1.36
Elements
of Statistical
Theory
--(-Y w
-
r / w
e
r2 cos b
2
W
as we can see from equation (1.186). § 1.36 Distribution
of a set of functions
of n
variables
The problems discussed in 5 1.33 and § 1.34 can easily be generalized for the case of multivariate distributions. Given the frequency function x2, x j of the n variables xlt
x2,
' '
x , we wish to find the frequency function u2,
' '
of i known functions of the variables x1, x2,
U1
= V
U2
"
u.
= uifx1.xa.
u.) • • • , xn
w
'"'Xn>
U2(X1>X2>
'">XJ
••
-,xn)
where i ^.n. The first step is to transform the distribution $ to a new set of n variables that consist of the i variables ult u2, ' ' ', u. and n — i arbitrary variables which we define as functions of the x's. The second step consists in integrating the transformed function over the n—i arbitrary variables to obtain the desired distribution of the i variables u. In general, the transformation may be effected most readily by choosing any i of the original x's, for instance xJt x2, ' ' ', x., and determining their values in terms of the u's and the remaining n—i variables x; x1
= X1(u1, U2, ' '
Uit xin,
x2
- x2(ult
li£,
U2,
X.+2,
'
xn)
, X£+2 . • • • , Xn)
Elements of Statistical
x.
m Xi(ult
u2,
Leaving the variables x. function
*2 Xi(Ul>
' '
' ' •>
*£ +1' xi +2 » " ' "» xn ^
fXj , (ult
» ' ' 'i u2,
• •
xi)
U')
Since the variables x. +1, are independent of the u ' s , they maybe omitted from the Jacobian. By integration over the variables x.+1, ' ' •, * we obtain the desired frequency function of the i variables u;
••
+ 00+00
+ oo
— ©o—oo
— oo
t^.x.^.
' - -,xn),
x.+1,
(1.187)
• -.x
n
]
J (Xj, X2, ' ' (Uj, u2, • • dxi+1
dxi+2
X.) ut) • "dxn
.
If among the n variables x there is a set of i variables xlt x2, ' ' ', x. which are stochastically independent of the remaining n — i variables, we can write 2 2\a,/ ^
=
^ S
'
1 ^
'
( f 2 >
"
2 \o2J
a2V25F -
and that the mean errors M) The variables
•
i] are functions of these four variables defined by
£= * + h V = / +
•
If we transform the distribution of x, y, , f2 to the variables 7), £j, e2 and integrate over e 1 , t 2 , we obtain for the bivariate distribution F Q ( r j ) the expression + OO+OO F0 u.r,)
= J —
J Ft ( £ - f j , i , - f 2 ) ^ ( f j ) 2 ( f 2 ) d(1 dt2. oo
(1.188)
Of course, we could also transform from the variables x9 y% f j . ^ to xf y, £ 77 and integrate over x and y which leads to a second form of the equation: + 00+00 Fo^'ti
= /
/
4>1
2(v-y) dx dy .
(1.189)
—00— 00 When the mean errors o 1 and o 2 , which define
al
~ al
+ 00 _ I I f 2 f/ry; = ^ J « — oo
E(t)
=
a
y
•
(1.221)
we find
a,
whe ¿W
= o;t'-2it(y0-y)=
+
Therefore: m ^
.
1
2
• r ^ r
e
\ (t) =
1
«
t2 2
.
*o - Yo +
zo
> al
" °y
+
When two variables y, z are stochastically independent and normally distributed, their sum * = y + z also has a normal distribution. § 1.45 Solution by interpolation functions or interpolation series Integral equations either of Volterra's first kind or Fredholm's first kind are
Elements
of Statistical
$ 1.45
Theory
109
often solved in practice by introducing a general interpolation formula with a number of unknown parameters as representing the unknown function V(y). If the kernel is given in analytic form or represented by another interpolation formula, the integral equation + 00 =
F(x)
J U(y) K(x,y) — 00
dy
will furnish an interpolation formula for F(x), provided the integration can be performed. The interpolation formula for F(x) thus obtained will contain the unknown parameters of the formula adopted for U(y), and these parameters may then be so determined that the formula for F(x) fits the given data of F(x) as closely as possible. If a sufficiently close fit is not possible with any values of the parameters, the formula adopted for U(y) is not suitable and must be replaced by another. If a satisfactory fit of F(x) is obtained, the numerical values of the parameters when substituted in the adopted formula of U(y) give an approximate representation of the unknown function U(y). To illustrate the procedure let us consider the following example. We assume the integral equation to be of the form + 00
=
F(x)
J U(y) K(x—y) — OO
dy ,
(1.224)
where the kernel is a normal frequency function 1
A - V )
2
with the known parameters z0 and Yo2' ay2' expressions (1.225) for K(z) and (1.226) for U(y) in the integral equation (1.224), we find
w = — '
J«
¡\(ypn)Jizi^o)dy
—eo
\ ayl
Cy2az '
'
J —oo
\
\
a!
ay2
'
J
\
°z
'
§1.45
110
Elements
of Statistical
Theory
According to equation (1.223) the interpolation formula for F f x ) takes the form Ffx)
= JL °xl
4>
^ axl
'
+ — °x2
* \
(1.227)
a
x2
'
The question now arises whether it is possible with this formula to obtain a satisfactory representation of the given function Ffx). The latter may, for instance, consist of an observed distribution table. If we can find a set of numerical values of the parameters v, *01. a j , xo2' ax2 t ; ' l a t r e P r e s e n t the observed distribution table of F f x ) satisfactorily, the parameters v, yQ} , J02' a02 ^ o n n u ' a (1*226) can be derived from the known values of zQ, oz, V
'
X
01' °x I ' X02' °x2 YOI = "ill
re ations
^
~
Z
0
°yl
=
°2xl ~ (1.228)
Y02 =
-
Z
0
°y2
=
a
x2
~
The formula (1.226) with these parameter values is then taken as a representation of the function Ufy). Evidently we could proceed in a similar manner when the interpolation formula (1.226) consists of a sum of more than two norma] frequency functions. We draw attention to the fact that in the method of Fourier transforms of § 1.44 we started with analytic expressions or interpolation formulae for the two known functions Ffx) and K f z ) while in the method now under discussion we represent the unknown function Ufy) and the kernel K f z ) by interpolation formulae. In practical problems the latter procedure often has an advantage for the following reasons. (i) In statistical problems the function Ufy) is usually a frequency function which must always be positive. From physical considerations we may often infer the general nature of the unknown function Ufy), such as degree of continuity, frequency of maxima and minima, range of the variable where the frequency is appreciable, and so forth. Such general information can be taken into account by the choice of an appropriate interpolation formula for Ufy). (ii) When the function Ffx) is an empirically established frequency function, the frequencies F f x ) tabulated for successive intervals of * are subject to statistical irregularities which we may call "statistical errors". In such a case the solution Ufy) when substituted in the integral on the right hand side of equation (1.224) does not have to match the observed data of F f x ) exactly; it merely has to represent them within the limits of statistical errors. A formal solution Ufy) which reproduces the given data of F f x ) exactly with all their "statistical errors" may take negative values and show large fluctuations that have no physical significance. Instead of closed formulae like (1.225) and (1.226) we may use infinite series to represent the unknown function Ufy) and the kernel Kfz). We take as an example the Gram-Charlier series by which we may approximate any frequency function
Elements
of
Statistical
§ 1.45
Theory
111
that approaches 0 at ± oo, provided we include a sufficient number of terms. We s e t : U(
y
)
=
4>y(y)
+
A3y
(y)
+
A4y
™(y)
+
• • •
(1.229) K(z)
=
4>Jz)
+
A3z
4>™(z)
+
A4z
i»(z)
+
• • •
,
where i
^
, ,
( y _ ^ y o \
2 \ a
1
i
2
/
1
2
These s e r i e s substituted into the integral equation (1.224) give + oo F(x)
=
f
H
y
( y )
+
A3y
"(y)
+
• • - ^ J x - y )
+
A
2
\
^
)
( x - y )
— OO
+
A
4 z
4 >
i v
( x - y )
+
• • -Irfjr.
We integrate the individual terms and make use of the fact that =
/ „ a r e the roots of the equation P n ( y ) = 0. At these /»points the interpolation formula Gj(x,y) represents the function G(x,y) exactly: Gi
(x'yk)
=
G(x>yk)
•
In the integral (1.236) we now substitute the interpolation formula (1.237) for G(x, y): b
b
f K(x,y) U f y ) d y i
i
f « , f r j ¿ £íPn(Yk)
^ k ! y~yk
d y
(1.238)
i« ' ' ( d f f â r t The coefficients
Ak 4
= f aJ
w(
P " ) p' / . dY y} , ( y - y k ) p n (yk>
(1.239)
depend on the choice of the function w f y ) and on the values of the roots of the polynomial P n ( y ) used for the interpolation. By means of (1.238) we can express our integral equation (1.235) a s a sum b
F(x) =
f K(x,y) a
Ufy) dy - V k=l
A". W••
+
WK(xm,yn)U(yn).
This set of linear equations differs from that (1.232) which we used in $ 1.46 by the weight factors Wk and by the fact that the arguments yk in (1.242) are not equally spaced, but are particularly selected points. The order n of the polynomial used for the interpolation function (1.237) determines the number of unknowns U(yk) introduced; it can always be chosen so that the number (n) of unknowns is equal to or smaller than the number (m) of equations. The solution of equations (1.242) is, of course, obtained by a procedure quite similar to that discussed for the solution of equations (1.232). The advantage that we have gained over the method of i 1.46 is that by the Gaussian method we need to introduce only a relatively small number of unknown values U(yk) in order to obtain a relatively high accuracy in the evaluation of the integral. If the kernel K(x,y) is given in numerical form, the values of K for the particular arguments x. and yk must be derived by interpolation. The results of U(yk) obtained from the equations (1.242) apply to arguments yh of unequal spacing. A table of the function U(y) with equal intervals may be constructed by interpolation. Graphical interpolation by plotting the values of U(yk) against yk and joining the plots by a smooth curve will generally be satisfactory. If
118
5 1.47
Elements
of Statistical
Theory
desirable, a Lagrangian interpolation formula U(y)
. f A W
M
pn(y
(y-yk)
h
(y
k)
can also be used for tabulating the function U(y). We have now to discuss the choice of the arguments yk and the calculation of the weight factors Wk. The function wfy) and the values of yk are specified for each order n in such a way that the polynomials Pn (y) are orthogonal with respect to the function tufy) in the integration interval a, b. Thus b
f
My) PJy)
PJy)
dy
=
a
0
«+m
= gfn)
n =m
The functions w(y), and the type of polynomials which fulfill the^e conditions depend on the integration limits, hence for the various cases of integration limits the values of the roots yk and the weight factors Wk can be tabulated for polynomials of different orders. Methods for evaluating the integrals (1.239) which furnish the Ak are developed in the theory of orthogonal functions. Once the values Ak have been evaluated the weight factors can be found by (1.241). (1) Limits a = —oo
b =+(M) of all stars in the stellar system, or within a specified volume. A conditioned distribution of one variable x, where
(x\u),
u =
u(x,y),
is evidently related to the bivariate distribution f f x , y ) which includes as variables x as well as the additional variable (or variables) contained in the condition. An empirical determination of a conditioned distribution (x\u) may thus be useful, in combination with other statistical data, for the study of the bivariate distribution *¥(x,y). Since the complete, unconditioned distribution of x is the marginal distribution fx) of the bivariate distribution ^/fx,y), we may through the intermediary of W x , y ) establish a relation between the conditioned and the non-conditioned distributions. This relation will serve to reduce the conditioned distribution to a complete one when the necessary additional information is obtainable. From the bivariate distribution $(x,u) we find the number of objects in the selected population of the interval u ± Au/2 to a first approximation as N$>a(u)Au> where $ (u) is the marginal distribution of u, and N is the total number of individuals in the population. T h e absolute conditioned distribution F (x | u ± Au/2) is then related to (x) represents the relative frequency function of grain weights x for this infinite population, the probability that a single grain picked out at random has a weight x between the limits x ± Ax/2 is
f
p = I J
x
.A* Ax
QfxJ dx.
(1.328)
2
This probability corresponds, for instance, to the probability p = 1/13 that a card drawn at random from a deck of 52 cards is an ace. The probability that a wheat grain chosen at random does not have a weight within the specified limits is q m 1-p.
(1.329)
We consider now a sample of N grains drawn one after the other. The probability that the first n of these fulfill the weight specification, while the following N—n grains do not, is
Elements
of Statistical
161
? 1.63
Theory pnq
.
We do not care, however, in what order the n grains of proper we ight appear. The number of ways in which n objects can be ordered among N places is given by the binomial coefficient //V\
_
N!
\n )
~
n!(N-n)!
The probability II(n) that among N grains selected at random we should obtain, in any order, just n grains of the specified weight is therefore N! n(n)
- ^Tn^T'
n
p
N -
q
n
•
(1-330)
This is the so-called Bernoulli or binomial distribution. The sample count n may be any integer from 0 to /V. The parameters p and q = 1-p defined by (1.328) and (1.329) depend upon the chosen weight interval x + A*/2 and upon the frequency function of weights $ (x) in the parent population. Instead of the probability p we may introduce the expectation n0, the mean sample count for an infinite number of samples, which according to (1.327) and (1.328) is n0
= Np .
(1.331)
With the two parameters N and nQ the binomial distribution takes the form
On account of the factorials the formulae (1.330) or (1.332) are inconvenient for calculation when /V is large. For large samples it is, however, generally possible to approximate the binomial distribution by simpler formulae. 5 1.63 Poisson distribution Of great practical interest is the case in which the sample is large and the interval of the variable studied is sufficiently narrow so that it contains only a very small fraction p of the parent population. We shall consider the limit to which the binomial distribution converges as N increases toward infinity while, at the same time, p decreases in such a manner that the expectation ng = Np remains finite. We write formula (1.332) as
162
51.63
Elements
of Statistical
Theory
When N increases toward infinity while n0 and n remain finite, the square bracket on the right hand side converges to the value 1, and
N(N~\) 2 !N4
1 im I 1 - n0 Ar~oo \
m l
-
n
o
5a _ + 2!
...
Combining these limiting values we obtain the Poisson U (
n
) = ^ e ~ n!
n
° .
2 no
_ distribution (1.333)
which is applicable when the sample i s large, while the expectation ng of the sample count is very much smaller than /V. The Poisson distribution is often called the " L a w of Small Numbers." Examples: (1) We have an astronomical photograph on which 4590 stars are recorded. The field of the plate is subdivided into 900 small squares. If we assume the stars to be uniformly distributed over the field covered by the plate, what i s the probability of finding 10 stars in a particular square, and how many squares would we "expect " to contain 10 s t a r s ? The term "uniform distribution" refers to a continuous mathematical model. We may imagine this model a s consisting of points or stars (the parent population) distributed over the plate so that Q(x,y) = constant, where
x2>
=
a3(2n)
3/2
"¿2
~xo)2
+ (*2 ~xo)2
+
fej-V*1
6
This probability function indicates what kinds of samples are highly probable and will occur frequently among a large number of samples. We recognize our formula a s being that of a trivariate normal frequency function with equal dispersion in all directions, a spherical or Maxwellian distribution (see § 1.28). The center of the distribution with maximum probability density is at the point S , which has the coordinates x1 = x2 = x3 = x0. The surfaces of equal pro-
Elements
of Statistical
Theory
§ 1.72
173
bability density are spheres centered at S0, the dispersion sphere has the radius a. If the parameters (xQ and o in our example^ contained in the frequency function $>(x) are a priori fixed and known* the probability density Hfxj , x2, ' ' ' , xN) in the sample space is completely determined by (1.342). In general, however, only the form of ®(x) will be prescribed,. while the numerical values of the parameters are unknown. The probability density at any point S in the sample space will then depend on the values of certain unknown population parameters, Qj , 62 ' ' ' ' • * n(xlf
x2, • • • , xN | d j . e2. • • •
1 d j . e2,
fy
= (x) contains more than one parameter, the probability of drawing a sample like that observed must be represented as a multivariate function of the different parameters Oj , 6 2 , • • • , 6 l . The setting up and the discussion of the probability density function n f r j . . " " * . xN | 6j , d 2 , ' ' ^frequently offers great mathematical difficulties, and sometimes becomes quite intractable. 5 1.73 Classification of estimators and methods of estimation Classification of estimators: Let us assume that we know or postulate the form of the frequency function of some variable x in a specified parent population, but that we do not know the numerical values of the parameters Q1 , &2, • • •, of the function. By observation we obtain a sample of values x^ , x 2 , • • *, xN drawn from the parent population; we wish to estimate the numerical values of the 0 f s. For clarity of meaning we shall speak of the mathematical function f(Xj , x 2 , ' ' • , x N ) by which we estimate the numerical value of 6 as the estimator. The numerical value derived for 6 will be termed an estimate. For even the simplest case where we have a sample olN values x^ , x 2 , • • *, %N and wish to estimate the value of a single parameter 6 there are infinitely many ways in which we can choose the estimating function t *= estimate ol 6 = f(xlt
x2,
• • • , xN) .
The estimator t is a function of the random variables x^ , x2, • • • , xN which describe a sample; t itself is thus a random variable and possesses a probability distribution. Our problem is to choose from among the many possible estimators t a " g o o d " one, by which we mean, that, for the estimator we choose, the pro-
Elements
of Statistical
I 1.73
Theory
175
bability distribution of t should be as compact as possible around 0. We want our estimator to be (if possible) consistent, unbiased, efficient, and sufficient. (i) A consistent estimator t = f f x j , * 2 , • • • , %N) is one that converges in probability to 6. More precisely, if for a given fixed quantity t we can find some value of N such that, for all samples as large as or larger than N, the probability that t differs from 0 by more than e is as small as we please, t is said to be consistent. (ii) An unbiased estimator t is one for which S(t)
= Slf(x1.x2,
' • • , XN)\ - 6 .
for all N. (iii) The estimator t ~ f(x^ , x2, * * * t is a function of a set of random variables and is itself a random variable possessing a distribution. If for two consistent estimators t^ and t 2 we find that the variance of the distribution of t1 is greater than the variance of t2, then t2 is the more efficient of the two, and may be regarded as the better estimator. (iv) An estimator tI(x1, % 2 , ' ' ' , xN) is sufficient if it gives us all the information the sample can supply about B. In general, such sufficient estimators,, though highly desirable, cannot be found. Methods of estimation: There exist a number of theories or principles by which estimates can be made. (i) The maximum likelihood method of estimation can be traced to Gauss, was used by Helmert in 1876,. by Pearson in 1895, and has been extensively developed by R. A. Fisher since 1912. We assume that the distribution of some attribute x in an infinite parent population is of a specified mathematical form Qfxldj.
d2,
• • • . 6i),
but that the parameters 6 have unknown numerical values. A sample of N individuals drawn randomly from this parent population has yielded the values x1 , x 2 , ' ' • , xN of the variable. The estimates of the parameters ^ , • • •, are made so that the probability or "likelihood" of drawing the observed sample (probability density at the sample point) becomes a maximum. In the case of a single parameter we have illustrated (Figure 1.22) the probability density II at the sample point as a function of the parameter value 6\ the maximum likelihood method selects as an estimate the value of $ for which the curve II (x^, x 2 , ' ' ' , x^ | 6) has a maximum. According to (1.343) the probability density at the sample point (Fisher's "likelihood function") is ,x2,---
, xN | e1, e2,
• (x21 elt
e2,
• • • . et)
• • • , 9t)
=
| e1, e2.
• • • , e^
(xN | e1, e2,
•- • ,
176
§ 1.73
Elements
of Statistical
Theory
The experimental values , x 2 , ' ' ' , xN are fixed by our observations, while the parameter values , &2, • • • , are to be thought of as variables. The conditions for a maximum of II are then I ) : $ ( X i ) = a . j dx + a i 2 62 + • • • + a a 0, .
(1.346)
Here the a ^ ' s are coefficients having known numerical v a l u e s for each x . . We wish to obtain the " b e s t " unbiased estimates of the d ' s , where " b e s t " is used in the sense of l e a s t s q u a r e s . The Gauss — Markoff theorem (which we shall not prove) s t a t e s that the best unbiased linear l e a s t squares estimates & are those values of the 0's that minimize the sum of squares N
S = Z
fXi - a
n
e, -
e2 -
a a QJ2 wt .
••• -
(1.347)
The quantities w{ that appear in this expression are known positive constants termed weights that are given by an expression of the form 2
o 2 = w— .
(1.348)
i
where o 2 is the variance of x . . The value of o 2 may be unknown. To find the estimates of the d ' s minimizing the sum of squares S, we consider the 8 ' s a s independent variables and set up the conditions N
dS ^ J~a =
~ ~ i i j (*i ~ aUei ~ ai2 62 ~
~ ~aiA) =0»
W a
''
(1.349) / = 1,
2,
•• ' ,
I.
or, more s p e c i f i c a l l y , N
d1Z^a.1awi «»7
N
V
+ V 2 ^ a . 2 a . . wi + •
N
N
~
, ^ — "ii
- — ""i " i j (1.350)
i
= 1, 2, • • •
,1.
Elements of Statistical
«1.74
Theory
183
This set of I linear equations (generally known as the normal equations) can be solved for the least squares estimates 6., provided the determinant of the coefficients is not zero. While the above formulation of the least squares method is made on quite a general basis, we wish to call attention to the need for some caution in practical applications. There exist in astronomical literature many applications of least squares that are, in fact, outside the range of the theory. In the formulation of the Gauss — Markoff theorem we did not specify any form of the distribution of the x's. However, we did specify that the coefficients a., were known. In many practical applications these coefficients are not accurately known; they are not error-free t; we have only observed values for them. Such a case does not fall within the province of the Gauss — Markoff theorem. As an example, let us assume that we know two variables £ and TJ to be related by the linear equation (
= a + 07/.
(1.351)
Mathematically, there is what might be termed a strict structural relation between the variables f and rj. We have observed values of £ = ' ' ', as well as" the corresponding values of r] = i)1 , tj2 , ' ' ' , rj ; both sets of values are affected by observational errors. We wish to estimate the values of the parameters a and /3, Identifying the quantities in equation (1.351) with the symbols used in the Gauss — Markoff theorem we find x., a = 6 , ji = a.j = 1, o j 2 = Tj.. Clearly the Gauss — Markoff theorem does not apply in this case, for we do not know the true values of a.2 . If we neglect the errors in R}., and adopt the observed TJ. for the a{2, equation (1.346) becomes = a + j8ij r
(1.351a)
The parameters a and /3 determined from this equation are those of the regression line of £ on rj. We could with equal advantage write equation (1.351) in the form 1 "
a ~ J
1 . +
(1.351b)
where both and r/. are affected by observational errors. For equation (1.351b) we make the following identification with the symbols
t Actually the values of the c o e f f i c i e n t s are never known with absolute accuracy. We mean by " e r r o r - f r e e " that the errors of the c o e f f i c i e n t s are so small that they do not appreciably affect the result of the solution. In the example of the solar motion determination d i s c u s s e d above w e may consider the c o e f f i c i e n t s a j , c^, a^ a s error-free if they are known with an accuracy of + 0.01. An error of this amount will a f f e c t the expected radial velocity by l e s s than ± 0.2 km/sec which is small compared with the standard deviation of individual radial v e l o c i t i e s in any direction.
184
§ 1.74
Elements
of the Gauss — Markoff theorem: rj. = x., - a//3 = i? , 1//3 = If we neglect the errors in equation (1.346) becomes
of Statistical ,
Theory
= 1, o j 2 =
which is the equation of the regression line of r) on £ We have seen in § 1.24 that the two regression lines generally differ. The value of /3, for example, derived from (1.351c) will therefore differ from that derived from (1.351a). Considered as estimators of the parameter /S in the structural relation £ = a + fir/, the values of /3 derived from (1.351a) and (1.351c) are biased, for they refer only to the two regression lines derivable from our data, not to the structural relation itself, which lies somewhere between the regression lines. In this problem involving errors in two quantities the equation analogous to (1.346) should read = a + pS^.)
.
Unfortunately, there is no counterpart of the Gauss — Markoff theorem on least squares when there are errors (or when there is a distribution) in more than one of the quantities appearing in the equation of condition. Most of the solutions of such "two-error" problems so far worked out are concerned with the very simple problem of estimating the numerical values of a and /3 in equation (1.351). We note that parameter estimates in such cases depend in general upon the distribution of the £ s and rfs; in particular, the value of /3 can be estimated only if the ratio of the standard deviations o^/a^ is known. The more general and difficult problems of parameter estimation by least squares when more than one of the quantities involved is subject to error or has a statistical distribution, have not yet been satisfactorily solved. Generally, the fact that such "multi - error" least squares solutions do not fall under the Gauss — Markoff theorem has been ignored by most astronomical investigators. We must thus bear in mind that because of such improper (or at least very questionable) application of least squares, the results obtained in many important and fundamental astronomical investigations are biased and are in need of rediscussion by more adequate methods. In the discussion of astronomical problems in Parts III to VI of this text we shall point out various cases in which ordinary least squares solutions formerly employed are inadequate and lead to biased results. •)(- minimum method of estimation: Suppose that we have drawn a sample of N individuals from a parent population for which we assume the distribution of some variable x to be of the form « M * ! ^ . e2,
• • • . et).
(1.352)
The I parameters 6 are of unknown numerical values; we wish to estimate their values from the data of our sample. If the sample is relatively large, we may
Elements
of Statistical
§ 1.74
Theory
185
advantageously use not the individual values Xj, x 2 , ' • x^, but rather the grouped data, that is, the number n. of individuals for which x falls in the interval x. ± / W / 2 . If the intervals are not too wide, we may presume that the numbers n. give an adequate description of our sample and we may consider the re. as a set of random variables describing the sample of size N taken from the parent population. The expectations nQi of the variables n. depend upon the frequency function 0ltd2.
= noi(dj,
e2,
• • • , ip:
- "
. et) (1.353)
. + A X./2 = N I
I), is, as we shall see in chapter 1.8, often taken as a measure of the goodness of fit between a model and the sample. The smaller the numerical value 2 2 of x » better the fit. In the X~ minimum method of parameter estimation, we s 0 t lat a determine the value of the 0's in n 0 i ( d j , ' " "> ^ ^ minimum. We note that there are analogies between the minimum method of parameter estimation and the Gauss — Markoff least squares method; X is the counterpart of S in equation (1.347). Thus, the numerator in (1.354) is similar in nature to the term fx. - a.j 6j - a.2 d2 - • • • - aa 0jJ 2 in (1.347). If we recall that an approximate value of the variance CT.2 of an observed sample number re. is ng. (eq. (1.339)), we find a2 a2 w, = -=• - — ,
, (1.355)
Elements
i 1.75
186
of Statistical
Theory
hence the terra 1 /nn. in (1.354) is similar to the term w. in (1.347). 1 n . . 2 The values ft* that will minimize X can be found from the equations
j
i=i >-
oi
«
J
;
In practice this set of I equations is difficult to form and to solve because of the complications involved in expressing the values nQi in terms of the 6 ' s . If the intervals Ax. are chosen so that the n 0 i are all reasonably large, the differences n.i - n01 In this case the numerical n. are all much smaller than n n.. Oi value of the second term in the square bracket is small compared to that of the first term and can often be neglected. The modified X~ minimum method thus s e t s up the conditions x > r v ^ 2 i i ^oi noi. _1I dd.J £=J L
=
j = 1, 2, • • • , I .
(1.357)
This is the result we would obtain if in the differentiation of X* we treated the denominator of equation (1.354) as a constant, independent of ft. These modified X~ minimum conditions (1.357) can be shown to be equivalent to those of the maximum likelihood method when the latter is applied to grouped data. Direct calculation of the X~ minimum estimates of the d's is frequently difficult because of the non-linearity of the equations generally needed to express nQi as a function of the d's. In this case of non-linear equations the calculations must be made by a series of approximations. We start with a set of approximate values of the d's. The differences between the n.i and the non i. calculated
with the approximate d's are expressed in terms of differential corrections to the approximate d's. The equations (1.356) or (1.357) are then made linear in the differentia] corrections and solved by familiar techniques. It may be necessary to go through this process several times before satisfactory results are obtained. The X~ minimum method of estimation has the advantage that it can be used for truncated distributions when the sample data do not cover the whole range of the variable, a case in which other methods are not applicable. $ 1.75 Mean and variance of the probability distribution of a sample parameter In § 1.73 we introduced the function t = f(x1 , %2, ' • ', xN) by which the numerical value of a parameter 6 is estimated. To form some intuitive judgement of the probable precision of t we can examine the probability or sampling distribution of the estimator t, I I f t ) . The more compact the distribution of the estimator the more efficient the estimator is and the more reliance we place in the estimated value. The determination of the sampling distribution Il^iJ is thus a problem of some interest. A sample is characterized by the attribute values x^, x„% ' ' ' , xN of its /V members. We have remarked in § 1.72 that a particular sample may be represented
Elements of Statistical Theory
187
11.75
as a point S in an /V—dimensional sample s p a c e , and that the distribution of an infinite number of such sample points is defined by an N -variate frequency function
n(x 2>
%2, - • -, xN\d1,e2,
•— , et)
which gives the probability density at each point in the sample s p a c e . If & 2 , • ' ' , 6^) is the distribution law of the attribute x in the parent population then
ni*,.
" • ,xN\e1,e2,
• - • et) =
\e1.e2.-"
, et) (1.358)
e1, q2.
.dt) • • •
I e1. e2,
•
• •
.
et).
The estimator t of one of the parameters 0 is a specified function
t = f^Xj » %2 , * * " » Xjy ) of the N descriptive variables x1 , x2, ' ' ' , %N of the sample. In § 1.36 we have discussed the procedure for finding the distribution of a quantity which is a specified function of several variables from the multivariate distribution of these variables. By this procedure we obtain n t ( t \dj. d 2 ,
f
f
"
'
f
®
- - - ,
dxj (t,x2, • • ' ,xN) dx2dx3 dt
dx.,
where x1 = x^ (t, %2 ' • ' xN) is the solution of the equation t = f(x1$ %2, ' ' ', x N ) for x1 . We note that the probability density function II (t) depends on the values of the population parameters , 02 , • • • , d { . The derivation of this function by equation (1.359) is often difficult and sometimes so complicated as to be untractable. In many c a s e s , however, it is a relatively simple matter to find some of the parameters of Il^i^ a s , for example, the expectation or the variance of t. The expectation of the estimator t, that is, the mean value of i , for an infinite number of samples of s i z e N, is generally obtained in terms of the population moments f/x) of (x\6lf $2 , • • • , Qt). T h e s e f/x) in turn are functions of the population parameters 6 , 0 2 , • • • , . The following examples will illustrate the procedure.
188
$ 1.75
Elements of Statistical
Theory
A. Expectation and variance of the sample mean x We consider first the sample mean
1. The average of an infinite number of sample means, that i s , the expectation of x is
N
N
fa) Since each of the attributes x. of the sample has the same theoretical distribution 0>fx), S(x.) is equal to the mean xQ of $>fx) for all i. The result is thus
8 fx) = xQ.
(1.360)
2. The variance o l of the probability distribution of x is the central moment H 2 (x) of the second order.
N
a = n fx) = 8fx -X ) X 2
0 2
>' i-1
- x0]2 .
We develop the square and take the expectation of each term. Since the variables l' x2' ' ' '' XN a r e s t o c h a s t i c a l l y independent we have, when i^k,
x
8[fx,
- x0)fxk
-x0)]
= \fo(xi - x0)][S'(xk -x0)]
= 0 ,
and
-¿[htr/*
-V 2 ]-
For each value of i, Sfx, — Xq)2 represents the central moment of second order x) of the distribution $>fx); therefore - ji where cr2 is the variance of
-
b
-
( L 3 6 1 )
i>fx).
B. Expectation and variance of the sample variance s2 The sample variance s 2 expressed in terms of the quantities
s2 = rrSfr..
N 20). § 1.76 Calculation of moments from grouped data (Sheppard Corrections) Formulae for estimating population parameters are often derived under the assumption that the values of Xj , x 2 , ' • • , xN of the variable are available for each individual, of the sample, and that these individual values are used in the calculation. In the case of large samples, however, it is more economical to condense the great mass of data first by counting the number of individuals in the sample for successive intervals of the variable as shown in Table 1.1 or Table 1.3. In fact, frequently the statistical investigator does not have direct access to the original individual data, and he therefore must base his studies on published tables of such grouped data. The formula for calculating a sample moment m^ from the original data is /V 1 V ' k m,k = 77 Z ^ X 1? . N ¿=i In the case of grouped data an approximation to this is obtained if we assume the ni individuals contained in an interval x. iw/2 to have the same value of x, namely that of the midpoint of the interval, x.: N
*m! = iSn.x.k * N i-i
1 1
.
(1.363)
To avoid confusion we shall designate "grouped moments" calculated by (1.363) by an asterisk at the upper left. When the group intervals are fairly wide, the difference between *mk and m^ may be appreciable. For a theoretical continuous frequency function $>(x) the true population moments are given by + dx>
while the grouped moments for the intervals x. + w/2 would be formed by x. +w/2 V
=
dx-
x. ~w/2 From these two expressions relations between the moments and fi^ can be derived. When the frequency function (x) satisfies the conditions: (i) 0 (ii) xk d2n(x)/dx2n is finite and continuous throughout the range (k = 1, 2, = 2, • • • )
192
§ 1.76
( i i i ) Q(x)
Elements
and dn^fx)/dxn
of Statistical
Theory
vanish at the limits of the v a r i a b l e , t h e s e r e l a t i o n s
cake the form A
'
V j *
'
= 1*1 '
w 2
=
>2 *
+ Y2
' h
' = H
*
, ^
'
w 2
+
^r^i 2
,
W
"
T
+
, ^
w +
4
80
where Cfk
= jt//[r/ fA - r)!V
for 0 ^ r ^
k,
and
C)
= 0
for r > jfc .
We may s o l v e t h e s e equations for the \i^ and substitute the sample moments for the population moments. We thus obtain the s o - c a l l e d " S h e p p a r d C o r r e c t i o n s " which have to be applied to grouped sample moments: 1 ml
* i ml
ml
= *mJ
m'3
= *m 3 ' — (w2 / 4) *m1'
ml
= *mJ
2
2
When we apply the r e l a t i o n s
—
-
w2/12
(w2/2)*m'
(1.364)
+ (7/240)w4
.
between the central moments and the
moments given in Appendix T a b l e A 1 we obtain the corresponding for grouped central moments
general
corrections
$ 1.77
Elements of Statistical Theory
m2
= *m2
193
— ui2/12 (1.365)
m3 = *m3 m4 = *m4 _ (w2/2)*m2 Caution
must be
conditions
exercised
imply that the
+ (7/240)
in the a p p l i c a t i o n
frequency
.
of t h e s e
function $>(x)
formulae. Our t h r e e
i s not only c o n t i n u o u s
over
the whole r a n g e in w h i c h $ ( x ) > 0 , but a l s o that it t a p e r s off gradually to z e r o at
the
ends
contact,
of t h i s r a n g e . If the f u n c t i o n ^ ( x ) h a s
the v a l u e s
a high
order of terminal
of the moments o b t a i n e d from grouped moments with
the
Sheppard C o r r e c t i o n s are quite good even if the interval w i s f a i r l y l a r g e . F o r frequency
functions
however,
the
analogous
that approach
Sheppard
z e r o abruptly a t the e n d s of their
corrections
should
not
be
used.
Correction
ranges, formulae
to ( 1 . 3 6 5 ) h a v e b e e n derived for s u c h abrupt-ending f u n c t i o n s ,
but
the d i s c u s s i o n of t h e s e more c o m p l i c a t e d formulae i s b e y o n d our s c o p e .
§ 1.77 Numerical example of representing sample data by a formula In t a b l e the
1.5,
radial
columns
velocities
1—3, v of
we have t a b u l a t e d the o b s e r v e d d i s t r i b u t i o n
stars
brighter than apparent magnitude
7.0
of
within
the a r e a a = 1 7 A 1 4 m to 18 A 4 0 " \ 8 = 2 0 ° to 4 0 ° t . A s an e x a m p l e of the proprocedures
described
in the previous s e c t i o n s , we s h a l l a t t e m p t to r e p r e s e n t
t h e s e d a t a by a normal f r e q u e n c y f u n c t i o n
$(v) = T h e maximum l i k e l i h o o d method ( a s w e l l a s P e a r s o n ' s method of m o m e n t s ) g i v e s for the e s t i m a t e s of the two p a r a m e t e r s v Q , o• the formulae N
v* = mjfv) = v = -Sw. "
,
£
1
N
CT2*= m2(v) = sv2 = -^(Vi-V)2
.
A s our d a t a a r e grouped we have to employ the formulae ( 1 . 3 6 3 ) for c a l c u l a t i n g the grouped moments
and *m2 ; from t h e s e we obtain the true moments by
m e a n s of the Sheppard c o r r e c t i o n s . T h e u s e of t h e Sheppard c o r r e c t i o n s i s in t Data from Lick Observatory Publications, Vol. 18.
M
X. i
R - L R - L < N V O
+
+ 3.23
+ 2.59
0.3
1.6
+
+ 1.96
+0.70
+ 1.33
12.1
18.5
+
5.4
+
-0.57
-1.20
£
1
19.3
13.6
6.5
2.1 -1.84
0.5 -2.47
-3.10
co
v O in O C coS O^hO E~ J ^^ O i - H i*hC O O v í S— co '*
VO CO (N
v o o c o v o . ! 1 1 1
g1
VO 1
^H
T-H
o
CS)
—i + 1—1
L
+•
CO +
+
\ O m r J i c O C S i - l O i — I C S C O ' J I 1 1 1 1 1 1 + + + +
o 00
n. i
v.i km/sec
Interval mid points
Number of stars
Auxiliary variable
C-
co
L H
I
O -
L V
I
O O
I
L L
O O
L ^
O C
1
L O
O C
1
L ^
O I
L -
1
O H
L
O
1
oc o ro - \ oO mo ^ i o< r oo c so i —o i o 1 1 1 1 I 1 1 1
L
1
O
L
O
+ o I-io +
L O , - H Ç V J
+
+
eso +
co +
Sums
c>
-3.73
Standard variable t
Normal frequency function
0.1
r H i n C S i n y î 0 es es —i I I I I I I
+ 0.06
c~ 0 when K = 0) is i= a, and (ii) PJJ, the probability of accepting H when H is false (of saying K = 0 when K > 0) is a minimum no matter what the orbit may be. The details of the search for such a test are outside the scope of this text, although we can make use of the test that has been devised. The test of H that minimizes P,. for some possible Keplerian orbits maximizes it for some others. We know nothing about the variability of the stars being tested, however, and hence nothing of their orbits if, indeed, such orbits exist. We would like a test which would be at least moderately powerful against all possible orbits simultaneously. Thus, we are forced to compromise. Since we are especially concerned with situations in which it is not at all obvious whether K = 0 or K > 0 and since we cannot get a uniformly most powerful test, let us look for a test which minimizes PJJ (and, therefore, maximizes /3) in the vicinity of K = 0. Even here difficulties arise. It turns out that the
206
*
1-84
Elements
of Statistical
Theory
best we can do is (i) fulfil) the condition Pj = a, (ii) under the restriction that |3 = a, maximize the value of /3 in the vicinity of K = 0. The compromise thus consists of allowing the value of /3 for some alternatives to be less than it might have been in order to require that, no matter what the orbit may be, the probability of declaring that K > 0 when actually K > 0 shall be at least as large as the probability of declaring K > 0 when actually K = 0. The criterion for rejection arrived at under these restrictions is: Reject the hypothesis that K — 0 whenever n
—2 Z f r O
j-l
'
- x)2
^
X2
a,n — l
(1.366)
where is the value of \ 2 given in Table A5 in the Appendix for n—1 degrees of freedom and probability a (Enter the table with probability P = a). The distribution of the criterion on the left hand side of (1.366), the so-called X 2 distribution, is found by applying the principles of transforming distributions described in §§ 1.35 and 1.36. Starting with the multivariate normal distribution of Xj , x 2 , • • • , * n , we transform to new variables Uj , u 2 , ' ' ' , u where Uj is the criterion on the left hand side of (1.366) and u2, u3, ' • • , u are arbitrary. We then integrate out u2, u3, • • • , it to find the distribution of Uj alone. The details are too lengthy to be given here, see [ 4 ], t 5 ] . We illustrate the working of the test on a particular example. In Table 1.6 are listed 29measurements of the radial velocity of BD 24°1123 by R. J. Trumpler. Since the estimated o 2 = 45.6 and 29
IL(x.-X)2
= 2423.66,
we have =
2423.66 45.6
=
5 3
1 5 i
80 Adopting a = 0.10 we find from the table Xq j0 28 = tl 131 the hypothesis that K = 0 for BD 24°1123 is rejected. We conclude that, at the level of significance a = 0.10 or less, this star should be listed as having variable radial velocity. We have neglected the fact that not all of the observations have equal weight. It is not clear what effect this may have on the result. Since there are but two such observations we may hope that the effect is small. The test criterion used was designed to be powerful, in a particular sense, against Keplerian variation. However, it should be of value in testing against other similar types of variation. The power of the test described is very sensitive to the elements of the parti-
Elements
of Statistical
$ 1.84
Theory
207
TABLE 1.6 RADIAL VELOCITIES OF STAR BD 24° 1123, SPECTRAL TYPE B3
Year
Measured Radial Velocity X.
X.
J
1924 32 38 41 42 43 44-5
45-6
Year
Residuals
+ 7.2 - 13.8 - 32.0 - 29.2 Wt.'A - 8.1 - 1.9 - 10.8 - 7.9 - 13.6 - 15.4 -23.6 - 8.3 -21.6 - 16.5 + 0.2
-
Measured Radial Ve locity
X
l Residuals
X.
1
+ 17.1 - 3.9 - 22.1 - 19.3 + 1.8 + 8.0 - 0.9 + 2.0 " 3.7 - 5.5 - 13.7 + 1.6 - 11.7 - 6.6 + 10.1
+ 2.5 - 1.5 -19.7Wt.H - 2.6 - 10.4 - 9.9 - 9.3 - 14.5 - 4.2 - 8.2 + 2.5 - 2.9 " 13.1 "14.1
1945-6
46-7
X
+i12.6 + 8.4 - 9.8 + 7.3 " 0.5 - 0.0 + 0.6 " 4.6 + 5.7 + 1.7 + 12.4 + 7.0 - 3.2 " 4.2
29
29 observations,
x
= -9.87,
S (x, - x / = 2423.66 j'l
The two spectrograms given half weight are underexposed and have only half as many lines measured as the others. The probable error of a single radial velocity measure (full weight) for a star of this type is approximately ± 4.5 km/sec. Then estimated o 2 = 45.6.
cular orbit. The dependence
is quite complicated and is illustrated in Figure
1.24 and T a b l e 1.7, which g i v e the power function of the test. The uses of the power function may be illustrated by two examples. Example
1. Suppose that with n = 10 observations, an observer accepted the
hypothesis H that K = 0. A glance at the figure or at the table indicates that the words " a c c e p t the hypothesis H"
should be taken with caution. In f a c t ,
the radial v e l o c i t y may have quite a large variation with K/o
equal to 5 and
y e t , if the eccentricity e of the orbit i s 0.9 and = 45°, and if re = 5 observations were made of each star, then, as will be found in Table 1.7 or in Figure 1.24, /3 = 0.88 so that approximately 264 stars will be detected and listed as variable stars. It is seen that the situation is not very satisfactory. The catalogue will list about 334 stars as variables of which 70 are falsely labeled; it will fail to list 36 stars which are variables. The only way to improve the situation is to decrease a and increase /3. It is true that a is at our disposal, and may be taken as low a s desired with a corresponding decrease in the proportion of constant radial velocity stars listed as variables. However, if a is decreased but o and n remain constant, then ¡3 will decrease also. Thus, real improvement can be obtained only by also decreasing o or increasing n or both. The lower right portion of Figure 1.24 illustrates that, when n is fairly large, it is more efficient to decrease a than to increase n correspondingly. For example, j6 is increased more if we (a) halve a by obtaining on n occasions four independent observations almost simultaneously and combining them than if we (b) make n four times as large (by obtaining 4n single observations). We should emphasize a further consequence because similar remarks apply to many astronomical catalogues. The probability that a binary star will be listed as having variable radial velocity depends very much on the particular elements of its orbit and thus so does the probability that some astronomer will undertake the computation of its orbit. This produces a bias in the catalogue distribution of the elements of variable radial velocity stars. For example, compared to the actual distribution of eccentricity of binary systems, the catalogue distribution of eccentricity will tend to have too few stars with high eccentricity. Any discussion of the catalogue distribution of eccentricity which does not take cognizance of this fact may produce misleading conclusions. The catalogue distribution may be rectified by solving an integral equation similar to that used to correct an observed distribution for errors of observation (1.243) except that the power function is substituted for the error function. The purpose of this example is two-fold:
Elements of Statistical
§ 1.85
Theory
211
(i) to illustrate that astronomical catalogues generally do not provide us with a random sample of the stellar population but that the amount of selective identifiability may be estimated and also may be corrected if the approximate power function of the method of selecting catalogue material is known and (ii) to illustrate that real improvement of the catalogue can often be obtained only by increasing the precision or the number of observations on each star. Actually, how large n must be should be considered (with the aid of the power function) before any observational program is started. § 1.85 The likelihood ratio principle and Student's t-test We now consider some astronomical hypotheses for which no special statistical tests have been devised. For some of these hypotheses, however, the standard statistical techniques are applicable immediately and produce a " b e s t " test. In this section we take up a simple example, explain the concepts of the likelihood ratio principle involved in deriving its test and then use this principle to derive the so-called Student's t-test. Supptfse that an observer wants to check his metho'd of measuring the wavelengths of spectral lines with the interferometer. In order to do this, he proposes to make n = 2 measurements, Xj , X2, X , and test the hypothesis $ that these measurements are on the same wavelength system as the "standard" wavelength 6438.4722 A of the measured line. The observer realizes that each of his measurements is subject to random error (and this is why he considers making n measures rather than only one). However, it is not unreasonable to assume that the errors are completely independent and normally distributed about zero with unknown common variance o2. Using this assumption we construct the statistical counterpart H of the phenomenal hypothesis 45 . According to the hypothesis H, the X1 , X2, ' ' ' , Xn are independent normal random variables each with expected value = 6438.4722 and with common unknown variance o 2 . Thus,
0 are unspecified numbers. We shall let the new s e t of admissible hypotheses be a s broad a s possible by letting it include every hypothesis ascribing any non-negative value to the probability that the radial velocity will fall in t h e c e l l with the sole restriction that the sum of these probabilities taken over the s cells must add to one. One may wonder whether or not the introduction of the cells and of the hypot h e s i s H* has changed the situation. To answer this, we point out that the hypothesis H* may well be true and yet, a t the same time, the original hypothesis H may be f a l s e . T h i s is because there are many different functions such that their integrals over a finite number of intervals are the same. Thus, a decision to substitute H* for H means that we decide to ignore the differences between the normal distribution and any other distribution that agrees with a normal distribution on each and every cell , a.); we agree to look for a t e s t which will be s e n s i t i v e to those deviations from normality which t r a n s g r e s s the boundary of at l e a s t one cell. T h u s , the substitution of the hypothesis H* for H and the consideration of the altered s e t of admissible hypotheses means a substantial limitation. We accept this limitation b e c a u s e it enables us to make a reasonable t e s t of the hypothesis H*. Situations in which the c l a s s of admissible hypotheses is not specified occur often in astronomical problems. Also frequent are c a s e s in which the original data c o n s i s t of the numbers of objects or values falling in particular c e l l s . It thus seems useful to describe the situation and the appropriate t e s t in general terms, and return to the ellipsoidal hypothesis a s an example. Consider c a s e s where the random variables are the frequencies of the occurrences of certain events or v a l u e s . Then consider T s e q u e n c e s — say T observational programs — of independent observations. Assume that each observation of the kth sequence (where ¿ = 1 , 2 , • • ", T) may produce any one of sk mutually exclusive r e s u l t s , s a y nR
ki-
aR
k2'
• • • • ivRfcs ' fc
(In the case of the radial velocity measurements, result R, . is that in the ktfl program the value of the stellar velocity fell in cell /.) Also assume that the probability of yielding the result R i s the same for each of the observations of the k1^1 sequence and that it is a known function, say
Elements
of Statistical
t 1.86
Theory Pki = /"*,
.
V
of a certain number I of independent parameters d1 , d 2 , ' • • , for every k - 1 , 2 , • • •, T, s s k k ^Pkj ; -i
=
•••.(%)
j-1
221
. Necessarily
- 1 .
The theory applies when each function f k . p o s s e s s e s continuous partial derivatives up to the second order with r e s p e c t to all the parameters 0. The set O of admissible hypotheses is, then, composed of all hypotheses specifying the values of the parameters 6J[ , , • • • , d[. With this set of admissible hypotheses we shall test a hypothesis H which s p e c i f i e s the values of some r functions of the parameters dj , &2. ' ' ' , or, occasionally, of the probabilities p k . . Let Nk be the total number of observations in the kth sequence, and let nk. be the number of these that gave the result that is, the number of observed values falling in cell /, and let T
N = 2
k=i
/v.. k
The ratio
is the relative frequency of the result R k . in the course of the Nk observations forming the k t > l s e q u e n c e . We have now s p e c i f i e d the probability density of the observable random varia b l e s (either the n k . ' s or the q k . ' s whichever is more convenient) under the hypothesis H to be tested and under the c l a s s 0 of admissible hypotheses. We are thus ready to apply the likelihood ratio principle to deduce a t e s t . The actual application of the X - principle often leads to d i f f i c u l t i e s . F i r s t , the functions f k j may be so awkward that there are algebraic difficulties in solving for the values of the parameters &k which maximize the probability of the observed values of the qk. either with respect to Q or with respect to H. However, the main difficulty occurs in finding the probability distribution of the criterion A and this is because the observable random variables q k . are discrete with consecutive differences equal to 1/A^. E f f o r t s have therefore been made to devise t e s t s that will give r e s u l t s almost always coinciding with the result of the A - t e s t but are e a s i e r to apply. These simplified t e s t s are the very useful \ 2 t e s t s which we now describe. In order to apply the X 2 " t e s t w e (i) compute the maximum likelihood (5 1.73) estimates (see footnote t next page) p k . of the probabilities p k . consistent with H, the hypothesis tested;
i 1.86
222 Iii)
Elements
of Statistical
Theory
form the sum sk
T Xl
= £ *
( i i i ) compute
the
,
_n
)2
*k 2
-
;=i
= i
T S
fly
sk
(
_ ,Y
n
X
A=i/=i
maximum likelihood estimates p k .
n
i,
?
Nk
Pkj
of
J2 (1-370)
the
probabilities
pkj
consistent with 0 , the set of a d m i s s i b l e h y p o t h e s e s ; ( i v ) form the s e c o n d sum
T y2 xn
(v)
=
sk
V 2H
/V V k 2J
k-1
j=l
/
^
\2
VhiZlhî— — g r
T =
yk>
sk
/
V 2 ,
^ 2 * —
k=l
i=l
A/ ~
^
k
Pk
— *
'J
• '
(1.371)
rhj
compute the d i f f e r e n c e 2
_
XH
and r e j e c t H,
2 Xù
the hypothesis tested, whenever the d i f f e r e n c e
x £ , the v a l u e of the c l a s s i c a l difference Xjj -
\
2
given in the A p p e n d i x a s
-
xf}exceel2
,3'34)
The two parameters of this frequency function — the mean v , and the dispersion a. — both vary with the position a, 8 of the area. The mean v,g i s found by equations (3.30) in the same manner as for the single drift hypothesis. The quantity o^fa, 8) is the central moment of the second order ^^a» ig, v ) distribution; this moment may be derived by transformation from the second order central moments ¡L.k(x,y,z)oi the $>(x,y,z) distribution. According to (3.27) and (3.30) we have " vr
= yi3(*-*oJ
~vr0
+ X2 3 ^ ~/o> + ft 3 (* ~ *0 > >
and °T2 = ^02 K'h' +
V
=$(Vr-Vro)2
= Y?3 thoo^'f'^
yh 1*002 (*>y(x, y, z) lead, of course, to different forms of the distribution ^ a i ( t a , - t i ) . Single drift hypothesis: The Maxwellian or spherical velocity distribution [302]
Stellar
Motions
in the Vicinity
of the Sun
§ 3.51
303
y, z) postulated by the single drift hypothesis is defined by four parameters: the components x0, y0, zQ of the mean motion of the nearby s t a r s with r e s p e c t to the sun, that is, the coordinates of the center of the velocity sphere, and the dispersion o , the radius of the velocity sphere. The transformed distribution ^(ta, tg,vrJ obtained by rotation of the coordinate axes is a l s o Maxwellian; we merely have to transform the parameters to the t a , tg, vr coordinate system. The marginal distribution resulting from an integration over v is a bivariate normal distribution of the circular type represented by the- formula - - ^ t o a - W ^ + ^S-'So)2:!
1 \s('a>'s> =
«
Z 77(7
•
(3.41)
The radius o of the dispersion circle of (3.41) is equal to the radius of the dispersion sphere of Q(x,y,z). The coordinates ta0, t$0 of the center of the dispersion circle may be calculated from xQ, yg, zQ by the transformation equations (3.4) = >jj h
+ y2i
y0
+ y3i
K (3.42)
Ho
=
h2 *o + y22
y + y32
%
-
where the y.j and y i 2 are the direction cosines between the ta axis or the tg axis of the Kq system and the x, y, £ a x e s of the Kg system; the matrix (3.5) e x p r e s s e s these direction cosines explicitely in terms of the a, 8 of the center of the area under consideration. Figure 3.10 illustrates the geometrical relation between the components t^, ig 0 and the vector SC, which represents the mean motion of the nearby s t a r s with r e s p e c t to the s u n . The vector SC is of the same magnitude a s , but opposite to, the local solar motion vector; its length is designated by s 0 . The projection of SC on the tangential plane of the area is t0, and we have t0
= s© sin A ,
(3.43)
where A is the angular distance between the center of the area and the antapex of the local solar motion. If we designate by X the position angle of the great circle drawn from the area center to the antapex is a l s o the angle between tQ and the t j axis) we obtain the formulae la0
=
lo
sin
X = se
sin A sin
x
ho
=
lo
cos
X = s©
sin
X-
(3.44) A
cos
The mean tangential velocity components t o f the s t a r s in a given area may thus be found either by (3.42) from the rectangular components of the mean
304
« 3.51
Stellar Motions in the Vicinity of the Sun
Figure 3.10 Mean motion of stars projected on tangential plane.
motion of the nearer stars with respect to the sun, or by (3.44) from the speed of the local solar motion and the adopted position of its antapex. In Figure 3.11a we show for an area centered at a = 0, 8 = 0 the dispersion circle of the tangential velocity distribution on the basis of the single drift hypothesis. We note that the coordinates of the center of the dispersion circle change from area to area while the radius remains the same. Two star stream hypothesis: In the case of the two star stream hypothesis we treat each of the streams separately by the method just described and then take the weighted sum of the two distributions in the ta, tg plane: - ¿ 2
K'a-'ai)2 + (tg-tsP2.] (3.45)
l-v + ~ 2 no25 e
^ - ¿ [ ( « a - i a s ^ + ks
_iS2)2]
The two parameters v (fraction of the population belonging to stream I), and o
Stellar Motions in the Vicinity of the Sun
5 3.51
305
(the dispersions of the two streams which are assumed to be equal) are the same in all areas and are also parameters of the space velocity distribution m f , v. t$
-2.Q
Km/sec
Figure 3.11 Distribution of tangential v e l o c i t i e s in area at a = 0 h , 8 = 0 ° according to: a) single drift hypothesis; b) two star stream hypothesis; c ) ellipsoidal hypothesis.
306
{ 3.51
Stellar Motions in the Vicinity of the Sun
The quantities t ^ , tg^ , t a 2 , and tg2 are the components of the projected mean motions of the two streams relative to the sun; they can be calculated by the equations lal
=
S1
sin
sin
Xl
la2
=
s2
sin
A2
sin
X2 (3.46)
hi
= S1
sin
cos
Xl
t%2 = s2
sin
^2
COS
*2 •
where s 1 is the speed of the mean motion of stream I relative to the sun, while A^ and Xl represent the angular distance and the position angle of the apex of stream I from the center of the area. The corresponding data for stream II are
V
A 2'X
2;
The distribution of tangential velocities for our area derived on the basis of the two star stream hypothesis is illustrated in Figure 3.11b. The coordinates and separations of the centers of the two dispersion circles in the ta, fg plane vary, of course, with the position of the area on the celestial sphere. Ellipsoidal hypothesis: The ellipsoidal hypothesis of stellar motions states that the distribution (x,y,z) is a general trivariate normal frequency function defined by nine parameters. We might use as parameters the three general moments of the first order t+lOO ^,y'J)
= i0.
ii0']0 (x,y,z)
= yQ,
p0'01 (x,y,z)
=
zQ.
and the six central moments of the second order thoo^'h^'
Po2o(^>y>*}>
f*iio(*>y>z)>
Poii&y*')'
By rotation of the coordinate system to the ta, i g , v axes, oriented to the center of our area, the general type of the velocity distribution is not changed; only the numerical values of the nine parameters will be different. By integration over v we obtain the marginal distribution which is a bivariate normal frequency function of the standard form:
VasOcts)
=
2j7acpgVl-
P2
(3.47)
•e
This formula contains five parameters t^, tg0 , ua, og, p. The first two of these, tQQ and i g 0 , are the general moments of the first order and are found by equations (3.42) or (3.44) exactly as in the case of the single drift hypothesis. The remaining parameters o a l og, and p are related to the three second order
Stellar Motions in the Vicinity
of the Sun
§ 3.51
307
central moments of the bivariate ta, t§ distribution by l = SK-too)2
'
~h0)2
=
• p°aat
- ^ a - W ^ S - i S ^ } -
T h e s e e x p r e s s i o n s represent three of the second order moments of the trivariate distribution v j ; they are< obtained by transformation from the six second order central moments ¡j..k of y, z). The transformation equations (3.4) which express ta, ig in terms of x, y, z furnish t h e relations t
oTho
= Yu (*-*))
+ Y2i(y-y'o)
+
Y3i(*-*o)
= Y12(x-x0)
+ y22fy-y0)
+ y32(z-i0>
,
from which we derive (see example 4 of 5 1.31) (3.48) h) = °S • ^ / ' a - V = P a a a S b y t h e procedure discussed in $ 1.26. A s we go from one area of the sky to another, the center of the dispersion ellipse shifts its position; its maximum distance from the origin is equal to the speed s 0 of the local solar motion. The dispersion ellipse changes in s i z e , shape and orientation with the position of the area; the semiprincipal axes of the dispersion ellipse vary in length between the values and 2 3 of the largest and smallest axes of the velocity ellipsoid. § 3.52 Proper
motion distribution
in a small area of the sky
The proper motion of a star is proportional to the tangential velocity and the parallax of the star: =
,
ns
=
.
(K = 4.738)
(3.50)
If we select from among the stars in a given area only those having the same parallax, we will find the distribution of their proper motions to be identical with the distribution of their tangential velocities except for a change in the scale of the coordinates. The proper motion scale (seconds of arc per year) would be derived from the scale of tangential velocities (km/sec) by multiplication by the factor p/x. The stars ordinarily observed in a small area of the sky, however, have parallaxes covering a considerable range of values. Suppose we divide the stars of
§ 3.52
Stellar Motions in the Vicinity of the Sun
309
the area into groups according to their parallaxes. We may then, to a first approximation, consider all the stars in a group to have the same parallax, the group average. The proper motion distribution of each group reproduces the tangential velocity distribution on a scale which is proportional to the mean parallax of the group, and which varies from one group to another. To obtain the proper motion distribution of all stars observed in the area we have to sum up the distributions of successive parallax groups. In this summation each group distribution must receive a weight QfpJ Ap which represents the fraction of all stars with parallaxes falling into the interval p ± Ap/2. According to the single drift hypothesis the proper motion distribution for a group of stars with the same parallax in a small area of the sky is a bivariate normal distribution which may be illustrated by the dispersion circle. The center C (p) of this circle has the coordinates Va0(p)
= ^'aO'
PSo (?)
=
f Ho •
(3-51)
The quantity o^Jp), the radius of the dispersion circle of the proper motions, is derived from the radius a of the dispersion sphere of the space velocities by the expression ou(p)-£a. ^ K
(3.52)
For groups of stars with decreasing parallax the dispersion circles decrease in radius while their centers move along a straight line toward the origin of coordinates. In Figure 3.12a we have drawn the dispersion circles of the proper motion distribution for the area at a = 0, 8 = 0, and for groups of stars with parallaxes 0."10, 0." 08, 0." 06, 0." 04, 0."02. In similar fashion we show in Figure 3.12b for the same area and the same parallaxes the curves of constant frequency for the proper motion distributions derived on the basis of the ellipsoidal hypothesis. To derive the mathematical form of the proper motion distribution r ( P a , FS) in a small area, we recall that p a , fig are two specified functions (3.50) of the three variables ta, ig, p. By the procedure discussed in § 1.36 the distribution of fi a , /Kg may be obtained from the trivariate distribution Ofra» H> pJ
in two steps: (i) We transform ta, p in i i f t a , t g , p j to the variables p.a, fig, p; and (ii) integrate over the variable p to obtain the equation
310
$ 3.52
Stellar
Motions
in the Vicinity
Figure 3 . 1 2 a Distribution of proper motions in area at a = drift h y p o t h e s i s . oo
of the Sun
S = 0 ° according to s i n g l e
^
= J
p)
P-
d
0 If we assume as we have heretofore that the velocity distribution of the brighter stars is space invariant, the distribution of the tangential velocities is stochastically. independent of the parallax. We can thus write arta,ts,pj
=
q» o8 f t a , t s j n p f P j ,
and the expression for the proper motion distribution becomes oo
r
= f 0
( j f n / p J
\
s
»*> ^
i t ) *
•
-
(3 53)
§ 3.52
Stellar Motions in the Vicinity of the Sun
311
Ms
Figure
3.12b Distribution of proper motions in area at a = ellipsoidal hypothesis.
On the
basis
of the
single
drift h y p o t h e s i s
the
8 = 0° according to
distribution
of
tangential
v e l o c i t i e s i s d e f i n e d by ( 3 . 4 1 ) , and the e q u a t i o n for the proper motion d i s t r i b u tion a c c o r d i n g t o ( 3 . 5 3 ) i s (3.54)
o This
integral