116 48 6MB
English Pages 146 [142] Year 2015
DETECTION THEORY
Detection Theory IVAN SELIN THE RAND CORPORATION
PRINCETON, NEW JERSEY PRINCETON UNIVERSITY PRESS 1965
Copyright © 1965, by The RAND Corporation Published, 1965, by Princeton University Press L. C. Card 65-17158 All Rights Reserved Printed in the United States of America
TO
MY
WIPE,
NINA
Preface In the last ten years a well-developed theory has grown to cover testing for the presence of a desired random process (sig nal) in the presence of an undesirable process (noise). This theory, the subject of the present book, is an adaptation of the statistical theory of hypothesis testing to problems of the type that arise in radar, communication, and control engineering. This book is drawn from a course of lectures given in the graduate school of the University of California at Los Angeles (to engineers) and at The RAND Corporation (to a mixed audience of engineers, mathematicians, and mathematical statisticians). The object of the lectures and the text is to demonstrate a canonical class of interesting engineering prob lems to the mathematicians while presenting a unified statistical theory to the engineers. This volume attempts to provide an overview of the field of signal detection through a selection of topics and computational material. The book is not a reference work. Rather it is an exposition of a number of topics—including some very recent work— written to appeal to those readers in engineering, mathematics, statistics, and physics who are as much interested in a knowl edge of the problems and methods as in the results of detection theory. Although the exposition is introductory in the sense that no knowledge of detection theory or of statistical inference is as sumed, a good background in what is usually called "engineer ing probability theory" is required. A familiarity with prob ability density functions and distributions, the one-dimensional normal distribution, covariance functions and spectral densities, weighting functions and transfer functions, and a little bit of function theory is assumed at various points in the text.t Given such a background, the course can be covered in one semester. However, since the background of the class is rarely this good—in particular, some time usually has to be devoted t This material is available in Chapters 2, 3, 4, 6, and 8 of Davenport and Root, An Introdtiction to the Theory of Random, Signals and Noise, McGraw-Hill Book Company, Inc., New York, 1958.
PREFACE
to the definition of a stochastic process and to the relation between covariance functions and spectral densities—in prac tice it is difficult to complete the course in a single semester. The first goal of the book is to expound the theory and probems arising in detecting signals, particularly of the type that arise in radar and electrical communication practices. The second goal is to furnish the student with a number of mathematical tools that are useful in diverse engineering prob lems involving continuous parameter processes. For this reason, rather long sections are devoted to the complex representation of real signals, integral equations with symmetric kernels, and the Karhunen-Loeve expansion. The third goal is to acquaint the engineering student with related work in mathematical statistics. To this end, a number of terms and several proofs have been included that would appear more familiar to the statistician than to the engineer. Since engineering jargon is avoided, statisticians, mathemati cians, and physicists may find this interesting engineering field readily accessible. The material is framed in a rather general context, but all of the topics treated in detail deal with normal noise. Many of the results are valid or can be extended to random variables with arbitrary distributions, although often at the cost of much longer and tedious proofs. The reader interested in other engi neering topics (for example, signals of other unknown param eters, resolution, or waveform estimation) is referred to C. W. Helstrom's book, f while he who wishes the statistical theory in a more general form will be well served by E. L. Lehmann's text. J I acknowledge with gratitude the advise of Leland Attaway, Julian Bussgang, Carl Helstrom, John Hogan, David Middleton, Gaylord Northrup, Irving Reed, and Michael Warshaw. I am particularly grateful to Franz Tuteur, who has read and criticized the entire manuscript in detail. I wish also to thank Dorothy Stewart and Ann Greene for their excellent editorial aid. f Statistical Theory of Signal Detection, Pergamon Press, Oxford, 1960. t Testing Statistical Hypotheses, John Wiley & Sons, Inc., New York, 1959.
PREFACE
The study was prepared as part of the continuing research program undertaken for the United States Air Force by The RAND Corporation. Ivan Selin Santa Monica, Calif. March, 1965
Contents PREFACE GLOSSARY 1. INTRODUCTION
Outline of the Problem Singular Tests 2. SIMPLE HYPOTHESIS TESTING
Optimum Decision Rule, Neyman-Pearson Criterion Bayes Criterion Minimax Test Sufficiency Receiver Operating Characteristics 3. SIMPLE HYPOTHESES, MULTIPLE OBSERVATIONS
Multiple but Correlated Observations Two Observations Multiple Observations Detection of a Known Signal in White Noise 4. SIGNAL KNOWN EXCEPT FOR SOME UNKNOWN PARAMETERS
Introduction Analytic Representation of Signals Distribution of the Likelihood Ratio 5. SYNTHESIS OF FILTERS
Signal Known Exactly Signal of Unknown Phase A Sequence of Pulses of Incoherent Phase 6. TESTS IN THE ABSENCE OF AN A PRIORI DISTRIBUTION 7. DETECTION OF KNOWN SIGNALS IN COLORED NOISE
The Karhunen-Lo^ve Expansion Definitions Theorems The Representation
vii xiii 1
1 5 6
8 11 13 15 16 20 23 23 27 28 33 33 35 42 46 46 46 47 48 54 55 57 58
63
CONTENTS The Detection Problem: Known Signal in Gaussian Colored Noise Distribution of the Likelihood Ratio Heuristic Discussion of the Test Function
66 67 71
8. DETECTION OF STOCHASTIC SIGNALS IN NOISE Introduction Derivation of the Likelihood Ratio Probabilities of Error First Method Second Method
75 75 76 82 83 86
9. SEQUENTIAL DETECTION Test for a Known Signal in White Noise with Continuous Observations Known Signals in Colored Noise Optimum Property of the SPRT Signals of Unknown Strength Signals with Nuisance Parameters Sequential Estimation of Parameters
90 90 96 97 102 105 106
APPENDIX A. SAMPLING ANALYSIS Definition Existence of the Limit Covariance Function of n( P(Hi). Although the cost C0 of a false alarm may be much less than that of a miss Ci, in general, P(Ho)Co» P(Hi)Ci. If we wish to design a test that will keep down the total expected cost, the decision rule should be strongly weighted to reduce the probability of false alarm. In many cases, the largest acceptable value of a is chosen arbitrarily and then the detector is de signed to produce a minimum β for that value of a. A common test is
where K is chosen such that the probability that (l/T) f y 2 dt is less than p, if Ho is true, equals a:
A communication or radar system may be represented by the four blocks shown in Fig. 1. (In an actual receiver, circuit noise enters at the first stage of the receiver.) In designing a system, one usually attempts to choose an ensemble of signal functions {s(i)} that will make P(y\H 0 ) differ as much as possible from P(y\Hi). However, as far as the
INTRODUCTION
signal detection problem is concerned, s(t) and n{t) are taken as given. The detection problem may be resolved into two com ponents: the computation of the likelihoods, and the choice of a decision rule to act upon these likelihoods. n(t)
y( t )
s(t) Transmitter
Channel
Receiver
Decisionmaking device
Fig. 1—Block diagram of a generalized com munication system SINGULAR TESTS A test is said to be singular if one of the conditional error probabilities can be made arbitrarily small while the other re mains bounded at some value less than unity. Loosely speaking, this will not happen if (1) the signal-to-noise ratio is finite and (2) the signal does not occupy a frequency band devoid of noise. In terms of the Karhunen-Loeve expansion to be introduced in Chapter 8, both of these conditions may be subsumed under the requirement that the sum of the signal coefficients si < ». Singular tests do not arise in physical problems with limited observation times. The concept is useful principally as a check on the consistency of the mathematical models of the physical processes.
C H A P T E R
2
Simple Hypothesis Testing In the simplest case, the ensemble of possible signals contains only one member, which we shall denote by s(t). In this case, the alternatives (1.1) and (1.2) may be read: "Ho is the hypothesis that y(t) is a realization from a process with zero mean and random part n{t), while Hi is the hypothesis that y(t) is a realization from a process with mean s( for small signal-to-noise ratio,
for large signal-to-noise ratio. This is the well-known result in radar that for small signals, the optimum receiver is a square law detector followed by an integrator; for large signals, the linear detector, followed by an integrator, is optimum. In radio engineering, the term linear detector is applied to a rectifier followed by a filter that passes the low-frequency components of the rectified wave. 47
CHAPTER 6
Tests in the Absence of an A Priori Distribution In the previous chapter, we discussed the design and evalua tion of receivers for the detection of signals known except for a parameter. If the parameter is governed by a known a priori distribution, Ρ(θ), the most complete possible description of the experiment is given by the a posteriori probability P(0\y) = kP(y\e)P(6). In this chapter we discuss techniques that are applied when Ρ(θ) is not known. The next best thing to the a posteriori probability P(6\y) is an estimate θ of Θ. Given this, we can act as if θ were the true value of θ and proceed as if we had a case of signal known exactly. Since, in general, θ is not the true value of Θ, an error results. Therefore we wish to find a desirable estimate of Θ, that is, one that keeps this error small under a wide range of true values of Θ. To gain some insight into the approach, we can state this problem from a slightly different point of view. Let H a and H i be the hypotheses to be tested: H 0 : y{t) = n(t), H i : y(t) = n(t) + s(t, Θ), where 0 E Ω,
and Ω is a known range of possible values of Θ. Since no measure is defined on Ω (that is, Ρ(θ) is unknown or does not exist), our problem is one of multiple hypotheses: (n(i) + s(t, ¢)} is an ensemble of alternatives, since different signals correspond to different values of Θ. Only when θ is fixed is it clear to the observer which alternative he is considering. For each true value of Θ, there exists a Neyman-Pearson test; that is, there is a test (y, Θ) for each 0, such that E Q [(y, 0)] < a, and Ei[(y, ¢)] is maximized subject to this constraint. If it happens that (y, θ) does not depend on Θ, then we say that there exists a uniformly most powerful (UMP) test (in β).This is often the case if the alternatives are one-sided, that is, if Ω is an ordered set and all θ G Ω are larger (or smaller) than the value of θ that corresponds to H.
TESTS IN ABSENCE OF AN A PRIORI DISTRIBUTION
Example D H 0 : χ = n, nit) is normal (0, 1), Hi: χ = η + θ.
(i) If θ G Ω, where Ω is the closed interval [a, b] and a, b > 0, then a UMP test exists: one sets a threshold K such that |χρ(ι)ώ = a, and accepts if χ > Κ. (ii) If θ can assume negative as well as positive values, how ever, this does not work. For positive 0, we would like to take DiH χ > K; for negative 0, D0 if x > K. Zero is the value of 0 that corresponds to H 0 . More generally, if R^*(y, θ)] < R[(y, ¢)] for all β£Ω, we say that φ * is UMP in Θ. In general, no one test is UMP in Θ. A more usual situation is shown in Fig. 7. R[ 0) —*• 1 as η —* oo. An unbiased estimator whose variance disappears in the limit is consistent; a biased estimator can be consistent if (a) its efficiency is high a n d (b) its bias disappears a s η —> . This is not an essential property since we are really concerned with 0 for finite n. Even a very inefficient estimator can be con sistent; if 0 = 1 for η = 0, 1, 2, . . . , m, and 0 = S"= m x n / n for η > to, then we have a useless estimator for η < m, but for large n, we get consistency. 5. If 0 is normal, and Ε(θ — 0)2 < E(θ — 0)2 for every other normal estimator 0, we say that 0 is asymptotically efficient. 6. If P[y(6)] does not depend on 0 except through Θ, then 0 is sufficient. This is extremely important, since in this case P(y, θ) = P(y\θ)Κθ, θ). Through f(6, 0), 0 summarizes all the information about 0 that is available to the observer. Then the observations y may be discarded, since they add only to P(t/|0), and P(y\d) contains only extraneous data about y that do not bear on 0 once /(0, 0) and 0 are given. Sufficiency has been discussed in Chapter 2. Example E Let y be normal (0, 1). Given observations y h y 2 , . • . , y n , the problem is to estimate 0. A sufficient estimator is
TESTS IN ABSENCE OF AN A PRIORI DISTRIBUTION
0 = Σ"=1 (Xji/η), and any other data are irrelevant (for ex ample, the order in which the y's occur). Various estimators have some but not all of those properties, leaving the choice of an estimator open. What is normally done in practice when P(0) is not available is t o form max^ L(y\e), and then use the maximum of the likelihood ratio as if it were the true likelihood ratio. The justification for this procedure is based on the number of desirable properties possessed by the maximum likelihood estimator, although there is no simple definition of optimality satisfied by this estimator. 1. Although max L estimates need not be unbiased, at least max L estimates are invariant. If θ is the maximum likelihood of 0, and u(8) is single valued, then w(0) is the maximum likeli hood of w(0). For instance, if y is normal, with zero mean and unknown variance, the maximum likelihood estimator of the variance is § = Σ"=1(ί/·/η). For a normal distribution, the fourth central moment is equal to three times the square of the variance. If we wish to estimate Eyi, the maximum likeli hood estimator is 3 Σ ^ / η ) , not 2(y-/n). 2. If any sufficient statistic exists, the maximum likelihood estimator is sufficient. 3. Often, maximum likelihood estimates are consistent and asymptotically efficient. 4. The estimators are often easy to form: § is given as a solu tion t o d logL(y\6)/de = 0 . 5. If Ρ(θ) is uniform, then the maximum likelihood ratio, as a function of 0, is proportional to the a posteriori probability: L(.y\0) = c · P(6\y). Thus the maximum likelihood estimator maximizes the a posteriori probability of 0; that is, it produces the most probable value of 0 given the observation only. In this case, where a priori information does not introduce a prejudice, the maximum likelihood estimate maximizes a posteriori probability. Let us consider item 4 somewhat more closely. The estimate of the parameter 0 is formed by evaluating L(y|0) for all 0 and taking the maximum of L. In practice, if 0 assumes a continuum of values, L(y|0) can be computed for only a finite number of values of 0. In general, this is not a difficult procedure to mechanize. However, the evaluation of the performance of the
TESTS IN ABSENCE OF AN A PEIOBI DISTBIBUTION
estimator is a much more difficult problem. To find the prob ability distribution L(%) of θ, θ must be expressed as a suitable analytical function of y, usually as a solution to §-eL(y\d)=0.
(6.1)
However, solution of Eq. (6.1) for θ will in general result in several values of 0, from which it is then necessary to choose the maximizing value. If the noise is gaussian, we have seen that the likelihood ratio takes the form L(y\0) = c · exp
J-
[y
~
(6-2)
where θα is the true value of Θ. In the engineering literature it is usually assumed that the signal-to-noise energy ratio is suffi ciently large that L(y|0) is almost certainly small, except for values of 0 in the neighborhood of 0o. If such an assumption is valid, it is argued, then only one zero of Eq. (6.1) will produce a large value of Eq. (6.2), and it is clear which zero in Eq. (6.1) corresponds to the maximum likelihood estimate. In evaluating t h e d i s t r i b u t i o n of 0 — t h a t is, f o r e a c h Θ, t h e p r o b a b i l i t y t h a t θ maximizes L(y\d)—a further assumption isusuallymade involv ing large signal-to-noise ratios. (For examples of such computa tions, see Woodward.) These assumptions are reasonable enough and are usually prof erred on the basis of two facts: 1. The exponential function of Eq. (6.2) is sharply peaked in the neighborhood of the zero(s) of its exponent. 2. If accurate estimation is to be made, a large signal-tonoise ratio is required anyway. Unfortunately, the system designer is very often faced with the problem of detecting very weak (threshold) signals, and he may be forced to form one or a series of very poor estimates before detection.! In that case, the problem of ambiguities arises (the apparent value of 0 may not be in the neighborhood of the true value), and the solution of Eq. (6.1) may be only the first of several steps in evaluating the detector. (Woodward t Weak signals are of interest in radar, for instance, when the target approaches the radar. Then the cumulative probability of detection is increased if the long-range returns are handled properly.
TESTS IN ABSENCE OF AN A PRIORI DISTRIBUTION
gives a clear illustration of the behavior of L(y |0) as a function of signal-to-noise energy ratio.) When large signal approxima tions can no longer be made, the evaluation of the distribution of θ becomes a difficult problem. It is then necessary to find the distribution of zero-crossings of the random variable BL{y\θ)/θθ or of the derivative of some monotonic function of L{y\ti), which are difficult and as yet unsolved problems.
CHAPTER 7
Detection of Known Signals in Colored Noise The basic test is to decide between the hypotheses H 0 : y(t) = n(t) and H i : y(t) = s(t) + n(t). In this case, we shall assume that s(t) is known exactly and that n{t) is a stationary normal process, but that the spectral density of n(t) is not constant throughout the frequency range occupied by the signal. If we attempt to apply the Shannon sampling technique, we can write lf\
ι
·
κ V
Sin 2irW (4
~~ k)
k =—Κ
but unfortunately the {rik\ are not mutually independent ran dom variables. Thus we find that P[y(t) = n(t)] = P(y k = n k , all k ) ^ Π p (Vk = n k ) . k In order to use the sampling theorem to write the probability measure for the continuous waveshape y(t) in terms of a count able product of known probability density functions, the cor relation of the samples must be taken into account. This produces some inconveniences as we take the samples more and more closely together. Instead we shall devote considerable space to obtaining a representation (the Karhunen-Loeve ex pansion) that fills the same role for colored noise as the Shannon technique fills for white noise. This expansion is very useful in a wide class of detection problems and is not discussed in depth in many engineering textbooks.
DETECTION
OF K N O W N
SIGNALS
IN
COLORED
NOISE
THE KARHUNEN-LOfiVE EXPANSION We wish a representation of n(t) in the form
with the following properties: 1. The equality should hold for all 2. The should be deterministic functions of t. 3. The time functions should be orthonormal over
4. The deterministic coefficients {ak j should not be functions of t. 5. The random coefficients {nk} should not be functions of t. 6. The random coefficients should be normalized and uncorrected: (This property is also sometimes called orthonormality, but the sense of the term differs slightly from that of condition 3, above.) 7. If n(t) is gaussian, then the {nk\ should also be gaussian. Condition 1 is necessary if the distribution of n(t) is to be described throughout the observation interval by means of the representation. Conditions 2, 4, and 5 permit the time variation and the randomness of the process to be separated. The normalization of condition 3 is merely a convenience, but the orthogonality is essential so that only one linear combination of the can result in a given realization of n(t). Conditions 6 and 7 permit us to write the probability of the joint event (the occurrence of one of a class of realizations of The term "l.i.m." is a probability limit in this definition:
is to equal zero for every value of t. Here and below, the asterisk conjugate. 55
is used
to denote the complex
DETECTION
OF K N O W N
SIGNALS
IN
COLORED
NOISE
n(t)) as the product of the probabilities of the component events (the occurrence of the corresponding realizations of the {n*}). From these desiderata, we can construct the expansion after first finding a necessary condition for the existence of such an expansion. Let n(t) be a gaussian process with mean identically equal to zero. Then, if the expansion exists,
(7.1) because is to equal 5krThus, if the representation exists for n(t), then the covariance function of n(t) must have the above representation. Furthermore,
(7.2) if the summation and integration may be interchanged. We have used the orthonormality of the in evaluating the integral. Thus, we have an integral equation to be satisfied by the and the if they are to have the desired properties on (0, T). In the language of integral equations, the are called the characteristic functions or eigenfunctions, and the are called the characteristic values or eigenvalues of the If n(t) is not gaussian, the following steps may still be performed, resulting in an expansion in terms of unoorrelated, but not necessarily independent, coefficients. If the coefficients are not independent, the expansion does not suffice as a basis for the likelihood ratio. The reader proficient in integral equations can safely proceed to the section entitled " T h e Detection Problem: Known Signal in Gaussian Colored Noise." 56
DETECTION
OF K N O W N
SIGNALS
IN
COLORED
NOISE
kernel R(t, s). A characteristic value is associated with each characteristic function. We shall consider this equation in detail to see under what conditions solutions with the desired properties exist. First we need some facts about integral equations of this, the second, kind. DEFINITIONS
1. A function/( 1 > B > 0. Then L[y(t)] is compared with these thresholds, either continuously or at discrete points in time. Since L[y(0)] = 1 at time zero, L[y(0)] lies between A and B. As the test proceeds, if L exceeds A at one of the com parison instants, the test terminates with the decision D i. If L has decreased below B, the test terminates with the decision D0. If L remains between A and B, another observa tion is made. The test continues until one or the other of the thresholds is crossed. As we shall prove later, if the signal is known exactly, if independent observations are made on the y{t) process, and if the average test time is finite, this test will achieve given conditional error probabilities in a shorter average time than any other test. In particular, the average test time is less than that required in the conventional fixed sample size test for the same error probabilities. TEST FOR A KNOWN SIGNAL IN WHITE NOISE WITH CONTINUOUS OBSERVATIONS The likelihood ratio for a known signal in white noise, as developed in Chapter 3, may be written
SEQUENTIAL
DETECTION
The remaining question in designing this sequential test is the choice of thresholds. These may be obtained easily in terms of the desired conditional error probabilities. The test ends with the decision D i when the likelihood ratio crosses the upper threshold A. At this point This equation will hold at the boundary for all waveshapes that result in the decision Di. The integral of over the ensemble of such waveshapes will then be equal to the probability of deciding Di when Hi is true. If the test must end with probability one, then by definition Similarly, integrating over the same ensemble results in the probability of deciding D i when in fact H0 is true, which is a. Thus
Similarly, at the lower threshold, and an integration of both sides over the ensemble of waveshapes leading to D0 results in
The next topic is the distribution of the test duration, and in particular the mean duration of the test. The easiest way to obtain this mean value is through the Shannon sampling technique. The notation is that of Chapter 3. Consider the sequence of independent observations spaced at time intervals 1 / (2W) seconds apart, which is to be tested sequentially for the presence of a constant signal in noise of variance 2N0W. The problem is to find the average time taken for the test to end. If we define
91
SEQUENTIAL
DETECTION
(all of the z„ are identically distributed), then the logarithm of the likelihood ratio based on a sequence of v independent observations may be written
We next prove a very convenient formula for finding the average test time. Theorem 9.1. If the test ends at the Nth observation (N is a random variable), and if , thenf (9.2) Proof. Since N is itself a random variable,
The event can occur only if the test has not ended by the (i — l)th observation, and hence this event is independent of Zi. Therefore, The expression E(z) is known as the Kullback-Leibler information number in statistics and is a measure of the amount of information gained from each observation. Note that the event N = i is not independent of for For instance, for i = 2, whether or not the test reaches the second stage obviously depends on what happens at the first stage but not on what will happen at the second stage. 92
SEQUENTIAL
DETECTION
which is the desired result. Note two devices occurring in this computation that are useful in evaluating sums and, in particular, expectations of integer-valued random variables. The first device is
provided both sums are absolutely convergent. The second is
The average test length will depend on whether H 0 or Hi is true, so we shall consider the two possibilities separately:
similarly,
Note that is negative, since For the detection of a known signal in white noise,
93
SEQUENTIAL DETECTION
and we have E(N\H0) « E(NlH1) «
O
[(1 - a) log (1/ B ) - a log A ] ,
(9.5a)
log A - β log (1/ B ) ] .
(9.5b)
[(ι _
β )
At several points in the preceding discussion it was assumed that the test had to end with probability one, or, more precisely, Iim P(N > n) = 0. n—* oe
This is of course necessary if E ( N ) is to be finite, but it is further required if we are t o be able t o write a = P(Di\Ho) = 1 - P(DoIiTo) and β = P(D0IH1) = 1 - P(D1IH1). (Other wise neither decision D0 or Di might be reached, and the pairs of probabilities would not add to unity.) We will now show that the test will almost certainly end, and from there we can easily show that E(N) is finite. The probability of the test's not ending is certainly less than the probability that none of the |z,| will exceed the distance between the upper and lower boundaries, and the latter probability goes to zero as the number of observations goes to infinity. Let C = log A — log B. In general, P(|z,| < C) = ρ < 1, and thus the probability that the first η observations all lead to values of |ζ,·| < C equals p n , which approaches zero as η approaches infinity. (This reasoning applies directly to any independent observations, normal or not, as long as ρ < 1, and it can be easily extended to cases where ρ = 1, provided P(zi = 0) < 1. However, for very highly correlated observa tions the theorem can be violated.) This argument is also easily extended to show that all the moments exist. The mean test length is written to
E(N) = ^ nP(N = n) n=l CO
QO
= ^ P(N > n) < ^ p" = n= l
n=*l
< co.
SEQUENTIAL
The moments of N will all exist if
DETECTION
exists for some
All the terms of the series are positive if and therefore each term must be finite if exists. This expectation is finite provided
The quantity of direct interest in evaluating the SPRT for continuous time is not the number of samples (which is a function of the arbitrarily chosen bandwidth), but the time duration of the test. Since the observed waveshape is supposed to be sampled every 1/(2W) seconds, the test duration T is given by T = N/(2W), and, therefore,
Note that this result does not depend on W, the assumed noise bandwidth. This is to be expected, since by using the sampling technique on finite observation intervals we have already assumed that l/W is negligibly small compared with the observation time. This result can be obtained rigorously for a properly defined signalin-white-noise process without use of the approximations of a sampling technique (see the notes in Appendix B). 95
SEQUENTIAL DETECTION
These various proofs, as well as the proof of the optimality of the SPRT, suppose that the observations are identically as well as independently distributed. This corresponds to the case of a constant signal in white noise. Modifications are possible to cover a time-varying signal in white noise, a case that gives rise to independent observations with identical variances but dif ferent mean values. For instance, if the signal is sinusoidal instead of constant, the above results hold almost exactly pro vided the period of the signal is much shorter than the expected test length; in the formulas above, (1/T)JTs(i)2 dt replaces s2. If the signal consists of a sequence of pulses of duration τ (or a pulse-modulated carrier signal), then the discrete analysis can be used to find E(N): in this case E(z) is obtained by consider ing the likelihood ratio for a fixed, r-second continuous observa tion of a signal in white noise (see Chapter 3). However, for a pulsed signal and a small E(N), the approximate formulas (9.5) may seriously underestimate E(N), since, for instance, E(ZN\DI) may exceed log A significantly. Slowly varying signals, or signals that have only finite total energy over the infinite time interval, give rise to analytical problems for which solutions are unknown. The latter case is particularly troublesome in theory, because it is then no longer true that the test must end with probability one. No practical problem occurs, however, provided signal amplitude does not markedly decrease for a time much longer than E(T). KNOWN SIGNALS IN COLORED NOISE If the noise is not white, the Shannon samples are correlated and the observations in the corresponding problem in discrete time are no longer independent. Two approaches to the analysis are possible. 1. It may be possible to find a transformation, involving no more than the information available in the observed segment of y(t), that results in an equivalent detection problem for a signal in white noise. For instance, consider the problem of detecting a constant signal S in noise with spectral density iVo/(1 + ω2). If the received waveshape is passed through a filter with transfer function 1 + io> (that is, a filter that adds the input and its first derivative), then the problem at the out put is to detect a signal of strength S in white noise of spectral
SEQUENTIAL DETECTION
density No. The signal-to-noise energy ratio of the white noise case is S2/No- The problems are equivalent because the observer can perform this operation without delay and without need of the unobserved portion of y(t) at t < 0; he has neither added nor destroyed information. Note that if the signal to be detected were A sin (ω0ί + θ ) , the equivalent test would be for the presence of the signal (1 + ω0)Α cos (ω0ί + θ) in white noise of spectral density N0, resulting in a signal-to-noise energy ratio of Ai
-g- (1 + ωο)2 Nl
In both cases the signal-to-noise energy ratio of the equivalent test is equal to the signal-to-noise energy ratio of the original test, if the value of the noise spectral density is taken at the signal frequency. 2. If no whitening operation is possible, it is necessary to consider the likelihood ratio itself as a random process and attempt to find the distribution of its first passage times across the boundaries. This can be done, since it turns out that the likelihood ratio is a Markov process and may be considered stationary if a suitable time scale is chosen. Unfortunately it is no longer quite true that the SPRT with constant thresholds is still an optimum test—the optimum test involves initially timevarying thresholds that soon approach asymptotically constant values. As an example of a spectral density that does not lead to an equivalent test through use of a whitening filter, consider 7Vo(l + ω2)/(1 + 262ω2 + 64ω4). The whitening filter has the transfer function [1 + (ζ'6ω)2]/(1+ ΐω). The numerator calls for adding the input to its properly scaled second derivative, but the denominator corresponds to a filter with transfer func tion 1/(1 + ΐω), the latter filter involving memory. Thus if the output is to be a white noise process, the entire past of the input must be available, and this is unknown to the observer. OPTIMUM PROPERTY O F THE SPRT The next topic is a proof of an optimum property of the sequential probability ratio test. Of all tests that achieve any
SEQUENTIAL DETECTION
pair of preassigned values of a and β, theSPRT minimizes both E(N\H0) and E(N\Hi). There are three parts to the proof. In Lemma 9.1 we will extend the notion of cost to include a cost of experimentation proportional to the number of observa tions made. The risk is defined as the average of the cost of the false decisions plus the cost of experimentation. Then we show that for any choice of values for the three different costs (C01, cost of a miss; C10, cost of a false alarm; and CV, the cost of an observation) and any a priori probability τ = P(Ho), there exists a unique SPRT that is a Bayes test (that is, that min imizes the risk). In Lemma 9.2 it is stated that we can choose P(H 0 ) and the cost coefficients so that the Bayes SPRT will achieve any arbi trary set of values of a and β. Lemma 9.3 combines the results of the other two to prove the desired theorem. Theorem 9.2. If observations are independent and identically distributed, then among all tests for which a < a 0 and β < β 0 , the SPRT with a = a 0 and β = βο minimizes both E(N\H 0 ) and E(NlH 1 ). If Hο is true, the risk associated with a test procedure δ may be written as ASCio + CNE$(N\H0); if IIi is true, the risk is /S5Coi + CNE,(N\H i). The average (Bayes) risk associated with the procedure δ and the a priori probability π is r(x, δ) = TRLMC 1 0 + C N E S (N\H 0 )] + ( 1 - τ ) [ / 9 , < 7 ο ι + C N E S (NIH 1 )]. (9.6) Lemma 9.1. There exists an SPRT that minimizes the Bayes risk (Eq. (9.6)). In a sequential test, one of three decisions is made after each observation: to accept H1, to accept H0, or to take another observation. From the Bayes point of view, the decision to con tinue testing is made if it appears that the increase in observa tion cost will be compensated by a reduction in the expected cost of making a false decision: Let ρ,(ττ, δ) be the additional risk incurred by the test δ if it is decided to continue testing after the ith observation. Let D 0i , Du, Dei be the decision to accept, to reject, and to experiment, respectively, after the ith observation. The risk of stopping is
SEQUENTIAL
DETECTION
, We can extend this concept to include the option of stopping without any observations, which would happen if CV were high enough compared to and Proof. The lemma is proved in two parts. First it is decided whether to take a first observation at all, which turns out to depend on a test in which L(0) is compared with two thresholds. Then, by induction, it is shown that the same two thresholds will be used at every observation, provided the first observation is taken. First Step: The average risk of deciding Do without an observation is ; if D i is decided, Let
where C is the class of tests involving at least one observation. If is less than both and one observation is made. Otherwise a decision is made without an observation. Since all of these risks depend on we wish to see if the decision can be characterized in terms of We must investigate as a function of 1. For any two probabilities and and any in (0, 1),
Hence is concave. 2. The risk is bounded below (by zero). Therefore is continuous in Consider Fig. 9: The risk functions and intersect at the point ; at this point the risk in testing is For all other values of ir, the risk is less, since either will be less than this risk. Since is concave and never negative, If is greater than is greater than at least one of the risks for all values of in (0, 1). (See Fig. 9(a).) In See G. H. Hardy, J. E. Littlewood, and G. Polya, Cambridge University Press, New York, 1934, Sec. 3.18.
99
Inequalities,
SEQUENTIAL
DETECTION
this case, the average risk is minimized for all TT by performing no observation. However, if is less than , then there will be an interval
such that an observation should be made if lies within (See Fig. 9(b).) Obviously and satisfy the equations
Fig. 9—Average risk as a function of and ; and and will depend on C01, C10, and CN in some as yet undetermined manner. Thus, for the latter case, the first step is
Later Steps: Suppose that i observations {yn} have been taken and that the test has not yet terminated. The procedure is then the same as before, since the expected additional risk is not affected by the costs of the experimentation already incurred:
100
SEQUENTIAL
DETECTION
However, P(H0) is no longer just by i observations:
but rather has been modified
Thus the Bayes procedure is to continue testing as long as or
where
Note that the same rule holds for i = 0. This proves Lemma 9.1, but it does not settle the question of setting the thresholds in terms of the weights CN, C0I, and C10, which are reflected in the values We know how to set the thresholds in terms of and The next lemma states that for arbitrary and there exists a set of cost coefficients and for which the Bayes test will achieve and It is given without proof, since the proof is readily available and, although ingenious, adds little to the understanding of the nroblem. Lemma 9.2. For any and there exist and CN such that the Bayes test for cost coefficients CN, COI = 1 — C, Cm = C, and an a priori probability is sequential probability ratio test as given in Eq. (9.1). (Lemma stated without proof.) Proof of Theorem 9.2. Finally, consider an SPRT with the thresholds and any constant Solve from Eq. (9.7): for and
These constants satisfy the relationship Lemma 9.2 states that there is a set of cost coefficients for which the present test is a Bayes solution (the cost coefficients and therefore on for fixed A and B). Let depend on and 101
SEQUENTIAL DETECTION
the present test achieve values «ο, β ο , E ( N \ H 0 ) , E ( N I H 1 ) 1 and consider a n y other test with α * , β * , Ε * ( Ν \ Η 0 ) , Ε * ( Ν \ Η { ) . Since the present test is Bayes, it satisfies ir[Cioa + C N E(N\H 0 ) ] + ( 1 -
+ C N E(NIH 1 )]
< i r l C l o a * + C N E*(N\H