255 99 4MB
English Pages 123 [124] Year 2023
Studies in Big Data 126
Lawrence D. Stone Roy L. Streit Stephen L. Anderson
Introduction to Bayesian Tracking and Particle Filters
Studies in Big Data Volume 126
Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Big Data” (SBD) publishes new developments and advances in the various areas of Big Data- quickly and with a high quality. The intent is to cover the theory, research, development, and applications of Big Data, as embedded in the fields of engineering, computer science, physics, economics and life sciences. The books of the series refer to the analysis and understanding of large, complex, and/or distributed data sets generated from recent digital sources coming from sensors or other physical instruments as well as simulations, crowd sourcing, social networks or other internet transactions, such as emails or video click streams and other. The series contains monographs, lecture notes and edited volumes in Big Data spanning the areas of computational intelligence including neural networks, evolutionary computation, soft computing, fuzzy systems, as well as artificial intelligence, data mining, modern statistics and Operations research, as well as self-organizing systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. The books of this series are reviewed in a single blind peer review process. Indexed by SCOPUS, EI Compendex, SCIMAGO and zbMATH. All books published in the series are submitted for consideration in Web of Science.
Lawrence D. Stone · Roy L. Streit · Stephen L. Anderson
Introduction to Bayesian Tracking and Particle Filters
Lawrence D. Stone Metron Reston, VA, USA
Roy L. Streit Metron Reston, VA, USA
Stephen L. Anderson Metron Reston, VA, USA
ISSN 2197-6503 ISSN 2197-6511 (electronic) Studies in Big Data ISBN 978-3-031-32241-9 ISBN 978-3-031-32242-6 (eBook) https://doi.org/10.1007/978-3-031-32242-6 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2 Bayesian Single Target Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Prior Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Likelihood Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 Posterior Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.4 Basic Bayesian Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.5 Examples of Priors, Posteriors, and Likelihood Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Tracking a Moving Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Prior Distribution on Target Motion . . . . . . . . . . . . . . . . . . . . . 2.2.2 Single Target Tracking Problem . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Bayes-Markov Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Kalman Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Discrete Time Kalman Filtering Equations . . . . . . . . . . . . . . . 2.3.2 Examples of Discrete-Time Gaussian Motion Models . . . . . 2.3.3 Continuous-Discrete Kalman Filtering Equations . . . . . . . . . 2.3.4 Kalman Filtering Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.5 Nonlinear Extensions of Kalman Filtering . . . . . . . . . . . . . . . Appendix: Standard Kalman Filter Equations . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 5 7 8 9 11
3 Bayesian Particle Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Particle Filter Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Motion Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Bayesian Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Bayesian Particle Filter Recursion . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Additional Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.5 Tracking Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45 45 46 46 47 48 54 55
14 20 21 22 23 25 25 28 30 32 40 43 44
v
vi
Contents
3.3 Bayesian Particle Filtering Applied to Other Nonlinear Estimation Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Nonlinear Time Series Example . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Smoothing Particle Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Repeated Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Smoothing Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
66 67 69 71 72 77 79
4 Simple Multiple Target Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Association Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Soft Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Simplified JPDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Particle Filter Implementation of Simplified Nonlinear JPDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Crossing Targets Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Feature-Aided Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 More Complex Multiple Target Tracking Problems . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81 81 81 83 83
5 Intensity Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Point Process Model of Multitarget State . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Basic Properties of PPPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Probability Distribution Function for a PPP . . . . . . . . . . . . . . 5.2.3 Superposition of Point Processes . . . . . . . . . . . . . . . . . . . . . . . 5.2.4 Target Motion Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.5 Sensor Measurement Process . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.6 Thinning a Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.7 Augmented Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 iFilter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Augmented State Space Modeling . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Predicted Detected and Undetected Target Processes . . . . . . 5.3.3 Measurement Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.4 Bayes Posterior Point Process (Information Update) . . . . . . . 5.3.5 PPP Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.6 Correlation Losses in the PPP Approximation . . . . . . . . . . . . 5.3.7 The iFilter Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
99 99 100 102 103 104 104 104 105 105 106 106 107 108 109 110 111 111 112 117 118
84 85 89 97 98
Chapter 1
Introduction
This book provides a quick but insightful introduction to Bayesian tracking and particle filtering for a person who has some background in probability and statistics and wishes to learn the basics of single target tracking. We also introduce the reader to multiple target tracking by presenting useful approximate methods that are easy to implement compared to a full-blown multiple target tracker. Filtering is the process of estimating the present target state from the measurements received up to the present time. We take an approach to filtering that is naturally suited to the computational resources and methods now available. We view the specification of the target’s motion model as a recipe that allows us to produce a large number of independently drawn sample paths from the model. We use these paths as a discrete approximation to the target motion model and perform Bayesian filtering on them. This leads in a natural fashion to particle filtering and provides insight into the nature and power of Bayesian filtering. The background provided here will allow a person to quickly become a productive member of a project team using Bayesian filtering and to develop new methods and techniques for problems the team may face. Summary of chapters. The following paragraphs provide a quick review of the chapters in this book. Chapter 2 begins by answering the question of why we use Bayesian methods for tracking. The Bayesian inference framework starts with a prior distribution on a parameter to be estimated. Bayes rule is applied to measurements (observations) related to this parameter to obtain the posterior distribution given these measurements. Given a new measurement, the chapter shows how to use the posterior as the prior to compute the updated posterior given this and all previous measurements. This basic Bayesian recursion is applied to obtain the single target Bayesian filtering recursion. The prior distribution for this recursion is a stochastic process representing the target’s motion model. The chapter discusses Kalman filtering as the special case of Bayesian filtering where the target motion model is Gaussian and the measurements are linear in target state with additive Gaussian measurement errors.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. D. Stone et al., Introduction to Bayesian Tracking and Particle Filters, Studies in Big Data 126, https://doi.org/10.1007/978-3-031-32242-6_1
1
2
1 Introduction
Chapter 3 presents particle filtering. It discusses target motion models with examples drawn from tracking applications. The emphasis is on developing natural motion models that represent our best understanding of how the target moves. This frees the analyst from the usual Gaussian and Markov process constraints put on target motion models. The chapter introduces the notion of approximating the stochastic process modeling target motion by using a large number of independently drawn sample paths from that process. The Bayesian filtering process is applied to this discrete approximation, which leads naturally to particle filtering. This in turns leads to the need for resampling the paths to preserve the resolution of the filter. The chapter presents examples of particle filtering applied to problems that are challenging to solve using Kalman filtering. It also discusses the application of particle filters to estimating a parameter that varies over time according to a stochastic process and provides an example illustrating this application. The chapter finishes by discussing smoothing a particle filter. Fixed interval smoothing is the process of estimating the posterior distribution on target paths over a fixed time interval given the measurements received in that interval. The chapter describes the repeated filtering method for computing this smoothed estimate. This simple but effective method is particularly suited to the computational resources now commonly available to an analyst. Repeated filtering is a very general method for smoothing a particle filter. It can be applied to almost any particle filter. It does not require the Gaussian or Markov process assumptions made by other methods. Moreover, repeated filtering is not limited to single target tracking. It can be applied to multiple target tracking problems such as those discussed in Chap. 4 and to any filtering problem where a particle filter is used to approximate a stochastic process and perform filtering. Chapter 4 discusses multiple target tracking. The difficulty in multiple target tracking arises when the origin of a measurement is ambiguous. This occurs when we are not sure which target generated the measurement or whether it is a spurious or false measurement, i.e., not generated by any target. By contrast, previous chapters have assumed that all measurements are generated by a single target and that there are no false measurements. To deal with ambiguity in assigning measurements to targets, we present a Bayesian method of computing association probabilities. The association probability between a target and a measurement is the probability that the target generated the measurement. Using association probabilities, we develop a simple multiple target tracker that allows for the possibility of false measurements. The chapter concludes with a discussion of algorithms designed for more complicated multiple target tracking problems, Chapter 5 takes a different view of multiple target tracking. Instead of computing a target state distribution for each target present, it settles for the more modest goal of estimating the target intensity (density) function over a common state space. If we integrate the target intensity function over any subset of the state space, we obtain the expected number of targets in that subset. Thus, we are estimating the spatial distribution of the number of targets as opposed to target state distributions. Generally, peaks in the intensity function correspond to the locations of targets.
1 Introduction
3
To estimate the intensity function, the chapter develops point process models of target state. It then derives the iFilter recursion for calculating target intensity functions in a Bayesian fashion. In contrast to the multiple target trackers described in Chap. 4, the iFilter recursion does not require associating measurements to targets and is computationally simpler than the algorithms discussed in Sect. 4.5 of Chapter 4. The chapter concludes with an example that shows the iFilter “tracking” four targets that appear and disappear over time in the presence of clutter (false measurements).
Chapter 2
Bayesian Single Target Tracking
2.1 Bayesian Inference Bayesian inference combines a prior probability distribution on the possible values of a parameter with likelihood functions derived from measurements of the parameter to compute the posterior probability distribution on the parameter. Why Bayesian Inference? There are many ways to estimate a parameter. Why have we chosen Bayesian inference? We believe Bayesian inference to be the most powerful and general inference method available because it can incorporate all the information available to the analyst including subjective information. It is the only method known to us that can do this in a principled fashion and that has a consistent record of success. In addition, the method is conceptually simple. Stone et al. (2014a) describes how Bayesian inference made a significant difference in the success of a major search operation. This article is one of several “Big Bayes” stories in the February 2014 issue of Statistical Science which features articles describing cases in which the use of Bayesian statistics made a big difference. There are three keys to unlocking the power of Bayesian inference: subjective probability, likelihood functions, and the Bayesian recursion. These key elements are presented in this chapter. Subjective probability. Information is often available about the parameter to be estimated that is not derived from measurements, data, or other objective methods. It is often valuable information which should be used to help estimate the parameter. In the 1960s and 70s subjective probability and Bayesian inference was put on a solid theoretical foundation by statisticians such as Lindley (1965), DeGroot (1970), and deFinetti (1974–1975). In comparison, more “objective” notions of probability, such as relative frequency, can create problems when used as the basis for statistical inference as is discussed in Chap. 1 of Berger (1985). As a simple example of the use of subjective probability, consider estimating the probability pH of obtaining heads when tossing a coin. To start, we know that 0 ≤ p H ≤ 1. However, if the coin is a standard US coin, such as © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. D. Stone et al., Introduction to Bayesian Tracking and Particle Filters, Studies in Big Data 126, https://doi.org/10.1007/978-3-031-32242-6_2
5
6
2 Bayesian Single Target Tracking
a penny, we know more than this. We know from experience that the value of pH is likely to be near 1/2 which means that values near 1/2 have higher probability than those near 0 or 1. Bayesian inference allows us to incorporate this (subjective) information into a prior distribution. For example, to represent our knowledge or belief about the value of pH , we can choose the prior on pH from the class of beta distributions with density functions f αβ given in (2.1). f αβ (x) =
(α + β) α−1 x (1 − x)β−1 for 0 ≤ x ≤ 1 and α, β ≥ 0, (α)(β)
(2.1)
where x denotes a possible value of pH and is the gamma function, which for integers n ≥ 1 takes the values (n) = (n − 1)!. Graphs of several beta densities are shown in Fig. 2.1. By selecting appropriate values for α and β, we can obtain a uniform [0, 1] prior indicating that all values of p H between 0 and 1 are equally likely (α = 1, β = 1), a prior where probabilities near 1/2 are more likely (α = β = 3), a prior that favors probabilities near 0 (α = 2, β = 6), or one that favors probabilities near 1 (α = 6, β = 2). Using one of these densities as our prior, we can flip the coin a number of times and use Bayes rule presented below to estimate the posterior distribution on pH . If we choose a beta prior that peaks at 0.5 as it does for α = 3, β = 3, then, as we will see, the uncertainty in our estimate of pH will be smaller than with a uniform [0, 1] prior (α = 1, β = 1). This may not be a concern with coin tosses where it is easy to toss the coin a few more times to reduce uncertainty, but it is a concern in applications where it is difficult, expensive, or even unethical to obtain more data or measurements. This is often the case in medical trials when evaluating the effectiveness of a new pharmaceutical drug or procedure.
f
x
1 3
2.5 2.0
2,
6
6,
2
1.5 1.0 0.5
0.2
0.4
Fig. 2.1 Beta probability density functions
0.6
0.8
1.0
x
2.1 Bayesian Inference
7
Likelihood functions. A likelihood function converts a measurement or piece of data into a function that specifies probabilities on the state space of the parameter to be estimated. The method of doing this is given in (2.3) below. This method enables us to combine the information in that measurement with the prior distribution on the parameter to produce the posterior distribution using Bayes rule. Likelihood functions play a key role because they convert all types of measurements into functions on the state space of the parameter. For this reason, we say that likelihood functions are the common currency of information in Bayesian inference. Bayesian recursion. As new data become available, the Bayesian recursion enables us to revise our estimate of the parameter of interest in a principled recursive fashion. Suppose we have computed the posterior distribution on a parameter based on n measurements or pieces of data and then receive a new measurement, an (n + 1) st piece of data. The Bayesian recursion allows us to compute the posterior given all n + 1 measurements by using the posterior given the first n measurements as the prior and combining it with the likelihood function for the (n + 1) st measurement using Bayes rule to obtain the posterior given all n + 1 measurements. This greatly simplifies the estimation process in dynamic situations where data are being received over time and we wish to revise the posterior as new data arrives. Enabling technologies. The theoretical reasons given above make Bayesian inference general and powerful. Moreover, the existence of highly capable, low-cost computers coupled with the ready availability of data allows us to perform Bayesian inference using realistic models and assumptions. Often one or more of the steps involved in Bayesian inference require evaluating integrals that must be computed numerically. The requirement to use numerical methods to perform Bayesian inference to estimate time-varying stochastic quantities has driven the development of particle filters which is the subject of Chap. 3.
2.1.1 Prior Distribution Let X denote the unknown parameter we wish to estimate. Bayesian inference takes the point of view that we can quantify our knowledge about X in terms of a prior probability distribution on its value. In this usage, prior means before we have received an estimate or measurement of X. Let S be the space of possible values of X and p(x) = Pr{X = x} for x ∈ S
(2.2)
denote the prior distribution on X. Note, we use Pr to indicate either probability or probability density whichever is appropriate. Similarly, we use the phrase probability function to mean either probability function or probability density function as appropriate. In addition, we use upper-case letters for random variables and lowercase ones for values of a random variable. Thus in (2.2), x is a value of the random variable X. We use the terminology random variable even though the probability
8
2 Bayesian Single Target Tracking
distribution in (2.2) may represent our uncertainty in the value of X rather than some basic randomness. Comment. Some might object to treating the unknown parameter as a random variable. This objection is usually based on the classical mechanical view of the world in which these parameters are deterministic quantities that are simply unknown. They are not random. The Bayesian view is that the probability distribution in (2.2) represents our uncertainty about the value of the unknown parameter. This uncertainty can result from lack of knowledge or basic randomness. The prior in (2.2) can be based on subjective information and judgements. Such probabilities are called subjective probabilities. The use of subjective judgements and probabilities in a principled manner is one of the strengths of Bayesian inference. If reader is skeptical about the use of subjective information and probabilities in Bayesian inference, we ask them to withhold judgement until they have seen the power and effectiveness of the Bayesian approach as it unfolds in this book.
2.1.2 Likelihood Function We assume that information about the value of X is obtained through observations or measurements Y whose values fall in a measurement space M, which may be different than the space S of possible values for X. The value of Y depends on X in a probabilistic fashion that is captured in the likelihood function L which is defined as L(y|x) = Pr{Y = y|X = x} for x ∈ S.
(2.3)
The likelihood function L gives the probability of receiving the measurement Y = y as a function of x. Note that the measurement y is fixed and known, but the value of the parameter x is unknown and can vary over S. So, we view L as a function of x. We may indicate this by writing L(y|·) to show that we are considering y fixed and letting x vary. If one integrates L(y|·) over x ∈ S, the integral may not equal 1. That is, L(y|·) may not be a probability function. Suppose we have two measurements Y1 = y1 and Y2 = y2 with likelihood functions L 1 and L 2 . Suppose further that the probability distributions on Y1 and Y2 are independent conditioned on the state X = x. Then we say the likelihood functions are conditionally independent and write the joint likelihood function for the two measurements as L(y1 , y2 |x) = Pr{Y1 = y1 , Y2 = y2 |X = x} = L 1 (y1 |x)L 2 (y2 |x) for x ∈ S. (2.4) This simple statement is extremely powerful. It tells us how to combine measurements from two sensors with conditionally independent likelihood functions even when the sensors differ in type. Just calculate their likelihood functions and multiply as in
2.1 Bayesian Inference
9
(2.4). Heterogenous sensors such as radar and sonar will generally have conditionally independent likelihood functions.
2.1.3 Posterior Distribution Let p(x) = Pr{X = x} for x ∈ S
(2.5)
be the prior distribution on the parameter X that we wish to estimate. Suppose we receive the measurement Y1 = y1 . Then the posterior distribution p(x|y1 ) on X given Y1 = y1 is Pr{X = x, Y1 = y1 } Pr{Y1 = y1 } Pr{Y1 = y1 |X = x} Pr{X = x} = for x ∈ S. Pr{Y1 = y1 }
p(x|y1 ) = Pr{X = x|Y1 = y1 } =
(2.6)
The first and second lines of (2.6) follow from the definition of the conditional probability. Bayes Theorem or rule. The second line of (2.6) is the statement of Bayes theorem or rule. Specifically Pr{X = x|Y1 = y1 } =
Pr{Y1 = y1 |X = x} Pr{X = x} for x ∈ S. Pr{Y1 = y1 }
(2.7)
From (2.3), (2.5), and the fact that Pr{Y1 = y1 |X = x} Pr{X = x}d x = L 1 (y1 |x) p0 (x)d x = Pr{Y1 = y1 }, S
S
(2.8) we obtain p(x|y1 ) = S
L 1 (y1 |x) p(x) for x ∈ S L 1 (y1 |x) p(x)d x
(2.9)
which restates Bayes theorem or rule in terms of the prior p and the likelihood function L 1 .
10
2 Bayesian Single Target Tracking
2.1.3.1
Estimating the Probability of Heads from Coin Tosses
As an example of the application of Bayes rule, we consider the estimation of p H , the probability that a coin will show heads when tossed. We wish to estimate this probability by tossing the coin n times and observing the number of times H the coin shows heads. We choose a beta distribution with density f αβ as given in (2.1) for the prior on p H . We assume that H is a binomial random variable with probability p H of heads occurring on each trial. Suppose we toss the coin n times and observe h heads. The likelihood function for the observation H = h is n h L n (h|x) = Pr{H = h| p H = x} = (2.10) x (1 − x)n−h for 0 ≤ x ≤ 1. h From Bayes rule (2.9), we have Pr{ p H = x|H = h} = 1
L n (h|x) f αβ (x)
L n (h|x) f αβ (x)d x n h (α+β) α−1 x (1 − x)β−1 x (1 − x)n−h (α)(β) h = 1 n (α+β) α−1 x (1 − x)β−1 d x x h (1 − x)n−h (α)(β) 0 h 0
= 1 0
x α+h−1 (1 − x)β+n−h−1 x α+h−1 (1 − x)β+n−h−1 d x
= f α+h,β+n−h (x) for 0 ≤ x ≤ 1,
(2.11)
where the last equality follows because the function in the numerator of the line above it is proportional to f α+h,β+n−h . Notice that the posterior is again a beta distribution with new values of α and β, namely α ' and β ' , which are obtained by adding the number of heads to α and the number of tails to β. The mean and variance of a beta distribution with density f αβ are μ(α, β) =
αβ α , var(α, β) = . 2 α+β (α + β) (α + β + 1)
(2.12)
Suppose we choose a prior with α = 2, β = 2 as shown in Fig. 2.2 and perform n = 5 coin tosses with h = 3 heads. Then from (2.11) and (2.12) we have α ' = 5, β ' = 4, μ(5, 4) = 0.556, var(5, 4) = 0.0247. The density function for this distribution is shown in Fig. 2.2.
(2.13)
2.1 Bayesian Inference f
11
x
2.5
2.0
2 5,
4 5
1.5
1.0
0.5
0.2
0.4
0.6
0.8
1.0
x
Fig. 2.2 Prior and posterior beta densities
2.1.4 Basic Bayesian Recursion Suppose we receive another measurement Y2 = y2 , and we wish to compute the posterior distribution p(x|y1 , y2 ) on X given Y1 = y1 and Y2 = y2 . Then Pr{X = x, Y1 = y, Y2 = y2 } Pr{Y1 = y, Y2 = y2 } Pr{Y1 = y1 , Y2 = y2 |X = x} Pr{X = x} = Pr{Y1 = y, Y2 = y2 } L 12 (y1 , y2 |x) p(x) (2.14) = S L 12 (y1 , y2 |x) p(x)d x
p(x|y1 , y2 ) = Pr{X = x|Y1 = y, Y2 = y2 } =
where L 12 (y1 , y2 |·) is the joint likelihood function for Y1 = y1, Y2 = y2 . If the measurement random variables Y1 and Y2 are independent conditioned on X = x, then L 12 (y1 , y2 |x) = L 1 (y1 |x)L 2 (y2 |x) for x ∈ S.
(2.15)
Recall that when (2.15) holds, the likelihood functions are conditionally independent. This will be true if the measurement error in Y1 is independent of (does not affect) the measurement error in Y2 . Using (2.15) we write p(x|y1 , y2 ) = S
L 1 (y1 |x)L 2 (y2 |x) p(x) for x ∈ S. L 1 (y1 |x)L 2 (y2 |x) p(x)d x
(2.16)
From (2.16) we see that p(x|y1 , y2 ) is proportional to L 2 (y2 |x)L 1 (y1 |x) p(x), which we write as
12
2 Bayesian Single Target Tracking
p(x|y1 , y2 ) ∝ L 2 (y2 |x)L 1 (y1 |x) p(x) for x ∈ S.
(2.17)
Since L 1 (y1 |x) p(x) ∝ Pr{X = x|Y1 = 1} = p(x|y1 ), we see from (2.17) that p(x|y1 , y2 ) ∝ L 2 (y2 |x) p(x|y1 ) and p(x|y1 , y2 ) = S
L 2 (y2 |x) p(x|y1 ) for x ∈ S. L 2 (y2 |x) p(x|y1 )d x
(2.18)
The reader can check that if we compute p(x|y2 ) first and then p(x|y2 , y1 ) = S
L 1 (y1 |x) p(x|y2 ) , L 1 (y1 |x) p(x|y2 )d x
we obtain p(x|y2 , y1 ) = p(x|y1 , y2 ). That is, the posterior in (2.18) is independent of the order in which we apply the likelihood functions. We can incorporate the measurement information in any order that is convenient. Bayesian Recursion. Equation (2.18) states the basic Bayesian recursion which we may express in words as follows. Suppose you have computed a posterior distribution based on a set of measurements and you receive a new measurement whose likelihood function is conditionally independent of the previous ones. Then you may compute the new posterior given all the measurements by using the old posterior as the prior and multiplying it by the likelihood function for the new measurement as is done in (2.18). Thus, posterior distributions may be computed recursively as new information (measurements) becomes available provided the likelihood functions of the new measurements are conditionally independent of the previous ones. Because of this recursion, incorporating or updating for new information is conceptually easy using Bayesian inference. This is one reason Bayesian inference provides a simple but powerful method for performing statistical estimation. Example As an example of the Bayesian recursion, suppose we make one more toss in the example in 2.1.3.1, and the coin comes up tails so that n = 1 and h = 0. From (2.11), we know that we can calculate new values of α and β for the posterior by adding 0 to α ' and 1 to β ' in (2.13), to obtain α '' = 5 and β '' = 5 with the result that the new posterior, whose density is shown in Fig. 2.2, has mean and variance μ(5, 5) = 0.5, var(5, 5) = 0.0227.
(2.19)
The estimate (mean) of the posterior has moved down to 0.5 as expected, and the variance is smaller than in (2.13). Thus, there is less uncertainty in this new estimate. To further illustrate the Bayesian recursion, suppose we start with the same prior as above which has α = β = 2. We make three pairs of coin tosses, and obtain the following results: {H, T},{H, H}, and {T, T}, where H stands for heads and T for tails. If we compute the posterior distribution after each pair of tosses using the Bayesian recursion applied to the beta density, we obtain the sequence of posterior
2.1 Bayesian Inference f
2.5
13
x
3,
3 3
5, 5
2.0
1.5
1.0
0.5
0.2
0.4
0.6
0.8
1.0
x
Fig. 2.3 Sequence of beta posterior densities obtained from Bayesian recursion
densities shown in Fig. 2.3. Note that the final posterior α = β = 5 is the same as the one we obtained in Fig. 2.2. Effect of the Prior on the Posterior. We now illustrate the effect of the prior distribution on the posterior. We return to the coin tossing example above, but instead of using the prior α = β = 2, we consider the effect of using a prior with α = β = m where m > 2. According to (2.12), the mean and variance of this prior distribution is μ(m, m) =
1 1 , var(m, m) = for m > 0. 2 8m + 4
(2.20)
One can see that a value of m > 2 produces a prior with smaller variance than that for α = β = 2. Smaller variance indicates less uncertainty about the value of pH . If we tossed the coin 6 times as above and obtained 3 heads and 3 tails, the posterior would have μ(m + 3, m + 3) =
1 1 , var(m + 3, m + 3) = for m > 0. (2.21) 2 8(m + 3) + 4
Suppose we had chosen α = β = 4 for our prior indicating less uncertainty about 1/2 being the likely value of p H than in the coin tossing example above. Then the posterior after tossing the coin six times would have μ(7, 7) =
1 1 , var(7, 7) = = 0.0167 for m > 0 2 60
(2.22)
which yields a smaller variance than the value 0.0227 in (2.19) that we obtained for the prior α = β = 2.
14
2 Bayesian Single Target Tracking
We see that a prior with a less uncertainty produces a posterior with a less uncertainty. As we mentioned above, this may not provide much benefit in the case of coin tossing but can provide a major benefit in a case where performing more trials is expensive or not possible. If the use of subjective information, such as expert opinion, can produce a prior with smaller uncertainty, then it is beneficial to use this information. Bayesian inference allows us to use this information in a principled fashion whereas classical statistical estimation does not.
2.1.5 Examples of Priors, Posteriors, and Likelihood Functions This section provides examples of priors, posteriors, and likelihood functions that arise frequently in tracking problems.
2.1.5.1
Gaussian Prior
Circular Gaussian Prior. Suppose we wish to estimate the location of an object in the plane. In this case, S is the plane and X is the location we wish to estimate. We denote locations in S by x = (x1 , x2 )T , where (x1 , x2 )T indicates the transpose of (x1 , x2 ). Suppose the prior distribution on the location is circular Gaussian with mean μ and standard deviation σ . A circular Gaussian distribution in 2-dimensitons has covariance matrix ∑ = σ 2 I2 , where I2 is the 2-dimensional identity matrix. The probability density of this prior is −1 1 p(x) = 2π σ 2 exp − (μ − x)T ∑ −1 (μ − x) for x ∈ S. 2
(2.23)
Figure 2.4 shows a plot of this prior for μ = (0, 0)T and ∑ = 2In . General Gaussian Prior. The density function of a general n-dimensional Gaussian distribution with mean μ and covariance matrix ∑ is given by 1 gn (x, μ, ∑) = (2π )−n/2 |∑|−1/2 exp − (x − μ)T ∑ −1 (x − μ) for x ∈ Rn 2 (2.24) where x and μ are n-dimensional column vectors, ∑ is an n-dimensional covariance matrix, and |∑| is the determinant of ∑.
2.1 Bayesian Inference
15
Fig. 2.4 Circular Gaussian density with mean μ = (0, 0)T and ∑ = 2In
2.1.5.2
Gaussian Likelihood Function
Suppose the measurement y = (1, 1)T is an estimate of the position of the object in the plane, and suppose the error ε in the measurement is circular Gaussian with mean (0, 0)T and covariance ∑ M = I2 Then the likelihood function for this measurement is L G (y|x ) = Pr{Y = y|X = x} = Pr{ε = y − x} −1 1 −1 exp − (y − x)T ∑ M = 2π σ 2 (y − x) . 2
(2.25)
Figure 2.5 shows a plot of this function. Because this Gaussian density remains unchanged if one interchanges x and y, the Gaussian likelihood function is a Gaussian density over x ∈ S.
16
2 Bayesian Single Target Tracking
Fig. 2.5 Gaussian likelihood function with y = (1, 1)T and ∑ M = I2
2.1.5.3
Line of Bearing Likelihood Functions
A common measurement available to an acoustic tracking system is the bearing to a target. For example, a passive acoustic sensor often produces line of bearing measurements. In this case, the sensor estimates the direction of the object from the sensor. In this example both the sensor and target are in the plane R2 , so we model the error in the bearing measurement as a one-dimensional Gaussian random variable. This is appropriate if the measurement errors are small and the target is far enough from the sensor to be considered a point target. Let α = (α1 , α2 )T be the location of the sensor making the bearing measurement. Let σd be the standard deviation of the bearing measurement error in degrees. Let θ be the measured bearing to the target in degrees. Under our assumption that the difference between the true bearing to the target and the measured bearing θ has a Gaussian distribution, the likelihood function for the bearing measurement is −1/2 1 L B (θ |x ) = 2π σd2 exp − 2 (θ − b(α, x))2 for x ∈ S 2σd
(2.26)
where b(α, x) is the bearing in degrees from the sensor location α to the point x.
2.1 Bayesian Inference
17
Figure 2.6 shows the likelihood function for a bearing measurement with a Gaussian bearing error. The sensor is located at (7, −7 )T ; the bearing measurement is 132° measured counterclockwise from the positive x1 axis; and the standard deviation of the bearing error is 3°. Note, the sensor is located outside the region of the figure. The likelihood function L B (θ |·) is not a probability density function on the (x1 , x2 ) plane because it does not integrate to 1. In fact, it integrates to infinity. This is in contrast to the Gaussian likelihood function in Fig. 2.5. Figure 2.7 shows the line of bearing likelihood function for a similar sensor located at (−7, −7)T with a bearing measurement of 44°. Assuming that the measurement errors in these two bearings are independent conditioned on the target’s location, we can multiply the likelihood functions as in (2.15) to obtain the combined (joint) likelihood function for the two measurements as shown in Fig. 2.8. If we have two bearings that cross, as the ones in Figs. 2.6 and 2.7 do, then we have a cross fix. The likelihood function in Fig. 2.8 represents that cross fix.
Fig. 2.6 Line of bearing likelihood function for a sensor at (7, −7)T , bearing measurement = 132°, and σd = 3◦
18
2 Bayesian Single Target Tracking
Fig. 2.7 Line of bearing likelihood function for a sensor at (−7, −7)T bearing measurement = 44° σd = 3◦
2.1.5.4
Posterior Distributions
In this section we use the prior and likelihood functions presented above to compute two Gaussian posteriors and one approximately Gaussian posterior. Gaussian Posterior. If we combine the prior in Fig. 2.4 with the likelihood function for the Gaussian measurement in Fig. 2.5 according to Bayes rule in (2.9), we obtain the posterior density, which is proportional to the one shown in Fig. 2.9. (The density shown in Fig. 2.9 is not normalized by the denominator of (2.9). The pointwise product of two Gaussian densities is proportional to a Gaussian density DeGroot (2004). If the densities have means μ1 and μ2 along with covariance matrices ∑1 and ∑2 , then the mean and covariance of the normalized product density are −1 ∑ = ∑1−1 + ∑2−1 and μ = ∑ μ1 ∑1−1 + μ2 ∑2−1 .
(2.27)
Using (2.27), we see that the density in Fig. 2.9 is proportional to the density for a√circular Gaussian distribution with mean (2/3, 2/3) and standard deviation σ = 2/3.
2.1 Bayesian Inference
19
Fig. 2.8 Combined line of bearings likelihood function
Gaussian Posterior from a Bayesian Recursion Suppose we obtain a second Gaussian measurement of (1/2, 1/2)T that has a circular normal measurement error with mean (0, 0)T and ∑ M = I2 . Multiplying the density function in Fig. 2.9 by the likelihood function of the second measurement, we obtain a density function proportional to the posterior density function given the first and second measurements as shown in Fig. 2.10. By (2.27), this density is proportional to the density function√ for a circular normal distribution with mean (3/5, 3/5)T and standard deviation σ = 2/5. It is left as an exercise for the reader to show, using (2.27), that this is the same distribution that would result from calculating the joint likelihood function of the two measurements and combining it with the prior. Posterior distribution for crossed bearing measurements. If the prior distribution for the target location is uniform over a large region of the plane, then the combined likelihood function in Fig. 2.8 is proportional to the posterior density function for the target location given the two bearing measurements. A Gaussian distribution is often a good approximation for two crossed bearing measurements, especially when the bearings cross at an angle is that is close to 90° as is illustrated in Fig. 2.8.
20
2 Bayesian Single Target Tracking
Fig. 2.9 Posterior distribution from combining a Gaussian prior and a Gaussian likelihood function
2.2 Tracking a Moving Target In this section, we apply the Bayesian recursion to tracking a single moving target. We assume there is one and only one target and that all measurements are generated by this target, i.e., there are no false measurements or detections. Filtering, Prediction, and Smoothing. Tracking a target means estimating its state as a function of time. In the typical situation, we receive a sequence of time ordered measurements. As each measurement is received, we revise our estimate of target state, i.e., compute its posterior distribution at the time of the measurement. The process of doing this is called filtering. If we estimate target state at some time in the future this is called prediction. Suppose we have received a set of measurements over the interval [0, T ]. The process of estimating target state for times t ∈ [0, T ] given the measurements we have received in [0, T ] is called fixed interval smoothing.
2.2 Tracking a Moving Target
21
Fig. 2.10 Posterior resulting from a Gaussian prior and two Gaussian measurements
2.2.1 Prior Distribution on Target Motion The prior distribution on the target’s motion is specified by a target motion model. This model is a probabilistic representation of our knowledge of the target’s motion. The motion model should represent our best estimate (often subjective) of how the target of interest will be moving in this situation. Ideally, it includes whatever information (including subjective information) we have about the target’s behavior and goals.
2.2.1.1
Stochastic Process Motion Model
The target motion model is represented by a stochastic process on the target state space. This process need not be Markovian as is often assumed in the tracking literature. In fact, we will see that a non-Markovian process poses no problem for the particle filter method of performing single target tracking described in Chap. 3. For a particle filter, we require only that we be able make a large number of independent draws of sample paths from the model. The practical significance of this observation is that when using a particle filter, we are free to specify a motion model that reflects
22
2 Bayesian Single Target Tracking
our knowledge of the behavior and goals of the target without being constrained by the requirement that the probability functions resulting from the model have a convenient analytic form or approximation. This flexibility contributes significantly to the power of the particle filter method.
2.2.2 Single Target Tracking Problem For the single target tracking problem, we assume the target’s motion takes place in continuous time. Our knowledge about its motion is represented by the probability distribution that we impose on the stochastic process {X (t), t ≥ 0}, where X (t) ∈ S for t ≥ 0. The target state space S can be continuous, discrete, or a combination of the two. We receive measurements or observations on the target state at a discrete set of possibly random times 0 ≤ t1 < t2 < · · · . Let Yk be the random variable representing the measurement at time tk and Yk = yk be the value of the measurement. We assume the likelihood functions L k (yk |s) = Pr{Yk = yk |X (tk ) = s} for s ∈ S and k = 1, 2, . . . are conditionally independent of the likelihood functions for all other measurements and depend only on the target state at time tk . If we received two or more measurements at the same time, we combine them into a single measurement via their combined likelihood function.
2.2.2.1
Computing the Posterior
Suppose we have received measurements Yk = yk at times 0 ≤ t1 < t2 , . . . , < t K . Let Y(K ) = (Y1 , Y2 , . . . , Y K ) y(K ) = (y1 , y2 , . . . , y K ) X(K ) = (X (t1 ), X (t2 ), . . . , X (t K )) s(K ) = (s1 , s2 , . . . , s K ) q(s(K )) = Pr{X(K ) = s(K )},
(2.28)
p(t, s|y(K )) = Pr{X (t) = s|Y(K ) = y(K )} for s ∈ S.
(2.29)
and
2.2 Tracking a Moving Target
23
Thus p(t, s|y(K )) is the posterior probability (density) that X (t) = s given the observations Y(k) = (y1 , y2 , . . . , y K ) at times 0 ≤ t1 < t2 . . . , t K ≤ t. We may compute p(t K , s K |y(K )) using Bayes rule as follows. p(t K , s K |y(K )) =
Pr{Y(K ) = y(K ),X (t K ) = s K } . Pr{Y(K ) = y(K )}
(2.30)
The numerator (2.30) is the joint probability of receiving the measurements y(K ) = y(K ) at the times t1 , t2 , . . . , t K and the target being in state X (t K ) = s K at time t K . One can calculate this by computing the probability of receiving the measurements y(K ) = y(K ) given the target is in states X(K ) = s(K ) at the times t1 , . . . , t K , multiplying this by the probability q(s(K )) of being in those states, and then integrating this product over all possible values of the first K − 1 target states (s1 , s2 , . . . , s K −1 ). Thus the numerator in (2.30) can be written as the numerator in (2.31) below. The denominator in (2.30) is simply the integral of the numerator over s K . Thus p(t K , s K |y(K )) =
Pr{Y(K ) = y(K )|X(K ) = s(K )}q(s(K )) ds1 ds2 · · · ds K −1 . Pr{Y(K ) = y(K )|X(K ) = s(K )}q(s(K )) ds1 ds2 · · · ds K −1 ds K
(2.31) Note, the numerator is integrated over s1 , . . . , sk−1 but not sK .
2.2.3 Bayes-Markov Recursion If the target motion process {X (t), t ≥ 0} is Markovian in the target state space and the likelihood functions of the measurements are conditionally independent, then we can calculate the posterior distribution in (2.31) in a recursive manner. Let q0 (s) = Pr{X (0) = s}
(2.32)
and define the Markov transition function qk (sk |sk−1 ) = Pr{X (tk ) = sk |X (tk−1 ) = sk−1 } for sk−1 , sk ∈ S.
(2.33)
We may compute the posterior p(t K , ·|y(K )) recursively from p(t K −1 , ·|y(K − 1)) as follows. Compute the motion updated (predicted) distribution −
p (t K , s K |y(K − 1)) =
q K (s K |s K −1 ) p(t K −1 , s K −1 |y(K − 1))ds K −1 for s K ∈ S.
This becomes the prior on X (t K ) before incorporating the measurement Yk = yk . Using the Bayesian recursion in Sect. 2.1.4 we obtain
24
2 Bayesian Single Target Tracking
p(t K , s K |y(K )) =
1 L K (y K |s K ) p − (t K , s K |y(K − 1)) for s K ∈ S CK
(2.34)
where the normalizing constant is
L K (y K |s K ) p − (t K , s K |y(K − 1))ds K .
CK = S
We now state the Bayes-Markov Recursion for single target tracking. As one can see this is a simple and elegant recursion. The integrals in this recursion can be evaluated explicitly only in special cases such as in Kalman filtering which is discussed in Sect. 2.3. Bayes-Markov Recursion Set the initial distribution p(t0 , s|y(0)) = q0 (s) for s ∈ S
(2.35)
where t0 = 0; y(0) = no measurement For k ≥ 1 and s ∈ S, Perform the motion (prediction) update
−
p (tk , s|y(k − 1)) =
qk (s|sk−1 ) p(tk−1 , sk−1 |y(k − 1))dsk−1
(2.36)
Compute the measurement likelihood for Yk = yk L Yk (yk |s) = Pr{Yk = yk |X (tk ) = s}
(2.37)
Perform the Bayesian update p(tk , s|y(k)) =
1 L Y (yk |s) p − (tk , s|y(k − 1)) Ck k
(2.38)
where
L Yk (yk |s) p − (tk , s|y(k − 1))ds.
Ck = S
(2.39)
2.3 Kalman Filtering
25
2.3 Kalman Filtering Kalman filtering is the special case of Bayesian tracking that applies when the target motion model is a Gauss-Markov process and the measurements are linear functions of the target state with additive Gaussian measurement errors. Kalman filtering is a very powerful and computationally simple tracking method useable when these assumptions hold, or at least hold approximately. It has been one of the workhorse methods of tracking because of its modest computing requirements and because Kalman filtering and its extensions work well in a surprising range of situations. In this section, we derive only the basic Kalman filtering equations. More complete treatments of Kalman filtering can be found in Bar-Shalom et al. (2001). In Sect. 2.3.1, we use the Bayes-Markov recursion to derive the discrete time Kalman filtering equations. In Sect. 2.3.3, we show how to modify these equations to apply to continuous-discrete Kalman filtering where the target moves in continuous time but measurements arrive at discrete possibly random times.
2.3.1 Discrete Time Kalman Filtering Equations This section describes motion and measurement model assumptions required by discrete time Kalman filtering.
2.3.1.1
Kalman Filter Model
The target state space S is a standard n-dimensional space, N (μ, P)is a Gaussian distribtuion with mean μ and covariance P.
(2.40)
Time is discrete with t0 = 0 and tk = k△ for some fixed △ > 0. For discrete-time Kalman filtering, we assume the following modeling assumptions hold. Discrete-Time Kalman Filter Model Initial Distribution X (t0 ) ∼ N (μ0 , P0 ) where μ0 and P0 are n-dimensional
(2.41)
26
2 Bayesian Single Target Tracking
Motion Model X (tk ) = ϕk X (tk−1 ) + ϕk + wk for k ≥ 1
(2.42)
where for k = 1, 2, · · · , ϕk = n × n dimensional matrix ϕk = an n-dimensional deterministic (column) vector wk is an n-dimensional random (column) vector where wk ∼ N (0, Q k ) and wk is independent of wk ' for k /= k '
(2.43)
Q k = n-dimensional covariance matrix Measurement Model Yk = Mk X (tk ) + εk
(2.44)
where for k = 1, 2, . . . , Mk = m k × n matrix with m k ≤ n εk is an m k −dimensional (column) random vector, where εk ∼ N (0, Rk )and εk is independent of εk ' for k /= k ' Rk is an m k −dimensional covariance matrix
(2.45)
The term ϕk in (2.43) is a control or drift term that moves the mean of the target distribution in a deterministic direction. Note, the usual Kalman filter notation has Hk in place of Mk in (2.44) and (2.45). A (discrete time) sample path from this motion model is obtained as follows. Draw X (t0 ) from the initial distribution in (2.41). Multiply X (t0 ) by ϕ1 and add the deterministic vector ϕ1 plus the random vector w1 to obtain X (t1 ). The states X (tk ) for k ≥ 2 are obtained in a similar fashion as shown in the recursion above. Note, that each step of the motion model involves linear operations plus the addition of a Gaussian random variable so that the distribution of X (tk ) remains Gaussian for k ≥ 1.
2.3.1.2
Likelihood Functions for the Measurements
Measurements Yk , k ≥ 1 are linear functions of the target state at time tk , i.e., Mk X (tk ), plus a Gaussian-distributed measurement error εk . Note, the dimension m k of the measurement can be smaller than the dimension of the target state space. For example, it is common to obtain position measurements on a target in
2.3 Kalman Filtering
27
a position-velocity state space. Under the assumptions of the Measurement Model, the likelihood function for a measurement Yk = yk is L k (yk |s) = Pr{Yk = yk |X (tk ) = s} = Pr{εk = yk − Mk s}
1 = (2π )−m k /2 |Rk |−1/2 exp − (yk − Mk s)T Rk−1 (yk − Mk s) for s ∈ S 2 (2.46) where |Rk | is the determinant of Rk . Note, to allow for the possibility of no measurement being received at a time tk , we can define L k for this case to be the constant function that equals 1 everywhere. This will produce no change in the prior distribution when computing the posterior target state distribution using the standard Bayesian recursion. This is equivalent to simply skipping this step for time tk .
2.3.1.3
Kalman Filtering Recursion
Since all the distributions involved in a Kalman filter are Gaussian, we can determine them by calculating their means and covariances. The Kalman filter enables us to do this using a simple recursion. We derive the recursion assuming Mk = In where In is the n-dimensional identity matrix. In this case, measurements are estimates of the full target state with additive Gaussian errors. We then state the form of the recursion for general measurement matrices Mk . Let μk/k−1 and Pk/k−1 be the mean and covariance of the target state distribution resulting from the motion (prediction) update in (2.42), and let μk and Pk be the mean and covariance of the posterior distribution at time tk given the measurement (2.44). By (2.42) and (2.43), the motion updated mean and covariance become μk/k−1 = ϕk μk−1 + ϕk Pk/k−1 = ϕk Pk−1 ϕkT + Q k .
(2.47)
To compute the posterior, we multiply the likelihood function in (2.46) by the density of the motion updated distribution determined by μk/k−1 and Pk/k−1 in (2.47) and renormalize (Bayes Theorem) to obtain the density of the posterior given the measurement at time tk . For the case where Mk = In , this is equivalent to combining two independent estimates of the target state so that by (2.27), −1 −1 Pk = Pk/k−1 + Rk−1 −1 μk = Pk Pk/k−1 μk/k−1 + Rk−1 yk .
(2.48)
28
2 Bayesian Single Target Tracking
Appendix shows that (2.48) is equivalent to the standard form of the Kalman filter recursion which is given in (2.49) and which is more computationally efficient than (2.48). Kalman Filter Recursion
μk/k−1 = ϕk μk−1 + ϕk Pk/k−1 = ϕk Pk−1 ϕkT + Q k −1 K k = Pk/k−1 MkT Mk Pk/k−1 MkT + Rk Pk = (In − K k Mk )Pk/k−1 μk = μk/k−1 + K k yk − Mk μk/k−1 for k ≥ 1.
(2.49)
The Kalman gain K k determines how much weight is given to the measurement yk in the computation of μk . The more accurate the measurement becomes, the smaller Rk becomes, the nearer K k Mk becomes to I, and the more weight is given to the measurement yk . Correspondingly as K k Mk comes near to I, the covariance Pk of the posterior becomes very small indicating that there is very little error in the estimate of target state at time t k . Some Kalman Terminology. The expression νk = yk − Mk μk/k−1 is called the innovation in the measurement yk . Speaking intuitively, one expects to receive the measurement Mk μk/k−1 at time tk , so the innovation is a measure of the new information gained from the measurement. The covariance matrix Q k in the motion model is called the dispersion. The matrix Q k is sometimes called the plant or system noise, although this is not appropriate terminology for tracking.
2.3.2 Examples of Discrete-Time Gaussian Motion Models The following are some examples of discrete-time Gaussian motion models. This section is based on Sect. 3.2.1.1 of Stone et al (2014b).
2.3.2.1
Model 1: Deterministic with Unknown Initial State
A particularly simple example occurs when the target is assumed to have a constant velocity and no dispersion. Let the target state space be position and velocity in the usual Cartesian coordinates. Let s = (x, v)T where x and v are the position and velocity components of the target state, and let
2.3 Kalman Filtering
29
△k = tk − tk−1 , ϕk =
1 △k 0 , ϕk = 0, and wk = for k ≥ 1. (2.50) 0 1 0
Then
xk vk
= ϕk
xk−1 vk−1
=
xk−1 + △k vk−1 . vk−1
In this case the target’s motion is deterministic once the Gaussian-distributed initial state (x0 , v0 )T is known.
2.3.2.2
Model 2: Independent Increments for the Velocity Process
As a second example, suppose that ϕk and ϕk are given by (2.50) and that wk =
0 wk'
where {wk' : k ≥ 1} is a sequence of independent, identically distributed, zero-mean Gaussian random variables with values in the velocity component of the state space. Then the motion model becomes xk x 0 xk−1 + △k vk−1 = ϕk k−1 + = vk vk−1 vk−1 + wk' wk' In this case the position process is the sum of position increments resulting from velocities that are the sum of independent, mean-zero, Gaussian increments. This model allows the target to change velocity, but if no measurements are obtained over a long period of time, it may produce unreasonably large velocities. If the covariance of wk' is “small” for all k, then this is often called the almost constant velocity (ACV) or nearly constant velocity (NCV) motion model.
2.3.2.3
Model 3: Discrete-Time Integrated Ornstein-Uhlenbeck Process
In this model, we specify a “drag coefficient” γ > 0, and let {rk : k ≥ 1} be a sequence of independent, identically distributed, zero-mean Gaussian random variables taking values in the velocity space of models 1 and 2 above. In addition, we let
ϕk = 0, ϕk =
Then
1 γ −1 1 − e−γ △k 0 , wk = , and wk' = 1 − e−γ △k rk for k ≥ 1. ' −γ △ wk k 0 e
30
2 Bayesian Single Target Tracking
xk vk
= ϕk
xk−1 vk−1
+
0 wk'
=
xk−1 + △k vk−1 e−γ △k vk−1 + wk'
for k ≥ 1.
This is a discrete-time version of the integrated Ornstein–Uhlenbeck (IOU) process discussed in Sect. 3.2.2.2 of Stone et al. (2014b).1 In this process, the variance of the velocity distribution is bounded as t → ∞., a property that distinguishes it from Model 2. Comments on Models 2 and 3. While motion models 2 and 3 are convenient for use in a Kalman filter, they are not very realistic. In Model 2, the variance of the velocity distribution increases linearly with time, producing unreasonable velocity (and position) distributions unless measurements are received without large time gaps between them. Model 3 solves that problem by using a velocity distribution whose variance is bounded even when no measurements are received. Even with this improvement, the Gaussian distribution on velocities allows for the possibility of unrealistically large velocities.
2.3.3 Continuous-Discrete Kalman Filtering Equations In continuous-time, discrete-measurement Kalman filtering, the motion model takes place in continuous time, but the measurements are received at a discrete sequence of possibly random times 0 ≤ t1 < t2 < · · · which is the case for the single target tracking problem described in Sect. 2.2.2. The only parts that different from discretetime Kalman filtering are the motion models and motion updates using those models. The Kalman recursion is the same as the one given in (2.49) except that Pk/k−1 and μk/k−1 in (2.47) must be calculated from the continuous time motion model as they account for the target motion from time tk−1 to time tk . For those interested, there is a detailed discussion of continuous time Gaussian motion models with examples in Sect. 3.2.2 of Stone et al (2014b). We give a brief discussion of these models below.
2.3.3.1
Wiener Process Model
In this model, the stochastic process {X (t), t ≥ 0} is used to represent the (usually 2dimensional) position state of the target. This process is Markovian with independent increments. For the 2-dimensional case, X (t2 ) − X (t1 ) ∼ N 0, (t2 − t1 )σ 2 I2 for t2 ≥ t1 .
1
The definition of ϕ given here corrects the one given Stone et al. (2014b).
(2.51)
2.3 Kalman Filtering
31
If one knows the general direction of motion of the target, one can add a deterministic drift vector to the process to represent that general direction of motion. This not a good motion model. For example, if no measurements are received, the area of the uncertainty region grows linearly with time and quickly produces unrealistic uncertainty areas. In addition, although the samples paths are continuous, they are nowhere differentiable. Again, this is a very unrealistic model for the motion of a target.
2.3.3.2
Ornstein-Uhlenbeck Process Model
This process is a Wiener process with a drag coefficient γ added which tends to pull the process back to the origin. The magnitude of the drag is proportional to the distance of the process from the origin. Unlike the Wiener process, this process has a stationary distribution. In the 2-dimensional case, there is a limiting covariance matrix, σ02 I2 2 where σ 2 is the variance of the underlying Wiener process. As t → ∞, the covariance of X (t) approaches σ02 I2 2. As result the position uncertainty does not become unbounded. As with the Wiener process, the sample paths are nowhere differentiable.
2.3.3.3
Integrated Ornstein-Uhlenbeck Process Model
The Integrated Ornstein-Uhlenbeck (IOU) process uses an Ornstein Uhlenbeck (OU) process for the velocity process of the target. The position of the target is obtained by integrating the velocity process. Hence the name IOU. Of the standard continuous Gaussian Markov (diffusion) processes used for modeling target motion, this seems the most reasonable. One can choose the parameters of the process to control the velocity in the sense of specifying the rate at which the velocity changes and specifying the mean squared speed of the stationary (limiting) distribution of the velocity process. The sample paths are continuous with a well-defined velocity process. Of course, the major advantage of this process is that you can apply a Kalman filter to it. The following are the equations for calculating the distribution of this process as a function of time. A derivation of these equations is given in Sect. 3.2.2 of Stone et al (2014b). The IOU model has two parameters that the user can specify, the autocorrelation coefficient γ > 0 and the variance σ02 in (2.51) which defines the underlying Wiener process that defines the Ornstein–Uhlenbeck velocity process. The velocity process {V (t), t ≥ 0} has an exponential autocorrelation function e−γ t for t ≥ 0, so that the correlation between V (t1 ) and V (t2 ) equals e−γ |t2 −t1 | . Thus, γ controls the rate at which the correlation between two velocities decays. For target tracking, it is convenient to think about 1/γ as the mean time between velocity changes and specify the parameter on that basis. Then one can chose the value of σ0 that produces a
32
2 Bayesian Single Target Tracking
desired mean squared speed σ02 2γ of the stationary velocity distribution. With these parameters, we can calculate the mean and covariance matrices for X (t) as follows. Let 0 I2 and X (0) ∼ N (μ0 , ∑0 ). = 0 −γ I2 Then E [X (t)] = et μ0
(2.52)
T b (t) I2 b12 (t) I2 Var[X (t)] = et ∑0 et + σ 2 11 b21 (t) I2 b22 (t) I2
(2.53)
where I2 γ −1 1 − e−γ t I2 e = 0 e−γ t I2
b11 (t) = γ −2 t − 2γ −1 1 − e−γ t + 21 γ −1 1 − e−2γ t
b12 (t) = b21 (t) = γ −2 1 − e−γ t − 21 1 − e−2γ t b22 (t) = 21 γ −1 1 − e−2γ t t
(2.54)
and et indicates the matrix exponential of t. Section 2.3.4.2 presents an example of tracking a target using Kalman filtering and this motion model with simulated measurement data. Doubly Integrated OU Process. One can use the OU process to model the target’s acceleration and integrate it twice to get velocity and position. This process is often used to model aircraft which can change velocity rapidly at high speeds. See Sect. 8.2 of Bar-Shalom et al. (2001).
2.3.4 Kalman Filtering Examples In the section we present several examples of the use of Kalman filtering for tracking a single target.
2.3 Kalman Filtering
2.3.4.1
33
Tracking a Target Using the Almost Constant Velocity Motion Model
In this example we apply the Kalman filter to track a target using an almost constant velocity motion model as described in Sect. 2.3.2.2. The target state is given by a position-velocity pair (x, v)T . The state at time 0 is (x0 , v0 )T ∼ η(·, (x, v), ∑0 ).
(2.55)
Let △ be a fixed time increment. There are I time increments and T = I △. In the model, the target proceeds at velocity v0 until time t1 = △ at which time a new velocity v1 is obtained by adding a small, independent, mean-zero, Gaussian distributed variation to v0 to obtain v1 . The target continues at this velocity until the next time increment. We may express this mathematically as follows. Let (xi , vi ) be the target state at time ti = i△. Then
xi vi
=
xi−1 + △vi−1 vi−1 + wi
for i = 1, 2 . . .
(2.56)
where {wi : i = 1, . . . , I } are independent, identically distributed random variables with wi ∼ η(·, (0, 0), Q) and Q is a “small” covariance matrix. The parameters of the motion model are △ = 1 hr, π x = (0, 0)T , v = 7 kn (cos(θ ), sin(θ ))T where θ = 6 2 σ x I2 0 where σx = 10 nm, σv = 1 kn ∑0 = 0 σv2 I2 Q = σw2 I2 , where σw = 1 kn.
(2.57)
We use the abbreviations nm for nautical miles and kn for knots. The target follows a gently curving path starting at (−0.4, −0.9)T as shown by the black line in Fig. 2.11. The time duration is 16 h. There are 16 position measurements received at 1-h increments over the duration of the path. The 2-sigma uncertainty ellipses for the measurements are shown in black. The measurements have circular Gaussian errors with a standard deviation of 4 nm. The green ellipses show the 2-σ uncertainty ellipse for the target distribution motion updated to the time of the measurement. The red ellipses show the 2-σ uncertainty ellipse for the posterior target distribution resulting from the measurements. Since the target does not make large velocity changes from one time interval to the next, the filter does a good job of following the target. Blair (2020) presents a discussion of how to design an almost constant velocity motion model to track a maneuvering target.
34
2 Bayesian Single Target Tracking
Fig. 2.11 Kalman filter solutions using the almost constant velocity motion model. The 2-σ uncertainty ellipse for the measurements are shown in black. The green 2-σ ellipses show the uncertainty in the motion updated (predicted) distribution for target location, and the red ellipses are the 2-σ ellipses for the posterior distribution after the measurement update
2.3.4.2
Tracking a Maneuvering Target Using the IOU Motion Model
In this example we use an IOU process for the target motion model with γ = 0.075/h and σ0 chosen so that the mean squared speed of the limiting velocity distribution is σ 2 /γ = (5.8)2 kn2 . There is no particular significance to the parameter values chosen here. We are simply modeling a ship, such as a merchant ship, that does not change course very often and travels at a modest speed. The prior distribution on target state at time 0 is four-dimensional Gaussian. The prior on target position at time 0 is circular Gaussian with mean (0, 0) and standard deviation 6 nm in any direction. The green circle in the Time = 0 h plot in Fig. 2.12 shows the 2-σ uncertainty ellipse for this distribution. The prior marginal velocity distribution at time 0 is circular Gaussian with mean (4, 4) and standard deviation 1 knot in any direction. This prior distribution represents a situation where we have some knowledge of both the target’s position and its velocity. If our knowledge is very uncertain, we can represent that by a uniform distribution over a large region in both position and velocity space, which can be approximated by Gaussians with large variances. At time 0, we received the measurement indicated by the black ellipse in Fig. 2.12. which produces the posterior position distribution shown by the red ellipse. The vector shows the mean of the posterior velocity distribution. Its tail is located at the mean of the posterior position distribution. The units on the axes in Fig. 2.12 and
2.3 Kalman Filtering
35
Fig. 2.12 IOU Kalman filter output for the first four measurements. The black line shows the target track with a black dot at the target location. The green ellipses show the 2-σ ellipses for the prior or motion updated position distribution. The black and red ellipses are the 2-σ ellipses for the measurement error distribution and the position distribution after incorporating the measurement
subsequent figures are nautical miles (nm) except when we plot velocity vectors in which case the units on axes are to be interpreted as knots in the x1 and x2 directions. Figure 2.12 also shows the results of the next three measurements. The measurement at 4.5 h occurs after the target has changed course, but the tracker has not recognized this yet. However, in Fig. 2.13, we see that with the measurement at time 5.5 h, the filter has recognized the change in velocity and has made a reasonable estimate of it. Note that the IOU motion model allows the filter to react easily and naturally to velocity changes by the target, even large ones. This contrasts with the almost constant velocity model used in the example in Sect. 2.3.4.1 which accommodates only small velocity changes.
36
2 Bayesian Single Target Tracking
Fig. 2.13 IOU Kalman filter output at the next four measurement times. The black line shows the target track with a black dot at the target location. The green ellipses show the 2-σ ellipses for the motion updated position distribution. The black and red ellipses are the 2-σ ellipses for the measurement error distribution and the position distribution after incorporating the measurement
2.3.4.3
Line of Bearing Tracking Using an Extended Kalman Filter (EKF)
One of the most common extensions of Kalman filters results from replacing a measurement that is a nonlinear function of the target state with one that is a local linear approximation. This is an example of an Extended Kalman Filter (EKF) which is discussed in Sect. 2.3.5. A typical example of this is a line of bearing measurement such as the one shown in Fig. 2.14. The bearing of the target b(α, x) from the position of the sensor at
2.3 Kalman Filtering
37
x2 r
0
Fig. 2.14 Small angle approximation
α = (α1 , α2 )T is a nonlinear function of the position component x = (x1 , x2 )T of the target state. Specifically, measuring bearings counterclockwise from the positive x-axis, we have2 b(α, x) = atan2(x2 − α2 , x1 − α1 ).
(2.58)
Given the target is located at x, we assume that the bearing measurement is given by θ = b(α, x) + ε, where ε ∼ N 0, σd2 .
(2.59)
That is, the measured bearing includes a Gaussian error term. This model is reasonable for small variances σd2 . In this case, an EKF approximates the non-linear function b by a locally linear one in the neighborhood of a specified point x = (x 1 , x 2 )T . Suppose for simplicity, that α = (0, 0)T and we have received a bearing measurement of 0°. If we estimate (somehow) that the target is at approximately range r from the sensor, we can take x = (r, 0)T for the center of the neighborhood of the linear approximation. From Fig. 2.14, we see that for small angles △θ,we have r △θ ≈ △x2 . 2 Since the bearing error distribution is N 0, σ , the distribution of the error in the d x2 direction is approximately N 0, r 2 σd2 in this small neighborhood of (r, 0). This provides a reasonable approximation to the measurement error in the x2 direction provided σd is small and r is close to the actual range of the target from the sensor. To complete the specification of a bivariate normal measurement error distribution, which is necessary to apply the Kalman filter, we assume an independent normal error distribution in the x1 direction with mean 0 and some large standard deviation to represent that fact that bearings do not provide range information. For example we could take σ R = r/2, or if we have some idea of the range uncertainty, rU , we could take σ R = rU . This produces the bivariate normal measurement error distribution N (0, ∑b ) where ∑b =
2
σ R2 0 . 0 r 2 σd2
(2.60)
Although the notation does not indicate this, we are using the two-argument arctan function that accounts for the quadrant of the bearing. The bearings in this example are measured counterclockwise from the horizontal axis. This differs from the usual maritime convention where bearings are measured clockwise from north (the vertical axis).
38
2 Bayesian Single Target Tracking
The 2-σ ellipse (with semi-major axis 2σ R and semi-minor axis 2r σd ) for this distribution is a skinny ellipse with the long axis parallel to the x1 axis and the short axis parallel to the x2 axis. Suppose the target state space S is 4 dimensional with s = (x, v)T where x and v are 2-dimensional position and velocity vectors. Now we can apply the Kalman filter update in (2.49) for the “measurement” yk = (r, 0)T with Yk = Mk X (tk ) + εk and εk ∼ N (0, ∑b ) 2 1000 σR 0 . where Mk = and ∑b = 0 r 2 σd2 0100
(2.61)
Although this approximation often works well when there is a reasonable estimate of the target’s range, it can sometimes lead to very bad solutions and produce a “range collapse” where the estimated position of the target moves closer and closer to the sensor as we will see in the bearings only example below. For bearings-only tracking problems, Ristic et al (2004) have shown that a particle filter out-performs the Kalman filter as well as many variations and extensions of the Kalman filter. This happens because a linear approximation to a bearing measurement is often not a good one. If the bearing θ /= 0◦ , we must compute the measurement y(θ ) and the error covariance matrix ∑b (θ ) in the coordinate system of the tracker. Let u = (1, 0) and v = (0, 1) be unit vectors in the coordinate system where the positive u axis is in the direction from the sensor to the detection. This coordinate system is rotated θ degrees counterclockwise from the tracker coordinate system. To convert from (u, v) coordinates to the tracker coordinates, we compute the coordinates of u and v in the tracker coordinate system, i.e., u(θ ) = (cos(−θ ), sin( θ )) v(θ ) = (− sin( θ ), cos( θ )). Then we compute the measurement y(θ ) and the covariance matrix ∑b (θ ) the tracker coordinates as follows. y(θ ) = r u(θ ) ∑b (θ ) = σ R2 uT (θ )u(θ ) + r 2 σd2 v T (θ )v(θ ) 2 2 2 2 2 2 2 2 cos σ sin − r σ sin(θ cos(θ + r σ σ ) ) (θ ) (θ ) R d R d = 2 σ R − r 2 σd2 sin(θ ) cos(θ ) σ R2 sin2 (θ ) + r 2 σd2 cos2 (θ )
(2.62)
where superscript T denotes transpose. Now we apply the Kalman filter update in (2.49) using the measurement Y = y(θ ) and error covariance matrix ∑b (θ ). Example. We applied this EKF to a bearings-only tracking problem where the EKF experiences a form of range collapse in which the estimate of the target’s position moves closer and closer to the sensor while the target’s actual position does not. Figure 2.15 show a series of EKF solutions from a stationary, bearings-only sensor. The target starts at the black dot on the left and follows the constant velocity path at
2.3 Kalman Filtering
39
5.7 kn that is shown in black. At the initial time the prior position distribution (before any measurements) is circular normal with mean (10, 10) and standard deviation 5 nm. The green circle is the 2-σ ellipse for the prior. The prior on target velocity is circular Gaussian with mean (0, 0)T and standard deviation 3 kn. The tracker uses the IOU motion model from Sect. 2.3.4.2. The bearing measurements are obtained every 6 min with a 1-° standard deviation on the bearing error. We performed the linearization in (2.62) using the mean of the motion updated (predicted) distribution at the time of measurement to obtain an estimate of the range r from the sensor. The 2-σ ellipses for the motion updated (prediction) distributions are green. The 2-σ ellipses for the measurement error are black and those for the posterior distribution are red. Even though the initial range estimate is correct, the estimated position of the target (red ellipses) moves closer and closer to the sensor although the target does not. This is called range collapse. This example illustrates the tendency of an EKF to pull the range of the target closer and closer to a stationary, bearings-only sensor even when the initial range estimate is correct. If the sensor is mobile, one can often maneuver the sensor to prevent range collapse and obtain a better solution. Another possibility is to run a bank of EKFs, each initialized with a different range estimate, and combine the results by the method described in Sect. 6.4.1.2 of Ristic et al. (2004). The EKF does a good job in situations where one can make a good linear approximation to the measurements. However, bearings-only measurements is an example where this is often not possible.
Fig. 2.15 EKF solutions for a bearings-only tracking problem showing an example of range collapse
40
2 Bayesian Single Target Tracking
2.3.5 Nonlinear Extensions of Kalman Filtering The Kalman filter and its nonlinear extensions have proved remarkably effective in solving many tracking problems. In this section, we briefly discuss three common, nonlinear extension, namely the Extended Kalman Filter (EKF), Unscented Kalman Filter (UKF), and Ensemble Kalman Filter (EnKF). There are many other non-linear extensions of the Kalman filter such as the Interacting Multiple Model (IMM) Kalman filter in which the tracker is composed of a linear mixture of Kalman filters with the weights adjusted as measurements are received. See for example Sect. 11.6 of Bar-Shalom et al. (2001).
2.3.5.1
Extended Kalman Filter (EKF)
In Sect. 2.3.4.3, we have seen an example of an EKF in which a non-linear measurement is approximated by a linear one. One can also apply an EKF when the motion model is nonlinear and produces motion-updated distributions that are not Gaussian. In this case, the EKF replaces the non-linear motion model with a linear Gaussian one in the plane tangent to the mean of the distribution. This produces a motion updated distribution that is a Gaussian approximation to the one produced by the nonlinear model. One then applies the Kalman filter recursion equations to this approximation. See Bar-Shalom et al. (2001) for further discussion and examples of EKFs.
2.3.5.2
Unscented Kalman Filter (UKF)
When the motion model is nonlinear, the Unscented Kalman Filter (UKF) developed by Julier and Uhlmann (1997) is designed to provide a better approximation to the mean and covariance of the motion-updated prior than the one produced by the EKF. In a UKF, one chooses a set of points called sigma points based on the mean and covariance of the Gaussian prior. If the dimension of the target state space is n, then the number of sigma points is 2n + 1. Details for computing these points are given in Wikipedia (2022). These points are propagated by the nonlinear motion model to their positions at the time of the next measurement and used to calculate the mean and covariance of the Gaussian approximation to motion updated distribution. If the measurement is a linear function of target state with an additive Gaussian error, the UKF distribution is used along with the measurement to produce a Gaussian posterior by applying the usual Kalman filter equations. This approximation becomes the prior for the next motion update and subsequent measurement update. Another advantage of the UKF over an EKF is that it does not require one to take derivatives which can get complicated especially for higher dimensional state spaces. If the measurement is a nonlinear function of target state, the UKF method can be used to compute a Gaussian distribution that approximates the measurement error distribution. One can
2.3 Kalman Filtering
41
then apply the Kalman update equations to compute a Gaussian approximation to the posterior. The UKF process works well when the distributions to be approximated are unimodal but fails when they are not.
2.3.5.3
Ensemble Kalman Filter (EnKF)
The Ensemble Kalman Filter (EnKF) is a bit different than the Kalman filter extensions discussed above which are designed for single target tracking. The EnKF is used to compute estimates of a set of environmental quantities such as a field of ocean currents. In addition, it is designed to incorporate a set of measurements into these estimates. In the environmental community, the process of combining measurements with predictions to produce revised environmental estimates is called assimilation. Environmental predictions are often given as space-time fields, e.g., ocean currents over a region of space and time. At each update, there may be a large number of measurements distributed over time and space to incorporate into the revised predictions. The standard Bayesian tracking approach described in this chapter is applicable in principle but encounters many difficulties that make it unsuitable here. The difficulties are several: • The state space (field of predictions) is very large and based on nonlinear relationships • Measurements are numerous and often nonlinear in state • Prescribing a prior and calculating likelihood functions for the measurements is difficult • The computational load can be substantial However, we can obtain a reasonable approximation to the posterior distribution on the environmental estimates when the following conditions are satisfied. • We can simulate a large number of predictions of the field to use as a prior • The measurements are (approximately) linear functions of the state (predictions) with additive Gaussian errors. EnKF Assumptions The following are the assumptions for the EnKF. The prior distribution on the system state is approximated by an ensemble (set) of j = 1, . . . , J predictions x j (tk ) ∈ Rn for k = 1, . . . , K where Rn is standard n - space. Each ensemble member x j is a prediction in space and time. The state space of this prediction is Rn × {1, . . . , K } which can be quite large. The system state x is a vector with n K components. Each component is a prediction for one place and time. The state vector x can be stacked vector. One could start with time t1 and put the predictions at the n spatial points in a linear order to get an n vector. Then repeat
42
2 Bayesian Single Target Tracking
the process for tk , k = 2, . . . , K stacking each n - vector below the previous one to obtain an n K vector. In a typical example, say ocean currents, we have a large spatial grid of N S points, and the ocean current is a velocity (2-vector). In this case n = 2Ns , so that n K can be a very large number and x a very long vector. Measurements y (an m-vector) are made at all or only some of the times and spatial points of the n K states. The measurements are linear functions of the system state x. Specifically there is an m × n K matrix M such that y = M x + ε, where ε ∼ N (0, R) and R is an m dimensional covariance matrix. Typically, there are dozens to hundreds of measurements corresponding to the points and times at which measurements are made, so that m can be a large number too. Information update (assimilation) is accomplished by applying a Kalman filter equation to each member of the ensemble of predictions as though it were the mean of the distribution of the prediction. To do this, we apply the following recursion. EnKF Recursion For j = 1, . . . , J, f
x j = j th ensemble member updated with the previous measurements and the latest forecast y j = y + ε j where ε j ∼ N (0, R)(mysterious ε j explained later) x aj = j th member of the ensemble of J space - time vector fields updated (assimilated) for y j Compute f Pef = empirical n K × n K covariance of the ensemble x j , j = 1, . . . , J −1 K e = Pef M T M Pef M T + R (ensemble Kalman gain). For j = 1, . . . , J, compute f f x aj = x j + K e y j − M x j
(2.63)
The result x aj , j = 1, . . . , J is the posterior (assimilated) ensemble. Note that the measurement update moves the ensemble members but does not change their weights. The ensemble members resemble somewhat the particles discussed in Chap. 3. However, when those particles are updated for a measurement, both their weights and values are modified.
2.3 Kalman Filtering
43
Why is ε j added to y(tk ) in the EnKF Recursion? If we did not do this, then each ensemble member would be an estimate of the mean of the posterior distribution and the ensemble covariance would equal the covariance of the estimate of the mean of the posterior rather than the covariance of the posterior distribution on state. See Burgers et al. (1998). Comments The EnKF can employ a nonlinear model to perform its predictions so long as the measurements are (approximately) linear in system state with additive Gaussian errors. The EnKF approximates the full Bayesian filtering result for the ensemble. It produces the correct 1st and 2nd moments when the ensemble distribution is Gaussian and measurements are linear functions of state with additive Gaussian errors. See Evenson (2003, Sect. 3). The EnKF was originally developed by Evenson (1994). For a good summary of EnKF methods see Evenson (2003). Appendix: Standard Kalman Filter Equations We can transform the equations in (2.48) into the standard Kalman filter equations by using the following matrix inversion identities, namely,
A−1 + B −1
−1
= A − A(A + B)−1 A = In − A(A + B)−1 A = A(A + B)−1 B.
(2.64)
Applying these identities to (2.48) we obtain −1 −1 −1 Pk/k−1 Pk = Pk/k−1 + Rk−1 = In − Pk/k−1 Pk/k−1 + Rk = (In − K k )Pk/k−1
(2.65)
where −1 K k = Pk/k−1 Pk/k−1 + Rk
(2.66)
is called the Kalman gain. From (2.66), we have K k In + Pk/k−1 Rk−1 = Pk/k−1 Rk−1 K k = (In − K k )Pk/k−1 Rk−1 , and from (2.48), (2.65), and (2.67), we obtain −1 μk = Pk Pk/k−1 μk/k−1 + Rk−1 yk −1 = (In − K k )Pk/k−1 Pk/k−1 μk/k−1 + Rk−1 yk
(2.67)
44
2 Bayesian Single Target Tracking
= μk/k−1 − K k μk/k−1 + (In − K k )Pk/k−1 Rk−1 yk = μk/k−1 + K k yk − μk/k−1 .
(2.68)
If Mk /= In , then (2.65), (2.66) and (2.68) become the Kalman filter equations given in (2.49).
References Bar-Shalom, Y., Lee, X.R., Kirubarajan, T.: Estimation with Applications to Tracking and Navigation. Wiley, New York (2001) Berger, J.O.: Statistical Decision Theory and Bayesian Analysis. Springer, New York (1985) Blair, W.D.: Design of NCV filters for tracking maneuvering targets. In: IEEE Radar Conference (RadarConf20), Florence, Italy, 21–25 Sept 2020 (2020) Burgers, G., van Leeuwen, P.J., Evensen, G.: Analysis scheme in the ensemble Kalman filter. Mon Weather Rev 126, 1719–1724 (1998) de Finetti, B.: Theory of Probability Vols I and II. Wiley, New York (1974–75) DeGroot, M.H.: Optimal Statistical Decisions. McGraw-Hill, New York NY (1970) DeGroot, M.H.: Optimal Statistical Decisions: Wiley Classics Library. Wiley, Hoboken NJ (2004) Evenson, G.: Sequential data assimilation with a non-linear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res. 99(C5), 10 143–10 162 (1994) Evenson, G.: The ensemble Kalman filter: theoretical formulation and practical implementation. Ocean Dyn. 53, 343–367 (2003) Julier, S.J., Uhlmann, J.K.: New extension of the Kalman filter to nonlinear systems. In: Kadar, I., (ed.) Signal Processing, Sensor Fusion, and Target Recognition VI, vol. 3, pp. 182–193. Proceedings of SPIE (1997) Lindley, D.: Introduction to Probability and Statistics from a Bayesian Point of View, Parts 1 and 2. Cambridge University Press, Cambridge UK (1965) Ristic, B.S., Arulampalm, S, Gordon, N.: Beyond the Kalman Filter. Artech House, Norwood MA (2004) Stone, L.D., Keller, C.M., Kratzke, T.M.: Search for wreckage of air france flight 447. Stat. Sci. 29, 69–80 (2014a) Stone, L.D, Streit, R.L, Corwin, T.L, Bell, K.L.: Bayesian Multiple Target Tracking, 2nd edn. Artech House, Norwood MA (2014b) Wikipedia: Kalman Filter https://en.wikipedia.org/w/index.php?title=Kalman_filter&oldid=108 2910871 (2022). Accessed 26 Apr 19:24 UTC
Chapter 3
Bayesian Particle Filtering
3.1 Introduction The Bayes-Markov recursion in Sect. 2.2.3 is elegant and general. The Kalman filter recursion in Sect. 2.3.1.3 is a special case of this recursion that holds when the motion model is Gaussian and the measurements are linear functions of target state with additive Gaussian errors. However, in many situations these assumptions do not hold and calculating the integrals in the Bayes-Markov recursion in closed form is not possible. Implementing the recursion in these cases requires numerical methods such as particle filters. Bayesian particle filtering is a very general and conceptually simple method of estimating the system state of a stochastic process at time t given the measurements received by time t. The requirements on the stochastic process are minimal, namely that one can generate a large number of independent sample paths from the process. These sample paths may be obtained by Monte Carlo simulation. In addition, we require that the measurements be represented by likelihood functions that are conditionally independent of one another and depend on only the target state at the time of the measurement. Section 3.2 defines Bayesian particle filtering, particle paths, and particle distributions. It describes how particle filtering is used to solve single target tracking problems and illustrates the process with examples. Section 3.3 provides an examples of applying particle filtering to a problem other than tracking such as estimating a parameter whose time varying value is modeled by a stochastic process. Section 3.4 considers the problem of finding the best estimate of the target’s path over an interval of time [0, T ] given the measurements that have arrived in that interval. This is fixed interval smoothing, and the resulting posterior distribution on the target’s path is the smoothed solution. This section provides a simple and general method of finding smoothed solutions for particle filters. This smoothing method may also be applied to particle filters designed for problems other than tracking.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. D. Stone et al., Introduction to Bayesian Tracking and Particle Filters, Studies in Big Data 126, https://doi.org/10.1007/978-3-031-32242-6_3
45
46
3 Bayesian Particle Filtering
3.2 Particle Filter Tracking Particle filtering is the process of performing Bayesian filtering on a discrete set of sample paths to estimate the posterior distribution on target state at the time of the last measurement.
3.2.1 Motion Model As in earlier chapters, prior target motion is modeled by a stochastic process {X (t), t ∈ [0, T ]} where X (t) ∈ S for t ∈ [0, T ]. In order to perform particle filtering, we approximate {X (t), t ∈ [0, T ]} by making independent draws to obtain a large number N of samples paths from this process. The target motion process need not be Gaussian or Markovian. We need only to simulate (independently draw) a large number of sample paths from the target motion process. To facilitate resampling, which is discussed below, we must also be able to generate the paths sequentially in time. Suppose we have made N independent draws from {X (t), t ∈ [0, T ]} to obtain {xn , n = 1, . . . , N } sample paths. Each xn specifies a possible target path xn (t) ∈ S for t ∈ [0, T ]. We call xn particle paths and xn (t) a particle state at time t. We assign probability p(n) = 1/N to xn for n = 1, . . . , N and define PN = {(xn , p(n)), n = 1, . . . , N }
(3.1)
to be a particle path distribution. This distribution is a discrete sample path approximation to {X (t), t ∈ [0, T ]}. This is the prior particle path distribution on the process before we have received measurements. The distribution PN produces a particle state distribution state for each t ∈ [0, T ] by PNt = {(xn (t), p(n)), n = 1, . . . , N }.
(3.2)
A particle state distribution is a discrete point distribution on target state at time t. The concept of approximating the target motion process by a discrete set of paths that run over the time interval [0, T ] is a convenient conceptual one. It lets us define our version of particle filtering in a simple framework that allows the reader to understand the basic concepts of particle filtering before delving into the computational complexities of implementing a particle filter. These complexities are not brushed aside; they are addressed later in the chapter.
3.2 Particle Filter Tracking
47
3.2.2 Bayesian Recursion We follow the basic Bayesian recursion given in Sect. 2.1.4 to compute the posterior particle path distribution given a measurement. Suppose we have a particle path distribution PN = {(xn , p(n)), n = 1, . . . , N } for the prior and we receive the measurement Y1 = y1 at the time t1 . We compute the posterior particle path distribution as follows. Let L 1 ( y1 |·) be the likelihood function for the measurement Y1 = y1 at time t1 . The notation L 1 ( y1 |·) indicates that we are holding the value y1 fixed and considering L 1 ( y1 |·) as function of the second variable which is indicated by the dot. This notation will be used with other functions of two or more variables. Compute p( n|y1 ) = ∑ N
L 1 ( y1 |xn (t1 )) p(n)
m=1
L 1 ( y1 |xm (t1 )) p(m)
for n = 1, . . . , N .
(3.3)
Then PN (y1 ) = {(xn , p( n|y1 )), n = 1, . . . , N }
(3.4)
is the posterior particle path distribution given Y1 = y1 and is a discrete sample path approximation to the posterior distribution on the target motion process {X (t), t ∈ [0, T ]} given Y1 = y1 . Similarly the particle state distribution at time t, PNt (y1 ) = {(xn (t), p( n|y1 )), n = 1, . . . , N } for t ∈ [0, T ],
(3.5)
is the posterior particle state distribution at time t given Y1 = y1 . If t = t1 this is a discrete point approximation for the distribution on target state at time t1 given the measurement Y1 = y1 at time t1 which is the primary distribution of interest to the particle filter. The goal of a particle filter is to compute the posterior distribution on target state at the time of the last measurement, t1 in the above case. However, (3.3) has also calculated the posterior distribution on target paths (particle paths) given the measurement at t1 . If we receive another measurement Y2 = y2 at time t2 with likelihood function L 2 that is conditionally independent of L 1 , we apply the basic Bayesian recursion of Sect. 2.1.4 to compute the posterior particle path distribution given Y1 = y1 and Y2 = y2 as follows. We treat PN (y1 ) as the prior particle path distribution and compute the posterior by p( n|y1 , y2 ) = ∑ N
L 2 ( y2 |xn (t2 )) p( n|y1 )
m=1
L 2 ( y2 |xm (t2 )) p( m|y1 )
for n = 1, . . . , N
(3.6)
48
3 Bayesian Particle Filtering
so that PN (y1 , y2 ) = {(xn , p( n|y1 , y2 )), n = 1, . . . , N } becomes the posterior particle path distribution and PNt (y1 , y2 ) = {(xn (t), p(y1 , y2 )), n = 1, . . . , N } becomes the posterior particle state distribution at t given Y1 = y1 and Y2 = y2 . We usually think of measurements arriving in time order so that t2 > t1 , but that is not necessary for the computation in (3.6). The reader may check the proof of the basic Bayesian recursion in Sect. 2.1.4 to confirm this assertion. As noted above, we can compute the posterior distribution on target state for any time t ∈ [0, T ] from PN (y1 , y2 ). We are not constrained to the time of the last measurement.
3.2.3 Bayesian Particle Filter Recursion Particle filters generally assume that measurements are received in time order and are concerned with computing an accurate estimate of the posterior on target state at the time of the last measurement. A Bayes-Markov particle filter performs this process by using a particle filter version of the Bayes Markov recursion in Sect. 2.2.3. The more general Bayesian particle filter removes the need for the Markov assumption on the target motion process. We present both versions of the Bayesian particle filter recursion.
3.2.3.1
Bayes-Markov Particle Filter Recursion
In most tracking problems, the motion model is assumed to be Markovian in the target state. This is convenient for a couple of reasons. For example, it is easy to generate the sample paths recursively one step a time, and one needs to retain only the posterior on the present target state to compute the posterior distribution when the next measurement arrives. This simplifies the computer code needed to implement the particle filter and reduces the memory requirements. These were important considerations when computers were less capable than they are now, and they are still important in some situations. For example, tracking a fast-moving, maneuvering plane by radar is often accomplished by using a Gauss-Markov motion process and applying a Kalman filter. See for example Blair (2020). Suppose, as in Sect. 2.2, we have received measurements y1:k−1 = (y1 , . . . , yk−1 ) at times 0 ≤ t1 < · · · < tk−1 and now receive a measurement Yk = yk at time tk > tk−1 with likelihood function L k . In the Bayes-Markov particle filter, we use the transition function qk defined in Sect. 2.2.3 to motion update the n th particle state to its state at time tk by making an independent draw from the distribution qk ( ·|xn (tk−1 ))
3.2 Particle Filter Tracking
49
to obtain xn (tk ) for n = 1, . . . , N . This procedure allows us to generate the particle paths incrementally. One then performs the information update by computing p( n|y1:k ) = ∑ N
L k ( yk |xn (tk )) p( n|y1:k−1 )
m=1
L k ( yk |xm (tk )) p( m|y1:k−1 )
for n = 1, . . . , N
(3.7)
so that PNtk (y1:k ) = {(xn (tk ), p( n|y1:k )), n = 1, . . . , N } becomes the posterior distribution on target state at time tk . This leads to the Bayes-Markov particle filter recursion shown below. This particle filter along with the resampling discussed below is a sequential importance resampling (SIR) filter or bootstrap filter. Resampling. One difficulty with a particle filter is that after several measurements, the probability tends to be concentrated on a small number of particles. To deal with this problem, most particle filters resample the particles as discribed below. Some resample after every measurement update. Others compute the effective number of particles Ne f f = ∑ N
1
2 n=1 ( p( n|y1:k ))
(3.8)
and resample when this number falls below a specified threshold. If p( n|y1:k ) = 1/N for n = 1, . . . , N , then Ne f f = N as one would expect. As the distribution becomes more concentrated on a small number of particles, Ne f f becomes smaller. Finally, if one particle has probability 1 and the others 0, then Ne f f = 1. Bayes-Markov Particle Filter Recursion Draw the initial distribution Set p(n) = 1/N for n = 1, . . . , N Let t0 = 0 and q0 be the probability distribution for X (0). Make N independent draws xn (0) ∼ q0 for n = 1, . . . , N
(3.9)
to obtain the particle state distribution {(xn (0), p(n)), n = 1, . . . , N } at time t0 = 0. For k ≥ 1 Perform the motion update Make independent draws to obtain xn (tk ) ∼ qk ( ·|xn (tk−1 )) for n = 1, . . . , N Compute the measurement likelihood for Yk = yk
(3.10)
50
3 Bayesian Particle Filtering
L k (yk |xn (tk )) = Pr{Yk = yk |X (tk ) = xn (tk )} for n = 1, . . . , N
(3.11)
Perform the Bayesian update p( n|y1:k ) =
1 L k ( yk |xn (tk )) p( n|y1:k−1 ) C
(3.12)
where C=
N ∑
L k ( yk |xm (tk )) p( m|y1:k−1 )
(3.13)
m=1
The posterior particle state distribution at time tk is PNtk (y1:k ) = {(xn (tk ), p( n|y1:k )), n = 1, . . . , N }
(3.14)
Note, target motion can take place in continuous or discrete time in the recursion above. The measurements are received at discrete (possibly) random times. Splitting and Russian Roulette. This simple but effective way of resampling was developed by Ulam and Von Neumann (See Kahn (1955)) for use in Monte Carlo computations made by members of the Manhattan project which produced the first atomic bomb. For each particle, compute M(n) = N p( n|y(k)) and write M(n) = jn + f n as an integer plus a fractional part. For each n, draw a random number rn from a uniform [0, 1] distribution. If rn < f n , set jn = jn + 1; otherwise, leave jn as is. Generate jn almost identical copies of xn (tk ). The state of these child particles is the same as xn (tk ) with a small perturbation added. Often the parent particle is not perturbed. The perturbation process is discussed below. We wish to retain the history of each particle, so each child particle retains the history of its parent particle. Note, if jn = 0 and f n is small, then the particle n will have a high probability of being “killed off”; thus, the name, Russian roulette and splitting. The number of particles N produced in this process has expected value equal to N but may differ from it a bit. The resulting particles are all given probability 1/N . If one wishes to have exactly N particles after resampling, one can use the following algorithm taken from Sect. 3.3 of Stone et al (2014). Step 3 of the algorithm copies the entire sample path up to time tk . Step 4 perturbs the target state only at time tk . Step 4 is often modified so that the first copy of a particle remains unchanged.
3.2 Particle Filter Tracking
51
Resampling Algorithm for Exactly N Particles 1. Set C0 = 0 and compute Cn =
n ∑
p( m|y1:k ) for n = 1, . . . , N
m=1
2. Draw u 1 from a uniform distribution over [0, 1/N ] and compute u m = u 1 + (m − 1)/N for m = 2, . . . , N 3. For n = 1, . . . , N , do the following For m such that Cn−1 ≤ u m < Cn set xm = xn 4. For m = 1, . . . , N perturb the state xm (tk ) at time tk to obtain the particle xm which differs from xm only at time tk 5. Set the probability of each particle equal to 1/N so that
xn , 1/N , n = 1, . . . , N
becomes the resampled posterior particle distribution Perturbations. Choosing how to perturb a particle is an art rather than a science, but we give some suggestions when the target state space S is d-dimensional Euclidean space. If the state space has a d-dimensional Euclidean component plus other perhaps discrete components, one might consider perturbing only the Euclidian component of the particle. The process of perturbing the particles is called regularization, not a very descriptive term. Perhaps diversification would be a better term for then we could refer a to particle filter as diversified rather than regularized. One way to think about the particle states at time t is that they are samples from a smooth underlying probability density. There is a substantial literature on estimating this density from the samples. See for example Silverman (1986). One suggestion for perturbing particles is given in Sect. 3.3.4 of Stone et al. (2014). 1. Compute the empirical covariance ∑ of the particle distribution at time tk before resampling as follows μ=
N ∑ n=1
p( n|y1:k )xn (tk ), ∑ =
N ∑
p( n|y1:k )(xn (tk ) − μ)(xn (tk ) − μ)T
n=1
(3.15) 2. For each particle xn (tk ) in the resampled set, make an independent draw δn from the density
52
3 Bayesian Particle Filtering
K h (x) = h
−d
η h
−1
x, 0, ∑
1/(d+4) 4 1 where h = for x ∈ S 2 N (d + 2) (3.16)
and add δn to xn (tk ) to obtain the perturbed particle state. Note, the previous path history of xn remains the same as before the perturbation. Curse of dimensionality. Particle filters do not escape the curse of dimensionality. As the dimension d of the state space increases, the number of particles required to obtain a good representation of the posterior distribution grows exponentially. Daum and Huang (2011) found that roughly 10 d particles are required for a d-dimensional state space. Thus a 4-dimensional state space requires on the order of 10,000 particles and a 6-dimensional one requires on the order of 1,000,000. This is in contrast to the number of draws that are required to estimate a multidimensional integral or expectation by Monte Carlo integration. If we wish to estimate a d-dimensional integral using N samples, then for some constant √ C, the sampling error in estimating this integral (a single number) decreases as C/ N as N → ∞. The constant C depends on d. See Wikipedia (2022). This seeming discrepancy can be partially understood by observing that we are estimating only one number for an integral, whereas the number of samples needed to estimate a multidimensional distribution can reasonably be expected to grow as the dimension of the state space grows. The surprise is that it grows exponentially. State Space Considerations. We normally think of the target state space as kinematic, e.g., position and velocity. However, by expanding the state space to include non-kinematic information such as an index that indicates which motion model the particle is following or what its destination is, we can often retain the Markov nature of the motion model while at the same time providing a richer set of possible motions for the target. Suppose our information suggests that the target is heading for destination 1 with probability P△ (1) and destination 2 with probability P△ (2) = 1 − P△ (1). For each destination, we can specify a motion model appropriate for that destination. We can think of these models as two stochastic processes, {X (1, t), t ≥ 0} and {X (2, t), t}, one for each destination. For the particle filter we can combine these into a single model {X (i, t), i = 1, 2; t ≥ 0}.
(3.17)
When we draw sample paths from this process, we first draw the value of i with probability P△ (i ) for i = 1, 2. We then draw the remainder of the path from the motion model for destination i. We can also allow for the possibility that the target will switch from one destination to the other in the middle of its path. A motion model can have several possible sub-models. One sub-model can prescribe that the target follows a constant course and speed for long periods of time. Another one can specify course changes or maneuvers that occur rapidly over time. Another sub-model can specify that the target simply loiters in an area. To model
3.2 Particle Filter Tracking
53
these possibilities, one specifies the probability PM (m) of the mth motion model for m = 1, 2 . . . and proceeds to simulate paths in the fashion used for multiple destinations. Again, we can allow for the possibility that the target switches from one motion model to another at a random time. This sort of motion model is called an Interactive Multiple Model (IMM) in Kalman filter terminology and is often used as a non-linear extension of Kalman filtering with the restriction that each of the motion sub-models must be Gaussian. Of course, the measurements must be linear functions of target state with Gaussian measurement errors. Implementing a Kalman filter IMM is rather complex. See Bar Shalom et al. (2001). By contrast, it is simple to implement m motion models in a particle filter, and we don’t have to abide by the Kalman filter restrictions.
3.2.3.2
Bayesian Particle Filter Recursion
One can perform the Bayesian particle filter recursion even if the motion model is not Markovian. What we require is that we can make a draw from the target path distribution from time tk−1 to time tk given the particle history up to time tk−1 . This yields independent draws of xn (tk ) for n = 1, . . . , N from the distribution of X (tk ) given X (t) for 0 ≤ t ≤ tk−1 . We denote this distribution by qk ( ·|X (t) for 0 ≤ t ≤ tk−1 ) so that for s ∈ S, qk ( s|X (t) for 0 ≤ t ≤ tk−1 ) = Pr{X (tk ) = s|X (t) for 0 ≤ t ≤ tk−1 } for s ∈ S.
(3.18)
We observe that almost any motion model will satisfy this requirement which allows us to generate particle paths sequentially. Bayesian Particle Filter Recursion Draw the initial distribution Set p(n) = 1/N for n = 1, . . . , N Let t0 = 0 and q0 be the probability distribution for X (0). Make N independent draws xn (0) ∼ q0 for n = 1, . . . , N
(3.19)
to obtain the particle state distribution {(xn (0), p(n)), n = 1, . . . , N } at time t0 = 0. For k ≥ 1 Perform the motion update Make independent draws to obtain
54
3 Bayesian Particle Filtering
xn (tk ) ∼ qk ( ·|X (t) for 0 ≤ t ≤ tk−1 ) for n = 1, . . . , N
(3.20)
Compute the measurement likelihood for Yk = yk L Yk ( yk |xn (tk )) = Pr{Yk = yk |X (tk ) = xn (tk )} for n = 1, . . . , N
(3.21)
Perform the Bayesian update p( n|y1:k ) =
1 L Y ( yk |xn (tk )) p( n|y1:k−1 ) C k
(3.22)
where C=
N ∑
L Yk ( yk |xm (tk )) p( m|y1:k−1 )
(3.23)
m=1
Posterior particle state distribution at time tk is PNtk (y1:k ) = {(xn (tk ), p( n|y1:k )), n = 1, . . . , N }
(3.24)
3.2.4 Additional Considerations The SIR particle filter described above is a basic particle filter and is satisfactory for many problems. Chapter 3 of Ristic et al. (2004) and Doucet and Johansen (2008) provide excellent tutorials on particle filters and discuss some advanced methods such as auxiliary particle filters and the use of Monte Carlo Markov Chain (MCMC) methods to diversify the particles when resampling. Auxiliary particle filters are useful when a measurement is received that is on the edge of the particle cloud that represents the motion updated distribution at the time of the measurement. However, it is often difficult to implement auxiliary particle filters. Godsill and Clapp (2001) present an approach that uses a sequence of likelihood functions resulting from inflated measurement uncertainties that converge on the actual likelihood function. In some cases, progressive Bayesian estimation by Hanebeck and Pander (2016) provides an alternative approach. In many cases, the user will have to devise an application specific approach to their problem. The reader can see from (3.22) that the likelihood function may be multiplied by a constant factor without changing the result. It usually pays to check range of values of the likelihood functions. If they become extremely low or extremely high, this can cause numerical difficulties. A solution is to multiply the likelihood function
3.2 Particle Filter Tracking
55
by a constant that brings the likelihood values into a reasonable numerical range for computer calculation.
3.2.5 Tracking Examples We provide two tracking examples, one involving bearings only tracking and the other involving a surveillance scenario where the target must avoid certain regions.
3.2.5.1
Bearings-Only Tracking
This example is adapted from Sect. 1.3 of Stone et al. (2014). It considers a submarine versus submarine tracking problem. The problem takes place from the point of view of the ownship submarine which is trying to track a target submarine. Note, we use the abbreviations nm for nautical miles and kn for knots (nautical miles per hour). The ownship commander has received an indication that a submarine of unknown origin (the target) has entered ownship’s vicinity and that it is somewhere within a specified a 10 nm by 10 nm square region. The commander wishes to establish the location of the target and estimate its track using the submarine’s passive, bearingsonly sensor since that sensor will not give away ownship’s position. This sensor provides limited range information. We know only that it cannot detect beyond a range of 6.5 nm. Motion Model for Target. The commander uses general knowledge about submarine behavior to construct a probabilistic motion model for the target submarine. There is a minimum speed Smin at which a submarine can travel and still maintain its ability to steer and balance. In addition, submarines generally stay below a maximum speed Smax to avoid becoming so noisy that they are easily detected. Based on personal knowledge and experience, the commander estimates Smin = 2 kn and Smax = 20 kn for this target. Submarines generally stay on a fixed course and speed and maneuver to a new course and speed from time to time. When a submarine changes velocity, it generally chooses offsets from its present course and speed. The frequency and magnitude of these changes depends on whether the submarine is transiting to a specified destination or patrolling (loitering) in an area. The commander estimates that this submarine will change velocity every 1/2 h on average. With these considerations in mind, the commander specifies the following probabilistic motion model to represent both his understanding and uncertainty about the target’s motion. Motion Model. The prior on position at time 0 is uniform over the specified 10 nm by 10 nm square region. The target’s prior velocity distribution is given by a uniform distribution over [0◦ , 360◦ ] on target course and an independent uniform distribution on speed between 2 and 20 kn. The prior velocity distribution is contained in the annulus shown in Fig. 3.1 where Smin = 2 kn, Smax = 20 kn, and v = (v1 , v2 ) is velocity. Note that this velocity distribution is not even close to a Gaussian one.
56
3 Bayesian Particle Filtering
Fig. 3.1 Velocity distribution annulus
v2
S max
S min v1
Since we will need to simulate possible target paths, it is convenient to describe the motion model in terms of simulating a target path. The target submarine chooses an initial position and velocity from the priors described above. The time τ between velocity changes is exponentially distributed with mean 1/λ so that Pr{τ ≤ t} = 1 − e−λt for t > 0 where λ > 0. An independent draw for τ is made for each velocity change. The commander specifies that λ = 2/h which produces a 1/2 h mean time between velocity changes. If the target is in state (x(t0 ), v(t0 )) at time t0 , it continues at velocity v(t0 ) for a time τ so that (x(t), v(t)) = (x(t0 ) + (t − t0 )v(t0 ), v(t0 )) for t0 ≤ t ≤ t0 + τ. At time t0 + τ , a velocity change △v is obtained by making independent draws for the change △θ in course and the change △s in speed. This determines the velocity for the next leg which begins at t0 + τ . The distribution on the change in course △θ is a mixture of two truncated Gaussian distributions, D1 and D2 each having weight 0.5. Specifically, D1 = N (μh , σh2 ) truncated at zero (no negative values) D2 = N (−μh , σh2 ) truncated at zero (no positive values) where μh = 60◦ and σh = 30◦ . The course changes are further constrained to be between −180◦ and +180◦ . Figure 3.2 shows the density function for this course change distribution. The change in speed △s is drawn from a similar Gaussian mixture distribution with
3.2 Particle Filter Tracking
57
0.005 0.004 0.003 0.002 0.001
-180
-120
-60
60
120
180
Fig. 3.2 Course change distribution
D1 = N (μs , σs2 ) truncated at zero (no negative values) D2 = N (−μs , σs2 ) truncated at zero (no positive values), where μs = 2 kn and σs = 1 kn. The speed changes are further truncated so that the resulting speed does not fall below Smin or above Smax . This produces a speed change density with a shape similar to the one in Fig. 3.2. Comment. Clearly, submarines do not change velocity instantaneously, so the model is an approximation. Since we will be simulating possible target paths, we could round the edges and put a constraint on the turn radius. We could also add circular arcs. Our experience is that this is unnecessary for good performance in this problem. The process of choosing the motion model for the target submarine in this example is instructive. Because we are using a particle filter, we can concentrate on modeling the operational and tactical constraints of the submarine as understood by the commander. How slow or fast will it go? How often will it change velocity? Is there a general direction of motion? We represent our uncertainty about the behavior of the submarine by probability distributions. This provides for natural and realistic (to the appropriate level of detail) models for target motion. The resulting simulation of the target’s motion provides the particle paths for the particle filter that will estimate the track of the target submarine. Likelihood Function for Bearing Measurements. Submarines operate in 3dimensions, but their range of depths below the ocean surface is small compared to the distances between the submarines in this example. As a result, we can treat this tracking problem as 2-dimensional. Moreover, the example takes place in a small region of the ocean, so we can avoid spherical geometry and treat the problem as taking place in the plane tangent to the ocean surface with point of tangency at the center of the region of interest. The x1 axis is horizontal (East (positive)-West (negative)) and the x2 axis vertical (North (positive)-South (negative)). For mathematical convenience, we measure bearings counterclockwise from the x1 axis.
58
3 Bayesian Particle Filtering
Bearing measurements arrive every minute. The errors in the bearing measurements are Gaussian distributed with mean 0 and standard deviation 2°. Bearing measurement errors are independent from measurement to measurement. If correlation in bearing errors is a problem due to time averaging in the signal processing of the senor, then a simple solution is to down-sample the bearing measurements to insure they are reasonably independent. Suppose that (y1 (tk ), y2 (tk )) and (x1 (tk ), x2 (tk )) are the positions of ownship and the target submarine at the time tk of the kth bearing measurement θk . The measurement θk satisfies the equation1
θk = arctan
x2 (tk ) − y2 (tk ) x1 (tk ) − y1 (tk )
+ εk where εk ∼ N 0, σk2 .
(3.25)
That is, the observed bearing of the target from ownship has a Gaussian error in bearing with mean 0 and standard deviation σk , which may depend on the measurement. For this example, we have taken σk = 2 deg for all k. Nonlinear measurements with non-Gaussian errors in the standard (x1 , x2 ) coordinate system are easily incorporated into a particle filter by using a likelihood function. The likelihood function L b for the measurement θk is the probability (density) of obtaining θk as a function of target position (x1 , x2 ). From (3.25) we calculate L b as follows
x2 − y2 (tk ) |(x L b ( θk 1 , x2 )) = Pr εk = θk − arctan x1 − y1 (tk )
x2 − y2 (tk ) (3.26) , 0, σk2 . = η θk − arctan x1 − y1 (tk ) where η (· , μ, ∑) is the probability density function for a N (μ, ∑) distribution. Note that θk is data; it is fixed. As a result, L b is a function of the position component (x1 , x2 ) of the target state space. The function L b (θ, ·) specifies the likelihood of the measurement as a function of target position as depicted in Fig. 3.3. For this example, we assume that the detection capabilities of the submarine sensor limit detections to no more than 6.5 nm from ownship. We model this by setting the bearing likelihoods (such as the one shown in Fig. 3.3) to be 0 beyond 6.5 nm from the position of ownship. If one has more detailed detection information, it can be incorporated into the likelihood function. Particle Filter. We used a Bayes Markov particle filter as described in Sect. 3.2.3.1 with the above motion model and 25,000 particles to demonstrate the performance of a particle filter on this problem. We have found that roughly 25,000 particles are
1
Although the notation does not indicate this, we are using the two-argument arctan function that accounts for the quadrant of the bearing. The bearings in this example are measured counterclockwise from the horizontal axis. This differs from the usual maritime convention where bearings are measured clockwise from north (the vertical axis).
3.2 Particle Filter Tracking
59
Fig. 3.3 Line of bearing likelihood function for a 45° bearing measurement from (0,0) with additive Gaussian measurement error having mean 0 and standard deviation 2°
necessary to obtain a good representation of the probability distributions on the 4dimensional, position-and-velocity state space. This number is in concert with the findings of Daum and Haung (2011). To simulate measurements for the particle filter, we specified the target’s and ownship’s path as shown in Fig. 3.4. Ownship maneuvers early in its track to obtain a cross fix that will improve its range estimate. Later the target submarine maneuvers, and after that ownship maneuvers again. Measurements were generated at a sequence of times t1 , t2 , . . . spaced one minute apart. (Note that measurements do not need to be equally spaced.) The measurements were simulated by adding a Gaussian error to the actual bearing of the target from ownship, and these noisy bearings were sent to the particle filter. Tracker Output. Figure 3.5 shows the initial bearing measurement at time 0 which produces a probability distribution on target location with substantial range uncertainty. The only range information is that the target is no more than 6.5 nm from ownship at the time of the initial bearing. In this figure and subsequent ones, observed bearings are indicated by a green line, and we show a random sample of 500 of the 25,000 particles. A single position measurement gives us no information on the velocity distribution, so this distribution is the same as the prior distribution on velocity. The target’s velocity at this time is (4, −4), which is indicated by the crosshairs on the velocity distribution display. As more bearings are received, the range and bearing uncertainty are reduced somewhat as seen in Fig. 3.6, which shows the position and velocity distributions after
60
3 Bayesian Particle Filtering
Fig. 3.4 Paths of target and ownship
Fig. 3.5 Target position and velocity distributions after the initial bearing measurement
3.2 Particle Filter Tracking
61
12 min and 12 measurements. Notice that the velocity distribution is substantially different from the prior. However, it is not until ownship maneuvers at 18 min and obtains the additional measurements through time 28 min that the range and velocity uncertainties are substantially reduced as shown in Fig. 3.7. When the target maneuvers at 36 min, the position and velocity uncertainties begin to increase. At 42 min, six minutes after the target maneuver, the position and velocity distributions shown in Fig. 3.8 reflect the increased uncertainty in the position and velocity. The velocity distribution is beginning to react to the maneuver with a small number of particles located in the vicinity of the target’s new velocity. Notice the
Fig. 3.6 Position and velocity distributions at 12 min
Fig. 3.7 Position and velocity distributions at 28 min
62
3 Bayesian Particle Filtering
Fig. 3.8 Position and velocity distributions at 42 min
“hole’ in the velocity distribution around (0, 0) resulting from the constraint that the target speed must not fall below 2 kn. At 42 min, ownship maneuvers again to obtain a better solution. Figure 3.9 shows the position and velocity distributions at 50 min. There is a large range uncertainty, but the velocity distribution now has substantial mass near the target’s velocity. At the end of the example at 60 min, Fig. 3.10 shows good localization in both position and velocity. Comments. This example demonstrates the power and flexibility of nonlinear tracking as implemented by a particle filter. A realistic motion model based on the
Fig. 3.9 Position and velocity distributions at 50 min
3.2 Particle Filter Tracking
63
Fig. 3.10 Position and velocity distributions at 60 min
commander’s knowledge of how submarines operate was easily implemented along with the constraints on the speed of the target. The motion model allowed for target maneuvers in a natural fashion. The measurement model readily handled observations that are nonlinear functions of target state and allowed for the incorporation of limited range information. At each time, the posterior distribution accurately represented the uncertainty in the tracker’s target state estimate. The lack of range information is represented by the range uncertainty shown in the position distributions before ownship maneuvered. The filter easily handled the target maneuver because the possibility of a maneuver is built into the motion model. The reader is invited to compare the ease and flexibility of the particle filter approach to that of the numerous clever and complex approaches to performing bearings-only tracking with a Kalman filter that are presented in Chap. 6 of Ristic et al (2004) and to note that in the tests results presented there, the particle filter outperformed the Kalman filters.
3.2.5.2
Surveillance Example with Avoidance Areas
In this section, we consider a surveillance problem in which we are tracking a target when there may be considerable time gaps between measurements. In addition, there are avoidance areas that we know the target will not enter. In problems like this, it is important to employ a realistic motion model so that the estimates of the uncertainty in the target’s location remain realistic between measurements. Moreover, it is often desirable to make an estimate of the target’s path over a time interval [0, T ] given all the measurements received in that interval. This is called fixed interval smoothing and is a difficult problem to solve except in special cases. However, in Sect. 3.4 we
64
3 Bayesian Particle Filtering
present a simple and general method for smoothing particle filters. We call it repeated filtering and apply it to this problem. As in the example in Sect. 3.2.5.1, the state space (x, v) consists of a 2-dimensional spatial component, x and a 2-dimensional velocity component v. The motion model is similar to that in Sect. 3.2.5.1 and is defined by first specifying a probability (density) function p0 (x, v) on the position and velocity (x0 , v0 ) of the target at time 0. As time progress, the target changes velocity (instantaneously) at the event times of a Poisson process with rate λ = 0.25/hr. Between velocity changes, the target follows a constant velocity path at the previously chosen velocity. When the target changes velocity, its new velocity vi is drawn from a probability (density) function p ( ·|vi−1 ) where vi−1 is the velocity just prior to the change. This type of motion model is called a generalized random tour (GRT) after the random tour model introduced by Washburn (1969). For this example, we set p0 (x0 , v0 ) = η x0 , (0, 0), (15 nm)2 I2 η v0 , (0, 0), (10 kn)2 I2 where I2 is the 2-dimensional identity matrix. When a velocity change occurs, the new velocity is chosen by making independent draws to determine the changes to the speed and course of the target. The course change distribution is the same mixture of truncated Gaussian distribution as in the bearings-only tracking example in Sect. 3.2.5.1. The probability density function for this distribution is shown in Fig. 3.2. The speed change distribution is also symmetric about zero. On each side of zero, the distribution is proportional to that of a truncated Gaussian whose mean has value 2 kn on the positive side and −2 kn on the negative side. The standard deviation is 1 kn on both sides. As the particle paths are generated, we ensure that they stay clear of avoidance regions as follows. When a velocity change takes place, the time on that leg is drawn as well as the new velocity. If the resulting leg hits an avoidance region, a new velocity is drawn. This process is repeated up to 100 times. If the process is not successful, the particle path is allowed to penetrate a small distance into the avoidance area at which time another up to 100 tries are made to find a velocity that moves the path outside the avoidance area. As a result, there are minor intrusions into the avoidance area. These do not affect the overall quality of the solution in a surveillance problem where the goal is to estimate the general location of a target not a precise position and velocity. The target follows the ladder path shown in blue in Fig. 3.11 with long legs of 24 h the first duration and short legs of 6 h duration. Its speed is fixed at 8 kn. The time to√ measurement is gamma distributed with mean 4 hrs and standard deviation 8/3 h. The time intervals between subsequent measurements are independent with this same gamma distribution. The measurements are of position with a circular Gaussian error distribution having standard deviation 10 nm. In Fig. 3.11, measurements are indicated by red dots. For the particle filter, we used 10,000 particle paths. On the left, Fig. 3.11 shows 50 particle paths through time 60 h drawn at random from the 10,000 particles. Similarly on the right, the figures shows 50 paths through
3.2 Particle Filter Tracking
65
Fig. 3.11 On the left, this figure shows 50 particle paths through 60 h. On the right it shows 50 paths through 120 h. The target’s path is shown in blue, avoidance areas in black, and measurements in red. Measurements are connected by a dashed line to show their time sequence
time 120 h. Note, that because of resampling, the number of distinct paths decreases as time goes back to 0 h. When we resample and split existing particles, the child particle inherits its past path from its parent. The result is many particles share the same parent. This produces the lack of resolution seen as we go back in time. So even though these paths are drawn from the posterior distribution on paths given the measurements through time 60 and 120 h and therefore represent an estimate of the smoothed distribution over [0, 60] and [0, 120], they are not a good representation of that distribution. This problem of impoverishment as one goes back in time will be solved in Sect. 3.4. However, the filter is doing its job of providing a good estimate of the distribution on target location at the time of the last measurement. That is what it is designed to do. Figure 3.12 shows 50 sample paths through times 180 and 240 h. The impoverishment of paths near time 0 is even more noticeable than in Fig. 3.11. The repeated filtering method presented in 3.4 solves this problem. Note, that the filter has done a good job of keeping the paths out of the avoidance areas and has also done a good job of following the target even though the target is not following the motion model.
66
3 Bayesian Particle Filtering
Fig. 3.12 On the left are 50 sample paths through time 180 h. On the right are 50 sample paths through time 240 h
3.3 Bayesian Particle Filtering Applied to Other Nonlinear Estimation Problems We can apply the methods of this chapter to nonlinear filtering problems that are not tracking problems. For these problems we wish to estimate the state of a system that evolves over time according to a stochastic process {X (t), t ∈ [0, T ]} where X (t) is the system state at time t. We assume know the prior distribution of this process. As in Chap. 2, the values of the system state lie in a space S which may be continuous, discrete, or a combination of the two. We receive noisy measurements of the state at a discrete set of possibly random times 0 ≤ t1 < t2 < · · · . Let Yk be the random variable representing the measurement at time tk and Yk = yk be the value of the measurement. We assume we can calculate the likelihood functions for these measurements, L k (yk |s) = Pr{Yk = yk |X (tk ) = s} for s ∈ S and k = 1, 2, . . .
(3.27)
and that they are conditionally independent of one another. If we can generate independent sample paths from the process sequentially in the fashion described in Sect. 3.2.3.2, we can apply the recursion in that section to estimate the state of the process. We show an example of this in Sect. 3.3.1. There are many applications of particle filters to non-linear stochastic time series problems. Some examples are model selection in communications Djuric et al. (2003); signal and image process Fitzgerald et al. (1999) and Song et al. (2021); and econometrics Shephard and Pitt (1997) and Herbst and Schorfheide (2017).
3.3 Bayesian Particle Filtering Applied to Other Nonlinear Estimation …
67
3.3.1 Nonlinear Time Series Example In this section we apply a particle filter to a nonlinear filtering problem that is not a tracking problem. Although the example is artificial, it is one of the standard ones used to test particle filtering algorithms. We use the version of this problem described in Godsill et al (2004) who note that this model has been extensively used for testing numerical filtering techniques. The stochastic process X develops in discrete time, t = 1, 2, . . . , 100, and in one dimension as follows. X (1) ∼ N (0, 10) For t > 1, X (t − 1) 1 X (t) = X (t − 1) + 25 + 8 cos(1.2t) + v(t) 2 1 + X 2 (t − 1) where v(t) ∼ N (0, 10) and is indpenent of v(s) for s = t
(3.28)
Measurements Yt are received at each time step with Yt =
X 2 (t) + w(t) 20
(3.29)
where w(t) ∼ N (0, 1) and is independent of w(s) for s = t. To produce a particle filter example, we first simulated a sample path from the process X using (3.28). This is the path for the particle filter to estimate. It is shown in Fig. 3.13. We simulated measurements using (3.29) with the red dots providing the X (t) values. The resulting measurements are shown in black in Fig. 3.14 where they are compared to the values of the process (red). Because the measurement equation in (3.29) squares the process value X, we see that measurements are all positive or only slightly negative. This of course, creates an ambiguity about whether the measurement is the result of a positive or negative value of X. The measurement equation in (3.29) produces the likelihood function L
2 x2 1 L( x|y) = P{ Y = y|X = x} = exp − y − for 2 20
− ∞ < x < ∞. (3.30)
We applied a particle filter using Eq. (3.28) to obtain 1000 sample paths from the prior process and used (3.30) to calculate the posterior distribution of the sample paths given the measurements. Figure 3.15 shows 50 sample paths drawn randomly from the posterior distribution after processing the measurements through time 50. Because of the positive–negative ambiguity in the measurement equation, some paths have poor estimates of the state
68
3 Bayesian Particle Filtering
Fig. 3.13 Actual path of the process X. The red dots are the values. The lines simply connect the dots to provide clarity
Fig. 3.14 Process values are shown in red, measurement values in black
3.4 Smoothing Particle Filters
69
Fig. 3.15 Fifty sample paths (black) from the particle filter at time 50. Actual values of the process X are shown as red dots
at some times, particularly at time 50. As more measurements are received, this ambiguity is often resolved. Figure 3.16 shows 50 paths from the particle filter after processing the measurements through time 100. Note that many of the ambiguities that are apparent in Fig. 3.15 have been resolved, particularly the one at time 50. However, there is a large ambiguity at time 100. As we mentioned above, this problem is one of the standard ones used to test filtering algorithms. Because of the use of a likelihood function to incorporate measurement information and the retention of the history of a particle path when it is split from its parent, this filtering problem was not difficult to solve.
3.4 Smoothing Particle Filters Until now, we have been concerned with filtering, i.e., estimating the posterior distribution on target state at the time of the last detection. However, in some cases we wish to find the best estimate of the target’s path over an interval of time [0, T ] given the measurements that arrived in that interval. The process of doing this is called fixed interval smoothing and the resulting posterior distribution on the target’s path is the smoothed solution. If a single smoothed path is desired, one can calculate the mean or maximum likelihood path of this distribution. If the Kalman filter assumptions are satisfied, then we can obtain the smoothed solution for this problem using a Kalman smoother. A number of Kalman smoothers are described in Sect. 3.2.3 of Stone et al
70
3 Bayesian Particle Filtering
Fig. 3.16 Fifty sample paths (black) from the filter after processing the measurements through time 100
(2014) and Särkä (2013). A standard smoother is the Rauch-Tung-Striebel (RTS) smoother on p. 135 of Särkä (2013). Fixed interval smoothing, and smoothing in general, is difficult to perform on particle filters. If one performs the resampling of a Bayesian particle filter so that the path histories of the particles are retained as is done in Sect. 3.2.3.1, then the resulting set of paths and their weights provides an estimate of the posterior (smoothed) distribution on the target paths. However, the resampling process usually leads to a set of paths that descend from a few initial paths at time 0 or sometimes only one. This estimate loses resolution as it goes back in time toward 0. There are numerous methods proposed to deal with this problem. They all require that the motion model of the process be Markovian with an explicit transition (density) function. See for example Särkä (2013), Godsill et al. (2004), and Klaas et al. (2006). The method in Godsill et al. (2004) is notable in that it produces smoothed sample paths whereas the other methods produce only smoothed marginal distributions at the measurement times. In this section, which is based on Anderson et al. (2023), we describe and illustrate a simple and general method of smoothing particle filters which we call repeated filtering. It produces a set of sample paths from the posterior distribution on smoothed sample paths. All that is required to implement this method is the ability to generate independent sample paths from the motion model. It does not require the special assumptions referenced above and produces sample paths rather than marginal distributions at discrete times. In fact, once one has produced a particle filter solution for a problem, they have done the hard part. Repeated filtering requires simply repeating the filtering process.
3.4 Smoothing Particle Filters
71
3.4.1 Repeated Filtering The increasing speed, memory capacity, and capability of present-day computers allow us to implement the repeated filtering method of smoothing a particle filter, which would not have been practical a few years ago. The method is a recursive procedure that yields M independent sample paths from the smoothed distribution on sample paths. Repeated Filtering Recursion • Step 1. Make an initial run of the particle filter processing the measurements received over the time interval [0, T ] and resampling as necessary. • Step 2. Choose a particle path at random according to the posterior particle path distribution. • Step 3. Rerun the particle filter with the same measurements as in Step 1 but drawing particles paths that are independent of those drawn in Step 1. This will insure, among other things, we choose new and independent samples of the target state at time 0. • Step 4. Make a random draw to choose one of the particle paths as in Step 2. Save this particle path. • Step 5. Repeat Steps 3 and 4, using particle paths that are independent of those drawn previously until one obtains M smoothed particle paths. Assign each particle path probability 1/M to create the particle path distribution {(x m , 1/M); m = 1, . . . , M}.
(3.31)
The set of particle paths in (3.31) forms a discrete path approximation to the posterior distribution on the sample paths given the measurements received in [0, T ]. Weighting the Paths. The solution in (3.31) gives each smoothed path an equal weight. We hypothesize that an alternate weighting scheme applied to the smoothed particle paths in (3.31) would produce a better solution. However, we have not found a method that has consistently produced better solutions. This is an area for further investigation. Result. The result of the repeated filtering recursion is a set of M independent particle paths drawn from the posterior distribution over [0, T ] given the measurements received in that time interval. We use these paths to form the particle path distribution in (3.31) which approximates the posterior distribution of the motion process. Obviously, if one has the particle path distribution in (3.31), one can obtain a particle state distribution at time t ∈ [0, T ] from {(x m (t), 1/M); m = 1, . . . , M}.
(3.32)
The ability to estimate the distribution of the state of the smoothed process at times between measurements can be particularly important in situations where there
72
3 Bayesian Particle Filtering
are large time gaps between measurements as occurs in some surveillance problems. In addition, having a set of smoothed paths can be helpful in determining patterns of motion. Further, as noted in Godsill et al (2004), having sample paths allows one to explore relationships between the state of the process at different times. This can be useful in tracking problems. Resampling within each run of the particle filter is necessary to preserve the resolution of the posterior particle path distribution near T. Making independent runs of the particle filter in Step 3 to obtain M posterior sample paths is necessary to preserve the resolution of the estimate of the posterior near time 0. Determining the number of particle filter runs required and the number particles for a run generally requires some experimentation. We expect that there is some limitation on the length of the interval [0, T ] over which this process produces solutions with good resolution, or more likely as the time interval becomes longer, the number M of particle filter runs may need to become larger. We have not explored this question. Another possibility is to break the interval [0, T ] into two or more subintervals and splice the solutions from the subintervals together in some fashion. We have not explored this possibility either. Computation Time. The computation time to obtain a repeated filtering solution depends on the time to perform one filter run, which depends on the complexity of the problem. Generating M independent samples from the posterior will take M times as long as a single filter run. If time becomes a problem, one can easily apply coarse grain parallel processing by allocating the repetition of Steps 3 and 4 across a number of processors.
3.4.2 Smoothing Examples In this section we find a smoothed estimate of the target’s path in the avoidance area example in Sect. 3.2.5.2 and a smoothed estimate of the path of the nonlinear times series in Sect. 3.3.1. For the problems in theses sections, we developed particle filters and produced particle filter solutions, i.e., estimates of the posterior distribution at the time of each measurement. In this section, we use those particle filters and apply repeated filtering to find smoothed estimates of the path of the target or the series given all the measurements.
3.4.2.1
Smoothing the Avoidance Areas Example
For repeated filtering, we ran the particle filter with N = 10,000 and at time T randomly chose one of the paths from posterior particle path distribution. We repeated Steps 3 and 4 in the Repeated Filtering Recursion to obtain M = 400 sample paths from the posterior (smoothed) distribution on the target paths. The 2-sigma ellipses for the position estimates shown in Fig. 3.17 were calculated
3.4 Smoothing Particle Filters
73
Fig. 3.17 Slinky plot from the first run of the repeated filtering smoother. The heavy blue line shows the target’s path and the red dots the measurements. The black circles show discs of regions that the target must avoid. The grey ellipses are 2-sigma ellipses generated from the repeated filtering results. The dashed circle shows the 2-sigma ellipse for the initial position distribution at time 0
every 4 h. This sequence of ellipses is called a slinky plot. We find this to be a convenient representation of the smoothed distribution on target paths. The ellipses represent Gaussian approximations to the position distributions every 4 h. Thus, some ellipses may intersect an avoidance area even though the paths may not. As noted in Sect. 3.2.5.2, there may be very small incursions of the paths into the avoidance regions. To illustrate the stability of the repeated filtering method, we repeated this run a second time using the same measurements as in the first run but with random draws independent of those made for the first run. We overlaid the slinky plots for the two runs in Fig. 3.18 where we have rotated the plot by 90° to facilitate visual comparison. As the reader can see, there is little, if any, difference in the plots, which gives us confidence in the stability of the method. A sample of the smoothed paths from repeated filtering is shown in Fig. 3.19. As one would expect, there is more uncertainty in the distribution of the smoothed target path near time 0 than there is close to time T. Note the improved diversity of paths near 0 h compared to Fig. 3.12.
3.4.2.2
Smoothing the Nonlinear Time Series Example
To apply repeated filtering to the nonlinear time series example in Sect. 3.3.1, we made 1000 independent runs of the particle filter developed in Sect. 3.3.1. At the conclusion of each filter run at time 100, we selected one of the paths at random
74
3 Bayesian Particle Filtering
Fig. 3.18 Comparison of the slinky plots from two runs of the repeated filtering method using the same inputs but independent random numbers. The blue line shows the target’s path Fig. 3.19 Smoothed sample paths selected by random draws from the set of smoothed paths. Each path has an equal probability of being chosen. The dashed line connects the measurements
from the posterior particle path distribution at time 100. This path was saved as a smoothed path. The resulting set of 1000 independently drawn paths constitutes an estimate of the posterior distribution on the smoothed paths of the time series given the measurements in the time interval [1, 100].
3.4 Smoothing Particle Filters
75
Fig. 3.20 Fifty smoothed paths shown in black; the time series values are in red
Figure 3.20 shows a sample of 50 smoothed paths in black and the actual values of the time series in red. Looking at the measurement Eq. (3.29), one can see that a value x of the series will produce the same measurement as −x. As more measurements are received, the smoother (usually) sorts out this ambiguity. This ambiguity has produced the bimodal results near time 100. Figure 3.21 shows the smoother results when restricted to the interval [0, 51]. The ambiguity in the smoothed solution near time 51 in this figure is resolved by time 100 in Fig. 3.20. Figure 3.22 shows a histogram of the smoothed distribution of the series values at each of the 100 times. One of the important advantages of finding smoothed paths rather than marginal distributions at each time is the ability to analyze joint densities of values at two different times. Estimating joint densities can be important in tracking problems too. Figures 3.23 and 3.24 show examples of these joint densities. The non-linear time series in this example has a Markovian motion model with an explicit transition function, so it could be solved by the methods in Godsill et al (2004). In Anderson et al (2023), this example was modified by adding reflecting boundaries at x = ±15 and was easily smoothed using repeated filtering since it was easy to simulate paths of the modified stochastic process. However, it would be difficult to smooth this modified process using the methods of Godsill et al (2004). One can see from (3.28) that for each transition one has to allow for reflection off one or more boundaries to determine the distribution of X (t) given X (t − 1). In fact, since the term v(t) in (3.28) is drawn from a Gaussian distribution, the transition function has to account for unbounded number of possible reflections.
76
3 Bayesian Particle Filtering
Fig. 3.21 Smoother results for [0,51]. Smoothed paths are in black. Time series values are in red
Fig. 3.22 Histogram of smoother values. Dark grey indicates higher density areas. Red stars show actual values of the time series
3.5 Notes
77
Fig. 3.23 Joint density plot for the values of the smoothed time series at times 8 and 9. Note the multi-modal distribution
Fig. 3.24 Joint density plot for times 77 and 78. This density is unimodal but not Gaussian
3.5 Notes For much the twentieth century, due to the influence of R. A. Fisher, Bayesian statistics was considered unscientific. It was difficult to publish results using Bayesian methods. Sometimes researchers used Bayesian methods to obtain a result but hid
78
3 Bayesian Particle Filtering
that fact in their papers. (See McGrayne (2011). In the 1960s and 1970s, statisticians such as Lindley (1965), DeGroot (1970), and DeFinetti (1974–75) put Bayesian statistics and the notion of subjective probability on a sound theoretical basis. However, the use of Bayesian methods was limited due to the complexities of computing posterior distributions in many applications. With the development of personal computers starting in the 1980s, affordable and capable computers became widely available. This facilitated and energized the use of Bayesian statistics in the scientific community. Gordon et al. (2018) tells the story of how he and colleagues became exposed to Bayesian statistics, Monte Carlo sampling, resampling, and proposal distribution methods in the early 1990s. Taking advantage of these methods Gordon et al. (1993) developed and popularized particle filters for the tracking community. Particle filtering is now one of the most important and powerful tools for tracking and Bayesian state estimation. Godsill (2019) provides an overview of the first 25 yr of particle filtering. There is a considerable literature on particle filters with Chap. 4 of Ristic et al. (2004) and Doucet and Johansen (2008) providing good introductions. Doucet et al. (2001) is one of the many references discussing advanced methods and theoretical results for particle filters. There was an early application of particle filters developed for the US Navy in 1972. It is described in Richardson et al. (2003) and was called Monte Carlo tracking. It was used to track Soviet ballistic missile submarines in the Atlantic and Pacific beginning in 1972. The 1972 version of the system ran on a large Navy mainframe computer in batch processing mode. It employed a multiple-model stochastic process for the target motion model, which was simulated by making 4000 Monte Carlo draws (particles) from its distribution. When a measurement was received (in the form of bivariate Gaussian distribution on location), the distribution on the stochastic process was updated in a Bayesian fashion to the produce the posterior distribution at the time of the detection and to forecast the target location distribution into the future for planning purposes. After each detection, the particles were resampled using the splitting and Russian roulette technique described in Sect. 3.2.3.1. Richardson et al. (2003) describe versions of this system that were developed and used by the US Navy into the late 1980s. However, the work remained classified and was not published in the open literature. The 1972 Monte Carlo tracking system was based on the Computer Assisted Search Planning (CASP) system developed by Richardson and his colleagues for use by the US Coast Guard for planning searches for people and boats missing at sea. See Richardson and Discenza (1980). This system became operational in 1974, but it did not employ resampling as the subsequent Navy system did since a detection of the search object ended the search. CASP’s successor, the Search and Rescue Operational Planning System (SAROPS), Kratzke et al. (2010), also employs particle filters. It was placed in operation in 2007 and continues to be used to plan search and rescue operations almost every day.
References
79
References Anderson, S.L., Stone, L.D., Maskell, S.: Repeated filter for particle filters. To appear in J. Adv. Inf. Fusion (2023) Bar-Shalom, Y., Lee, X.R., Kirubarajan, T.: Estimation with applications to tracking and navigation. Wiley, New York (2001) Blair, W.D.: Design of NCV filters for tracking maneuvering targets. IEEE Radar Conference (RadarConf20), Florence, Italy, 21–25 Sept 2020 (2020) Daum, F., Huang, J.: Particle degeneracy: root cause and solution. Proceedings of SPIE 8050 (2011) de Finetti, B.: Theory of probability, vol. I and II. Wiley, New York (1974–75) DeGroot, M.: Optimal statistical decisions. Wiley Classics, New York (1970) Doucet, A., de Freitas, N., Gordon, N.J. (eds.): Sequential Monte Carlo methods in Practice. Springer (2001) Doucet, A., Johansen, A.M.: A tutorial on particle filtering: fifteen years later. https://www.stats. ox.ac.uk/~doucet/doucet_johansen_tutorialPF2011.pdf (2008). Accessed 13 Jan 2022 Djuric, P.M., Kotecha, J.H., Zhang, J., Huang, Y., Ghirmai, T.: Particle filtering: a review of the theory and how it can be used for solving problems in wireless communications. IEEE Sign. Process. Mag. (2003) Fitzgerald, P.J., Godsill, S., Kokaram, A.: Bayesian methods in signal and image processing. Comput. Sci. (1999) Godsill, S.J.: Particle filtering: the first 25 years and beyond. International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May (2019) Godsill, S.J., Clapp, T.: Improvement strategies for Monte Carlo particle filters. In: Doucet, A., De Freitas, N., Gordon, N. (eds.) Sequential Monte Carlo Methods in Practice, pp. 139–158. Springer, New York (2001) Godsill, S.J., Doucet, A., West, M.: Monte Carlo smoothing for nonlinear time series. J. Am. Stat. Assoc. 99(465), 156–168 (2004) Gordon, N.J., Salmond, D.J., Smith, A.F.M.: Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEEE Proc-F 140(2), 107–113 (1993) Gordon, N.J., Salmond, D.J., Smith, A.F.M.: 25 years of particle filters and other random points. Keynote presentation at 21st International Conference on Information Fusion, Cambridge, UK, July 2018. https://fusion2018.eng.cam.ac.uk/system/files/documents/keynote.pdf (2018) Hanebeck, U.D., Pander, M.: Progressive Bayesian estimation with deterministic particles. In: Proceedings of the 19th International Conference on Information Fusion, Heidelberg, Germany, 5–8 July (2016) Herbst, E., Schorfheide: Tempered particle filtering. Working paper 23448 National Bureau of Economic Research http://www.nber.org/papers/w23448 (2017) Kahn, H.: Use of different Monte Carlo techniques. The Rand Corporation (P-766) http://www. rand.org/pubs/authors/k/kahn_herman.html (1955) Kitigawa, G.: Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. J. Comput. Graph. Stat. (1996) Klaas, M., Briers, M., de Freitas, N., Maske, S., Lang, V.: Fast particle smoothing: if I had a million particles. In: Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh (2006) Kratzke, T.M., Stone, L.D., Frost, J.R.: Search and rescue optimal planning system. In: Proceedings of the 13th International Conference on Information Fusion, Edinburgh, 26–29 July (2010) Lindley, D.: Introduction to probability and statistics from a Bayesian point of view, Parts 1 and 2. Cambridge University Press, Cambridge UK (1965) McGrayne, S.B.: The theory that would not die. Yale University Press, New Haven CT (2011) Richardson, H.R., Discenza, J.H.: The United States Coast Guard computer assisted search planning system. Naval Res. Logistics q. 27, 141–157 (1980)
80
3 Bayesian Particle Filtering
Richardson, H.R., Stone, L.D., Monach, W.R., Discenza, J.H.: Early maritime applications of particle filtering. In: Signal and Data Processing of Small Targets, Proceedings of SPIE, vol. 5204, pp. 165–174 Ristic, B., Arulampalm, S., Gordon, N.: Beyond the Kalman filter. Artech House, Boston (2004) Särkä, S.: Bayesian filtering and smoothing. Cambridge University Press, New York (2013) Silverman, B.W.: Density estimation for statistics and data analysis. Chapman Hall, New York (1986) Shephard, N., Pitt, M.: Likelihood analysis of non-Gaussian measurement time series. Biometrika 84, 653–667 (1997) Song, W., Wang, Z., Wang, J., Alsaadi, F.E., Shan, J.: Particle filtering for nonlinear/non-Gaussian systems with energy harvesting sensors subject to randomly occurring sensor saturations. IEEE Trans. Sign. Process. 69, 15–27 (2021) Stone, L.D., Streit, R.L., Corwin, T.L., Bell, K.L.: Bayesian multiple target tracking, 2nd edn. Artech House, Boston (2014) Washburn, A.R.: Probability density of a moving particle. Oper. Res. 17(5), 861–871 (1969) Wikipedia: Monte Carlo Integration. Available at https://en.wikipedia.org/wiki/Monte_Carlo_int egration (2022). Accessed 28 Dec 2022
Chapter 4
Simple Multiple Target Tracking
4.1 Introduction In previous chapters we assumed there is one target present and that all measurements are generated by that target. In this chapter, we relax both assumptions and allow for multiple targets and for the possibility that a measurement is false, i.e., the measurement is not generated by a target. In Sect. 4.2, we define association probability and show how to calculate it in a Bayesian fashion. In Sect. 4.3, we use association probabilities to define soft association. In Sect. 4.4, we use soft association to develop a simplified Joint Probabilistic Data Association (JPDA) algorithm and describe a particle filter implementation of this algorithm. We then apply the particle filter implementation to two examples. The first one follows two targets as their positions cross. In this example, the tracker uses only kinematic information. In the second one, we show how to use feature information to aid association and tracking. Section 4.5 briefly discusses algorithms for more complex multiple target tracking problems.
4.2 Association Probabilities Suppose we have received the line-of-bearing measurement shown in Fig. 4.1 and that we are tracking two targets whose probability distributions are shown in this figure. It is unclear which target is the origin of this measurement. It could be either one. We cannot decide this question definitively, but we can calculate the probability that each target generated the measurement. Suppose that there are Jt target tracks at the time t of a measurement and that p −j (t, ·) is the motion-updated (predicted) probability distribution on the state s ∈ S of the j th target at time t for j = 1, . . . , Jt . As in previous chapters, the p −j (t, ·) notation indicates that we are holding t fixed and considering p −j (t, ·) as a function of the second variable, which is the target state variable. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. D. Stone et al., Introduction to Bayesian Tracking and Particle Filters, Studies in Big Data 126, https://doi.org/10.1007/978-3-031-32242-6_4
81
82
4 Simple Multiple Target Tracking
Fig. 4.1 Line of bearing measurement with ambiguous target origin
In addition to the Jt existing tracks, we add j = 0 as a track for a target that has not previously been detected. The prior probability distribution p0 (t, ·) for this target is often taken to be uniform over the area of interest. Suppose that we have received the single measurement Yt = yt at time t. Let h( j ) =
Pr{target j generated Yt = yt } for j = 1, . . . , Jt Pr{a new target generated Yt = yt }for j = 0
be the prior probability distribution on the identity of the target generating this measurement. We usually take h( j ) = 1/(Jt + 1) for j = 1, . . . , Jt . Let L t (yt |x) = Pr{Yt = yt |X (t) = x} for x ∈ S be the likelihood function for this measurement. We compute the target association likelihood l( j ) for the target j as follows. l( j ) =
L t (yt |x) p −j (t, x)d x for j = 0, . . . , Jt ,
(4.1)
S
and then compute the association probability α( j ) by Bayes rule. α( j ) = ∑ Jt
l( j )h( j )
j =0 l( j
Thus
)h( j )
for j = 0, . . . , Jt .
(4.2)
4.4 Simplified JPDA
83
α( j ) = Pr{target j generated measurement Yt = yt } for j = 0, . . . , Jt . To allow for the arrival of new targets and to filter out false measurements, we set a threshold value τ0 for new targets. If α(0) > τ0 , we set Jt = Jt + 1 and generate a new target distribution from the measurement. If α(0) is below this threshold, no new target is created, and we renormalize the remaining Jt association probabilities to sum to 1. This will tend to filter out false measurements.
4.3 Soft Association If we know for certain which target generated the measurement Yt = yt , we can assign or associate the measurement to that target with probability 1. This is called a hard association or assignment. A soft association associates the measurement Yt = yt to target j with probability α( j ) where 0 < α( j ) < 1. Using this association probability, we compute the posterior distribution p j (t, ·) on target j given the soft association α( j ) as follows. Let p +j (t, s) = p j (t, s|yt ) for s ∈ S
(4.3)
so that p +j (t, ·) is the posterior distribution on target j given Yt = yt , i.e., given the measurement is associated to target j. Then we compute p j (t, ·) = α( j ) p +j (t, ·) + (1 − α( j)) p −j (t, ·).
(4.4)
We see from (4.4) that the influence of a measurement is felt on a track in proportion to the probability that the measurement is associated with that track.
4.4 Simplified JPDA A full version of nonlinear JPDA is given in Sect. 4.5 of Stone et al. (2014). In this section we present a simplified JPDA in which we assume that measurements arrive one at a time and that there are no out-of-sequence measurements, i.e., the k th measurement in time is always received before the k + 1st. As each measurement is received, the association probabilities in (4.2) and the posterior distributions in (4.4) are calculated for each target. Simplified JPDA makes the simplifying assumption that the posterior distributions in (4.4) are sufficient statistics for any future calculations of the multiple target distributions. Specifically, we can forget the history of contacts received in the past and use only the posteriors in (4.4) going forward in time. The full JPDA makes this same assumption about the posteriors it produces.
84
4 Simple Multiple Target Tracking
One can allow for deleting an existing target by setting a number of measurements K and a threshold α. If α( j ) < α for K consecutive measurements then we delete target j. Correspondingly, one may wish to wait for some of number of confirming measurements that are associated to a new target with a high enough probability before declaring the target present. This will also filter out most false measurements. In the full nonlinear JPDA presented in Sect. 4.5 of Stone et al. (2014), measurements arrive in scans. In a scan, multiple measurements arrive at virtually the same time, each target generates at most one measurement, and each measurement is generated at most one target. This substantially complicates the computation of association probabilities. The notion of a scan arises naturally in radar where the antenna rotates at a fixed number of revolutions per minute. Each rotation produces a scan of measurements. Modern radars often perform scans electronically without physically rotating the antenna.
4.4.1 Particle Filter Implementation of Simplified Nonlinear JPDA If we use a particle filter to represent the distribution of each target, then it is easy to implement the simplified nonlinear JPDA described above. Using the particle filter representation of the motion updated distribution for target j, we compute the particle filter representation of the posterior distribution p +j (t, · ) for target j given the measurement at time t is associated to that target. We then compute the posterior p j (t, ·) in (4.4) as the mixture of the particle filter representation of p +j (t, ·) and that of the motion-updated distribution p −j (t, ·) as follows. We resample, if necessary, the particles representing the distributions of p +j (t, ·) and p −j (t, ·) to obtain N particles with weight 1/N for each distribution. Denote the two sets of particles by
x +j (n), 1/N ; n = 1, . . . , N and x −j (n), 1/N ; n = 1, . . . , N .
(4.5)
Randomly choose α(m)N particles (without replacement) from
x +j (n), 1/N ; n = 1, . . . , N
and (1 − α(m))N particles (without replacement) from
x −j (n), 1/N ; n = 1, . . . , N
rounding the number chosen as necessary to obtain a total of N particles. The resulting set of particles is a particle filter representation of p j (t, · ) in (4.4).
4.4 Simplified JPDA
85
The special case of JPDA where we know there is one and only one target but there may be false detections is called Probabilistic Data Association (PDA). PDA and JPDA were developed by Bar-Shalom and Fortmann (1988) in the context of Kalman filtering. Since the Kalman filtering recursion produces Gaussian target state distributions, the distributions p +j (t, ·) and p −j (t, ·) in (4.4) are Gaussian. Since the Kalman filter requires a Gaussian prior to continue its recursion, the distribution p j (t, ·) in (4.4), which is a mixture of Gaussian distributions, is approximated by a single Gaussian that is matched to the mean and covariance of the mixture. This works well when the target state distribution is well approximated by a Gaussian and the other linear Gaussian assumptions of the Kalman filter hold, at least approximately.
4.4.2 Crossing Targets Example This section presents a particle filter example of simplified nonlinear JPDA tracking applied to a situation where we know there are exactly two targets and no false alarms. This example is adapted from Sect. 1.5 of Stone et al (2014). There are two targets whose tracks over the period from 0 to 2 h are shown in Fig. 4.2. Target 1 is moving to the right at 5 knots (kn). Target 2 is moving on a 20-degree downward diagonal to the right at 5.3 kn. The target tracks cross at 1 h at which time the target positions coincide. The observations (measurements) are received every 0.1 h over the two-hour period. They are position estimates with mean 0, bivariate-normal measurement Fig. 4.2 Target tracks over [0, 2] and prior position distributions at time 0. The circles show the 95% containment ellipses for these distributions
86
4 Simple Multiple Target Tracking
errors. For each observation, we make random draws to determine the ellipse that characterizes the covariance matrix of the error distribution. Specifically, we draw values for the semi-major axis length σma j (in nautical miles (nm)), the ratio r of the semi-minor axis to the semi-minor axis σmin , and the orientation θ using the distributions σma j ∼ U [0.1, 0.2], r ∼ U [0.5, 1.0], and θ ∼ U [0, 2π ]
(4.6)
where U [A] is a uniform distribution over the set A. The orientation of the major axis is measured counterclockwise from the positive x1 axis of the position component of the target space. We randomized the measurement error distributions in this example to show that the particle filter has no problem incorporating measurements with different error distributions as often occurs in actual tracking situations. The choices in (4.6) produce the following covariance matrix in the coordinate system of the major and minor axes of the ellipse. ∑e =
2 σma 0 j . 2 0 r 2 σma j
To compute this covariance matrix in the coordinate system of the position component of the target state space, we apply the rotation matrix R(θ ) =
cos θ − sin θ sin θ cos θ
to ∑e obtain ∑(θ ) = R(θ )∑e R T (θ ) ⎞ ⎛ 2 2 2 2 2 2 2 2 sin(θ cos(θ σma cos σ sin − r σ + r σ ) ) (θ ) (θ ) j ma j ma j ⎠ (4.7) ma j = ⎝ 2 2 2 2 2 2 2 σma j − r σma j sin(θ ) cos(θ ) σma j sin (θ ) + r 2 σma j cos (θ ) in the manner described in Sect. 2.3.4.3. At each time increment, one measurement is generated and sent to the tracker. The measurements alternate between the two targets, first a measurement on target 1, then one on target 2. (The tracker does not know this.) The values of the measurements are obtained by adding an offset (drawn from the measurement error distribution) to the actual target position at the time of the measurement. For the filter, the distribution on each target’s position is initiated with a circular normal distribution with mean equal to the target position and σ = 0.2 nm; the resulting prior position distributions are shown in Fig. 4.2. The circles shown are the 2.45 σ ellipses (circles) for these distributions. The 2.45 σ ellipse has probability 0.95 of containing the target position, so we call this the 95% containment (or uncertainty) ellipse.
4.4 Simplified JPDA
87
We use the same motion model as we did for the bearings-only tracking example in Sect. 3.1.5.1 with the exception that target speeds are constrained to be between 2 and 12 kn; the mean time between course changes is 1 h; μh = 20 deg; and σh = 10 deg. As in Sect. 3.1.5.1, the prior distribution on each target’s velocity has the annular form shown in Fig. 3.2.
4.4.2.1
Tracker Output
Figures 4.2, 4.3, 4.4 and 4.5 and Fig. 4.6 display a random sample of 100 of the 10,000 particles for each track. Figure 4.3 shows the position and velocity distributions for the two targets after the measurement on target 1 at 0.1 h. The ellipse shown is the 95% uncertainty ellipse for the measurement error. Since the separation between the targets is much larger than the measurement error, there is no ambiguity in the association of this measurement to target 1. Since no measurements have been associated with target 2, its velocity distribution retains its annular shape. However, the posterior velocity distribution for target 1 has shifted to show a strong bias in the positive direction of the horizontal axis. Figure 4.4 shows the position and velocity distributions at 0.6 h. The velocity distributions have now separated with target 1 heading to the right and target 2 heading diagonally to the right and down. Figure 4.5 shows the position and velocity distributions at time 1.0 h when the target positions coincide. The position distributions show substantial overlap. However, the velocity distributions maintain their separation. It is this separation
Fig. 4.3 Position (left) and velocity (right) distributions for targets 1 (red) and 2 (black) at time 0.1 h after incorporating the detection or target 1. The ellipse is the 95% measurement uncertainty ellipse
88
4 Simple Multiple Target Tracking
Fig. 4.4 Position (left) and velocity (right) distributions for targets 1 (red) and 2 (black) at time 0.6 h after incorporating the detection on target 2. The ellipse is the 95% measurement uncertainty ellipse
Fig. 4.5 Position (left) and velocity (right) distributions for targets 1(red) and 2 (black) at 1.0 h after incorporating the detection on target 1. The ellipse is the 95% measurement uncertainty ellipse
that allows the tracker to sort out the two tracks correctly after they cross as we see in Fig. 4.6, which shows the position distributions at 1.6 h. If the targets had switched paths as they crossed, the tracker would not have been able to determine this. This is also true of more complicated trackers that maintain multiple association hypotheses and compute the probabilities of each hypothesis. Unless there is additional identifying information, such as feature measurements, the tracker may confuse the identities of the targets. If there are features, such as the color of the target, that distinguish them, measurements of these features can be
4.4 Simplified JPDA
89
Fig. 4.6 Position distributions for targets 1 (red) and 2 (black) at 1.6 h after incorporating the detection on target 2. The ellipse is the 95% measurement uncertainty ellipse
incorporated into the measurement likelihood function and used to estimate target identity and improve state estimates. This is discussed in Sect. 4.4.3. Figure 4.7 shows a graph of target 1 association probabilities for the measurements as a function of time. Association probabilities for measurements generated by target 1 are indicated by × s. Those for measurements generated by target 2 are indicated by o s. As the target paths approach one another from time 0.7–1.5, association becomes ambiguous, and the association probabilities approach 0.5.
4.4.3 Feature-Aided Tracking Heretofore, we have considered only kinematic attributes of targets and kinematic measurements. In some tracking situations, particularly those involving multiple targets, considering other target attributes can improve tracking performance. Features. We define a feature to be an attribute of the target that is not one of the standard kinematic attributes such as position, velocity, and acceleration that define the target’s path. In the example below, color is a feature. Features are useful for associating measurements to targets and in determining the identity of a target. To
90
4 Simple Multiple Target Tracking
Fig. 4.7 Association probabilities for target 1. The × s denote association probabilities for target 1 for measurements generated by target 1, and the s denote association probabilities for target 1 when the measurements are generated by target 2. The association probability for target 2 is equal to 1 minus the association probability for target 1. Near time 1.0 h, when the targets cross, the association probabilities are near 0.5
make use of features and measurements, we must expand the target state by adding one or more feature components to the kinematic components of target state. Of course, this requires specifying a prior distribution on the joint kinematic and feature state of the target. We may receive kinematic and feature measurements separately and asynchronously or together and simultaneously. We apply the same Bayesian reasoning to feature measurements that we do to kinematic ones. We convert them to likelihood functions which we use to compute the posterior on target state. Feature measurements directly improve the estimate of features. When there is ambiguity in the origin of a measurement, feature measurements often improve the tracker’s ability to associate measurements to targets. In this way they indirectly improve the estimate of the kinematic state of the target. If a feature is dependent on kinematic state, a feature measurement will influence the kinematic state estimate and vice-versa. In the example below, we demonstrate how feature (color) measurements help the tracker make the correct association of measurements to targets when the targets are in close proximity.
4.4 Simplified JPDA Table 4.1 Confusion matrix for the color sensor. Each cell gives the probability of obtaining the measurement given the target color
4.4.3.1
91
Measurement Target color
Black
Red
Black
γ
1−γ
Red
1−δ
δ
Feature-Aided Tracking Example
Suppose we wish to track the two targets in the example in Sect. 4.4.2, and we have an additional piece of information. One target is black, the other red, but we don’t know which is which. In addition to the position measurements in the example in Sect. 4.4.2.1, we receive noisy estimates of the color of each target. The performance of the color sensor is characterized by a confusion matrix like the one in Table 4.1 where γ and δ are probabilities. This confusion matrix can be converted to a likelihood function L c for the color measurements as follows L c (Black | Black) = γ L c (Red | Red) = δ
L c (Black | Red) = 1 − δ L c (Red | Black) = 1 − γ .
(4.8)
We extend the state space S to add a color component, red or black. Let R4 be the 4-dimensional space of the example in Sect. 4.4.2, and S = R4 × {Black, Red}. We assume the distribution on target color is independent of the position and velocity distribution. For j = 1, 2, let p j (x, v) for (x, v) ∈ R4 be the prior on the kinematic state of target j q j (c) for c ∈ {Black, Red}be the prior on color for target j. Then the prior on target j is p j (x, v)q j (c) for (x, v, c) ∈ S = R4 × {Black, Red}.
(4.9)
We now extend the example in Sect. 4.4.2 to add color to the target state and add a color sensor. Since the color of a target does not change, we do not specify a dynamic model for the colors. Target 1 is red and target 2 is black, but the filter does not know this. We set the prior probabilities to be 0.5 for each color and each target. Another change is in the target tracks. Figure 4.8 shows the track of target 1 in red and target 2 in black. Notice that the targets change velocities at the time of the cross in a way that target 1 takes target 2’s original velocity after the cross and vice versa. The accumulated velocity information at the time of the cross makes it difficult for the tracker to follow the targets after the cross. This is where the feature information will help the tracker.
92
4 Simple Multiple Target Tracking
Fig. 4.8 Initial position and color distributions for targets 1 and 2
The measurement at time t is both a position and color measurement, i.e., Yt = (xt , ct ), and the likelihood functions for the measurements are conditionally independent. Thus the likelihood function for the joint measurement is L((xt , ct )|(x, v, c)) = η(xt , x, ∑t )L c (ct |c) for (x, v, c) ∈ S
(4.10)
where ∑t is the covariance function of the measurement which is obtained as described in Sect. 4.4.2 and η(·, μ, ∑) is the density function for a Gaussian distribution with mean μ and covariance matrix ∑. We applied the simplified JPDA particle filter to this problem and used the same motion model assumption as in Sect. 4.4.2. The particles have states in S, so each particle has a color assigned at time 0 with each particle having probability of 0.5 of being black or red. The confusion matrix or the color sensor is specified by Table 4.1 with γ = δ = 0.9. When a measurement is received, the association probabilities for targets 1 and 2 are calculated by (4.1) and (4.2) using the likelihood function in (4.10). Particle weights are also updated using this likelihood function. Figure 4.8 shows the initial position and color distributions for targets 1 and 2. We use × for target 1 particles and o for target 2 particles. The particle symbols are colored according to the color state of the particle. The particle symbols in Fig. 4.8 show a mixture of colors for each target reflecting the prior on color. In this figure and subsequent ones, we show a random sample of 100 of the 10,000 particles used in the particle filter. Time is given in hours, and there is one observation every 0.1 h. Observations alternate between target 1 and target 2, but, as in Sect. 4.4.2, the tracker does not know this.
4.4 Simplified JPDA
93
The left-hand side of Fig. 4.9 shows the posterior distribution on the target position and color after one observation which is on target 1. The observation has reduced the position uncertainty for target 1, but it has not done much to identify the colors of the targets. With the second observation at time 0.2 h, the right-hand side of Fig. 4.9 shows the tracker beginning to obtain good estimates of position and color. By time 0.4 h, the filter has received 2 more observations, one on each target, and the tracker has obtained good position and color estimates for both targets as shown in the left-hand side of Fig. 4.10. The right-hand side of Fig. 4.10 shows the posterior position and color distributions shortly before the cross.
Fig. 4.9 Posterior on position and color at time 0.1 (left) and 0.2 (right)
Fig. 4.10 Posterior on target position and color at time 0.4 h (left) and at 0.7 h (right)
94
4 Simple Multiple Target Tracking
The left-hand side of Fig. 4.11 shows the posterior position and color distributions at the time of the cross. Even though the targets are virtually on top of one another, they retain their color identities. The right-hand side shows the posterior position and color distributions one measurement subsequent to the cross. The distribution on target 1 is beginning to follow the new velocity resulting from the bounce at 1.0 h. The left-hand side of Fig. 4.12 shows the posteriors on position and color at time 1.2 h when the tracker has received one observation on each target after the cross. At this point, we can see that the tracker has successfully followed the targets through the bounce at 1.0 h. The right-hand side shows the posteriors at 2.0 h when the targets are well separated. The color measurements enabled the tracker to correctly follow these targets through the bounce. If the tracker had to depend solely on the velocity distributions as in the example in Sect. 4.4.2, it would have mistakenly concluded that the targets continued at the same velocities at which they entered the bounce. Even with a high probability of the sensor correctly identifying the color of a target, e.g., 0.9 in this example, there will be times when tracker fails to follow the targets. Effect of Measurement Noise on Feature Aided Tracking. As one would expect, the noisier the measurement of the feature, the less it helps in associating measurements to targets. To examine this effect, we considered the example above but allowed the value of γ = δ to vary. For each value of γ = δ, we made 100 runs of the tracker using independent values of the random variables involved. For each run, we observed whether the tracker successfully followed the targets through the cross. The results are seen in Fig. 4.13. One can see that for γ = δ ≤ 0.65, the success rate is 0. As γ = δ increases from 0.65, the success rate climbs rapidly to 1 at 0.95. For γ = δ = 0.90, as in the example above, there is a 10% chance of failure. This is
Fig. 4.11 Posterior on position and color at 1.0 h the time of cross (left) and the posteriors one measurement after the cross at time 1.1 h (right)
4.4 Simplified JPDA
95
Fig. 4.12 Posterior on position and color at time 1.2 h when the tracker has received observations on both targets after the cross (left) and the posteriors at time 2.0 h (right)
because the abrupt change in target velocities at the cross is difficult for the tracker to identify even with strong feature information. Adding a color feature and sensor to the crossing targets example in Sect. 4.4.2. Adding the color feature and sensor in the above example to the crossing targets example produced improved association estimates and better localization and
Fig. 4.13 Rate of successfully following the two targets through the cross as a function of γ = δ.
96
4 Simple Multiple Target Tracking
velocity estimates. Figure 4.14 shows the position distributions at time 1.0 in the crossing targets example with a color sensor (left) and without a color sensor (right). Figure 4.15 shows the velocity distributions at time 1.0 in the crossing targets example with a color sensor (left) and without a color sensor (right). For target 1, one can see an improvement in the position distribution and slight improvement in the velocity distribution when using a color sensor. Figure 4.16 compares the association probabilities for the crossing targets example with and without the color sensor. The reader can see that association performance improves substantially with the color sensor.
Fig. 4.14 Position distributions at time 1.0 in crossing targets example with a color sensor (left) and without color sensor (right)
Fig. 4.15 Velocity distribution at time 1.0 in crossing targets example with a color sensor (left) and without color sensor (right)
4.5 More Complex Multiple Target Tracking Problems
97
Fig. 4.16 The left-hand figure shows the association probabilities for target 1 with the color sensor. The figure on the right repeats Fig. 4.7 which shows the association probabilities without the color sensor. The × s denote association probabilities for target 1 for measurements generated by target 1, and the ◦ s denote association probabilities for target 1 for measurements generated by target 2
4.5 More Complex Multiple Target Tracking Problems In classical multiple target tracking, the problem is divided into two steps, association and estimation. Step 1 associates measurements with targets. Step 2 uses the measurements associated with each target to produce an estimate of that target’s state. Complications arise when there is more than one reasonable way to associate measurements with targets. Multiple Hypothesis Tracking (MHT) algorithms approach this problem by forming association hypotheses to explain the source of the measurements. Each hypothesis assigns the measurements to targets or false measurements. For each association hypothesis, MHT computes the probability that the hypothesis is correct and the conditional probability distribution on the joint target state given the hypothesis is correct. The Bayesian posterior is a mixture of the conditional joint target state distributions weighted by their association probabilities. This is the MHT decomposition of the multiple target tracking problem. Each element (hypothesis) of an MHT decomposition specifies which measurements are associated to which targets and which are associated to false measurements. Conditioned on of one these hypotheses, the multiple target tracking problem becomes much more tractable. Usually, it becomes a set of n independent single target tracking problems, where n is the number of targets specified by the hypothesis. The MHT decomposition transforms a difficult and daunting multiple target tracking problem into a set of problems that we know how to solve. This was the key insight of Reid (1979) who developed the first version of MHT. Since then, many versions of the algorithm have been developed. See Chong et al. (2019). A set of measurements at time t is a scan if each measurement is generated by at most one target and each target generates at most one measurement. Stone (2019) showed that if measurements arrive in scans, then MHT is the exact Bayesian solution to the multiple target tracking problem under very general conditions. However,
98
4 Simple Multiple Target Tracking
this solution quickly becomes computationally infeasible because the number of hypotheses grows exponentially with the number of measurements and targets. To deal with this problem, all practical MHT algorithms make approximations such as limiting the number of hypotheses considered. In addition, most MHT trackers display only the target distributions resulting from the highest probability association hypothesis. JPDA makes the approximation that the set of posterior target distributions computed from the soft associations it employs are a sufficient statistic for computing multiple target distributions in the future. One does not have to remember what measurements were received in the past. In addition, JPDA assumes the number of targets is given. Probabilistic Multiple Hypothesis Tracking (PMHT) developed by Streit and Luginbuhl (1993, 1994, and 1995) relaxes the assumption that a target generates at most one measurement in a scan and computes only the maximum a posteriori probability set of target tracks to approximate the solution to the multiple target tracking problem. PMHT also assumes the number of targets is given. Chapter 4 of Stone et al. (2014) provides more detailed descriptions of these methods for those who wish to delve into this subject in more detail.
References Bar-Shalom, Y., Fortmann, T.E.: Tracking and Data Association. Academic Press (1988) Chong, C.Y., Mori, S., Reid, D.B.: Forty years of multiple hypothesis tracking—a review of key developments. JAIF 14(2), 131–153 (2019) Reid, D.B.: An algorithm for tracking multiple targets. IEEE Trans. Autom. Control AC 24(6), 843–854 (1979) Streit, R.L., Luginbuhl, T.E.: A Probabilistic multi-hypothesis tracking algorithm without enumeration and pruning. In: Proceedings of the Sixth Joint Service Data Fusion Symposium, Laurel, Maryland, 14–18 June, pp. 1015–1024 (1993) Streit, R.L., Luginbuhl, T.E.: Maximum likelihood method for probabilistic multi-hypothesis tracking. In: Proceedings: Signal and Data Processing of Small Targets, vol. 2235, pp. 394–405. SPIE International Symposium, Orlando, Florida, 5–7 Apr (1994) Streit RL and Luginbuhl TE (1995) Probabilistic multi-hypothesis tracking. Technical Report 10428, Naval Undersea Warfare Center, Newport RI. Stone, L.D.: Conditions for multiple hypothesis tracking to be an exact Bayesian solution to the multiple target tracking problem. J. Adv. Inf. Fusion 14(2), 167–175 (2019) Stone, L.D., Streit, R.L., Corwin, T.L., Bell, K.L.: Bayesian Multiple Target Tracking, 2nd edn. Artech House, Boston (2014)
Chapter 5
Intensity Filters
5.1 Introduction Chapter 4 introduces the reader to multiple target tracking in situations in which the observations are in the form of contacts that are received in scans. In this case the main obstacle to successful tracking is associating contacts with targets. Chapter 4 briefly discusses multiple hypothesis tracking (MHT) which is the theoretically correct Bayesian solution to the tracking problem in this situation. MHT attempts to estimate the multitarget state, i.e., the number and state of each target present. However, computational considerations require approximations. A typical approximation limits the number of association hypotheses that MHT carries. Joint probabilistic data association (JPDA) and probabilistic multiple hypothesis tracking (PMHT) represent other approximations or variations on MHT. Each of these deals explicitly with the problem of associating contacts to targets and producing a target state estimate for each target. By contrast, intensity filters seek to estimate the number of targets per unit state space. Estimating the number of targets that are present in a surveillance region is a difficult problem that involves more than merely counting the number of sensor measurements in a scan. Not only can sensors fail to detect targets that are present, but spurious clutter measurements (false alarms) can be superimposed with targetoriginated measurements. Moreover, targets can leave the surveillance region, and new ones can arrive at any time without notice. In contrast to the target motion models used in previous chapters, intensity filters assume a discrete time Markov process motion model for the targets and assume that all targets have the same motion model and detection characteristics. The latter assumption is more restrictive than the motion model assumptions for MHT, JPDA, and PMHT. Estimating the number of targets per unit state space is very closely related to estimating the distribution of multitarget state; however, they are phenomenologically different problems. Intensity filters do not explicitly compute associations (or association probabilities) of contacts to targets nor do they estimate target tracks. Instead, they compute intensity functions that estimate the density (in target state © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. D. Stone et al., Introduction to Bayesian Tracking and Particle Filters, Studies in Big Data 126, https://doi.org/10.1007/978-3-031-32242-6_5
99
100
5 Intensity Filters
space) of the number of targets. Chapter 7 of Stone et al. (2014) shows how to combine multitarget intensity filtering with likelihood ratio tracking to estimate the number of targets present and produce track estimates for the identified targets. We use finite point processes to develop intensity filters, and in keeping with the approach of this book, take a Bayesian point of view. Throughout this chapter we use the finite point process terminology that is pervasive in both the mathematical and telecommunication literatures. This chapter is adapted from Chap. 5 of Stone et al. (2014). Organization of the Chapter. Section 5.2 defines nonhomogeneous Poisson point processes (PPPs). PPPs are fully characterized by their intensity functions, a property that makes them useful as approximations of more general finite point processes. They are used in this chapter to approximate the intensity of the Bayes posterior process, which, as we will see, is a finite point process that is not Poisson. Transformations of PPPs are employed to model target motion, detections, measurements, and births and deaths of targets. Further discussion of PPPs and their applications can be found in Streit (2010). Section 5.3 derives the intensity filter (iFilter) using Bayesian analysis and basic properties of PPPs. The derivation is very much in the spirit of Streit and Stone (2008) and Chap. 4 of Stone et al. (2014); that is, the measurement-to-target assignments are treated explicitly in the derivation of the algorithm (but do not appear in the iFilter algorithm itself). The derivation is applied to the traditional single target state space S augmented with a null point φ that facilitates Markovian modeling of target birth and death. The Bayes posterior is seen to be a finite point process, but not a PPP. However, we can compute the intensity of the Bayes posterior process explicitly. To incorporate new sensor scan information in a recursive manner, we approximate the Bayes posterior point process on S + = S ∪ φ by a PPP whose intensity function is equal to the intensity of the posterior process. This approximation closes the Bayesian recursion. Note that this approximation is applied at every scan, so the iFilter intensity at scan n is the intensity of the n-scan Bayesian posterior process only for the first scan. Section 5.4 presents an example of the use of an iFilter to detect and track multiple targets that appear and disappear at different times during the time frame of the example. Section 5.5 gives some notes on topics not included in this chapter, as well as some background material on methods and applications.
5.2 Point Process Model of Multitarget State The simplest and perhaps most natural way to define multitarget state is to say that it comprises the number of targets and a state (e.g., position and velocity) for each of the individual targets. One way to proceed from this definition is to assume a maximum number N < ∞ of targets and then define the multitarget state to be the vector whose N components are the state vectors of the individual targets with the
5.2 Point Process Model of Multitarget State
101
state φ indicating target not present. This is the approach taken in Chap. 4 of Stone et al. (2014). Under the conditions of the independence theorem in Sect. 4.3 of Stone et al. (2014), the conditional posterior distribution on multitarget state, conditioned on an association hypothesis, factors into a product of posteriors, one for each target. Generally, the unconditioned Bayes posterior does not factor, and it is necessary to evaluate it on the full state space. The methods of Chap. 4 of Stone et al. (2014) are designed for this problem. For intensity filters, the notion of multitarget state is weakened in the following way. The multitarget state contains a state for each target, but knowledge of which state belongs to which target is not retained. Thus, it is a less informative model than the one described above. The change means that the multitarget state is no longer a vector of target states but an ordered pair of the form (n, {x1 , ..., xn })
(5.1)
where n is the number of targets and {x1 , ..., xn } ∈ E(S) is the set of all finite lists of points, or target states, in S. (Lists can have repeated elements. See Sect. 5.2.1.) Permuting the order of the x’s does not change the meaning of the multitarget state in (5.1), but it does change the meaning of the multitarget state vector. Statisticians would say that targets are statistically identifiable in the multitarget vector but not in (5.1). Using the state space E(S) changes the mathematical character of the random variables used to model the multitarget process. With a multitarget vector, the multitarget process has outcomes that are vectors in (S ∪ φ) N . In contrast, with state defined by (5.1) the multitarget process has outcomes in E(S). A finite point process is a random variable whose outcomes are in the set of all finite lists of points in a given set. The advantage of viewing finite point processes as random variables is that we benefit immediately from the rich Bayesian theory that is available in the open literature. Various summary statistics for these processes have long been known and used in practice, and they can be adapted for use in multitarget tracking. Multitarget Process. In this chapter we shall use PPPs as models for the multitarget process. Sections 5.2.1–5.2.3 define PPPs and provide some basic properties. In the spirit of Chapts. 2–4, we specify a target motion model and a sensor measurement model to complete the definition of the multitarget process. The motion and measurement processes are defined in Sects. 5.2.4 and 5.2.5. Thinning is considered in Sect. 5.2.6; thinning will be used to model target death. Section 5.2.7 discusses the extended, or augmented, state space used to derive the intensity filter.
102
5 Intensity Filters
5.2.1 Basic Properties of PPPs Let be the random variable whose values are realizations of a PPP. Every realization = ξ of a PPP on a bounded set R is comprised of a number n ≥ 0 and n locations {x1 , . . . , xn } of points in R. The realization is denoted by the ordered pair ξ = (n, {x1 , . . . , xn }). The set notation signifies that the ordering of the points xi is irrelevant. However, the points are not necessarily distinct. It is better to think of {x1 , . . . , xn } as a list. From the definition below, it will be clear that for a continuous space S, the list has no repeated elements with probability one. However, when S has discrete elements, the list can contain multiple copies of the discrete elements. Including n in the realization ξ makes expectations easier to define and manipulate. If n = 0, then ξ = (0, ∅), where ∅ is the empty set. The event space of the PPP is E(R) = {(0, ∅)} ∪∞ n=1 (n, {x 1 , . . . , x n }) : x j ∈ R, j = 1, . . . , n .
(5.2)
This is also the event space of the Bayes posterior point process that is derived later in the chapter. Every PPP on S is characterized by a function λ(s), s ∈ S, called the intensity. If λ(s) = c for all s, where c ≥ 0 is a constant, the PPP is said to be homogeneous; otherwise, it is nonhomogeneous. It is assumed that λ(s) ≥ 0 for all s and, for continuous spaces S, 0≤
R
λ(s) ds < ∞
(5.3)
for all bounded subsets R of S, that is, subsets that are contained in some finite radius sphere. The intensity λ(s) need not be a continuous function (e.g., it can have step discontinuities). Bounded subsets are “windows” in which PPP realizations are observed. By stipulating a window, we avoid issues with infinite sets. For example, realizations of homogeneous PPPs on S = Rm have an infinite number of points but only a finite number in any window. The intensity function can be defined when a countable number of discrete elements are joined to S, and when the space comprises only a countable number of discrete elements (i.e., it has no continuous component). For further discussion of these cases, see Sect. 5.2.7. Definition. A finite point process with intensity λ(s), s ∈ S, is a PPP if and only if its realizations ξ on bounded subsets R are generated (simulated) by the following procedure. If R λ(s) ds = 0, then ξ = (0, ∅). If R λ(s) ds > 0, the realization ξ is obtained as follows: • Step 1. The number n of points is a sample from the discrete Poisson variate N with λ(s) ds n p N (n) ≡ Pr{N = n} = exp − R λ(s) ds ( R n! )
(5.4)
5.2 Point Process Model of Multitarget State
103
The mean number of points is E[N ] = R λ(s) ds. • Step 2. The n points x j ∈ R, j = 1, . . . , n, are obtained as i.i.d. samples of a random variable X on R with probability distribution function (pdf) p X (s) =
λ(s) R λ(s) ds
for s ∈ R
(5.5)
The output of step 2 is the ordered n-tuple ξo = (n, (x1 , . . . , xn )).). • Step 3. The realization of the PPP is ξ = (n, {x1 , . . . , xn }); that is, the order in which the points were generated is “forgotten.” Note that the list {x1 , . . . , xn } is a set of distinct elements with probability one if the intensity is an ordinary function and the space S is continuous (i.e., there are no point masses). If the space S is discrete, or like the augmented space S + has discrete components, then the list {x1 , . . . , xn } can have repeated elements.
5.2.2 Probability Distribution Function for a PPP The generation procedure above defines the pdf for the random variable corresponding to the PPP. The pdf p (ξ ) of the unordered realization ξ is p (ξ ) = p N (n) Pr{{x1 , . . . , xn }|N = n} n n
λ(x j ) R λ(s) ds n! λ(s) ds = exp − n! λ(s) ds R j=1 R
n = exp − λ(s) ds λ(x j ) for n ≥ 0. R
(5.6)
j=1
The n! in the numerator of (5.6) arises from the fact that there are n! equally likely ordered i.i.d. trials that can generate the list {x1 , . . . , xn }. For the case n = 0 in (5.6), we follow the usual convention that a product from j = 1 to 0 is equal to one. The pdf of is parameterized by the intensity function λ. The expression (5.6) is used in estimation problems involving data sets for which data order is irrelevant. The pdf of the ordered realization ξo is p (ξ )/n!. Several useful and important properties of PPPs can be derived directly from the probability structure just defined. These will be presented below. Further discussions of the properties of PPPs are readily available (e.g., Streit (2010) and Kingman (1993)).
104
5 Intensity Filters
5.2.3 Superposition of Point Processes PPPs are superimposed, or summed, if their realizations are combined into one event. Let and ϒ denote two PPPs on S, and let their intensities be λ(s) and ν(s). If (n, {x1 , . . . , xn }) and (m, {y1 , . . . , ym }) are realizations of and ϒ, respectively, then the combined event is (n + m, {x1 , . . . , xn , y1 , ..., ym }). The sum + ϒ is a point process on S, but it is not necessarily a PPP unless and ϒ are independent. Knowledge of which points originate from which realization is assumed lost, and this fact is crucial to showing that the probability of the combined event is equal to that of a realization of this combined event for a PPP with intensity λ(s) + ν(s). More generally, the superposition of any number of independent PPPs is a PPP, and its intensity is the sum of the intensities of the constituent PPPs. See Sect. 2.5 of Streit (2010).
5.2.4 Target Motion Process Targets move according to a Markov process. A PPP whose points undergo Markov motion is still a PPP. Let q be the transition function for the Markov motion process, so that the probability that the point x ∈ S transitions to x ∈ S is q(x |x). If ξ = (n, {x1 , . . . , xn }) is a realization of the PPP with intensity λ(x), the realization after the Markov transition is η ≡ (n, {x1 , . . . , xn }) ∈ E(S), where x j is a realization of the pdf q(·|x j ), j = 1, . . . , n. The transformed process, denoted q( ), is a PPP on S with intensity ν(x ) =
q(x |x)λ(x) d x.
(5.7)
S
A proof of this result is given in Sect. 5.2.8 of Stone et al. (2014).
5.2.5 Sensor Measurement Process The sensor measurement process maps a point x ∈ S to a point (measurement) y ∈ T . The measurement space T can be different from S. Let be a target PPP on S. If l(·|x) denotes the measurement likelihood function for a measurement originating at x ∈ S, then the measurement process is a PPP and υ(y) =
l(y|x) λ(x) d x, y ∈ T S
(5.8)
5.2 Point Process Model of Multitarget State
105
is its intensity function. A proof of this result is given in Sect. 5.2.8 of Stone et al. (2014).
5.2.6 Thinning a Process Thinning a point process by probabilistically reducing the number of points in the realizations produces another point process. Let be a point process on S. For every x ∈ S, let 1−α(x), 0 ≤ α(x) ≤ 1, be the probability that the point x is removed from any realization that contains it. For the realization ξ = (n, {x1 , . . . , xn }), the point x j is retained with probability α(x j ) and dropped with probability 1 − α(x j ). Thinning of the points is independent. The thinned realization is ξα = (m, {x1 , . . . , xm }), where m ≤ n is the number of points that survive the thinning. Knowledge of the number n of points in ξ is assumed lost in the thinned process. This is called Bernoulli thinning because the thinned process is the result of a sequence of Bernoulli trials on the points in the realization ξ. If is a PPP, then the thinned process, denoted by α , is a PPP and λα (x) = α(x) λ(x)
(5.9)
is its intensity. In addition, the points removed from the realizations of comprise a PPP, 1−α , whose intensity is λ1−α (x) = (1 − α(x)) λ(x). A proof of this result is given in Sect. 5.2.8 of Stone et al. (2014). It is surprising perhaps that for PPPs the thinned processes α and 1−α are independent. The coloring theorem (Chap. 6 of Kingman (1993)) generalizes this result to multiple thinned processes: If a point x in a realization of the PPP is colored (i.e., labeled) with one of r colors according to a discrete random variable with outcome probabilities p1 (x), ..., pr (x) that sum to one and the coloring of x is independent of the coloring of x for x = x, then the point process j corresponding to the j th color is a PPP with intensity λ j (x) = p j (x) λ(x); moreover, these PPPs are independent.
5.2.7 Augmented Spaces Straightforward modifications are needed for PPPs on the augmented space S + = S ∪ φ. The intensity is defined for all s ∈ S + . The intensity λ(φ) is dimensionless, unlike the values of λ(s) for s ∈ S. The bounded sets of S + are R and R+ ≡ R ∪ φ, where R is a bounded subset of S. Integrals of λ(s) over bounded subsets of S + must be finite; thus, the requirement (5.3) holds and is supplemented with the integral over the discrete-continous space R+
106
5 Intensity Filters
0≤
R+
λ(s) ds ≡ λ(φ) +
R
λ(s) ds < ∞
(5.10)
The event space is E(R+ ). The event space E(R) is a proper subset of E(R+ ). The pdf of this PPP is unchanged, except that the integrals are over either R or R+ , as the case may be. Superposition and thinning are unchanged. The intensity functions of the target motion and sensor measurement processes are also unchanged, except that the integrals are over S + . Other cases are treated similarly. When S is augmented with a countable number of discrete elements, say {φi }, with corresponding intensities {λ(φi )}, the term λ(φ) in (5.10) is replaced by the sum ∑i λ(φi ). When the space comprises only discrete elements, then the integral in (5.10) is omitted.
5.3 iFilter In this section we show that the multitarget intensity filter (iFilter) can be understood in elementary terms. We present a self-contained derivation from Bayesian principles, starting from the initial assumption that the multitarget process at the previous time step is a PPP with known intensity function. The Bayes information-updated multitarget process is another point process which is not, in general, a PPP. However, we show that all the single-target marginal distributions of the posterior are identical. This leads to a PPP approximation of the Bayes posterior process, and this approximation closes the Bayes recursion—we began the recursion with a PPP model of multitarget process, and we end with a model of the same mathematical form. The derivation assumes familiarity with single target Bayesian filtering and with the properties of PPPs at the elementary level discussed in the previous section. The augmented state space S + plays a key role in the combinatorics of the measurement to target assignments. On S the intensity filter reduces to the PHD intensity filter as discussed in Sect. 5.3 Stone et al. (2014). The methods used in this section reveal the Bayesian character of the filtering, as well as the combinatorial modeling of the assignment problem.
5.3.1 Augmented State Space Modeling The auxiliary point φ needs interpretation in tracking applications. For single target tracking, it represents the “target absent” hypothesis. This interpretation works because of the assumed dichotomy—a single target is either present or not. The interpretation of φ in the multitarget setting is more complex because there are time-varying numbers of targets, each of which may or may not produce a measurement, and there are spurious clutter measurements. Clutter measurements can be produced by random fluctuations in the environment or in the sensor itself. More
5.3 iFilter
107
often clutter measurements arise from physical phenomena (such as objects on the ocean bottom in the case of active acoustic sensors) that produce responses that are difficult to distinguish from that of a real target. Over time clutter measurements tend to be kinematically inconsistent with target motion. By contrast (real) targets in S generate measurement sequences that are consistent with target motion over time. For multitarget intensity filters we expand the role of φ to include the generation of clutter measurements. We do this by specifying an expected number λ(φ) of clutter “targets.” Each clutter target has probability P D (φ) of generating a clutter measurement. The clutter measurements are assumed to be a PPP on the measurement space with P D (φ)λ(φ) being the expected number of clutter measurements. This is an endogenous model of clutter. It is more common in the literature to superimpose spurious clutter data from an unmodeled (exogenous) source upon the targetoriginated measurements. While a priori there is no obvious reason to prefer one clutter model over the other (they are both somewhat artificial), the model we have chosen has the virtue of allowing us to compute the distribution of the Bayes posterior point process in a simple and straightforward fashion. The motion on the augmented space S + is defined by the transition func model tion q x|x , x, x ∈ S + . For x ∈ S, the quantities q(x | φ) and q(φ | x) have natural interpretations as the probabilities of target birth and death, respectively. The transition function must be defined so that q(x|x ) d x = q(x|φ) + q(x|x ) d x = 1 for x ∈ S + (5.11) S+
S
where the integral over the discrete–continuous space S + is the one used in (5.10). The sensor measurement likelihood function must also be defined on S + . For x ∈ S, l(y | x) for y ∈ T is the usual measurement likelihood function. For x = φ, l(y | φ) is defined to be the probability that a clutter measurement has the value y.
5.3.2 Predicted Detected and Undetected Target Processes Denote the multitarget point process at the previous time step by , the motion updated multitarget process by − . and the Bayesian information updated process is by + . These processes are defined on S + . For the filters considered here, assumed to be a PPP. Denote its intensity function by f ( · ). Target motion from the previous time step to the current one is assumed to be Markovian. For x ∈ S + , let d(x) denote the probability that a target at x does not survive the transition to the next time step. It is a thinning probability and does not need to integrate to one. Given the transition model q(x|x ) on the augmented space S + , the motion-updated process − is a PPP on S + and its intensity is
108
5 Intensity Filters
f
−
(x) =
S+
q x|x 1 − d(x ) f (x ) d x
for x ∈ S + .
(5.12)
This expression is easily understood. Target survival is an independent thinning process with survival probability 1−d(x). Using (5.9) we see that the surviving target process is a PPP with intensity (1 − d(x)) f (x). Target motion is a transformation of the form (5.7), so the motion-updated surviving multitarget process is a PPP with intensity (5.12). The motion-updated target process is split by the detection process into two processes—detected and undetected. Let P D (x) denote the probability of detecting a target located at x. The detected target process, denoted by − D, is the motion updated target process after thinning by P D (x), so it is a PPP. Using (5.9) we see that its intensity is f
−
D
(x) = P D (x) f
−
The undetected target process is denoted by f
−
U
(x) = (1 − P D (x)) f
for x ∈ S + .
(x)
−
−
(5.13)
U ; its intensity is for x ∈ S + .
(x)
(5.14)
Both are PPPs, so by the coloring theorem (with two colors), the detected and undetected target processes are independent.
5.3.3 Measurement Process The measurement process, denoted by ϒ, is a PPP on T . The sensor likelihood function is l(y|x). Detected targets are assumed to generate exactly one measurement. The intensity function for the measurement process is
−
λ(y) =
l(y | x) P D (x) f (x) d x S+ − l(y | x) P D (x) f (x) d x = λφ (y) +
(5.15)
S
where the clutter measurement intensity is λφ (y) = l(y | φ) P D (φ) f
−
(φ).
Since T
l(y|x) dy = 1 for x ∈ S + ,
(5.16)
5.3 iFilter
109
the expected number of measurements is equal to the expected number of detected targets; that is,
λ(y) dy = T
S+
P D (x) f
−
(x) d x =
f
−
D
S+
(x) d x.
(5.17)
This result follows from (5.13) and (5.15).
5.3.4 Bayes Posterior Point Process (Information Update) Let υo = m, (y1 , ..., ym ) denote an ordered realization of the measurement process ϒ at the current scan, where m > 0. (The special case m = 0 is discussed in Sect. 5.3.7.) Let ξo = n, (x1 , ..., xn ) be an ordered realization of the multitarget detected process − D. Then, using Bayes’ theorem, the posterior distribution on ξo is given by Pr{ξo |υo } = Pr{υo |ξo }
Pr{ξo } . Pr{υo }
(5.18)
We now compute the three functions on the right-hand side of (5.18). The basic properties of the measurement to target assignment problem yield the conditional probability Pr{υo |ξo } as follows: Pr{υo |ξo } =
⎧ ⎨ ⎩
1 m!
m ∑ l yσ ( j) |x j for m = n
(5.19)
σ j=1
for m = n
0
where the sum is over σ ∈ Sym(m), the set of all permutation on the first m positive integers. The sum over permutations reflects the fact that we assume a priori that all assignments of measurement to targets are equally likely. The division by m! is needed because we are dealing with ordered realizations of ξ and υ. (It makes the integral of the left-hand side of (5.19) over all possible values of the y’s equal to one.) The reason that the likelihood function is zero for m = n is the nature of the augmented multitarget state space—all measurements are accounted for by detected targets in S or φ. Since the predicted multitarget and measurement processes are PPPs, we have from the pdf for ordered realizations that 1 Pr{ξo } = p n!
1 − D (ξ ) = exp − f n! S+
−
D
(s) ds
n j=1
f
−
D
(x j )
(5.20)
110
5 Intensity Filters
m 1 1 pϒ (υ) = exp − λ(y) dy Pr{υo } = λ(y j ). m! m! T j=1
(5.21)
Substituting (5.19)–(5.21) into (5.18) and using the identity (5.17) gives
Pr{ξo |υo } =
⎧ ⎨ ⎩
1 m!
m ∑ l ( yσ ( j ) |x j ) f
−D
λ(yσ ( j ) )
σ j=1
0
(x j )
for m = n
(5.22)
for m = n.
The expression (5.22) clearly has a different mathematical form from the pdf (5.20) of an ordered realization of a PPP. Consequently, the Bayes posterior point process is a finite point process but not a PPP. To close the Bayesian recursion, we approximate it with a PPP.
5.3.5 PPP Approximation The Bayes posterior distribution (5.22) has the property that its single target marginal distributions are all equal. That is, integrating (5.22) over all the x’s except x (say) gives Pr{x |υo } ≡
Pr{ξo |υo } d x1 · · · d x −1 d x +1 · · · d xm − m 1 ∑ ∑ l yσ ( ) |x f D (x ) = m! r =1 σ ∈Sym(m) λ(yσ ( ) ) S+
···
S+
σ ( ) = r
=
1 m
m ∑ r =1
−
l(yr |x ) f D (x ) . λ(yr )
(5.23)
This identity holds for all x ∈ S + . The Bayes posterior density is approximated by the product of its single target marginal densities; that is, Pr{x1 , ..., xm |υo } ∼ = Pr{x1 |υo } · · · Pr{xm |υo }.
(5.24)
The product approximation is called the mean field approximation in the machine learning community, where it is widely used to factor multivariate pdfs into a product of marginal (or latent) variables to greatly reduce storage and facilitate subsequent analysis (Jebara (2004)). Given the product approximation, it is natural to approximate the Bayes posterior by a PPP whose intensity function is proportional to Pr{x|υo }, x ∈ S + , where the constant of proportionality is the expected number of targets. Matching the expected number of targets to that of the Bayes posterior
5.3 iFilter
111
process gives the proportionality constant m, since the Bayes posterior always has exactly m targets due to the augmented space S + . Hence the intensity of the PPP approximation of the detected process − D is f
+
D
(x) = m Pr{x|υo } =
m ∑ l(yr |x) f r =1
−
D
(x)
λ(yr )
, x ∈ S+.
(5.25)
Equivalently f
+
D
(x) =
m ∑ l(yr |x)P D (x) f r =1
−
(x)
λ(yr )
(5.26)
where we have substituted (5.13) into (5.25) to obtain (5.26).
5.3.6 Correlation Losses in the PPP Approximation The locations of points in a PPP are uncorrelated. Knowing the location of one point does not affect the distribution of the other points in the PPP. This is a consequence of the locations being obtained as i.i.d. draws from a given distribution. By contrast, in the Bayes posterior point process, point (target) locations are correlated. Knowing the location of one target tells us something about the location of other targets. In Bozdogan et al. (2013), it is shown that the presence of a target at x in the posterior point process reduces the likelihood of finding another target nearby. This topic is discussed further in Sect. 5.7.1 of Stone et al. (2014).
5.3.7 The iFilter Recursion The undetected process − U is by definition unaffected by scan data, so its Bayes update is the predicted PPP whose intensity is given by (5.14). Because the detected and undetected processes are independent, their Bayes updates are mutually independent as well. Thus, the intensity of the PPP approximation to the Bayes posterior process is the sum of the intensities of the Bayes updates of the detected and undetected processes; that is,
112
5 Intensity Filters
f
+
(x) = f
−
U
+
(x) + f
= (1 − P D (x)) f
D
−
(x)
(x) +
= f
−
(x) 1 − P (x) + D
m ∑ l(yr |x)P D (x) f r =1 m ∑ r =1
−
(x)
λ(yr )
l(yr |x)P D (x) . λ(yr )
(5.27)
For m = 0, the Bayes posterior process comprises only the predicted undetected process. Thus, for m = 0, the summation in (5.27) is omitted. Substituting the expression for λ given in (5.15) yields f
+
(x) = f
−
(x) 1 − P (x) + D
m ∑ r =1
l(yr |x)P D (x) λφ (yr ) + S l(yr | x) P D (x) f
−
(x) d x
.
(5.28) This is the intensity filter on the augmented space S + . Note that although all possible measurement-to-target assignments are used in the derivation, the iFilter recursion involves an expectation over these assignments, so it requires neither the enumeration of these assignments nor the computation of their probabilities. In contrast, such calculations are necessary in MHT and JPDA methods.
5.4 Example In this section, we present an example of the use of an iFilter to track multiple targets which appear and disappear at different times during the example. The example features large numbers of Poisson distributed false target detections. The target state vector is s = (x, v) where x is two dimensional in position and v is two dimensional in velocity. The position state space is a square R with −800 to 800 units along each side giving it an area of (1600)2 square units. The velocity space is a square with −10 to 10 units per scan on each side. The product space of these two squares is the target state space S. There are 150 scans of data, and there are four targets that enter and leave the search space at different scans. • • • •
Target 1 enters at scan 1 and exits at scan 76 Target 2 enters at scan 21 and exits at scan 96 Target 3 enters at scan 41 and exits at scan 116 Target 4 enters at scan 61 and leaves at scan 136.
They follow what is called an almost constant velocity motion model in discrete time which is a Markov model specified as follows. Let X (t) and V (t) be the
5.4 Example
113
random variables that are the position and velocity components of the target’s state at scan t for t = 0, 1, . . .. Each target enters the state space with state distribution (X (t0 ), V (t0 )) , where t0 is the time of entry. For any set A, let U (A) be the uniform probability distribution over A. Then X (t0 ) = (X 1 (t0 ), X 2 (t0 )) and V (t0 ) = (V1 (t0 ), V2 (t0 ))
(5.29)
X 1 (t0 ) ∼ U ([−800, −400] ∪ [400, 800]) X 2 (t0 ) ∼ U ([−800, −400] ∪ [400, 800]) U ([−10, 0]) if X 1 (t0 ) > 0 V1 (t0 ) ∼ U ([0, 10]) otherwise U ([−10, 0]) if X 2 (t0 ) > 0 V2 (t0 ) ∼ U ([0, 10]) otherwise.
(5.30)
where
The distribution on V (t0 ) is chosen so that the target tracks stay in the target state space for the duration of the example. Once a target enters R, it moves according to the almost constant velocity motion model given as follows where the superscript T indicates transpose. (X (t + 1), V (t + 1))T = F(X (t), V (t))T + wt ,
where wt ∼ N (0, Q) for t ≥ t0 (5.31)
and where ⎛
1 ⎜0 F =⎜ ⎝0 0
0 1 0 0
t 0 1 0
⎛ (t)3 ⎞ 0 3 ⎜ 0 t ⎟ ⎜ 2 ⎟, Q = σ ⎜ 2 p 0 ⎠ ⎝ (t) 2 1 0
0 (t)3 3
0 (t)2 2
(t)2 2
0 t 0
0
⎞
⎟ ⎟ ⎟, 0 ⎠ t
(t)2 2
(5.32)
t = 1, and σ p = 0.15.. In addition wt is independent of ws for t = s. At each time, any target present is detected with probability 0.95. The number of false targets follows a Poisson distribution with a mean of 30 false targets per scan and are detected with probability 0.99. The location of a false alarm is determined by an independent draw from the U (R) distribution. The number and location of false targets are independent from one scan to another. The iFilter does not know the mean of the false target distribution. By contrast, the mean number of false targets per scan is an input to the PHD filter. Instead, the iFilter estimates the number of false targets at each scan as follows λφ = l(y|φ)P D (φ) f
−
(φ)
(5.33)
114
5 Intensity Filters
where l(y|φ) = 1/(1600)2
(5.34)
P D (φ) = 0.99.
Since we are using s = (x, v) for target state in this example, it is convenient to restate the iFilter recursion (5.28) with s in place of x for the target state. With this change (5.28) becomes f
+
(s) = f
−
(s) 1 − P D (s) +
m ∑ r =1
l(yr |s)P D (s) λφ (yr ) + S l(yr | s) P D (s) f
−
(s) ds
.
(5.35) Measurements are estimates of position. Let H = (1, 1, 0, 0) so that H s T = x is the position component of the target state vector. Let Y be the random variable representing a measurement. Then Y = H sT + ε where ε ∼ N (0, 100I2 ),I2 is the 2-dimensional identity matrix, and N (μ, ∑) is a Gaussian distribution with mean μ and covariance ∑ . The likelihood function for the measurement Y = y given a target in state s = (x, v) is detected is l(y|s) = η y, H s T , 100 I2 for s ∈ S
(5.36)
where η(·, μ, ∑) is the density function for N (μ, ∑). A summary of the iFilter results is shown in Figs. 5.1 and 5.2. In Fig. 5.1 we see the tracks of four targets which appear and disappear over the 150 scans. This graphic is the superposition of the iFilter results from the 150 scans. We can see there are scans on which targets are not detected. On the left-hand side in Fig. 5.2, we see a graph of the estimated number of targets. It starts with 1 target, rises to 4, and then declines to 0 by the end of the 150 scans. The right-hand side shows the estimated number of false targets in each scan. Recall these tracks are produced without explicitly assigning detections to tracks or targets. On the other hand, this a simple case where there is little to no ambiguity in assigning detections to targets. The almost constant velocity motion model is a commonly used and convenient model for Kalman filtering problems. It is an unrealistic motion model, but it is satisfactory when measurements are received rapidly with small measurement errors.
5.4 Example
115
Fig. 5.1 This iFilter intensity map shows the tracks of the four targets which enter and leave on different scans. This map is the superposition of the 150 scan intensity maps from the iFilter. Dark blue indicates low intensity, yellow indicates high. The dashed lines show the targets’ actual tracks
Fig. 5.2 The red line in the left-hand figure shows the estimated number of targets as a function of scan. The black line is the actual number of targets. The red line in the right-hand figure shows the estimated number of false targets as a function of scan. The black line is the mean of the Poisson distribution for the number of false targets per scan
Computing this Example. The example1 above was computed using a particle filter approach. The computation employed 120,000 particles to represent the iFilter intensity surface resulting from a scan. The initial positions of these particles were drawn from a uniform distribution over the square R for the position component and a 1
This example was computed by R. Blair Angle of Metron.
116
5 Intensity Filters
uniform distribution over [−10, 10] × [−10, 10] for the velocity component. Each particle was given an initial intensity of 1/120000. As a result, the particle filter was initialized with the expected number of targets equal to 1. If we wish to initialize the filter with n expected number of targets, we could give each particle the weight of n/120000. We have found the iFilter to be robust to this initialization. It quickly identifies the expected number of targets present regardless of the value of n used for the initialization. We want the iFilter to track the number of targets and to detect new ones as they arrive in the target state space. We have chosen a particle filter representation of the iFilter intensity function because it is more economical and flexible than a gridded representation. However, after a few measurement updates, the particles tend to be concentrated near the locations of existing targets. This can result in very few if any particles being in regions where new targets may appear which would make it very difficult for the iFilter to detect a new target in those sparse regions. To avoid this problem, we add new particles uniformly over the state space after each measurement update. At each scan, 60,000 particles are birthed (added) to the 120,000 already present. These added particles initially have intensity 0. Each particle, both added and original, is motion updated to its state in the next scan using the almost constant velocity motion equations given in (5.31) and (5.32). In addition, each particle receives a small amount of intensity from the q(sn |φ) term in (5.11). Since there are 180,000 particles at this stage, we set N = 180,000 and 2 × 10−4 , q(φ|sn ) = 10−6 for n = 1, . . . N 180,000 q(φ|φ) = 1 − 2 × 10−4
q(sn |φ) =
(5.37)
where sn is the state of the n th particle. As a result, the motion update adds a small amount of intensity to each of the new (as well as old) particles so that all N = 180,000 particles have a positive intensity. We compute the particle version of the iFilter Eq. (5.35) as follows. For each measurement yr , we compute the integral in (5.35) by
N ∑
l(yr |sn )P D (sn ) f
−
(sn )
n=1
where P D (sn ) = 0.95 and l(yr |sn ) is given by (5.36). For n = 1, . . . , N , we compute (5.35) by
5.5 Notes
117
f
+
(sn ) = f
−
(sn ) 1 − P (sn ) + D
m ∑ r =1
and f
+
(φ) = f
−
(φ) 1 − P (φ) + D
λφ (yr ) +
m ∑ r =1
l(yr |sn )P D (sn )
λφ (yr ) +
∑N
n =1 l(yr |sn )P
∑N
D (s ) f n
−
(sn )
λφ (yr )
n =1 l(yr |sn )P
D (s ) f n
−
(sn )
(5.38) where for a measurement Y = y, λφ (y) = l(y|φ)P D (φ) f
−
(φ),P D (φ) = 0.99, and l(y|φ) = (1600)−2 .
(5.39)
The particles are resampled to obtain 120,000 particles of equal intensity. When a scan is displayed, the square R is gridded into cells. For each cell, the sum of the intensities of particles in that cell is computed to obtain the intensity in that cell. The cells are color coded according to their intensities with the lowest relative intensities shown in blue and the highest in yellow.
5.5 Notes Much more can be done with the finite point process model for multiple targets. For example, under the exogenous model for false alarms and target termination, an explicit form of the probability distribution and intensity function of the Bayes posterior finite point process can be derived. The posterior is seen to be the superposition m + 1 independent point processes, where m is the number of measurements. The remaining one is a Poisson point process for false alarms. The m + 1 processes are independent, so the intensity of their superposition is the sum of their intensities. Without going into details, the intensities of these processes are the terms in the iFilter update (5.28). The basic notion afoot is to give up on identifiable targets and study targets as an ensemble (not unlike what is done in statistical mechanics and thermodynamics). This “relaxed” multitarget model introduces non-obvious symmetries into the underlying probability structure, and that can lead to new filters with novel applications. The JiFi (JPDA intensity Filter) is one. See Streit (2016) and Angle and Streit (2019). It has potential applications for tracking targets that are extended in the sense that at each scan they may generate time-varying numbers of measurements from the sensor. One example is in high resolution radar problems. Another is drone group tracking in which a drone swarm is treated as single target because individual drones are difficult to distinguish from one another. The symmetries used to derive the iFilter and the JiFi filter are very difficult to see, as we have said, but it also happens that the symmetries introduced by assuming targets are not identifiable do not necessarily lead to low complexity filters. Whether
118
5 Intensity Filters
or not the symmetries help depends on the specifics of the combinatorial complexity of the application This is not good news for those trying to design and implement practical systems, but there is a silver lining. When symmetries do lead to low complexity filters, it turns out that the natural way to derive the filter is to use generating functions (GFs) and generating functionals (GFLs). These powerful tools are non-intuitive and outside the wheelhouse of many practicing engineers. The information update step of the iFilter is a conceptual first cousin of two justly famous algorithms. One is the Richardson-Lucy algorithm for Bayesian image restoration (Richardson 1972) and (Lucy 1974). The other is the Shepp-Vardi algorithm for positron emission tomography medical imaging (Shepp and Vardi 1982), as was first pointed out by (Streit 2009). Recently, we received information about the Soviet literature on the use of finite point processes in multiple target tracking. Subsequent investigation shows the literature is extensive and extended over decades. A prominent example is Bakut and Ivanchuk (1976).
References Angle, R.B., Streit, R.L.: Multisensor JiFi tracking of extended objects. In: 22th ISIF International Conference on Information Fusion, Ottawa, Canada (2019) Bakut, P., Ivanchuk, N.A.: Calculation of a-posteriori characteristics of flow of unresolved objects. Eng. Cybern. 14(6), 148–156. (English translation of Bakyt P. A., Ivanqyk H. A. Byqiclenie apoctepiopnyx xapaktepictik potoka pazpex nnyx ob ektov // Izv. AH CCCP, Texniqecka kibepnetika, 1976, № 6. (submitted Nov 18, 1974)) Bozdogan, Ö., Efe, M.: Reduced Palm intensity for track extraction. In: Proceedings of the 16th ISIF International Conference on Information Fusion, Istanbul, July (2013) Jebara, T.: Machine learning: discriminative and generative. Springer, New York (2004) Kingman, J.F.C.: Poisson processes. Oxford University Press, Oxford (1993) Lucy, L.B.: An iterative method for the rectification of observed distributions. Astron. J. 79, 745–754 (1974) Richardson, W.H.: Bayesian based iterative method of image restoration. J. Opt. Soc. Am. 62, 55–59 Shepp, L.A., Vardi, Y.: Maximum likelihood reconstruction of for emission tomography. IEEE Trans. Med. Imaging, MI-1(2), 113—122 Stone, L.D., Streit, R.L., Corwin, T.L., Bell, K.L.: Bayesian multiple target tracking, 2nd edn. Artech House, Boston (2014) Streit, R.L.: Poisson point processes—imaging, tracking, and sensing. Springer, New York (2010) Streit, R.L., Stone, L.D.: Bayes derivation of multitarget intensity filters. In: 11th ISIF Conference on Information Fusion, Cologne, Germany (2008) Streit, R.L.: PHD intensity filtering is one step of a MAP estimation algorithm for positron emission tomography. ISIF International Conference on Information Fusion, Seattle, Washington (2009) Streit, R.L.: JPDA intensity filter for tracking multiple extended objects in clutter. In: 19th ISIF International Conference on Information Fusion, Heidelberg, Germany (2016)