Image geometry through multiscale statistics 0-591-23642-7

This study in the statistics of scale space begins with an analysis of noise propagation of multiscale differential oper

229 35 849KB

English Pages 0 [130] Year 1996

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Image geometry through multiscale statistics
 0-591-23642-7

  • Commentary
  • Doctoral Dissertation
  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Image Geometry Through Multiscale Statistics

by Terry Seung-Won Yoo

A Dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science. Chapel Hill 1996

Approved by:

© 1996 Terry S. Yoo ALL RIGHTS RESERVED

Abstract This study in the statistics of scale space begins with an analysis of noise propagation of multiscale differential operators for image analysis. It also presents methods for computing multiscale central moments that characterize the probability distribution of local intensities. Directional operators for sampling oriented local central moments are also computed and principal statistical directions extracted, reflecting local image geometry. These multiscale statistical models are generalized for use with multivalued data. The absolute error in normalized multiscale differential invariants due to spatially uncorrelated noise is shown to vary non-monotonically across order of differentiation. Instead the absolute error decreases between zeroth and first order measurements and increases thereafter with increasing order of differentiation, remaining less than the initial error until the third or fourth order derivatives are taken. Statistical invariants given by isotropic and directional sampling operators of varying scale are used to generate local central moments of intensity that capture information about the local probability distribution of intensities at a pixel location under an assumption of piecewise ergodicity. Through canonical analysis of a matrix of second moments, directional sampling provides principal statistical directions that reflect local image geometry, and this allows the removal of biases introduced by image structure. Multiscale image statistics can thus be made invariant to spatial rotation and translation as well as linear functions of intensity. These new methods provide a principled means for processing multivalued images based on normalization by local covariances. They also provide a basis for choosing control parameters in variable conductance diffusion.

Acknowledgements I am indebted to many people for the successful completion of this document. I am grateful for the generous support of my advisor, Dr. Stephen M. Pizer, who has been with me throughout the years as mentor, colleague, editor, and friend. I also extend special thanks to Dr. James M. Coggins who has been an unending source of advice. I thank the other members of my committee, Dr. Jonathan A. Marshall, Dr. George D. Stetten, Dr. Benjamin Tsui, and especially Dr. J. S. Marron. This research has been funded in part through the National Institutes of Health, grant number P01 CA 47982 and the National Science Foundation, NSF ASC-89-20219. There are many other people without whom I would never have made it to the end of a successful graduate career. Thanks to Murray Anderegg for many cups of coffee and countless sanity-preserving hands of cribbage. Thanks to David Harrison who added breadth to my education in important ways. Thanks to Dr. Richard L. Holloway for the many needed distractions, his sharing of life’s little important things, but most of all for his valued friendship. Thanks to my life friends Dr. Don and Claire Stone, without whom I would never have reached the home stretch. Thanks to Matt Fitzgibbon, Dr. Mary McFarlane and Dr. Greg Turk for being there when I’ve needed them the most. During my graduate student tenure, I have had many gifted advisors and mentors. For their friendship and guidance I would like to thank Dr. Henry Fuchs, Dr. Frederick Brooks, Kathy Tesh, and Linda Houseman. For their contributions as colleagues, coauthors, conspirators, and confidants I thank David T. Chen, Marc Olano, Steven Aylward, Rob Katz, Dr. Bryan Morse, and Dr. Ross Whitaker. Finally, I owe my greatest debts to my family. I thank my parents for life and the strength and determination to live it. Special thanks to my son, Ross, who reminds me daily that miracles exist everywhere around us. Most of all, I thank my beloved wife, Penny, who shares my burdens and my joys. This dissertation is equally her achievement. For me, it is she who makes all things possible.

ii

iii

Contents Abstract............................................................................................................................... i Acknowledgements ........................................................................................................... ii Contents ............................................................................................................................ iv List of Tables ................................................................................................................... vii List of Figures................................................................................................................. viii List of Symbols ................................................................................................................. xi Chapter 1 Introduction.....................................................................................................1 1.1. A multiscale approach to computer vision.............................................................2 1.2. An integrated approach to early vision ..................................................................3 1.3. Driving issues.........................................................................................................4 1.4. Thesis .....................................................................................................................5 1.5. Overview................................................................................................................6 1.6. Contributions .........................................................................................................7 Chapter 2 Background ......................................................................................................9 2.1. Notation..................................................................................................................9 2.2. Images ....................................................................................................................9 2.2.1. Images as a 2D manifold in n-space .....................................................10 2.2.2. Digital Images .......................................................................................11 2.3. Invariance.............................................................................................................12 2.3.1. Gauge Coordinates................................................................................13 2.4. Scale Space ..........................................................................................................14 2.4.1. Differentiation.......................................................................................15 2.4.2. The Gaussian as a unique Regular Tempered Distribution...................16 2.4.3. Zoom Invariance ...................................................................................18 2.4.4. Gaussian Scale Space............................................................................18 2.5. Image Statistics ....................................................................................................19 2.5.1. The Normal Density vs. the Gaussian Filter Kernel .............................19 2.5.2. Noisy Images.........................................................................................20 2.5.3. Statistical Measures as Invariants: Mahalanobis Distances .................20 2.5.4. Calculating Central Moments ...............................................................21 2.5.5. Characteristic Functions........................................................................23 Univariate Characteristic Functions..............................................................23 A Simple Univariate Example (a Gaussian Normal Distribution) ...............23 Bivariate Characteristic Functions................................................................25 2.6. Moment Invariants of Image Functions ................................................................26 Chapter 3 Normalized Scale Space Derivatives: ..........................................................27 3.1. Introduction and Background...............................................................................27 3.1.1. Scale Space Differential Invariants.......................................................28 3.1.2. Reconstruction of Sampled Images via the Taylor Expansion ..............29 3.1.3. Exploring the Properties of Scale-space Derivatives............................30

iv

3.2. Noise and Scale....................................................................................................31 3.3. Variance of Multiscale Derivatives without Normalization ................................32 3.3.1. Covariances of 1D Multiscale Derivatives ...........................................33 3.3.2. Covariances of 2D Multiscale Derivatives ...........................................35 3.4. Variance of Normalized Scale Space Derivatives ...............................................36 3.5. Analysis of 1D Scale Space Derivatives..............................................................37 3.6. Analysis of 2D Scale Space Derivatives..............................................................39 3.7. Discussion ............................................................................................................42 3.8. Conclusion ...........................................................................................................43 3.A. Appendix : Covariance of Scale Space Derivatives.............................................44 Chapter 4 Multiscale Image Statistics............................................................................49 4.1. Background and Introduction...............................................................................49 4.2. Images and Stochastic Processes .........................................................................51 4.2.1. Stochastic Processes..............................................................................51 4.2.2. Images as Samples ...............................................................................52 4.2.3. Ergodicity..............................................................................................53 4.2.4. Ergodicity and Images...........................................................................54 4.3 Multiscale Statistics ..............................................................................................55 4.3.1. Multiscale Mean....................................................................................56 4.3.2. Multiscale Variance ..............................................................................57 4.3.3. Multiscale Skewness and Kurtosis........................................................59 4.3.4. Invariance with respect to linear functions of intensity ........................61 4.4. Other Multiscale Central Moments......................................................................62 4.5. Characteristics of Multiscale Image Statistics .....................................................62 4.5.1. Multiscale Statistics vs. Difference of Gaussian Operators..................62 4.5.2. Multiscale Moments of Intensity vs. Moment Invariants of Image Functions..........................................................................................................64 4.6. Measurement Aperture, Object Scale, and Noise ................................................65 4.6.1. Noise Propagation in Multiscale Statistics of an Ergodic Process........65 4.6.2. Noise Propagation in Multiscale Statistics of a Piecewise Ergodic Process.....................................................................................66 4.7. Multiscale Statistics of 2D Images.......................................................................68 4.7.1. Multiscale 2D Image Mean...................................................................68 4.7.2. Multiscale 2D Image Variance..............................................................68 4.7.3. Other Multiscale 2D Image Statistics ...................................................69 4.7.4. Some 2D Examples of Multiscale Image Statistics ..............................69 4.8. An Application: Statistical Nonlinear Diffusion.................................................70 4.9. Multiscale Statistics of Multivalued Images........................................................72 4.9.1. The Multiscale Multivalued Mean........................................................73 4.9.2. Multiscale Multivalued Joint Moments ................................................73 4.9.3. Multiscale Multivalued Variance..........................................................73 4.9.4. Multiparameter VCD, a foreshadow of future work.........................................74 4.10. Summary and Conclusions.................................................................................76 Chapter 5 Directional Multiscale Image Statistics........................................................79 5.1. Approaches to Directional Analysis.....................................................................80 5.1.1. Steerable Filters ....................................................................................80

v

5.1.2. Matrices and Differential Geometry .....................................................80 5.1.3. Intensity Invariance vs. Spatially Invariant Directional Analysis .........81 5.2. Directional Statistics ............................................................................................82 5.2.1. Multiscale Directional Means ...............................................................82 5.2.2. Multiscale Directional Covariances ......................................................83 5.2.3. The Cauchy-Schwarz Inequality for Multiscale Directional Covariances..........................................................................................85 5.3. Directional Multiscale Statistics of Sample 2D Images.......................................87 5.4. The Directional Multiscale Covariance Matrix ...................................................89 5.5. SVD Applied to Directional Multiscale Statistics of 2D Images.........................91 5.6. Multiscale Gauge Coordinates of Image Statistics ..............................................92 5.7. Invariants of Directional Multiscale Moments ....................................................93 5.8. Multiscale Directional Statistics of Multivalued Images.....................................95 5.8.1. Canonical Correlation Analysis of Multivalued Directional Statistics.96 5.8.2. Understanding Canonical Correlations of Multivalued Directional Statistics ............................................................................98 5.9. Covariance between Image Intensity and Space ..................................................99 5.9.1. Directional Analysis of 2D Scalar Images ............................................99 5.9.2. Canonical Correlation Analysis versus Differential Operators...........101 5.10. Summary ..........................................................................................................102 5.A. Appendix: Singular Value Decomposition of a 2x2 Symmetric Matrix...........103 Chapter 6 Conclusions and Future Directions............................................................105 6.1. Contributions and Conclusions ..........................................................................106 6.2. Future Directions in Multiscale Statistical Theory ............................................107 6.2.1. Local Differential Geometry and Local Image Statistics ....................108 6.2.2. Multiscale Distribution Analysis ........................................................108 6.2.3. Comparing Two Distributions ............................................................109 6.3. Applying Multiscale Image Statistics ................................................................119 6.3.1. Statistical Control of Nonlinear Diffusion..........................................110 6.3.2. Mixtures in Segmentation ...................................................................111 6.4. Multiscale Image Statistics in Medicine ............................................................112 6.5. Summary ............................................................................................................113 Bibliography ...................................................................................................................115

vi

List of Tables Table 3.1. Variances of unnormalized scale space derivatives (order 0-6) of noisy 1D images (variance of input noise = v0).........................................................................35 Table 3.2. Variances of unnormalized scale space derivatives of noisy 2D images (variance of input noise = v0) for partial spatial derivatives to the fourth order (Adapted from Blom 1992)........................................................................................35 Table 3.3. Variances of normalized scale space derivatives (order 0-6) of noisy 1D images (variance of input noise = v0)........................................................................37 Table 3.4. Variances of normalized scale space derivatives of noisy 2D images for partial spatial derivatives to the fourth order (variance of input noise = v0).........................37 Table 3.5. Variances of both unnormalized and normalized scale space derivatives (order 0-6) of noisy 2D images (variance of input noise = v0) ................................41

vii

List of Figures Figure 1.1. A segmentation example. (a) the original digital radiograph, (b) an image mask denoting segments, and (c) the classified segment mask, showing the hierarchical semantic organization of the skeletal system. ..........................................3 Figure 2.1. Three representations of an image. From left to right: (a) greyscale representation, (b) intensity surface, and (c) isophotes..............................................10 Figure 2.2. The image in Figure 2.1 represented as a digital image with a raster resolution of 64 × 64 pixels. .......................................................................................................11 Figure 2.3. 2-D Gaussian derivative filter kernels through the 4th order. ........................16 Figure 2.4. Top: Characteristic function for a zero mean Gaussian. Maclaurin approximating polynomials (a) n=2, (b) n=8, (c) n=10, and (d) n=16.......................25 Figure 3.1. Propagated error of unnormalized 1D scale space derivatives (order 0-6). Each curve represents the ratio of variance of output to input noise of the linear unnormalized derivative of Gaussian filter vs. scale σ. Plot is on a log-log scale....38 Figure 3.2 Plot of the propagated error of normalized 1D scale space derivatives (order 06). Each curve represents the ratio of variance of output to input noise of the linear normalized derivative of Gaussian filter vs. scale σ. Plot is on a log-log scale........38 Figure 3.3. Plot of the propagated error of normalized 1D scale-space derivatives (order 0-6). Curve represents the ratio of variance of output to input noise of the linear unnormalized derivative of Gaussian filter vs. order of differentiation. Plot is on a log scale. ....................................................................................................................39 Figure 3.4. Propagated error of unnormalized 2D scale space derivatives (order 0-6). Each curve represents the ratio of variance of output to input noise of the linear unnormalized derivative of Gaussian filter vs. scale σ. Plot is on a log-log scale....42 Figure 3.5 Plot of the propagated error of normalized 2D scale space derivatives (order 06). Each curve represents the ratio of variance of output to input noise of the linear normalized derivative of Gaussian filter vs. scale σ..................................................42 Figure 3.6. Plot of the propagated error of normalized 2D scale-space derivatives (order 0-6). Curve represents the ratio of variance of output to input noise of the linear unnormalized derivative of Gaussian filter vs. order of differentiation. Plot is on a log scale. ....................................................................................................................42 Figure 4.1a. Generic 1D square pulse function P(d, x). Used as the input for generating pulse transfer functions. .............................................................................................56 Figure 4.1b. 1D square pulse functions P (1, x), P (2, x), P (4, x), P (8, x). From left to right: d = 1, d = 2, d = 4, d = 8; lim P(d,x) = δ(x). .................................................57 d→ 0

Figure 4.2. 1D pulse transfer function for the multiscale mean operator µ P( d,x) (x|σ). From left to right: d = 1, d = 2, d = 4, d = 8. In all images, σ = 1. The dashed lines represent the input pulse function P (d,x). Note the difference in spatial and intensity ranges in each image. ..................................................................................57 ( 2) Figure 4.3. 1D pulse transfer function for the multiscale variance operator µ P( d,x) (x|σ). From left to right: d = 1, d = 2, d = 4, d = 8. In all images, σ = 1. Note the difference in spatial and intensity ranges in each image............................................58

viii

Figure 4.4. Comparison of the 1D pulse transfer function for the multiscale variance operator µ (2) P (d,x) (x | σ) to the square multiscale gradient magnitude operator. Top 2 ∂ row, µ (2) P (d,x) (x | σ) . Bottom row: ( ∂ x P(d,x | σ)) . From left to right: d = 1, d = 2, d = 4, d = 8. In all images, σ = 1. Note the difference in spatial and intensity ranges in each image..................................................................................................................59 Figure 4.5. 1D pulse transfer function of µ (3) P (d,x) (x | σ) . From left to right: d = 1, d = 2, d

= 4, d = 8. In all images, σ = 1. Note the difference in spatial and intensity ranges in each image..................................................................................................................59 . Bottom Figure 4.6. Comparison of µ (3) to ∂x∂ P(d,x | σ). Top row, µ (3) P (d,x) (x | σ ) P (d,x) (x | σ ) row: ∂x∂ P(d,x | σ). From left to right: d = 1, d = 2, d = 4, d = 8. In all images, σ = 1. Note the difference in spatial and intensity ranges in each image. .......................60 Figure 4.7. 1D pulse transfer function of µ (4) P (d,x) (x | σ) . From left to right: d = 1, d = 2, d = 4, d = 8. In all images, σ = 1. Note the difference in spatial and intensity ranges in each image..................................................................................................................61 2 . Figure 4.8. Comparison of µ (4) to ( ∂∂x 2 P(d, x | σ ))2 . Top row, µ (4) P (d,x) (x | σ ) P (d,x) (x | σ ) 2 2 Bottom row: ( ∂x∂ 2 P(d, x | σ )) . From left to right: d = 1, d = 2, d = 4, d = 8. In all images, σ = 1. Note the difference in spatial and intensity ranges in each image. ...61 Figure 4.9. Comparisons of µ (2) (x | σ) with dog(P (d,x); σa, σb). The input function is P (d,x) a pulse P (d,x). In all cases, d = 1. From left to right: a. µ (2) (x | σ) with σ = 1, P (d,x) b. Dog(P (d,x); σa, σb) with σa = σ b 2 , σb = 1, and c. Dog(P (d,x); σa, σb) with σa = 0, σb = 1......................................................................................................................64 Figure 4.10. Test function T(h,x)......................................................................................67 Figure 4.11. A 128 x 128 pixel Teardrop with Signal to Noise of 4:1 .............................69 Figure 4.12. Local statistical measure of the teardrop from Fig. 4.11. .............................69 Figure 4.13. A test object. The figure contains structures at different scale. The raster resolution of the object is 128 x 128 pixels. ..............................................................72 Figure 4.14. Results from the modified multiscale statistical approach to vcd (left: initial image, right: after 75 iterations of vcd)......................................................................72 Figure 4.15. Early work in statistically driven multivalued vcd. A synthetic multivalued image where the values are subject to significant gaussian white noise and with a strong negative correlation between intensity values. (a) - original two valued input image and its scatterplot histogram. (b) - image after processing with vcd and resulting histogram.....................................................................................................76 Figure 5.1. - A test image with SNR of 4:1 with a raster resolution of 256 × 256 pixels. 87 Figure 5.2. Directional variances of the objects from Figure 5.1. (From left to right: a: (2 ) (p|σ) , c: Vyy = µ (2˜I ,yy) ( p | σ ) ). In all images, a grey Vxx = µ ˜I ,xx (p | σ) , b: Vxy = µ (2) ˜I ,xy value is 0, and σ = 2 pixels. Bright grey to white indicates positive values, and dark grey to black indicates negative values. Each image uses a left handed coordinate system with the origin in the upper left corner, the x-axis oriented to the right, and the y-axis oriented from the toward the bottom of the page. .....................................88

ix

Figure 5.3. Eigenvalue images of the object from Figure 5.1, computed with a spatial aperture or scale σ of 2 pixels. (From left to right: a: λ1, b: λ2). In both of these images, black is zero and bright indicates positive values.........................................91 Figure 5.4. Eigenvector image of the object from Figure 5.1, computed with a spatial aperture or scale σ of 2 pixels. The image reflects only the eigenvector u in the direction of maximum variance at each pixel; the eigenvector v in the direction of minimum variance is perpendicular to the vectors shown. The lengths of the vector representations indicate relative magnitude...............................................................92 Figure 5.5. (a) Test figure exhibiting significant directional spatial correlation and (b) the local anisotropy statistic Qˆ where σ = 3. In both images, the raster resolution is 256 × 256. .........................................................................................................................95 Figure 6.1. A 2D dual-echo MR image of the head with its scatterplot histogram. .......112 Figure 6.2. MR image of a shoulder acquired using a surface receiving coil. This may represent the ultimate test for this research..............................................................114

x

List of Symbols Symbols are listed in order of their appearance in the text. R Rn

→ p I(p) ∈ N Nn

⊂ Lr V • ∇I θ ⊗ G(σ, p) ∫ ∂n ∂x n

Σ σ e π pT ∇•F I(p | σ) u˜ µ

real numbers n-space of real numbers maps onto bold designates vector or tensor quantities function I of p is an element of natural numbers, (i.e., 0, 1, 2, 3, ...) n-space of natural numbers is a subset of italics designate set notation a vector field V dot or inner product operator gradient of I Greek lower-case theta, used to designate an angular value the convolution operator Gaussian with spatial scale σ. integral n-th partial derivative with respect to x

N µ,µ (2 ) ( ˜u)

Greek upper-case sigma, without subscripts, denotes a covariance matrix Greek lower-case sigma, used as the scale parameter the transcendental value e, the natural logarithm Greek lower-case pi, the transcendental value, pi transpose of the tensor value p “del-dot,” the divergence of F multiscale measurement: function I(p) at scale σ tilde over a character denotes random variable Greek lower-case mu, used to designate moments the expectation operation Standard Normal distribution of u˜ , with mean µ and variance µ(2).

µ(n)

n-th central moment

n

∑ f(j)

summation of f(0) + f(1) + ... + f(n)

j =1

i M(u,v)

imaginary value −1 the u-v spatial moment of an image function m n multiscale partial derivative ∂∂x n ∂∂y m G( σ, p) ⊗ I(p)

L x ny m Lˆ n m

normalized multiscale partial derivative σ n + m ∂∂x n

Z

integers

x y

n

xi

∂ m ∂y m

G( σ, p) ⊗ I(p)

∀ V( u˜ ) M( u˜ ) Cov( u˜ )

for all variance of u˜ mean of u˜ covariance of u˜

n

∏ f(k)

product of f(0) + f(1) + ... + f(n)

F → f(x) a →∞ c lim P(d)

stochastic process F f(x) approaches c as a goes to infinity limit of P(d) as d approaches 0

erf(x)

standard error function: erf (x) =

|p| ⊥ max (S(x))

norm of p: p • p perpendicular to global maximum of S(x) over the interval -∞ < x < ∞

k =1

d →0

−∞ 3 are platykurtic or more table-like, and those with kurtosis < 3 are leptokurtic. 2.5.5. Characteristic Functions There exists a transform to convert probability density functions to a corresponding frequency domain. The frequency based representation is commonly named the characteristic function of a distribution, and it has several interesting properties. If the probability density function is known in advance, the moments of that function can be calculated from the characteristic function. If all the moments of the density function are known, it can be shown that the collective moments uniquely determine the characteristic function and thereby uniquely determine the density function. The inverse transform from moments to probability distribution is possible only if all moments are finite and if the Maclaurin series expansion used to reconstruct the probability distribution function converges absolutely near the first moment. Univariate Characteristic Functions Definition: If u˜ is a random variable and f( u˜ ) is its probability density function, the characteristic function of u˜ is defined to be Φ(ω), the Fourier transform (not normalized to 2π) of f( u˜ ) (see equation (2-25a)). Because it is a Fourier transform, an inverse transform of Φ(ω) yields the original probability density function (equation (2-25b)). Φ (ω) =





−∞

f(u˜ )e ω du˜

f( ˜u) =

i ˜u

1 2π



∞ −∞

Φ (ω )e

-iω˜u



(2-25a,b)

From the moment theorem it is possible to obtain f( u˜ ) from its moments. Given a series of central moments that describe f( u˜ ), it is possible to generate the probability distribution using (2-25b) and the following Taylor expansion equation. Φ (ω) =





1 n!

µ (n˜u ) (iω )

n

n=0

(2-26)

The value s is often substituted for the complex value iω, to create the moment generating function Φ (s). ∞ s˜u Φ( s ) = ∫ f(u˜ )e du˜ (2-27) −∞

This expression is called the moment generating function because the k-th derivative of this function with respect to s produces the k-th moment of the probability density function. A Simple Univariate Example (a Gaussian Normal Distribution) An approximation to a probability density function can be generated by using the first few central moments in the above zero centered Taylor expansion. Certain error terms will be generated, and the inverse characteristic function may require limitation to an integration domain. The inverse integral to calculate the approximate probability distribution

24

Image Geometry Through Multiscale Statistics

function may not exist. Therefore, certain constraints must be imposed on the expansion. That is, the expansion shown in equation (2-26) must converge absolutely at ω = 0. Consider the probability distribution function f( u˜ ) to be a Normal distribution with a zero mean and standard deviation of σ. The probability distribution function f( u˜ ) and its corresponding characteristic function are described by equations (2-28a) and (2-28b) respectively. f( ˜u) = N 0,v 0 (˜u) =

1 2 πv 0

1 u˜ 2 −2 v

e

- 1 v 0 (ω) 2 2

Φ (ω) = e

0

(2-28a,b)

Substituting s for iω produces the moment generating function Φ (s). Differentiating Φ (s) yields moments of the Gaussian probability distribution. The first six moments are Φ(s) = e ∂ ∂s ∂2 ∂s 2 3

∂ ∂s 3 4

∂ ∂s 4 5

∂ ∂s 5 ∂6 ∂s 6

1 v s2 0 2

Φ(s) = 3s v0 e 2

1 v s2 2 0

Φ (s) = 15sv 0 e

Φ(s) = 15v 0 3 e

Φ (s) = s v 0 e

Φ(s) = v 0 e

Φ(s) = 3v0 2 e 3

1 v 0 s2 2

1 v s2 2 0

1v s2 2 0

+ s v0 e 2

1 v s2 2 0

+ 45s2 v 0 4 e

1 v 0 s2 2

3

1 v s2 2 0

+ 10s v0 e 3

1 v s2 0 2

+ s v0 e 3

+ 6s 2 v0 3 e

1 v s2 2 0

2

4

1 v s2 2 0

(2-29)

+ s4v 0 4e

1 v s2 2 0

1 v s2 2 0

+ s v0 e

+ 15s 4 v0 5 e

5

5

1v0 s 2 2

1 v s2 2 0

+ s 6 v0 6 e

1 v 0 s2 2

The Taylor expansion is most easily computed about s = 0 (the Maclaurin series for the characteristic function). This corresponds to calculating the matching probability density function about its mean, using its central moments. In this example the value of Φ (s) and its derivatives where s = 0 are Φ (0) = 1 (2-30) ∂ ∂s Φ(0) 2

∂ ∂s 2 3

∂ ∂s 5 6 ∂ ∂s 6

8 ∂ ∂s 8

=0

Φ(0) = 3v 0 5

∂7 ∂s 7

Φ(0) = v 0

∂ Φ (0) ∂s 3

4 ∂ ∂s 4

=0

2

Φ(0) = 0

Φ(0) = 15v 0

3

Φ (0) = 105v0 4 ∂9 ∂s 9

10

∂ ∂ s10

12

∂ 12 ∂s

∂13 ∂s13

Φ(0) = 0 14 ∂ ∂s14

Φ (0) = 945v0 5

∂11 ∂s11

16

∂ ∂s16

Φ (0) = 0

Φ (0) = 10395v0

Φ (0) = 135135v0 7 ∂15 ∂s15

Φ(0) = 0

6

Φ (0) = 0 Φ(0) = 0

Φ (0) = 2027025v0 8

Background

25

Expanding the Maclaurin series up to the 16th degree generates the following: v 0 2 3v0 2 4 15v0 3 6 105v0 4 8 945v 0 5 10 ω − ω + ω − ω Φ(ω ) ≈ 1 − ω + 4! 6! 8! 10! 2! 10395v0 6 12 135135v 0 7 14 2027025v0 8 16 + ω − ω + ω 12! 14! 16!

(2-31)

Figure 2.4 shows the characteristic function, and some of the approximating polynomials. Φ(ω)

ω

Φ(ω)

Φ(ω)

ω

a.

ω

b. Φ(ω)

Φ(ω)

ω

ω

c. d. Figure 2.4. Top: Characteristic function for a zero mean Gaussian. Maclaurin approximating polynomials (a) n=2, (b) n=8, (c) n=10, and (d) n=16. Bivariate Characteristic Functions In the bivariate case the relationship between the two random variables is described by the joint characteristic function. Given two random variables u˜ 1 and u˜ 2 such that u = ( u˜ 1 , u˜ 2 ) and their joint distribution function f( u˜ 1 , u˜ 2 ), the joint characteristic function Φ(ω1, ω2) and its inverse are the integrals ∞



Φ (ω1 ,ω 2 ) = ∫−∞ ∫−∞ f(u˜ 1 , u˜ 2 )ei f( ˜u1 , u˜ 2 ) =

∞ ∞ 1 4 π 2 −∞ −∞

∫ ∫

( ω1 ˜u 1 +ω 2 ˜u 2 )

ω +ω Φ (ω 1,ω 2 )e -i 1 1 2 (

˜u

du˜ 1 d˜u2

˜u 2 )

dω1 dω 2

(2-32a) (2-32b)

26

Image Geometry Through Multiscale Statistics

From the moment theorem it is possible to obtain f( u˜ 1 , u˜ 2 ) from its joint moments. Given an infinite series of central moments that describe f( u˜ 1 , u˜ 2 ), it is possible to the generate the probability distribution using the following Maclaurin series expansion. ∞



Φ (ω1 , ω 2 ) = ∑ 1n! ∑ n =0

k= 0

()µ n k

(k, n− k) ˜u

(i ω1)k (iω 2 )n− k

(2-33)

2.6. Moment Invariants of Image Functions Hu introduced the family of moment invariants, taking advantage of the moment theorem that provides a bijection from derivatives in image space to moments in frequency [Hu 1962]. Reiss added refinements to the fundamental theorem of moment invariants almost thirty years after Hu [Reiss 1991]. A typical calculation for computing the regular moment mpq of the continuously differentiable function f(x,y) is: m pq = ∫



−∞





p

−∞

q

x y f(x,y)dx dy

(2-34)

As with moments of the probability density function, it is possible with a continuously differentiable function f(x,y) to generate moments m of any order. The infinite set of moments uniquely determines f(x,y). As with moments of probability distributions, for u ∈ R and v ∈ R there exists a moment generating function M(u, v) for these invariants. pq

M(u, v) = ∫



−∞



∞ −∞

e ux+ vy f(x,y)dx dy

(2-35)

Likewise, the complete set of moments can be expanded in a power series such that ∞



M(u, v) = ∑ ∑ m pq p =0 q = 0

p

q

u v p! q!

(2-36)

Central moments m are defined as (pq)





m(pq ) = ∫ ∫ (x − m10 m00 )p (y − m01 m 00 )q f(x,y)dx dy −∞ −∞

(2-37)

It is important to distinguish these invariants from statistical moments of image intensity. The central moments of image function are equivalent to moments where the spatial origin has been shifted to the image centroid (m10 m00 , m01 m00 ). By contrast, moments of image intensity are centered about the mean intensity value. To summarize, the primary difference is that moments of an image function capture image geometry while moments of image intensity describe the probability distribution of the image intensity values. Image function moments are sensitive to noise while intensity moments attempt to characterize noise. Later chapters will attempt to reimplement intensity moments to incorporate scale (Chapter 4) and image geometry (Chapter 5).

Background

27

For since the fabric of the Universe is most perfect and the works of a most wise creator, nothing at all takes place in the Universe in which some rule of maximum or minimum does not appear - Leonhard Euler

Chapter 3

Normalized Scale-Space Derivatives: A Statistical Analysis This chapter presents a statistical analysis of multiscale derivative measurements. Noisy images and multiscale derivative measurements made of noisy images are analyzed; the means and variances of the measured noisy derivatives are calculated in terms of the parameters of the probability distribution function of the initial noise function and the scale or sampling aperture. Normalized and unnormalized forms of differential scale space are analyzed, and the statistical results are compared. A discussion of the results and their ramifications for multiscale analysis is included. 3.1. Introduction and Background There has been substantial research in the area of differential invariants of scale space. Notably, researchers such as Koenderink, ter Haar Romeny, Florack, Lindeberg, Blom, and Eberly have contributed many papers on scale space and the invariances of scalespace derivatives [Koenderink 1984, ter Haar Romeny 1991ab, Florack 1993, Lindeberg 1992, Blom 1993, and Eberly 1994]. This sub-field of computer vision has yielded many significant insights. This chapter is an exploration of some of the statistical aspects of scale space when the source images are subject to spatially uncorrelated noise. This section amplifies scale space and multiscale derivative concepts presented in Chapter 2. It also presents a notation for multiscale derivatives borrowed from other related research. Given some scale or measurement aperture σ, scale-space derivatives of a digital image are measured by convolving the image with a derivative-of-Gaussian kernel. Given a continuous 2D image function I(p) and a Gaussian kernel G(σ, p) where p = (x, y), multiscale derivatives of arbitrary order are described by the following equation: L x n y m = L I, x n y m (p | σ) =

n

m

∂ ∂ n m ∂x ∂y

I(p | σ) =

n

m

∂ ∂ n m ∂x ∂y

G(σ,p) ⊗ I(p)

(3-1)

The term shown above is described as the n-th derivative in the x-direction and the m-th derivative in the y-direction given scale σ. Ter Haar Romeny et al. have adapted Einstein

28

Image Geometry Through Multiscale Statistics

Summation Notation (ESN) to provide a compact description of scale-space derivatives. In their notation, shown in the leftmost expression in equation (3-1), the image function I(p), the scale parameter σ, and the location parameter p are often assumed, resulting in the abbreviated notation L x n y m to represent scale-space differentiation. I use the abbreviated notation, L x ny m , as well as the term L I,x n y m (p | σ) interchangeably in this chapter. When it is necessary to specify a particular image function, spatial location, or scale, I will use the notation L I,x n y m (p | σ). The geometry captured in derivative measurements can be used to classify points within an image. Derivatives embody properties such as Gaussian or mean intensity curvature, gradient magnitude (often used as a measure of boundariness), isophote curvature, and a host of other important features of the image. Derivatives can be measured not only along the directions of the Cartesian coordinate directions, but in any arbitrary direction about a point. The work of ter Haar Romeny and Florack describes a coordinate frame or gauge at each spatial location based on the structure of first order derivatives. When measured in gauge coordinates, higher order derivatives exhibit invariance with respect to spatial rotation. Expressing derivatives in gauge coordinates simplifies the notation and computation of scale-space derivatives [ter Haar Romeny 1991a]. Tracing image intensities and derivative values through changing scale is also often useful. Derivatives with respect to scale are easily computed. The Gaussian has the property that the second derivative with respect to spatial coordinates is directly proportional to the first derivative with respect to scale. This property simplifies many scale-space calculations. Multiscale derivatives of the continuous image function I(p) given a Gaussian scale operator G(σ, p) taken with respect to scale σ are L σ k = L I,σ k (p | σ) =

k

∂ k ∂σ

I(p | σ) =

k

∂ k ∂σ

G(σ, p) ⊗ I(p)

(3-2)

3.1.1. Scale-Space Differential Invariants As described in Chapter 2, the Gaussian is a natural scale-space aperture function by the a priori constraints of linearity, shift invariance, and rotation invariance. Zoom invariance is achieved by imposing an appropriate metric on scale-space. Florack and ter Haar Romeny specified a distance metric that preserves the Euclidean nature of scale space [ter Haar Romeny 1991ab]. In more recent work, Eberly suggests a dimensionless 1-form to be used in scale-space measurements, specifying a hyperbolic construction for scale space [Eberly 1994ab]. ∂p σ

ρ

∂σ σ

(3-3)

ρ is a constant relating rate of change in the scale dimension to spatial rate of change. In most uses of this 1-form, ρ = 1.

Normalized Scale-Space Derivatives: A Statistical Analysis

29

Use of this 1-form suggests that to exhibit zoom-invariance, scale-space derivatives must be normalized by the scale of the differential operators. For example, in Fritsch’s study of the multiscale medial axis (now called core), he applied an operator with its kernel equal to the Laplacian of Gaussian multiplied by σ2 as a filter for detecting medialness. He showed that this operator exhibited zoom invariance [Fritsch 1993]. Eberly’s research on scale-space derivatives generalizes the normalization process for derivatives of arbitrary order. The form of a normalized scale-space spatial derivative Lˆ x n y m is n m Lˆ x n y m = Lˆ I, x n y m (p | σ) = σ +

(

∂n ∂m ∂ xn ∂ ym

)

G(σ, p) ⊗ I(p)

(3-4)

The multiscale derivatives with respect to σ are also easily normalized [Eberly 1994ab]. k Lˆ σ k = Lˆ I,σ k (p | σ) = σ

k

∂ ∂σ k

G(σ,p) ⊗ I(p)

(3-5)

3.1.2. Reconstruction of Sampled Images via the Taylor Expansion Combinations of derivative values can be used to recreate or approximate smooth functions from sparse samples. Digital images are discrete samplings of continuous functions. It is necessary to reconstruct a continuous function from the image samples in order to perform many image analysis operations. Given a sparse spatial sampling of the infinite set of derivatives of an image function (providing that the derivatives exist), it is possible to reconstruct a continuous function using a Taylor series expansion. For example, given a discretely sampled, continuously differentiable, 1D image function I(x), n x ∈ Z, a point x0 ∈ Z, and the set of all derivatives D = ∂x∂ n I(x) n = 0,1, 2,3,K , the

{

}

Taylor series can be used to reconstruct I(x), x ∈ R, from D. The familiar series is shown in the following equation. ∀ h ∈ R, and x = x0 + h, I(x) = I(x 0 + h) = I(x 0 ) + h (∂∂x I(x 0 ))+ ∞

=∑ k =0

hk k!

(

∂k ∂x k

I(x 0 )

)

2

h 2!

(

∂ 2 ∂x 2

)

I(x 0 ) + L +

k

h k!

(

∂ k ∂x k

)

I(x 0 ) + L (3-6)

The instantaneous derivative values of an image function are seldom known; usually, only the zeroth order samples are available. Furthermore, the discrete sampling process produces an unrecoverable loss of information, governed by Shannon’s sampling theorem and reflected by the Nyquist frequency. However, scaled derivatives of a discrete image can be measured, and a continuous representation of the image at some scale can be constructed. If I(x), x ∈ Z is the discrete 1D image function from the example above, let L I,x n (x | σ) be the n-th order scale-space derivative of I(x) at scale σ, (i.e., the 1D analog

30

Image Geometry Through Multiscale Statistics

of the derivative values in equation (3-1)), and let the set of corresponding scale-space derivatives of I(x) be L = L I,x n (x | σ) n = 0,1,2,3,K . The multiscale form of equation

{

}

(3-6) is ∞

L I (x | σ) = ∑

∀ h ∈ R and x = x 0 + h,

k =0

(L

k

h k!

)

(x 0 | σ)

I,x k

(3-7)

Similarly, 2D scale-space images (i.e., LI(p | σ) where p ∈ R2), can be reconstructed from multiscale derivatives. ∀ h ∈R , h = (hx hy ), and p = p0 + h, L I (p | σ) = 2





∑∑

h xnh ym (n + m)!

n= 0 m= 0

(L

I, x n y m

)

(p0 | σ)

(3-8)

Eberly’s scale-space 1-form requires that all spatial differences and scale differences be made relative to the scale at which the difference is measured. Therefore, the Taylor polynomials should be expressed in terms of scale σ, and they should use the normalized dimensionless derivative values Lˆ x p y q . For a 1D image function, let hˆ =h/σ (Thus, hˆ σ = h ). Transforming the previous Taylor polynomial in equation (3-7) to the dimensionless offset value of hˆ yields ∞

L I (x | σ ) = ∑

k hˆ k!

k=0

(Lˆ

I,x k

)

(x 0 | σ )

(3-9)

h h The 2D offset h = (hx, hy) can be normalized to hˆ = ( hˆ x , hˆ y ) = ( σx , σy ) . The result is a transformation of equation (3-8) to a corresponding dimensionless Taylor expression.

L I (p | σ) =





∑∑

n= 0 m= 0

ˆh n hˆ m y x n!m!

(Lˆ

I,x n y m

)

(x0 | σ)

(3-10)

3.1.3. Exploring the Properties of Scale-Space Derivatives In essence, this chapter is about understanding scale-space derivatives, their uses, and their properties. The preceding section has presented scale-space derivatives in both unnormalized and normalized forms, and supplied some insight into their uses. Scalespace derivatives provide a vehicle for reconstructing smooth image functions at some scale. The next step in the exploration of scale-space derivatives is the understanding their noise properties. What are the relations between one derivative and another? How do these relationships change as scale increases? More precisely, derivatives are compared according to the way that they propagate noise from the original image through changing scale. How sensitive is a multiscale derivative to spatially uncorrelated white noise in the original digital image signal? How does this sensitivity compare with the response of other derivatives of different order?

Normalized Scale-Space Derivatives: A Statistical Analysis

31

This chapter explores both scale-space differential forms L x p y q and Lˆ x p y q and their interactions with noisy input across scale. 3.2. Noise and Scale Consider a 1D image with added white noise; that is, let ˜I( x) = I(x) + u˜ such that x ∈ R, and u˜ is a zero-mean, spatially uncorrelated normally distributed random variable with variance v0 (i.e., the probability distribution function of u˜ = N 0,v 0 (u˜ ) ). As a linear function of a random variable u˜ , ˜I( x) can be expressed as function I(x, u˜ ) whose mean M ( I(x, ˜u)) = M ( ˜I (x)) (or the first moment of ˜I( x) , µ ˜I (x)) and variance (2) V(I(x, u˜ )) = V(˜I(x)) (or the second central moment µ ˜I (x) of ˜I( x) ,) are calculated as shown in Chapter 2, equations (2-12) and (2-13). The following two equations revisit those earlier calculations. M ( I(x, ˜u)) = M ( ˜I (x)) = µ ˜I (x) = I(x, u˜ )

(

)

= ∫ (I(x) + ˜u) N0, v 0 ( u˜ ) du˜

(

)

= I(x)∫ N 0, v 0 (˜u) d˜u + u˜

(3-11)

= I(x)

V( I(x, ˜u)) = V(˜I (x)) = µ(2) (x) ˜I =

(I(x, u˜ ) − I(x))2

(

(3-12)

)

= ∫ ˜u N0, v 0 (˜u) d˜u = v0 Applying a multiscale evaluation of the image ˜I( x) extends the forms of µ ˜I (x) and (x) to include a scale parameter. Consider the convolution of I(x, u˜ ) with an µ (2) ˜I arbitrary filter kernel h(x). µ (2˜I ⊗h) (x) = V

(∫

∞ −∞



I(x − τ, u˜ )h(τ)dτ

)

= ∫ (h(τ)) V(I(x − τ, ˜u))dτ −∞ 2



= v 0 ∫ (h(τ)) dτ −∞ 2

(3-13)

The variance of I(x, u˜ ), convolved with a filter kernel h(x) is dependent on the structure of the kernel, and not on the underlying function I(x). This relation is true for all functions I(x, u˜ ) with zero-mean Gaussian additive spatially uncorrelated white noise and all filter kernels h(x) for any n-dimensional space.

32

Image Geometry Through Multiscale Statistics

Combining the zeroth order scale-space derivative L ˜I (x | σ) and equation (3-13) results in the following relations for 1D images.

(

)

V L ˜I (x | σ) = µ(2) ˜I (x | σ) = V ∞

(∫

∞ −∞

˜I(x − τ)G(σ, x)dτ

)

= v 0 ∫ (G(σ,x)) dτ −∞ 2

 = v0 ∫  σ −∞  ∞

1 2π



e

x

2

2



2

  dτ  (3-14)

2

= v 0 2σ

1

=

(

π





−∞ ( σ /

1 2) 2 π



e

x 2(σ / 2) 2



v0 2σ π

)

M L ˜I (x | σ) = µ ˜I (x | σ) = I(x, u˜ ) ⊗ G(σ,x)

∫ ∫ (I(x − τ) + u˜ )G(σ,x)dτ(N = ( I(x) ⊗ G(σ,x))∫ (N (˜u))d˜u +

=



0, v 0

−∞

0,v 0

)

(u˜ ) du˜ u˜

= I(x)⊗ G(σ,x)

(3-15)

Thus the variance of the scale-space zeroth order intensity values is inversely proportional to scale σ and directly proportional to the variance of the noise in the input image. Keep in mind that the variance so described is distributed about the mean value of the scalespace intensity measurement and is thus a measure of the error relative to the scale at which it is measured and centered about the expected intensity value. The relation described in (3-14) generalizes to higher dimensions. If the initial image is 2D and the Gaussian scale-space sampling kernel is also 2D, the relation in (3-14) can be rewritten as

(

)

V L ˜I (p | σ) = µ ˜I (p | σ) = (2 )

v0 , p ∈ R2 4πσ 2

(3-16)

3.3. Variance of Multiscale Derivatives without Normalization In his dissertation research, Blom performed a statistical analysis on scale-space derivatives of arbitrary order. He measured the response of the multiscale derivative measurement to noise as a function of scale [Blom 1992]. Blom summarizes his results using 2D images as the basis of his analysis. For the purposes of this discussion, Blom’s analysis is recreated here for images of one dimension by extending the zero-th order statistical relations in equations (3-14) and (3-15) to unnormalized 1D scale-space derivatives.

Normalized Scale-Space Derivatives: A Statistical Analysis

33

3.3.1. Covariances of 1D Multiscale Derivatives This is a construction of multiscale covariances for Gaussian scale space of 1D images with Gaussian additive white noise. These results are not original observations. Similar findings are reported by Metz [Metz 1969] and by Blom [Blom 1992] as well as elsewhere in the literature. Given the previously defined 1D image function with additive Gaussian white noise ˜I(x) , the covariance of two scale-space derivatives of ˜I(x) can be measured for any location x at scale σ. The covariance between two such derivatives is defined as

(

)

Cov L ˜I ,x i (x | σ), L ˜I, x j (x | σ) =

(L

˜I ,x i

)(

)

(x | σ) − M(L ˜I ,x i (x | σ)) L ˜I, x j (x | σ) − M(L ˜I ,x j (x | σ))

(3-17)

where angle brackets indicated the expected value operation and M(L ˜I ,x i (x | σ)) is the mean or expected value of the i-th scale-space derivative of ˜I(x) . Observing that M(L ˜I ,x i (x | σ )) = = =

∂i i ∂x ∂i ∂xi

∂i ∂ xi

G(σ,x) ⊗ (I(x) + u˜ )

G( σ,x) ⊗ I(x) +

∂i i ∂x

G(σ,x) ⊗ u˜

G(σ, x) ⊗ I(x)

= L I, x i (x | σ )

(3-18)

and recalling that convolution distributes over addition yields the following simplification:

(

)

Cov L ˜I ,x i (x | σ) , L ˜I ,x j ˜I(x | σ) =

(L

=

(

˜I ,x i

i

∂i ∂ xi

)

i

∂j ∂xj

(

)

G(σ,x) ⊗ (I(x) + u˜ ) − ∂∂x i G(σ, x) ⊗ I(x)

∂ ∂ xi

( =

)(

(x | σ) − L I,x i (x | σ) L ˜I ,x j ˜I (x | σ) − L I,x j ˜I(x | σ)

G(σ, x) ⊗ (I(x) + ˜u) −

G(σ,x) ⊗ ˜u

)(

∂j ∂x j

∂j ∂x j

G(σ,x) ⊗ I(x)

)

(3-19)

)

G(σ,x) ⊗ ˜u)

Replacing the convolution operator with its corresponding integral and simplifying yields

34

Image Geometry Through Multiscale Statistics

(

)

Cov L ˜I ,x i (x | σ) , L ˜I ,x j (x | σ)

∫ = ∫∫ = ∫∫ =



−∞

∂i ∂x i



−∞



−∞





−∞

−∞

G(σ,τ)˜u(x − τ)dτ ∂i i ∂x

∂i ∂x i

G(σ,τ)

G(σ, τ)



j





−∞

∂j ∂x j

G(σ,ν)˜u(x − ν)dν

G(σ,ν)˜u(x − τ)˜u(x − ν)dν dτ

∂x

j

j

G(σ,ν) ˜u(x − τ)u˜ (x − ν) dν dτ



∂x

j

(3-20)

The term u˜ (x − τ )u˜ (x − ν) is by definition the spatial correlation of the additive noise function u˜ relative to the location x. Since u˜ is assumed to be white, its distribution is Gaussian about a zero mean, and it is not correlated in space. That is  u˜ (x − τ))2 = v 0 ˜u (x − τ)u˜ (x − ν) =  ( 0 

, if ν = τ , otherwise

(3-21)

Applying the results from (3-21), the covariance relation reduces to

(

)∫ = ∫

Cov L ˜I ,x i (x | σ) , L ˜I ,x j (x | σ) =



∂i ∂x i

2 G(σ,τ) ∂x∂ j G(σ, τ) (u˜ (x − τ)) dτ

∂i ∂x i

G(σ,τ) ∂x∂ j G(σ, τ)v 0 dτ



∂i ∂x i

−∞ ∞

−∞

= v0



−∞

j

j

(3-22)

G(σ,τ) ∂x∂ j G(σ,τ)dτ j

Simplifying the integral in equation (3-22) requires the use of several identities involving Hermite polynomials. A complete derivation is provided in the appendix of this chapter. The resulting simplified relation is

(

)

Cov L ˜I ,x i (x | σ) , L ˜I ,x j (x | σ)

i+ j  v 0  1  i + j  (i+ j) / 2 i+ (−1) ( 2 ) (2r − 1) ∏   2σ π  σ 2   r= 0

=  

,∀ even i + j (3-23) , ∀ odd i + j

0

Variance is a special case of covariance. Using equation (3-23), the general form for the variance of any k-th order (k > 0) unnormalized scale-space derivative of a 1D image is shown to be k

(

)

V L ˜I ,k (x | σ ) =

v0 2σ π

∏ (2i − 1) i =1

2k σ 2 k

(3-24)

Normalized Scale-Space Derivatives: A Statistical Analysis

35

Values for the variances of scale-space derivatives (order 0 - 6) are shown in Table 3.1. Derivative

Variance

L ˜I (x | σ)

2σ π

L ˜I ,x (x | σ)

v0 1 2σ 2 2σ π

L ˜I ,x 2 (x | σ)

v0 3 4σ 4 2σ π

L ˜I ,x3 (x | σ)

v0 15 8σ6 2σ π

L ˜I ,x4 (x | σ)

v0 105 16σ 8 2σ π

L ˜I ,x5 (x | σ)

v0 945 10 32 σ 2σ π

L ˜I ,x6 (x | σ)

10395 v 0 12 64 σ 2σ π

v0

Table 3.1. Variances of unnormalized scale-space derivatives (order 0-6) of noisy 1D images (variance of input noise = v0)

3.3.2. Covariances of 2D Multiscale Derivatives Blom and ter Haar Romeny [Blom 1992][ter Haar Romeny 1993] present similar results for unnormalized scale-space derivatives of noisy 2D images (Gaussian distributed, zero mean additive noise). Their results are summarized in Table 3.2. Derivative

Variance

L ˜I (p | σ )

4σ 2π

L ˜I ,x (p | σ) , L ˜I ,y (p | σ)

v0 1 4σ 2 4σ 2 π

L ˜I ,x 2 (p | σ) , L ˜I ,y 2 (p | σ)

v0 3 16σ 4 4 σ2 π

L ˜I ,xy (p | σ)

v0 1 16σ 4 4 σ2 π

L ˜I ,x 3 (p | σ) , L ˜I ,y 3 (p | σ)

v0 15 64 σ 6 4σ 2 π

L ˜I ,x 2 y (p | σ) , L ˜I ,xy 2 (p | σ)

v0 3 64 σ 6 4σ 2 π

L ˜I ,x 4 (p | σ) , L ˜I ,y 4 (p | σ)

v0 105 256σ 8 4σ 2 π

L ˜I ,x3 y (p | σ ) , L ˜I ,xy3 (p | σ )

v0 15 256σ 8 4σ 2 π

L ˜I ,x 2 y 2 (p | σ )

v0 9 256σ 8 4σ 2 π

v0

Table 3.2. Variances of unnormalized scale-space derivatives of noisy 2D images (variance of input noise = v0) for partial spatial derivatives to the fourth order (Adapted from Blom 1992)

36

Image Geometry Through Multiscale Statistics

The general form for the variance of unnormalized scale-space derivatives of a twodimensional image, where j represents the order of differentiation in the x direction and k is the order of differentiation in the y direction (j > 0; k > 0) is given by j

i

(

∏ (2n − 1) ∏ (2m − 1)

)

v0 n =1 m =1 V L ˜I ,x jy k (p | σ) = 2 (j +k ) 2 (j +k ) 2 4σ π σ

(3-25)

3.4. Variance of Normalized Scale-Space Derivatives The values shown in Tables 3.1 and 3.2 are a reflection of the absolute propagated error present in the resulting scale-space derivative images at a single scale σ. However, if a measurement is to be made across different scales (i.e., comparing derivative results at two different sampling/filter apertures) the derivative values must be normalized to ensure measurements that are invariant with respect to changing scale. Subsequently, the statistics of these measurements, when made of noisy images, must also reflect the normalization. This section describes new observations of normalized scale-space differential invariants. Using the relations found in (3-3) and (3-24), it is straightforward to determine the ˆ (x | σ)) . variance of normalized 1D scale-space derivatives V( L ˜I ,k k 2k V( Lˆ ˜I , k (x | σ )) = V(σ L ˜I ,k (x | σ)) = σ V(L ˜I, k (x | σ ))

(3-26)

Substituting from (3-24) into equation (3-26) yields the following general form for the variance of normalized derivatives of 1D images: k

(2i − 1) v0 ∏ i =1 V Lˆ ˜I ,k (x | σ) = 2k 2σ π

(

)

(3-27)

Using this relation, the results of Table 3.1 are recalculated for normalized spatial derivatives and shown in Table 3.3. Similarly, the general form for the variance of normalized scale-space derivatives of a two-dimensional image is given by j

i

(

)

V Lˆ ˜I ,x jy k (p | σ) =

∏ (2n − 1) ∏ (2m − 1)

v0 n=1 4σ 2 π

2

(3-28)

m=1 ( j +k )

where p =(x y), j represents the order of differentiation in the x direction, and k is the order of differentiation in the y direction (j > 0; k > 0). The resulting variance statistics are generated for normalized scale-space derivatives of 2D images up to the fourth order are presented in Table 3.4.

Normalized Scale-Space Derivatives: A Statistical Analysis Derivative

Variance

Lˆ ˜I (x | σ)

v0 2σ π

Lˆ ˜I ,x (x | σ)

1 v0 2 2σ π

Lˆ ˜I ,x2 (x | σ)

3 v0 4 2σ π

Lˆ ˜I ,x3 (x | σ)

15 v 0 8 2σ π

Lˆ ˜I ,x4 (x | σ)

105 v 0 16 2σ π

Lˆ ˜I ,x5 (x | σ)

945 v 0 32 2σ π

Lˆ ˜I ,x 6 (x | σ )

10395 v 0 64 2σ π

37

Table 3.3. Variances of normalized scale-space derivatives (order 0-6) of noisy 1D images (variance of input noise = v0)

Derivative

Variance

Lˆ ˜I (p | σ)

2 4σ π

Lˆ ˜I ,x ( p | σ ) , Lˆ ˜I ,y (p | σ)

1 v0 4 4σ2 π

Lˆ ˜I ,x2 (p | σ ) , Lˆ ˜I ,y2 (p | σ)

3 v0 16 4σ2 π

Lˆ ˜I ,xy (p | σ)

1 v0 16 4σ2 π

Lˆ ˜I ,x3 ( p | σ ) , Lˆ ˜I ,y3 (p | σ)

15 v 0 64 4 σ 2 π

Lˆ ˜I ,x2 y (p | σ) , Lˆ ˜I ,xy 2 (p | σ)

3 v0 64 4 σ 2 π

Lˆ ˜I ,x 4 (p | σ ) , Lˆ ˜I ,y 4 (p | σ)

105 v 0 256 4 σ 2 π

Lˆ ˜I ,x 3 y (p | σ) , Lˆ ˜I ,xy 3 (p | σ)

15 v 0 256 4 σ2 π

Lˆ ˜I ,x 2 y 2 (p | σ)

v0

v0 9 256 4σ2 π

Table 3.4. Variances of normalized scale-space derivatives of noisy 2D images for partial spatial derivatives to the fourth order (variance of input noise = v0)

3.5. Analysis of 1D Scale-Space Derivatives While the structure of equations (3-24) and (3-27) are very similar (indeed, the coefficients are identical), the variance of normalized scale-space derivatives are inversely proportional to σ, while the unnormalized derivatives are inversely proportional to σ2k+1 (k is the order of differentiation). This difference can be shown graphically by

38

Image Geometry Through Multiscale Statistics

plotting the propagated error of each of the derivative measurements vs. the scale parameter σ for different orders of differentiation. Figure 3.1 shows the propagated error of scale-space derivatives without normalization. Figure 3.2 shows the propagated error with the normalizing factor included. In both cases, the plots are on a Log-Log scale. The order of differentiation k shown by the different curves ranges from k = 0 to k = 6 and is labeled on the far left of each figure. Comparing the two plots, the most important difference is immediately clear. In both plots, the propagated error in either normalized or unnormalized scale-space derivatives is consistently decreasing with increasing scale σ. However, in the unnormalized measurements there is a significant crossing where for relatively large scale (σ > 1), the variance (and subsequently the significance) of the propagated error is smaller for higher order derivatives. order 6 5 4 3 2 1 0

  Ln  V(L ˜ k (x|σ )) I ,x  

order 6 5 4 3 0 2 1 Ln(σ)

Figure 3.1. Propagated error of unnormalized 1D scale-space derivatives (order 0-6). Each curve represents the ratio of variance of output to input noise of the linear unnormalized derivative of Gaussian filter vs. scale σ. Plot is on a log-log scale.

  Ln  V( Lˆ ˜ k (x|σ ))  I ,x  

Ln(σ)

Figure 3.2 Plot of the propagated error of normalized 1D scale-space derivatives (order 0-6). Each curve represents the ratio of variance of output to input noise of the linear normalized derivative of Gaussian filter vs. scale σ. Plot is on a log-log scale.

The crossover exhibited in the unnormalized derivatives has been used to justify the application of very high order derivative filters in the analysis and reconstruction of images. When comparing across scale, a normalized representation is required. If the derivatives are normalized, the improved stability of derivative measurements as scale increases remains, but the assertion that the relative stability and accuracy improves with increasing order of differentiation at large scale does not hold. An alternate finding for normalized scale-space derivatives becomes apparent. For 1D normalized scale-space derivatives, the low order differential forms (k=1 and k=2) propagate less noise than either the luminance (zeroth order form or k=0) or any of the derivatives for k > 2. Figure 3.3 shows the noise propagation of normalized scale-space derivatives versus order of differentiation relative to the zeroth order or scale-space luminance noise. The ‘J’ shape of the curve bears consideration (see Section 3.7, “Discussion”).

Normalized Scale-Space Derivatives: A Statistical Analysis

39

 ˆ   V( L˜I ,x k ( x|σ ))  Ln  V ( Lˆ 0 ( x|σ ))  ˜I ,x  

order k

Figure 3.3. Plot of the propagated error of normalized 1D scale-space derivatives (order 0-6). Curve represents the ratio of variance of output to input noise of the linear unnormalized derivative of Gaussian filter vs. order of differentiation. Plot is on a log scale.

3.6. Analysis of 2D Scale-Space Derivatives To create a comparable analysis for 2D images, the appropriate variances must combine the contributions of all the partial derivatives of the same order. Consider the 2D Taylor expansion for a scale-space function L ˜I (p | σ) . Let p0 ∈ R2, h ∈ R2 such that the interval h = (hx, hy), and p ∈ R2 where p = p0 + h. Rewriting equation (3-8) as an expression of the order of differentiation k generates the following representation. L ˜I (p | σ) = L ˜I (p0 | σ) + h x L I, x (p0 | σ) + hy L I, y (p0 | σ) + L

(

k

)

+ k!1 ∑ h x i h y k −i L I,x i y k− i (p0 | σ) + L i =0

(3-29)

Inspection the k-th order term shows that each partial derivative contributes to the k-th value weighted by the interpolation interval. The 2D variance analog to the 1D treatment shown before requires the measurement of the variance of the k-th order term. These calculations require the combination of covariances of partial derivatives of the same order. Blom derived an expression for the covariance of two partial derivatives of 2D images [Blom 1992]. Formally the expression of the variance of the k-th order term of the 2D scale-space Taylor expansion is shown below in equation (3-30).  k  V(T(k)) = V ∑ hx n h y k-n L ˜I ,x n y k− n (p0 | σ)  n =0  =

k

k

∑∑ h

k +i -j

hy

1 v0 k 2k 2 4σ π 2 σ

k

x

i= 0 j = 0

=

(

)

Cov L ˜I ,x i y k −i (p0 | σ), L ˜I ,x k − j y j (p0 | σ)

k+ j-i

k

∑∑ i = 0 j= 0

h x k+ i- jh y k+ j-i

(k +i − j) / 2

(k − i + j) / 2

n =1

m =1

∏ (2n − 1)

∀ even k + i - j and even k - i + j

∏ (2m − 1) (3-30)

40

Image Geometry Through Multiscale Statistics

To set the offset vector h to a common value, consider all values equidistant from the point p0. That is, consider for some radius r the set of all points {h = (hx, hy) | r2=hx2+hy2}. Therefore, let hx = r cos(θ) and hy = r sin(θ). Substitute these values into equation (3-30).  k  V ∑ (r cosθ)n (rsinθ )k-n L ˜I ,x n y k− n (p0 | σ)  n =0  =

v0 r 2k 4σ 2 π 2 k σ 2k

k

k

∑ ∑ cos(θ)k +i -j sin(θ) k+ j-i i = 0 j= 0

(k +i − j) / 2

(k −i + j) / 2

n=1

m =1

∏(2n − 1)

∏(2m − 1)

∀ even k + i - j and even k - i + j

(3-31)

Evaluating the variance expression in terms of θ shows that such a representation of variance is cyclic for all values of r and all values of σ. The maximum value of the variance expression is found when θ = 0, θ = 12 π , θ = π, θ = 32 π , or θ = 2π. This implies that without a loss of generality, the error propagated from uncorrelated noise through unnormalized 2D scaled derivatives can be bounded by 2k k   k r v0 k-n n   Max θ V ∑ (rcosθ ) (r sinθ) L ˜I , x n y k− n (p 0 | σ) = (2n − 1)  4σ 2 π 2 k σ 2k ∏   n =0 n =1

(3-32)

Given hx = r cos(θ) and hy = r sin(θ), consider the variance of the combined k-th order terms of the 2D Taylor expansion relative to the magnitude of h. That is, consider   k 1  n k-n Vmax ( T(k)) =  2 k  Maxθ  V ∑(r cosθ) (r sinθ ) L ˜I ,x n y k− n (p | σ)  r     n= 0 k v0 1 = ∏ (2n − 1) 4σ 2 π 2 k σ 2k n =1

(3-33)

where T(k) represents the contribution of the aggregate k-th order derivative terms of the Taylor expansion, divided by the weighting of the magnitude of the interpolation offset h. The value in equation (3-33) represents the 2D analog of the variance of 1D scaled derivatives shown in equation (3-24). hy h If the 2D offset vector h is normalized to hˆ = ( hˆ x , hˆ y ) = ( σx , σ ) , then it follows that given rˆ = r σ , hˆ x = ˆr cos(θ ) and hˆ y = rˆ sin(θ) . Repeating the analysis shown above for

the dimensionless Taylor expansion shown in equation (3-10) and using the scaled quantities for hˆ and ˆr generates an expression for the upper bound on propagated error from uncorrelated noise through normalized scaled derivatives:   k v0 ˆr 2k  n k-n Max θ V ∑ (rcosθ ) (r sinθ) Lˆ ˜I , x n y k− n (p | σ)  =   4σ 2 π 2 k   n =0

k

∏(2n − 1) n =1

(3-34)

Normalized Scale-Space Derivatives: A Statistical Analysis

41

If Tˆ (k) is the aggregate contribution of the k-th order normalized derivatives to the dimensionless Taylor expansion to be weighted by the normalized magnitude ˆr = r σ , then the normalized 2D variances of the k-th order, unweighted by ˆr is   k  1 n k -n ˆ n k − n (p | σ)  Vmax Tˆ (k) = 2k  Max V ∑ (ˆr cosθ) (ˆr sin θ ) L ˜I , x y  ˆr    n=0

( )

v0 1 = 4σ 2 π 2 k

k

∏ (2n − 1)

(3-35)

n =1

The results for both the normalized and unnormalized expressions are summarized in Table 3.5. Order ˆ (k) Vmax ( T(k)) Vmax T

( )

(k) 0 1 2 3 4 5 6

v0

v0

4 πσ2

4 πσ2

v0

v0

8 πσ 4

8 πσ 2

3v 0 16πσ

3v 0 6

16πσ

8

32 πσ

15 v 0 32 πσ

2

15 v 0 2

105v 0

105v 0

64 πσ10

64 πσ 2

945v 0

945v 0

128πσ12

128πσ 2

10395v 0

10395v 0

256πσ 14

256πσ 2

Table 3.5. Variances of both unnormalized and normalized scale-space derivatives (order 06) of noisy 2D images (variance of input noise = v0)

The equations of Table 3.5 are easily plotted and their results portrayed graphically in Figures 3.4 and 3.5. These plots resemble their 1D counterparts, and the conclusions drawn from them are the same as in the 1D case. Recapitulating the 1D results, in all cases the variances are monotonically decreasing with scale; however, if normalized values are considered in order to make cross-scale comparisons, the crossing of the variance values of the derivative values through increasing scale does not persist. As in the 1D case, under normalization the values of the variances of 2D derivatives are not strictly increasing with rising order of differentiation k. The first and second order normalized 2D derivatives propagate less noise than the scaled zeroth order intensities, regardless of scale. Figure 3.6 shows variances of 2D derivative terms Tˆ (k) plotted as a function of the order of differentiation and relative to the variance of the zeroth order scaled intensity value Tˆ (0) .

42

Image Geometry Through Multiscale Statistics

order

(

Ln Vmax (T(k ))

)

(

order 6 5 4 3 0 2 1

6 5 4 3 2 1 0

Ln Vmax ( Tˆ (k ))

)

Ln(σ) Ln(σ)

Figure 3.4. Propagated error of unnormalized 2D scale-space derivatives (order 0-6). Each curve represents the ratio of variance of output to input noise of the linear unnormalized derivative of Gaussian filter vs. scale σ. Plot is on a log-log scale.

Figure 3.5 Plot of the propagated error of normalized 2D scale-space derivatives (order 0-6). Each curve represents the ratio of variance of output to input noise of the linear normalized derivative of Gaussian filter vs. scale σ. Plot is on a log-log scale.

3.7. Discussion Human vision is used as a benchmark for evaluating image analysis systems. It has long been known that humans are better able to distinguish different levels of contrast than absolute levels of intensity. Human vision is capable of distinguishing objects relative to their surroundings regardless of variations in lighting. Scale space is often presented as a reasonable model for the overlapping receptive fields of the visual system. The results shown in this research demonstrate possible agreement between scale-space visual responses and the sensitivity of the human visual system to contrast rather than absolute intensity. I have analyzed the propagation of noise through scale-space intensity as well as scale-space derivative measurements. The results show that scale-space representations of first order, second order, and for 2D images, normalized scaled derivatives of up to the fifth order propagate less noise than the scalespace measures of absolute intensity.

  V max ( ˆT( k))  Ln   ˆ  V max ( T( 0)) 

order k

Figure 3.6. Plot of the propagated error of normalized 2D scale-space derivatives (order 0-6). Curve represents the ratio of variance of output to input noise of the linear unnormalized derivative of Gaussian filter vs. order of differentiation. Plot is on a log scale.

Normalized Scale-Space Derivatives: A Statistical Analysis

43

The implication is that a visual system based on scale-space gradients and scale-space curvature using the normalization suggested by Eberly should be more robust than a system based on absolute intensity. This implied agreement between scale-space analysis and the human visual system supplies circumstantial support for scale space as a visual model. This finding is dependent on the form of normalization used to achieve dimensionless scale-space measurements. Eberly suggests multiplying the scale-space derivative measurement by σk, where σ is the aperture of the scale operation, and k is the order of k differentiation. If (σ b ) is substituted for the normalizing term (where b is some constant), dimensionless scale-space measurements are still achieved. However, this change in normalization changes the shape of the plots shown in figure 3.3. and figure 3.6. The global minimum represented in each of these plots can be altered to lie between any two values of k. This sensitivity to the propagation of noise in multiscale analysis to the selection of a normalization term requires further study. 3.8. Conclusion I have derived analytic forms for the propagation of noise in an input signal through scale to calculated values for normalized scale-space derivatives. These expressions are based on the scale of the derivative-measuring kernel, the variance of the noise of the input signal, and the order of differentiation. A comparison of the propagated error in both unnormalized and normalized derivatives is presented. The improved stability of derivative measurements as scale increases is clear. The assertion that the relative stability and accuracy of higher order derivatives of image intensity at large scale is shown to be incorrect when spatial measurements are made relative to the scale aperture. An alternate finding is that for normalized scale-space derivatives, low order derivatives propagate less noise than either the zeroth order intensity measurement or derivatives of the third or higher orders. The first (gradient) and second (intensity curvature) spatial derivatives propagate less noise than the scale-space intensity value or derivatives of order four or higher. This finding is sensitive to the form of the normalization used to achieve dimensionless scale-space measurements. The normalization used in this study was suggested by Eberly. Dimensionless scale-space derivative measurements can be computed emphasizing the propagation of input noise to a lesser or greater degree. This sensitivity is of noise propagation to the normalization factor is still being explored.

44

Image Geometry Through Multiscale Statistics

Appendix : Covariance of Scale-Space Derivatives The following derivation is a reconstruction of second central moments of multiscale 1D images with Gaussian additive white (spatially uncorrelated) noise. This work is adapted from Blom [Blom 1992]. For a full derivation of the general multidimensional case, see the appendix of Chapter 4 of Blom’s dissertation. To calculate the covariance of 1D scale-space derivatives, consider the relation in equation (3-22).

(

) ∫

Cov L ˜I ,x i (x | σ) , L ˜I ,x j (x | σ) = v 0



−∞

∂i ∂x i

G(σ,τ) ∂x∂ j G(σ, τ)dτ j

(3A-1)

Integrating (3A-1) by parts yields

(

)

Cov L ˜I ,x i (x | σ) , L ˜I ,x j (x | σ) = v 0  − v0

(

∂ i ∂x





−∞

i

)(

G(σ,x)

∂ i +1 i+1 ∂x

)

∂ −1 j ∂x −1

G(σ,τ)

j

G(σ,x)

∂ j −1 j −1 ∂x

∞ −∞

 

(3A-2)

G(σ, τ)dτ

Note that the first term of (3A-2) vanishes for large x. That is,  v0 

(

∂ i ∂x i

G(σ, x)

)(

∂ j−1 ∂x j−1

)

G( σ, x)

∞ −∞

 =0 

(3A-3)

Repeating the integration by parts j times generates the following relationship

(

)

Cov L ˜I ,xi (x | σ) , L ˜I ,x j (x | σ) = (−1) v0 j

∫( ∞

−∞

∂+ i ∂x +j i j

G(σ,x))G(σ,x)dx

(3A-4)

x2

− 2 dx 1 e 2 σ , (3A-4) becomes If w = anddw = , then since G(σ, x) = σ 2π σ 2 σ 2

x

(

)

1  1  Cov L ˜I ,x i (x | σ) , L ˜I ,x j (x | σ) = (−1) v 0 2 2πσ  σ 2 

i+ j

j

∫( ∞

−∞

+

∂i j + ∂w i j

e−

w2

)e

−w

2

dw

(3A-5)

Invoking Rodrigues’ formula for Hermite polynomials: ∂k ∂z k

e −z = (−1)k Hk (z) e −z 2

2

(3A-6)

transforms equation (3A-5) into

(

)

1  1  Cov L ˜I ,x i (x | σ) , L ˜I ,x j (x | σ) = (−1) v 0 2 2πσ  σ 2  i

i+j





−∞

Hi + j (w) e −

2 w2

dw (3A-7)

Normalized Scale-Space Derivatives: A Statistical Analysis Consider only the integral element of (3A-7). Hermite polynomials:

45

Applying the recurrence relation of

H k (z) = 2z Hk −1 (z) − 2(k − 1) H k− 2 (z) ,

k

2

(3A-8)

transforms the integral element of (3A-7) to the following recurrence relation.





−∞



Hi + j (w) e −2w dw = 2

(2wH



−∞

(w) − 2(i + j − 1)H i + j−2 (w)) e −2w dw (3A-9) i+ j−1 2

simplifying to





−∞

Hi + j (w) e − =





dw

2wH i+ j −1 (w)e −

−∞



2w 2





−∞

2w2

dw

(3A-10)

2(i + j − 1)H i+ j− 2 (w) e −2 w dw 2

Integrating the first term of (3A-10) by parts yields





−∞

Hi + j (w) e −2w dw 2

 −2w 2 = − 12  H i+ j−1 (w)e

∫ − 2(i + j − 1)H ∫ +



−∞

1 ∂ 2 ∂w

∞ −∞

  (3A-11)

Hi + j−1 (w)e −2w dw 2



−∞

(w) e −2 w dw 2

i+ j− 2

Apply (3A-6), Rodrigues’ formula for Hermite polynomials, to the first term of (3A-11). 2 1 − Hi + j−1 (w)e −2w 2

∞ −∞

= Hi + j−1 (w)e − w e − w 2

= (−1) i + j−1 e − w

2

2

∂ i +j −1 ∂wi + j −1



(3A-12)

−∞

e −w

2

∞ −∞

=0 demonstrates that the first term vanishes for large values of w. Furthermore, combining (3A-6) and the recurrence relation shown in (3A-8) generates the following identity for Hermite polynomials:

46

Image Geometry Through Multiscale Statistics ∂ ∂z

((−1) e = 2z ((−1) e

H k (z) =

∂ ∂z

k

k

z

2

z

∂ k ∂z

e −z

∂ k ∂z

e −z

k

2

k

2

2

) )− ((−1)

k +1

ez

2

∂ k+1 ∂z k+1

e− z

2

)

= 2z Hk (z) − H k+1 (z)

(3A-13)

= 2z Hk (z) − (2z Hk (z) + 2k H k−1 (z)) = 2k Hk −1 (z) Using the result of (3A-12) and applying equation (3A-13), equation (3A-11) becomes





−∞

Hi + j (w) e −2w dw 2

=





−∞



(i + j − 1)H i+ j− 2 (w) e −2 w dw 2





−∞

2(i + j − 1)H i+ j− 2 (w) e −2 w dw 2



= −(i + j − 1)



(3A-14)

H i+ j− 2 (w) e −2w dw 2

−∞

Summarizing the results shown between (3A-7) and (3A-14) reveals an important recurrence relation regarding covariances among 1D scale-space derivatives.





−∞

2  Hi + j (w) e −2w dw = − (i + j − 1) 



2 H i+ j− 2 (w) e −2w dw 



−∞

(3A-15)

Applying the recurrence relation in (3A-15) to (3A-7) yields

(

)

Cov L ˜I ,x i (x | σ) , L ˜I ,x j (x | σ) = (−1) i v 0 = (−1)

i+1

1  1  2πσ 2  σ 2 

i+ j

1  1  v0 2πσ 2  σ 2 





−∞

i+ j

H i+ j (w) e −2y dw 2

 (i + j − 1) 





−∞

(3A-16)

2 H i + j−2 (w) e −2y dw 

Manipulating Rodrigues’ formula of (3A-6), it is clear that H0(z) = 1 and H1(z) = 2z. If i+j is odd, then (i+j-1)/2 is an integer value. Repeated application of (3A-15) to (3A-7) through (i+j-1)/2 iterations yields

Normalized Scale-Space Derivatives: A Statistical Analysis

(

47

)

Cov L ˜I ,x i (x | σ) , L ˜I ,x j (x | σ)

i + j −1  1  i+ ( 2 ) v 0 (−1) = 2πσ 2  σ 2 

= (−1) = (−1)

i+

i+j

i+j + 2  ( ) v 0  1   (i j−1) /(2r) ∏ i + j −1 2

2πσ  σ 2 



2

i+

 (i + j−1) / 2   (2r)  ∏  r= 0

r= 0











H 1 (w) e −2w dw  2

−∞

2w e −2w dw  2

−∞

2   1 −2w ( ) v 0  1   (i + j−1) /(2r) − e i+j

i + j −1 2

2πσ  σ 2 



2

∏ r= 0

,∀ odd i + j

(3A-17)



 −∞ 

2

 2

=0 Thus, for all odd i+j the corresponding covariance of the 1D scale-space derivatives is 0. This implies that odd derivatives are uncorrelated with even derivatives. If i+j is even, then (i+j)/2 is an integer value. Repeated application of (3A-7) to (3A-15) through (i+j)/2 iterations yields

(

)

Cov L ˜I ,x i (x | σ) , L ˜I ,x j (x | σ) i+ j v0  1  i+ = (−1) ( 2 ) 2πσ 2  σ 2 

= (−1)

i+

i+ j

 (i+ j) / 2  (2r − 1)  ∏  r=0 

i+ j ( ) v 0  1   (i+ j) / 2(2r − 1)  ∏ i+ j 2

2πσ  σ 2  2



r=0









−∞

2 e −2w dw    −∞

 v 0  1  i + j  (i+ j) / 2  1 ( ) (2r 1) = (−1) − ∏ i+

= (−1)

i+

i+ j 2

2σ π  σ 2 



r= 0

2 H 0 (w) e −2w dw 

 σ π





e

−∞



x2 σ2

 dx 

,∀ even i + j (3A-18)

i+j ( ) v 0  1   (i+ j) / 2(2r − 1) ∏ i+ j 2

2σ π  σ 2 



r= 0



Finally, since variances are a special case (i = j) of the covariance calculation, generating an expression for the variance of a 1D scale-space derivative is straightforward.

(

)

(

)

V L ˜I ,k (x | σ) = Cov L ˜I ,k (x | σ) , L ˜I ,k (x | σ) =

  1   k (2r − 1) ∏ 2k k  2σ π  σ 2   r =1 v0

(3A-19)

There is no conflict between causality and randomness or between determinism and probability if we agree, as we must, that scientific theories are not discoveries of the laws of nature but rather inventions of the human mind. Their consequences are presented in deterministic form if we examine the results of a single trial; they are presented as probabilistic statements if we are interested in averages of many trials. In both cases, all statements are qualified. In the first case, the uncertainties are of the form “with certain errors and in certain ranges of the relevant parameters”; in the second, “with a high degree of certainty if the number of trials is large enough.” - Athanasios Papoulis

Chapter 4

Multiscale Image Statistics When digital images are considered as arrays of observations made of an underlying scene, the vocabulary and calculus of statistics may be applied to their analysis. If an image is subject to noise in pixel measurement, it should be presented within the context of either known or computed properties of the pixel values. These properties include the sample size or raster resolution and statistics such as the variance of the additive noise. This is an introduction to the concept of multiscale image statistics. In particular, the next sections describe the generation of central moments of the local probability density of intensity values. A particular model of images as composed of piecewise regions having similar statistical properties (having similar probability distributions of intensity) is assumed for the construction of multiscale statistics. This model of images as samples of piecewise ergodic stochastic processes is presented after a brief introduction to provide a foundation for the rest of the chapter. Later sections present the construction of multiscale central moments of intensity. An earlier section in Chapter 2 describes the use of central moments to reconstruct the probability density function uniquely. The approach presented here outlines the generation of the central moments of the local intensity histogram of any arbitrary order. Later sections provide examples of these local central moments up to the fourth order. Properties of these moments are explained, and their behavior is compared with other common image processing operators. The multiscale central moments are generalized to images of two dimensions as well as multivalued images containing two values per pixel. Applications of multiscale central moments are included. In particular, the use of these measurements in the selection of control parameters in nonlinear diffusion systems for image processing are shown. 4.1. Background and Introduction Statistical pattern recognition is a discipline with a long and well established history. The literature is mature, and several texts have been written describing image analysis through the statistical methods. Filtering methods based on local neighborhood statistics such as median filtering can be found throughout the literature. Image contrast enhancement techniques based on histogram equalization have also been explored and are in use in medical as well as other production environments [Pizer 1987]. Numerous

50

Image Geometry Through Multiscale Statistics

methods for performing segmentation and classification of images based on statistical pattern recognition are well documented in various texts [e.g., Duda 1974]. Statistically based relaxation filters founded on the theory of Markov processes (Markov Random Fields) [Geman 1984, Chellappa 1993, Jain 1985] as well as relaxation strategies based on expectation-maximization methods [Dempster 1980] also have a long history. Geiger and Yuille provide a framework for comparing these and other segmentation strategies, including nonlinear diffusion discussed later in this chapter, in their survey of the common threads shared by different algorithms [Geiger 1991]. Typically, statistical methods in image processing employ the histogram of the image or some other means of representing the probability density function of the intensity values. This representation is most often computed at the maximum outer scale of the image. That is, the histograms, mixture models, or probability distribution approximations are computed across the whole image, including all pixel values equally. Image-wide probability density functions are commonly approximated as a Gaussian or linear combinations of multiple Gaussians. A maximum likelihood algorithm is usually then applied to classify individual pixel observations. Such methods seldom include local spatial trends or the geometry of the image as part of the statistical classifier. Maximum likelihood classifiers often employ image geometry in a post-process connectivity filter or, in the case of expectation-maximization methods, the classifier iterates between the maximum likelihood calculation and connectivity filtering. Exceptions to the generalization that statistics are computed at the outer scale of the image include the contrast enhancement method of adaptive histogram equalization. Adaptive histogram equalization (or AHE) and its derivatives (Contrast limited AHE or CLAHE, and Sharpened AHE or SHAHE) construct local histograms of image intensity and compute new image values that generate an equalized local probability distribution. [Cromartie 1995]. Early algorithms for AHE included calculating histograms over nonoverlapping rectangular neighborhoods and interpolation between equalized values. [Pizer 1987]. The choice of neighborhood operator was originally made on the basis of computational efficiency. Other exceptions include Markov random fields and sigma filters. Markov random fields (MRFs) filters apply maximum likelihood estimators over a local neighborhood [Geman 1984]. Techniques using sigma filtering also compute nonlinear smoothing functions based on a local sampling window [Lee 1983]. Local statistics within a well defined neighborhood are computed and the central pixel value is adjusted according to some function of those statistics. Questions often arise over the priors used in sigma filters and smoothing based on Markov random fields. Other questions arise over the selection of the neighborhood function. This chapter addresses the construction of robust statistics over a principled neighborhood function. The values that are proposed are local means and local central moments of intensity.

Multiscale Image Statistics

51

4.2. Images and Stochastic Processes This work assumes a particular model for images. As with most statistical pattern recognition systems, this research is based on the assumption that the input signal follows a Gibbs distribution. Stated loosely, a Gibbs assumption states that the value for the intensity at a particular location has compact local support. This research restates these assumptions using the language of stochastic processes (defined below). Restating and further illuminating this common assumption requires the following background material. 4.2.1. Stochastic Processes Chapter 2 defines an image to be a representation of some scene. The recording of the information within the scene is always subject to error of some kind (e.g., approximation error, measurement error, noise, discretization error, etc.) If the measurement of the scene is repeated, an identical image is not always acquired. However, there is usually a strong likelihood that the corresponding pixel values within two images of the same scene will have similar intensities. The study of stochastic processes enables the quantification and analysis of the predictability of image measurement, the likelihood of obtaining similar images upon repeated acquisition. A more complete version of the following discussion can be found in Papoulis’ introduction to random variables and stochastic processes [Papoulis 1991]. What is presented here is his organization modified from a time-based structure to one based on spatial location, transferring it to the framework of image processing. A stochastic process F is a mapping of locations in space to random variables. Representations for F include the function notation of F(p,ξ) which represents the ξth sample of the random variable located at position p. This notation is often abbreviated as F(p) when individual observations are not of interest, but rather the random variable itself. Formally, Definition: A stochastic process F (alternatively F(p,ξ) or F(p)) is a continuous mapping F : Rn → Ξ, where Ξ is a random variable. The domain of F is the set of all points p of an n-dimensional space: p ∈ Rn. The range of F is a random variable whose probability distribution function is F(x,p) = P{F(p)

x}

F is a function of the spatial variable p, and it gives the probability of the event {F(p) x} consisting of all outcomes ξ such that, at the specific location p, the samples F(p,ξ) of F do not exceed the value of x. The corresponding probability density function is f(x,p) such that f(x,p) =

∂F(x,p) ∂x

The definition given above describes F(p,ξ) as a continuous-space process since the domain of F is continuous over Rn. If F is a mapping from the space of integers (i.e.,

52

Image Geometry Through Multiscale Statistics

p ∈ Zn), then F is a discrete-space process. If the values of F(p,ξ) are countable, then F is a discrete-state process; otherwise it is described as a continuous-state process. 4.2.2. Images as Samples Paraphrasing Papoulis, F(p,ξ) has four interpretations: 1. F(p,ξ) is an ensemble of functions with p and ξ as variables. 2. It is a single function F(p0,ξ), where p0 is a constant, and ξ is allowed to vary. In this case F(p0,ξ) is called the state of the process at p0. 3. F(p,ξ0) is a single function (or a sample of the given process) where ξ0 is fixed, and p is allowed to vary. 4. If ξ0 and p0 are constant, F(p0,ξ0) is a scalar value. Using the interpretation 3 above, the process of capturing the intensity values of a scene to form a digital image I(p) can be considered to be a sample from a discrete-space discrete-state stochastic process F(p). This interpretation assigns I(p) = F(p,ξ0), for some ξ0, as a family or ensemble of samples one from each pixel location p. This view of images is a natural one. Consider the acquisition of still images of a stationary scene using a video camera. If there is noise in the input signal, two images acquired at slightly different times I0(p) = F(p, ξ0), and I1(p) = F(p, ξ1), while not identical, would be subject to the same noise processes. Optical distortions, manifesting themselves as spatial functions, would exhibit themselves identically in each image. Color shifts, variable sensitivity of the detector grid, radio frequency noise and amplifier noise would not generate the same values on repeated sampling, but would follow the same behavior for each location p. Given an ensemble of images of the same scene {I0(p), I1(p), ... In(p)}, (equivalently a large set of samples {F(p, ξ0), F(p,ξ1), ... , F(p, ξn)} of process F) where n is a large number, the expected, average or mean intensity value M(I(p)) of pixel p can be estimated using the following calculation. 1 n 1 n µ I (p) = µ F (p) = F(p) ≈ ∑ Ii (p) = ∑ F(p,ξ i ) n i=1 n i=1

(4-1)

Note the natural association between the expected value or mean of the stochastic process and the mean of the sample set of images. The variance of the sample set of images V(I(p)) can be calculated in a similar fashion, with a corresponding relationship to the variance of F. µ(I2) (p)

(2) = µ F (p)

1 n 1 n 2 ≈ ∑ (Ii (p) − µ I (p)) = ∑ F(p,ξ i ) − µ F (p) n i=1 n i=1

(

)

2

(4-2)

Multiscale Image Statistics

53

The notation µ(F2) (p) refers to the second central moment of F at location p. The order of the moment is indicated by the superscript value. The parenthesized superscript denotes that this is a central moment; that is, that this moment is calculated about the mean of F at p. A general form for central moments of F at p given an ensemble of samples is µ(Fk) (p) =

(

)

F(p) − µF (p)

k

=

1 n ∑ F(p,ξ i ) − µ F (p) n i=1

(

)

k

(4-3)

4.2.3. Ergodicity Stochastic processes are not functions, but mappings to random variables. When using real data it is not always convenient or possible to acquire sufficient samples of a single process F(p) to generate accurate information regarding the probability density function of the random variable for each location p. In many real examples in image processing, only one image is provided, not several. If F may be assumed to be stationary, that is the probability densities of the random elements of F are identical independent of p, it is possible to use these assumptions or properties to perform spatial averaging in place of averaging across many samples. The concept of ergodicity describes these conditions when a practitioner may trade spatial averaging for sample averaging. Definition: Consider the stochastic process F(x) where x ∈ R1. A stochastic process F(x) is said to be mean-ergodic if for some fixed sample ξ0 as d → ∞ the following condition holds

∫ w(τ;d)F(x - τ,ξ )dτ →µ (x) ∞

d→ ∞

0

−∞

 0 if x ≤ − d2 w(x;d) =  1d if − d2 < x ≤  0 if d < x  2

F

d 2

(4-4a)

(4-4b)

where the definition of the mean value µ F (x) is described in equation (4-1). Notice that (4-4) is equivalent to a convolution of the function F(x, ξ0 ) with a zerocentered square pulse function of height 1 d and width d centered at x. The definition of mean-ergodicity as shown above can easily be generalized to processes of higher spatial dimensions. The concept of ergodicity can also be generalized from equation (4-4) to higher order central moments. For example, Definition: A stochastic process F(x) (where x ∈ R1), is said to be varianceergodic if for some fixed sample ξ0 as a → ∞ the following condition holds





−∞

(

)

2

w(τ;d) F(x - τ,ξ0 ) - µ F (x) dτ  → µF(2) (x) d→∞

(4-5a)

54

Image Geometry Through Multiscale Statistics  0 if x ≤ − d2 w(x;d) =  1d if − d2 < x ≤  0 if d < x  2

d 2

(4-5b)

where the definition of the second central moment µ F(2) (x) is shown in equation (4-2). Within the integral, the mean value term is relative to the position x, rather than to the index of integration τ. That is, the right value in the squared term of the integral is µF(x) and not µF(τ). Given these definitions of mean-ergodicity and variance-ergodicity, it can be shown that if a process F is variance-ergodic, it must also be mean-ergodic. The converse of this statement, however, is not true [Papoulis 1991]. Ergodicity may be generalized to even higher moments. If a process is ergodic in the strict sense, increasing the spatial measurement window about a pixel of a single sample of the process uniquely specifies the probability density function for the stochastic process. Further exploration of these ideas is beyond the scope of this dissertation. If a process has a constant value for some observable moment of its distribution across space, it can be considered to be ergodic in a weak sense. That is, if the mean value of a process varies across space but the variance remains constant about that mean, then the process can be considered to be variance ergodic in the weak sense. 4.2.4. Ergodicity and Images If an image, which is a representation of a wider scene, is considered to be a sample of a completely ergodic process, where “ergodic” is defined in the strict sense, then the scene itself is of little interest since its expected brightness is constant, essentially a grey field. The image portrays significant information about the noise in the acquisition process, but little other information. What about images of scenes that have varying brightness and contrast? This section introduces piecewise or limited definitions of ergodicity. This distinction and its ramifications make the following definitions more applicable to image processing tasks. Definition: A process F(x) (where x ∈ R1), is piecewise mean-ergodic if it can be partitioned into intervals such that for each interval [a,b] where a x b: 1 ( b−a )

∫ F(τ,ξ)dτ = µ (x) + ε b

F

a

where ε → 0 as (b - a) → ∞

(4-6)

Definition: A process F(x) (where x ∈ R1), is piecewise variance-ergodic if it can be partitioned into intervals such that for each interval [a,b] where a x b: 1 ( b−a )



b

a

(F(τ , ξ) - µ

(2) (x)) dτ = µ F (x) + ε where ε → 0 as (b - a) → ∞ 2

F

(4-7)

Multiscale Image Statistics

55

As in the previous definitions, definitions of higher order central moments and for higher dimensions may be inferred from these cases. Given a single sample I(x) = F(x,ξ0) of a piecewise ergodic process F, it is not possible to recover either µ F (x) or µ F(2) (x) completely since the partitioned intervals limit the averaging process. However, some reduction of the variance of the estimates of µ F (x) and µ (2) (x) may still be achieved through spatial averaging. If the boundaries of F the partitions are known a priori, an optimal estimate of both µ F (x) and µ F(2) (x) can be calculated from a given sample I(x). If the boundaries of the partitions of F are not known, the problem of optimally estimating µ F (x) and µ (2) (x) from an image sample I(x) is underspecified. Without the F size (or scale) of the intervals, µ F (x) may be estimated using equation (4-1) by varying the interval width |b-a| for each location x and selecting an interval size based on some criterion. A regularizing sampling kernel is required to handle these uncertain boundary positions and the randomness of F. This regularization requirement is the basis for the research presented in this chapter, the development of multiscale techniques for estimating and evaluating the local probability densities in an image. 4.3 Multiscale Statistics Without a priori knowledge of the boundaries and the object widths within an image, locally adaptive multiscale statistical measurements are required to analyze the probability distribution across an arbitrary region of an image. This section presents multiscale image statistics, a technique developed through this research for estimating central moments of the probability distribution of intensities at arbitrary locations within an image across a continuously varying range of scales. The piecewise ergodic nature of the image is an underlying assumption of these developments. The definitions of mean and variance ergodicity in equations (4-4) and (4-5) imply the measurement of central moments in a local neighborhood of varying size about a point. Consider a set of observed values, ˜I(x) ⊂ R1 , where for purposes of discussion the location x ∈ R1, but can easily be generalized to Rn. The values of ˜I(x) may be sampled over a local neighborhood about a particular location x using a weighting function, ω(x), and the convolution operation, ˜I(x) ⊗ ω(x), where ˜I(x) ⊗ ω(x) =





ω(τ)˜I (x − τ)dτ =

−∞





ω(x − τ)˜I (τ)dτ

−∞

(4-8)

A regularizing sampling kernel is desired. To avoid a preference in orientation or location, the sampling function should be invariant with respect to spatial translation and spatial rotation. As with all probability weighting functions it is essential that ∞ ∫−∞ ω (τ )dτ = 1. One function that meets the above criterion is a normalized Gaussian function. Therefore, let

56

Image Geometry Through Multiscale Statistics

ω(x) = G(σ, x) =

1 σ 2π



e

x2 2σ 2

(4-9)

where the parameter σ represents the width of the sampling aperture. 4.3.1. Multiscale Mean Let the scale space measurement comprised of a sum of the original image intensities weighted by a Gaussian sampling kernel be the average or expected value of ˜I(x) over the neighborhood defined by the aperture of size σ. This local mean is µ ˜I (x | σ) = ˜I (x);σ =

neighborhood ( σ )

∑ ω(τ) ˜I(x − τ) = ∫ G(σ,x − τ)˜I (τ)dτ ∞

(4-10)

−∞

τ= x

where ˜I (x); σ is read as the expected value of ˜I(x) measured with aperture σ. This definition follows from the assumption that the observed values ˜I(x) represent a single sample from a mean-ergodic (or piecewise mean-ergodic) stochastic process. The effect of a multiscale statistical operator can be viewed through its response to the input of a square pulse function. The resulting pulse transfer function is the output of a multiscale statistical operator acting upon a simple piecewise ergodic input signal. A point transfer function, the result of applying the multiscale statistical operator to a Dirac delta function input is not defined; statistics cannot be generalized from a single sample. For the purposes of this discussion, the assumed input signal is P(d, x), a square pulse function centered at the origin with a spatial width of d and a height of 1/d (See Figure 4.1.). Note that lim P(d,x) = δ(x). d→ 0

1/d

d Figure 4.1a. 1D square pulse function P(d, x). Used as the input for generating pulse transfer functions.

Multiscale Image Statistics

57

Figure 4.1b. 1D square pulse functions P(1, x), P(2, x), P(4, x), P(8, x). From left to right: d = 1, d = 2, d = 4, d = 8; lim P(d,x) = δ(x) . d→ 0

The relationship between object width and the aperture of the multiscale statistical operator can be seen by applying the statistical mean operator at a variety of scales. Alternately, a statistical operator may be applied to square pulse inputs of various widths. Throughout this chapter the relationship between object and operator scale of the multiscale mean and higher order multiscale central moment operators will be presented by applying the operator to square pulse inputs of varying widths. An analysis of the relationship between object scale and operator aperture is found in Section 4.5. The 1D pulse transfer function for the multiscale mean operation is described in the following equation and shown in Figure 4.2 for varying values of d. µ P( d,x) (x|σ) = P(d,x) ⊗ G(σ,x) = erf(x) is the standard error function, erf (x) =



 x+ d2  1 erf 2d σ 2  x

−∞

x− d 1 erf  2  − 2d σ 2

(4-11)

G(1, τ)dτ . As the scale of the operator

decreases relative to the size of the object or pulse, it provides a better approximation to the original input signal.

Figure 4.2. 1D Pulse transfer function for the multiscale mean operator

µ P( d,x) (x|σ) for

σ = 1. From left to right: d = 1, d = 2, d = 4, d = 8. The dashed lines represent the input pulse function P(d,x). Note the difference in spatial and intensity ranges in each plot.

4.3.2. Multiscale Variance It is straightforward to calculate a value for the local variance over the neighborhood specified by the scale parameter σ. Equation (4-12) describes the local variance of intensity about a point x at scale σ.

58

Image Geometry Through Multiscale Statistics

(˜I(x) − µ (x | σ)) ; σ 2

(x | σ) = µ (2) ˜I

˜I

( ) ∫ = G(σ,x − τ)(˜I (τ)) dτ − (µ (x | σ)) ∫ =



−∞

2 G(σ,x − τ) ˜I (τ) − µ ˜I (x | σ) dτ



(4-12)

2

2

˜I

−∞

( ) ( 2

= G(σ,x) ⊗ ˜I(x) − µ ˜I (x | σ)

)

2

The point transfer function of the local variance operator is not defined for the Dirac delta (2) function δ(x) (i.e., µ δ(x) (x | σ) does not exist). However, the multiscale variance operation can be visually portrayed through its pulse transfer function µ (2) (x | σ) . The P (d,x) multiscale variance of a pulse transfer function is µ (2) P (d,x ) (x | σ) = G(σ,x) ⊗ (P(d,x)) − (G(σ,x)⊗ P(d,x)) 2

=

( ( ) 1 2 d2

erf

x + d2

σ 2

− 2 1d2 erf

2

( )) ( ( ) x− d2

σ 2



1 2d

erf

x +d2

σ 2

− 21d erf

( )) x − d2

2

(4-13)

σ 2

Figure 4.3 shows the multiscale variance operator applied to a square pulse P(d, x) for varying values of d.

Figure 4.3. 1D Pulse transfer function for the multiscale variance operator

µ(P(2)d,x) (x|σ) for

σ = 1. From left to right: d = 1, d = 2, d = 4, d = 8. Note the difference in spatial and intensity ranges in each plot.

The function shown in equation (4-13) is interesting in its resemblance to the square of the scale space gradient magnitude function (e.g., in the 1D case, 2 2 2 ∇P(d,x | σ) = ( ∂∂x P(d, x | σ)) = ( ∂∂x G(σ, x) ⊗ P(d, x)) ). Both are invariant with respect to rotation and translation, and both have similar responses to a given input stimulus. For example, Figure 4.4 portrays the variance calculation and the square of the scale-space 1D gradient magnitude of P(d,x).

Multiscale Image Statistics

59

Figure 4.4. Comparison of the 1D Pulse transfer function for the multiscale variance operator (x | σ) with σ = 1 to the square multiscale gradient magnitude operator. Top row, µ (2) P (d,x)

(x | σ) . Bottom row: ( ∂∂x P(d,x | σ)) . From left to right: d = 1, d = 2, d = 4, d = µ (2) P (d,x) 2

8. Note the difference in spatial and intensity ranges in each plot.

4.3.3. Multiscale Skewness and Kurtosis The third and fourth local central moments are easily calculated in a similar fashion. The multiscale third central moment is µ (˜I3 ) (x | σ) = =





−∞

(

)

3

G(σ, x − τ) ˜I (τ) − µ ˜I (x | σ) dτ

( ) ∫ G(σ,x − τ)(˜I(τ)) dτ ∫ + 3(µ (x | σ)) G(σ, x − τ)I(τ)dτ − (µ (x | σ)) ∫ ∞

−∞

3

G(σ, x − τ) ˜I (τ) dτ − 3µ ˜I (x | σ) 2

˜I



2

−∞



3

˜I

−∞

(4-14)

( ) − 3µ (x | σ) G(σ, x) ⊗ (˜I (τ))  + 2(µ (x | σ)) = G(σ, x) ⊗ (˜I(τ)) − 3µ (x | σ)µ (x | σ) − (µ (x | σ))

= G(σ, x) ⊗ ˜I(τ)

3

2

3

˜I

3

˜I

(2 )

˜I

˜I

3

˜I

The third central moment is demonstrated visually through its pulse transfer function across a range of pulse widths in Figure 4.5.

Figure 4.5. 1D Pulse transfer function of

(x | σ) with σ = 1. From left to right: µ (3) P (d,x)

d = 1, d = 2, d = 4, d = 8. Note the difference in spatial and intensity ranges in each plot.

The response of the multiscale third central moment of a square pulse µ (3) (x | σ) is P (d,x) similar to the multiscale first derivative of a pulse stimulus. Although the magnitude of the responses of the two operations are often an order of magnitude apart, the

60

Image Geometry Through Multiscale Statistics

correspondence between the shapes of the two curves is remarkable. The two functions are compared in Figure 4.6.

µ P (d,x) (x | σ) with σ = 1 to ∂x∂ P(d,x | σ ) with σ = 1. Top row, (3) µ P (d,x) (x | σ) . Bottom row: ∂x∂ P(d,x | σ ) . From left to right: d = 1, d = 2, d = 4, d = 8.

Figure 4.6. Comparison of

(3)

Note the difference in spatial and intensity ranges in each plot.

The multiscale fourth central moment is shown and simplified in equation (4-15). µ ˜I (x | σ) = (4)

=





−∞

(

)

4

G(σ, x − τ) ˜I (τ) − µ ˜I (x | σ) dτ

( ) ∫ G(σ,x − τ)(˜I (τ)) dτ ∫ +6(µ (x | σ)) G(σ, x − τ)(˜I(τ)) dτ − 4(µ (x | σ)) G(σ, x − τ)˜I (τ)dτ ∫ ∫ ∞

−∞

4

G(σ, x − τ) ˜I (τ) dτ − 4µ ˜I (x | σ) ∞

2

˜I



3

−∞

˜I

−∞

(

+ µ ˜I (x | σ)



3

2

−∞

) ∫ G(σ, x − τ)dτ ∞

4

−∞

( )

= G(σ,x) ⊗ ˜I(x)

4

( ) +16(µ (x | σ)) − 6(µ (x | σ))  G(σ,x) ⊗ (˜I(x))  + 5(µ (x | σ)) = G(σ,x) ⊗ (˜I(y)) − 4µ (x | σ)µ (x | σ) −6(µ (x | σ))  G(σ, x) ⊗ (˜I(x))  + 6(µ (x | σ)) − (µ (x | σ)) = G(σ,x) ⊗ (˜I(x)) − 4µ (x | σ)µ (x | σ) − 6(µ (x | σ)) µ ( x | σ) − (µ (x | σ)) ( )

(

)

2 2 3  −4µ ˜I (x | σ) G(σ, x) ⊗ ˜I(x)  + 12 µ ˜I (x | σ)  G(σ, x) ⊗ ˜I (x)  4

2

2

4

˜I

˜I

˜I

4

˜I

2

(3) ˜I

4

2

˜I

˜I

(4-15)

4

˜I

4

˜I

(3) ˜I

2

˜I

(2 ) ˜I

4

˜I

The fourth central moment is demonstrated visually through its pulse transfer function across a range of pulse widths in Figure 4.7.

Multiscale Image Statistics

Figure 4.7. 1D Pulse transfer function of

61

(x | σ) with σ = 1. From left to right: d = 1, µ (4) P (d,x)

d = 2, d = 4, d = 8. Note the difference in spatial and intensity ranges in each plot. (4) The function µ P(d,x) (x | σ) has a response similar to the square of the scale-space curvature or second derivative measure of a pulse stimulus (e.g., in 1D the square of the 2 2 multiscale curvature of a pulse P(d,x) is ( ∂x∂ 2 P(d, x | σ))2 = ( ∂x∂ 2 G(σ, x) ⊗ P(d,x)) 2 ). At relatively large apertures the two curves take on similar properties. The two functions are compared in Figure 4.8.

2 (x | σ) with σ = 1 to ( ∂x∂ 2 P(d, x | σ)) with σ = 1. Top µ (4) P (d,x) 2 2 row, µ (4) (x | σ) . Bottom row: ( ∂x∂ 2 P(d, x | σ)) . From left to right: d = 1, d = 2, d = P (d,x) 2

Figure 4.8. Comparison of

4, d = 8. Note the difference in spatial and intensity ranges in each plot.

4.3.4. Invariance with respect to linear functions of intensity As specified before, the selection of the Gaussian distribution as the sampling kernel was motivated by a desire for the sampling filter to be invariant with respect to particular transformations of x. It may be desirable to analyze the sampled measurements of the array of ˜I(x) values in dimensionless units (i.e., invariant with respect to certain transformations of ˜I ). The dimensions of thethe third and fourth central moments shown above are subject to exponentiation by the order of the moment calculation. Dimensionless measurements may be obtained by normalizing the central moments with powers of the square root of v0, the variance of the input noise (if known). The resulting measures are described as skewness and kurtosis. Their local manifestations, given a sampling aperture σ, are defined as (3 )

Local Skewness: γ ˜I (x | σ) =

µ ˜I(3 ) (x | σ)

(

v0 )

3

(4-16)

62

Image Geometry Through Multiscale Statistics

(4 )

Local Kurtosis: γ ˜I (x | σ) =

µ ˜(I4 ) (x | σ)

(v )

4

(4-17)

0

In the normalizations shown above, v0 is used rather than the calculated second central moment µ (2˜I ) (x | σ). In the case where the neighborhood about a pixel is contiguous and ergodic, µ (2˜I ) (x | σ) can be used. However, under the piecewise ergodic assumption, discontinuities introduce bias into the value µ (2˜I ) (x | σ), making it a poor estimate of v0 where boundary conditions exist. This suggests a different form of multiscale statistical analysis to overcome this bias. Directional analysis methods that deemphasize the bias in multiscale central moment calculations introduced by local image geometry is the topic of Chapter 5. 4.4. Other Multiscale Central Moments The general form for the multiscale central moment of order k of ˜I(x) is given by

(˜I(x) − µ (x | σ )) ;σ = G( σ,x) ⊗ (˜I (x) − µ (x | σ)) = G(σ, x − τ )(˜I( τ) − µ (x | σ )) dτ ∫

µ (k) ˜I (x | σ) =

k

˜I

k

˜I



−∞

(4-18)

k

˜I

Although higher moments than the fourth central moment may be also of interest, the remainder of this discussion will address the nature of scale, noise and extensions of this concept of moments to multiple dimensions as well as to images containing multiple values per pixel. 4.5. Characteristics of Multiscale Image Statistics It is important to recognize multiscale image statistics as central moments of the local probability distribution of intensity values taken from the neighborhood about a pixel location. Given the ensemble of all orders of these central moments, it is possible to reproduce the statistical behavior of the input signal and its noise properties at a particular location x in the image ˜I(x) . These moments also capture some information of the local image geometry. Multiscale image statistics may be illuminated by contrasting them with other image processing concepts. Such comparisons can lead to deeper insights into the nature of multiscale central moments of intensity. 4.5.1. Multiscale Statistics vs. Difference of Gaussian Operators A cursory glance at the mathematical form for the k-th order multiscale central moment of intensity in equation (4-18) might falsely suggest that these moments are simply a form of contrast measurement by a difference of two Gaussian operators (DoG) raised to the k-

Multiscale Image Statistics

63

th power. There are some crucial differences between an exponentiated difference of Gaussian operation and the multiscale central moments described above. In difference of Gaussian processing, an image is convolved with two Gaussian operators of differing aperture. The two filtered images are then subtracted to produce a resultant image that emphasizes boundary information within the original image. The process of filtering an image via a difference of Gausian operator and raising the result to the k-th power is formally described as follows:

(DoG(˜I (x);σ , σ )) = (G(σ ,x) ⊗ (˜I(x))− G(σ ,x) ⊗ (˜I (x))) k

a

= 





−∞

k

b

a

b

( ) ∫

G(σ b ,x − τ) ˜I (τ) dτ −



−∞

( )

G(σ a , x − ν) ˜I(ν) dν 

(4-19)

k

To simplify the comparison, equation (4-18) can be further simplified to the following expression.

(

)

(k) µ ˜I (x | σ) = G(σ,x) ⊗ ˜I(x) − µ ˜I (x | σ)

=



k

∫ G(σ,x − τ) ˜I (τ) − ∫ G(σ,x − ν)˜I(ν)dν ∞

−∞



(4-20) k



−∞

Contrasting equation (4-19) and equation (4-20), their differences are immediately apparent. The DoG operation has two separate aperture parameters that govern its behavior where multiscale statistics use a single aperture. A more important distinction is the association of the exponential term. In a difference of Gaussian image raised to the kth power, the difference of two filtered signals is exponentiated. In multiscale statistics the difference between the original input image and a filtered image is taken before being exponentiated and then filtered. Since convolution is a weighted summation process, exponentiated unsharp masking and multiscale statistics may be distinguished as follows: the exponentiated difference of Gaussian process is a power of the difference of two weighted sums. Multiscale central moments of intensity are weighted sums of an exponentiated difference. More simply, this is another example where the square of the sums does not equal the sum of the squares. To illuminate the difference between these two forms of image measurement, a comparison between variance and the square of the DoG response to a pulse input is shown in Figuire 4.9. Two different aperture selections are shown for the DoG filter. These results demonstrate that the response of the DoG filter is sensitive to the selection of the aperture size parameters.

64

Image Geometry Through Multiscale Statistics

a.

b.

c.

µ P (d,x) (x | σ) with DoG(P(d,x); σa, σb). The input function is a (2) pulse P(d,x) with d = 1. a. µ P (d,x) (x | σ) with σ = 1, b. DoG(P(d,x); σa, σb) with σa = σ b 2 , Figure 4.9. Comparisons of

(2)

σb = 1, and c. DoG(P(d,x); σa, σb) with σa = 0, σb = 1

4.5.2. Multiscale Moments of Intensity vs. Moment Invariants of Image Functions The unmodified term “central moment” is ambiguous when taken in the context of image processing. There is a family of methods for image analysis describing image geometry that includes the concepts of moments and central moments. These measurements are distinct from the concept of statistics of local image intensities. Hu introduced the family of moment invariants, taking advantage of the moment theorem that provides a bijection from derivatives in image space to moments in frequency [Hu 1962]. In 1D the calculation for computing the regular moment mk moment of image continuously differentiable function ˜I(x) is shown in equation (4-21). ∞

k mk = ∫−∞ τ ˜I (τ )dτ

(4-21)

To compute central moments, the spatial index of integration τ is offset to the image centroid calculated in 1D as (m1 m0 ). Central moments m(k) of the input image ˜I(x) are defined as ∞

k m(k) = ∫−∞ (τ − m1 m0 ) ˜I (τ )dτ

(4-22)

It is possible to postulate the existence of m(k)(x|σ), a multiscale locally adaptive version of these moment invariants. Using a Gaussian as the neighborhood function and using a normalization consistent with the moment theorem, the formalization of multiscale locally adaptive moment invariants becomes ∞

m(k ) (x | σ) = ∫ e − τ −∞

σ

2 2

k ( τ − m1 m0 ) ˜I( τ)dτ

(4-23)

From these basic equations it is clear that moment invariants and multiscale image statistics are very different. Moment invariants are applied in the spatial domain while image statistics are applied in the intensity domain. Moment invariants capture information about image geometry; the Taylor reconstruction of the infinite set of central moments of the image function yields the original image ˜I(x). Multiscale image statistics capture information about the histogram of pixel values within an image; the Taylor reconstruction of the infinite set of central moments of intensity generates the probability distribution function of ˜I(x).

Multiscale Image Statistics

65

4.6. Measurement Aperture, Object Scale, and Noise How does the error associated with the additive noise propagate through multiscale image statistics? In particular, how does noise affect the calculation of the local variance or second central moment? What is the relationship between noise and image geometry? Assume an image function with additive, zero-mean, Gaussian distributed, spatially uncorrelated, “white” noise ˜I(x) = (I(x) + u˜ (x)) where u˜ (x) is a random variable with zero mean, variance of v0, and no spatial correlation. That is, u˜ (x) ~ N(0,v0). Also, u˜ (x 0 ) and u˜ (x1 ) are Gaussian distributed, zero-mean, independent, identically distributed random variables for all spatial coordinates x0 ≠ x1. Let the scale-space representation of ˜I(x) where σ is the scale or measurement aperture be ˜I(x | σ) = G(σ,x) ⊗ (I(x) + u˜ (x)) . (2) (2) Consider M(µ (2) ˜I (x | σ)) = µ ˜I (x | σ ) , the mean of the local variance µ ˜I (x| σ) .

Applying the calculus of expected values to µ (2) ˜I (x| σ) generates the following expression. (2) 2 2 M(µ ˜I (x | σ)) = G(σ, x) ⊗ (˜I(x)) − (G(σ,x) ⊗ ˜I (x)) 2 2 = G(σ, x) ⊗ (˜I(x)) − (G(σ,x) ⊗ ˜I (x))

= G(σ,x)⊗ (I(x))2 + 2 G(σ,x) ⊗ I(x)˜u(x) + G(σ,x) ⊗ (u˜ (x))

2

− (G(σ,x) ⊗ I(x))

(4-24)

2

−2 (G(σ,x) ⊗ I(x))(G(σ,x) ⊗ u˜ (x)) − (G(σ,x) ⊗ ˜u(x))2 = G(σ,x) ⊗ (I(x))2 + 0 + (˜u (x))2 − (G(σ,x) ⊗ I(x))2 − 0 − 2 σ1 π (˜u(x))2 Since the variance of u˜ (x) is defined to be v0, equation (4-24) simplifies to the following expression.

(

M(µ (2) (x | σ)) = G(σ,x) ⊗ (I(x))2 − (G(σ, x) ⊗ I(x))2 + 1− ˜I

1 2σ π

)v

0

(4-25)

4.6.1. Noise Propagation in Multiscale Statistics of an Ergodic Process Increasing the aperture of the multiscale statistical measurement operator improves the measurement by decreasing the variance of the reported value through spatial averaging. This trend holds as long as discontinuities in the image are not encountered. In the absence of discontinuities, that is, with an image that is a sample of a complete ergodic process, the relationship between scale and variance can be studied. Let I(x) be a constant function (i.e., let I(x) = c). Then ˜I(x) is ergodic. With a constant expected value across the image, multiscale statistics reflect the ergodic properties of the image as scale increases. In other words, there is a satisfying correspondence between the scale of the multiscale central moment of intensity operator and the measurement interval described in the definitions of ergodicity described in

66

Image Geometry Through Multiscale Statistics

equations (4-4) and (4-5). Specifically, it can be shown that multiscale statistics can be used to demonstrate mean-ergodicity using the following two relations.   µ ˜I (x | σ) = G(σ, x) ⊗ ˜I (x)  σ→ ∞ → µ I (x) = c

(

) (µ (x | σ) − (G(σ,x) ⊗ I(x)))

2

Consider V µ ˜I (x | σ) =

, the variance of µ ˜I (x | σ) .

˜I

) (

(

)

V µ ˜I (x | σ) = V G(σ,x) ⊗ ˜I (x) =

(4-26)

1 2σ π

v0  σ→ ∞ →0

(4-27)

Equation (4-27) shows how µ ˜I (x | σ) converges to I(x) as a function of the initial variance v0 and scale σ. The relationship in equation (4-27) is derived in Chapter 3. Using multiscale statistics, it is also possible to show that ˜I(x) is variance-ergodic. (2) Moreover, a closed form for the convergence of the local variance measure µ ˜I (x| σ) can be derived. Consider the expected value of the local variance given a constant function I(x).

(

)

( )v

)

M µ ˜I (x | σ) = G(σ,x) ⊗ ( I(x)) − (G(σ,x)⊗ I(x)) + 1 − 2 σ1 π v 0 (2)

2

2

(

= G(σ,x) ⊗ c − ( G(σ,x) ⊗ c) + 1− 2 σ1 π 2

2

2

 ∞ = G(σ,τ)c dτ − G(σ,τ)c dτ + 1− 2 σ1  −∞  −∞





2



2

(

0

π

)v

0

 ∞ = c G(σ, τ)dτ − c G(σ,τ) dτ + 1 − 2 σ1 π v 0  −∞  −∞ 2







( )v

(

)

)

= c 2 − c 2 + 1 − 2 σ1 π v 0

(

= 1 − 2 σ1 π

0

(4-28)

Equation (4-28) implies that as scale increases the expected value of the multiscale variance approaches a constant value.

(

) (

)

M µ ˜I (x | σ) = 1 − 2 σ1 π v 0 → v0 (2)

(4-29)

σ → ∞

As scale decreases, µ ˜I (x| σ) becomes unstable. If σ < 2π1 , µ ˜I (x| σ) is negative, an undesirable attribute for a measure of the variance of a random variable. However, in the context of discrete statistics, it is consistent with an attempt to compute central moments from small numbers of discrete samples. It is impossible to generalize statistics of a population from a single sample. Estimating statistics from a fraction of a sample can yield nonsensical negative values for variance and for all even order moments. (2)

(2)

Multiscale Image Statistics

67

4.6.2. Noise Propagation in Multiscale Statistics of a Piecewise Ergodic Process Most images are not ergodic in the strict sense; they contain discontinuities or boundaries denoting separate regions and objects within the image. If an image is a sample of a piecewise ergodic process, it is not possible to increase the aperture of a measurement operator to infinity without introducing bias from object boundaries. This section extends the previous discussion on the interaction between scale and noise to include boundary information. Consider the simplest piecewise ergodic 1D image, a step function. Unlike the earlier pulse transfer function examples which were chosen to reflect the symmetry of the multiscale statistical operators, this example uses a single step. The mathematics are more easily presented and the effects of the discontinuity remain clear with this type of input function. Let I(x) be a step function T(h, x), such that ˜I(x) = (T(h,x) + u˜ (x)) where 0 if T(h,x) =  h if

x Vxy2, then the eigenvalues are both positive and real, so the covariance matrix is positive definite. If VxxVyy = Vxy2, the rank of the covariance matrix is not full, and there is a single eigenvalue representing isotropically distributed variance. The eigenvalues of the local covariance matrix λ1 and λ2 are principal values revealing information regarding the shape and structure of local image geometry in much the same way as the eigenvalues of the Hessian describe the intensity surface. λ1 and λ2 are invariant with respect to image rotation, translation and zoom (simultaneous multiplication of image resolution and measurement aperture). The eigenvectors u and v corresponding to λ1 and λ2, respectively, are u=

v=

   

(λ1 −Vyy )

Vxy

,

2

Vxy 2 +( λ 1 − Vyy )

Vxy 2 + ( λ1 −Vyy )2

 

and ( λ2 − Vyy ) Vxy 2 + ( λ2 − Vyy )2

,

(5-21) Vxy Vxy 2 +( λ2 − Vyy )2

 

These vectors are unit vectors signifying the principal variance directions. The eigenvalues reflect the magnitudes of the expression of the directional variance in local image space. The vector u is the direction of maximum local variance at p, while v is the minimum variance direction. These vectors are orthogonal. The eigenvectors comprise the diagonalizing matrix K = (u, v). This matrix is a linear transformation, a rotation in this case, that aligns the sampling directions along the u and v directions. K diagonalizes the covariance matrix CI(p) in the following fashion. K CI(p ) K T =

λ 1  0

0 λ 2 

(5-22)

The v direction is called the ergodic direction, indicating the direction in which the probability distribution of intensities shows the greatest ergodicity. λ2 is the variance sampled in the ergodic direction v, and represents the local directional variance measurement with the least influence of image geometry. This implies that λ2 is a reasonable measure of the variance of the local noise process since the contribution of local image structure has been minimized. λ2 can therefore be used to normalize directional statistics, enabling measurements that are invariant with respect to linear functions of intensity. In those locations where the distribution is isotropic with respect to space (i.e., there is no preferred direction), λ1 approximately equals λ2, the v direction becomes ambiguous, and isotropic sampling methods discussed in Chapter 4 are applicable.

Directional Multiscale Image Statistics

91

5.5. SVD Applied to Directional Multiscale Statistics of 2D Images When applied to local covariance measures of 2D images, singular value decomposition should generate, as eigenvalues of the directional covariance matrix, two variance values, one showing the maximum influence of boundaries and other geometry within the image and one local variance value that minimizes the influence of nearby boundaries. Essentially, these variance values are the result of linear transformations of the covariance matrix that indicated the directions of least and greatest impact of image geometry on directional variances. The analysis described in section 5.4. is easily applied to 2D image data. Consider the source image of Figure 5.1. The image values for the directional variances and covariance shown in Figure 5.2 can be simplified through singular value decomposition to generate eigenvalue images. Figure 5.3 shows the major and minor eigenvalues for the image in Figure 5.1, measured with an aperture of 2 pixels.

a.

b.

Figure 5.3. Eigenvalue images of the object from Figure 5.1, computed with a spatial aperture σ of 2 pixels. (a: λ1, b: λ2). In both of these images, black is zero and bright indicates positive values.

The eigenvalues are directional variances that have been subjected to linear transformations that maximize and minimize the influence of image geometry. The λ1 values in Figure 5.3a have a relatively large constant response around the object boundary. The fluctuations of directional variances in the Cartesian x and y directions from Figure 5.2 are no longer apparent. The λ2 in figure 5.3b shows a relatively constant value across the image. When queried, the values of the λ2 image are centered about the variance of the additive noise of the image. There are numerical artifacts and effects of isophote curvature that prevent a complete constant response in the λ2 image, but the predominant effect is a constant evaluation of image variance that minimizes the influence of first order elements of image structure. Figure 5.4 shows the vector field associated with the eigenvector u corresponding to the larger eigenvalue λ1. These vectors reflect the direction in which there is maximum variance at each pixel. The lengths of the arrows show relative magnitudes of the λ1 eigenvalues. Near the boundary of the teardrop, the directions of these vectors, as expected, are approximately orthogonal to the boundary. The set of v eigenvectors, corresponding to the smaller of the eigenvalues λ2, are perpendicular to the eigenvectors

92

Image Geometry Through Multiscale Statistics

shown. Near the boundary of the teardrop the v eigenvectors are approximately parallel to the boundary.

Figure 5.4. Eigenvector image of the object from Figure 5.1, computed with a spatial aperture σ of 2 pixels. The image reflects only the eigenvector u in the direction of maximum variance at each pixel; the eigenvector v in the direction of minimum variance is perpendicular to the vectors shown. The lengths of the vector representations indicate relative magnitude, reflecting the maximum variance or eigenvalue.

5.6. Multiscale Gauge Coordinates of Image Statistics In their description of differential scale space representations, ter Haar Romeny, Florack, and Lindeberg each make use of gauge coordinates, a local coordinate frame formed by the isophote tangent and gradient direction describing a natural orthogonal basis at each location within the image (see Chapter 2, [ter Haar Romeny 1991ab], [Florack 1993], and [Lindeberg 1994b]). When scale space differential invariants are recast in gauge coordinates, the natural coordinate system creates a framework for easier interpretation. This simplification is reflected in a reduction in the complexity of the notation. Eigenanalysis of directional multiscale covariances generates a similar coordinate frame hereafter called the covariance gauge. As previously mentioned, the eigenvectors represent principal variance directions, are orthogonal, and make a natural coordinate frame for analyzing local variations of intensity. Notation in the covariance gauge coordinate system is greatly simplified over the pixel grid directions. The eigenvalues are defined to be the variance in the u and v directions, respectively. Vuu =

Vvv =

(Vxx + Vyy ) + (Vxx + Vyy )2 − 4(Vxx Vyy − Vxy 2 ) 2 (Vxx + Vyy ) − (Vxx + Vyy ) 2 − 4(Vxx Vyy − Vxy 2 ) 2 Vuv = 0

(5-23a)

(5-23b) (5-23c)

The magnitudes of the gauge, measured through the eigenvalues of the singular value decomposition provide a metric by which other geometric invariants may be normalized. For example, scale-space gradients may be normalized by the square root of the minimum

Directional Multiscale Image Statistics

93

variance at a pixel location, making it invariant with respect to linear functions of intensity. The eigenvalues of the covariance matrix and the gradient magnitude increase and decrease proportionally under transformations of linear functions of intensity. Their quotient is an invariant with respect to linear shifts in intensity. The covariance gauge exists only if the covariance gauge conditions are met. The derivative based gauge coordinates of ter Haar Romeny and Florack exist only if the image intensity gradient exists and is unique [ter Haar Romeny 1991ab]. The covariance gauge exists if there are distinct principal variance directions. That is, it exists if unique eigenvectors can be found for the covariance matrix. This condition, the property that the covariance matrix is diagonalizable, is called the covariance gauge condition and is a generic property of images. 5.7. Invariants of Directional Multiscale Moments A stated goal of the development of local directional central moments was that their measurement be made invariant with respect to spatial rotation, translations, and zoom (the combined magnification or minification of measurement aperture and image or pixel scale), as well as invariant with respect to linear functions of intensity. Up to this point this chapter has been a progression of steps demonstrating local directional statistics of image intensity with each step incrementally showing additional invariances of the second-order local directional central moments of intensity. The local directional covariance matrix is invariant with respect to spatial translation, and after rotation to the covariance gauge, it is invariant to rotation. However, it is not invariant with respect to linear functions of intensity. This section describes invariant measures of local image statistics that have all the desired invariances with respect to changes in space as well as to linear changes in intensity. Local image measurements that are invariant with respect to linear functions of intensity have been calculated before using quantities from differential geometry, such as the windowed second moment matrix. Lindeberg uses the windowed second moment matrix to measure and approximate adjustments to his scale parameters, allowing him to steer his sampling aperture. The result is that the size and orientation of his anisotropic Gaussian sampling aperture becomes a function of image intensity, based on anisotropy measurements made from the windowed second moment matrix [Lindeberg 1994a]. Using these techniques, he achieves some startling results in shape from texture algorithms. While the covariance matrix CI(p) derived in Chapter 5 is distinct from the windowed second moment matrix prominent in Lindeberg’s shape from texture analysis and Weickert’s texture based nonlinear diffusion [Weickert 1995], it shares many of the same properties. In particular, some of the same invariants presented by Lindeberg regarding the windowed second moment matrix apply to the directional multiscale covariance matrix. Unlike Lindeberg’s local second moment matrix, no adjustments for anisotropic Gaussian sampling kernels are necessary. The difficulty that arises due to the lack of independence between the zeroth order sampling aperture and the first derivative that exists in the windowed second moment matrix does not apply to the local directional

94

Image Geometry Through Multiscale Statistics

covariance matrix. A single scale parameter can be used in the local covariance matrix CI(p) simplifying the construction of statistical invariants. Adapting from Lindeberg [Lindeberg 1994a], consider the following image descriptors. P = Vxx + Vyy, C = Vxx - Vyy, and S = 2 Vxy

(5-24)

P = Trace(CI(p)) can be interpreted as a measure of the strength of the variance response. The other two descriptors C and S contain directional information, which can be summarized into Qˆ , a normalized measure of local anisotropy. Q = C2 + S 2

and

Q Qˆ = P

(5-25)

2 2 (Vxx + Vyy ) − 4(Vxx Vyy − Vxy ) is the discriminant of the eigenvalue equation. As indicated in section 5.4, if Q = 0, there is only one eigenvalue and no preferred eigendirection, implying a isotropic distribution of image intensities. Q is therefore interpreted as a measure of local image anisotropy. Q is normalized by Trace(CI(p)) to generate a dimensionless statistic. Clearly, for the directional covariance based measure of normalized anisotropy, Qˆ ∈[0,1] (if the Cauchy-Schwarz assumption holds), and

Q=

1. Qˆ = 0 if and only if Vxx = Vyy and Vxy = 0. 2. Qˆ = 1 if and only if VxxVyy = Vxy2. Note that the Qˆ statistic is invariant with respect to spatial translation and with respect to linear functions of intensity. Figure 5.5a is a test image where the figures in the foreground and the background pixels have the same mean intensity (0) and the same variance (6.25). The difference between the two region types is that the foreground has a directional component not necessarily one of the Cartesian directions. Figure 5.5b is the test image processed for anisotropy. The normalized value of Qˆ is portrayed with the values ranging from 0.003 in the background to 0.89 in the foreground. Values for the Qˆ that are close to 1 indicate strong local anisotropy. Values near 0 indicate an isotropic distribution of noise and geometry. Both of these effects are seen in the example above. The measurements reflected in Figure 5.5b show that within the object boundaries, strong anisotropy is detected with Qˆ near 0.9. The background measures in Figure 5.5b are near 0 reflecting no directional preference, an isotropic distribution of intensities.

Directional Multiscale Image Statistics

95

a.

b.

Figure 5.5. (a) Test figure exhibiting significant directional spatial correlation and (b) the local anisotropy statistic

Qˆ where σ = 3. In both images, the raster resolution is 256 × 256.

5.8. Multiscale Directional Statistics of Multivalued Images A continuing feature of this research has been the generalizing of scalar techniques for use on multivalued image applications. Directional statistics are expected to have their greatest impact when applied to multivalued images because principled tools for directional analysis of multiple incommensurable values per pixel via differential geometric operators are not available. The premise of this work is that images with incommensurable values within each pixel can be studied by exploring the local intensity distributions through multiscale central moments (Section 4.6) and then making the various intensity values at a pixel commensurable via their observed covariances. To compensate for spatial biases in ergodicity introduced by local image geometry, the eigenvalues of the multivalued directional local covariance matrix, or the equivalent calculated correlations, are required. As described in the following section, these values correlate the magnitudes of the directional covariances of the local intensity values. The geometry of these correlations, their trends and changes over space, can be used to analyze image structure even when the original values of the image are incommensurable. Recall that in the scalar-valued case, the λ2 statistic characterizes the probability distribution of local noise. The directional covariance matrix of an image function I(p) is calculated and simplified using singular value decomposition (or SVD). The covariance matrix CI(p) is rotated into a coordinate frame designated by direction vectors u and v in the linear operator K. λ 1 K CI(p ) K T =  0

0 λ 2 

(5-26)

K is oriented to maximize λ1 = µ (2) I,uu ( p | σ) , the variance sampled in the u direction. The v direction is orthogonal to u and indicates the sampling direction of minimum variance, which I call the ergodic direction. Near object boundaries, v is the direction in which the probability distribution of intensities shows the greatest ergodicity. λ2 = µ (2) I,vv (p | σ) is the variance sampled in the ergodic direction v. λ2 represents the local

96

Image Geometry Through Multiscale Statistics

directional variance measurement with the least influence of image geometry. This implies that λ2 is a reasonable measure of the variance of the local noise process. The methods of singular value decomposition are insufficient when the components of the covariance matrix are not scalar quantities. The calculation of correlations among multivalued directional statistics requires the use of canonical correlation analysis. While SVD will reduce a matrix of scalar values, canonical correlation analysis can be applied to partitioned or block matrices (a matrix of matrices or other tensors). Canonical correlation analysis can therefore be used to analyze random variables containing multiple values each, while singular value decomposition is used to calculate covariances among random variables with scalar components. The two methods are related and can be shown to generate identical results when analyzing covariances of scalar-valued random variables. 5.8.1. Canonical Correlation Analysis of Multivalued Directional Statistics Canonical correlation analysis is a statistical approach for simplifying a symmetric matrix to its principal components. This section applies canonical correlation analysis to multivalued directional statistics. This use of canonical correlation analysis is a simplified adaptation from Arnold’s description of these techniques [Arnold 1981]. Consider the case of a 2D two-valued image. Let I(p) be defined over R2 such that I(p) = (I1(p), I2(p)). Define p = (px, py) ∈ R2. Let the I1(p) values sampled in the xdirection and the I1(p) values sampled in the y-direction be considered as separate elements of a two-valued random variable, (I1,x(p), I1,y(p)). Similarly, let the I2(p) values sampled in the x-direction and the I2(p) values sampled in the y-direction be considered as separate elements of a two-valued random variable, (I2,x(p), I2,y(p)). The analysis of each of these multivalued random variable pairs requires the calculation of a linear transformation or rotation GI (where I = (I1, I2)) that decorrelates Ix(p) and Iy(p). Two matrices G1 and G2 comprise GI. G1 and G2 are closely related to their counterpart K in the scalar calculations of singular value decomposition shown above. However, G1 and G2 not only decorrelate the directional variations within-variable elements, but by applying them to the cross covariance matrix, they also decorrelate the two-valued directional variances across their multiple elements. Let Σ be the covariance matrix of (I1,x(p), I1,y(p)) and (I2,x(p), I2,y(p)). That is,  ΣI I Σ =  11  ΣI 2 I1

Σ I1 I 2   Σ I2 I2 

(5-27)

ΣI ,I , ΣI ,I , ΣI ,I , and ΣI ,I are all 2 × 2 matrices, with ΣI ,I = ΣI ,I T. ΣI ,I and ΣI ,I are the familiar directional covariance matrices applied to the separate intensity values. 1 1

1 2

2 1

2 1

2 2

Σ I 1 ,I1 = C I1 (p )

 VI1 ,xx =  VI1 ,xy

1 2

1 1

) VI 1 ,xy   µ (2) (p | σ) µ (2I1 ,xy I1 ,xx (p | σ)  =  (2)  (2 ) VI 1 ,yy   µ I1 ,xy (p | σ) µ I1 ,yy (p | σ)

2 2

(5-28a)

Directional Multiscale Image Statistics Σ I 2 ,I 2 = C I 2 (p )

97

VI 2 ,xy   µ (2) (p | σ) µ (2) (p | σ) I 2 ,xx I 2 , xy  =  (2)  VI 2 ,yy   µ I 2 ,xy (p | σ) µ (2) I 2 , yy (p | σ)

 VI 2 ,xx =  VI 2 ,xy

(5-28b)

The covariance matrix between the two image intensity values and their directional components ΣI ,I is 2 × 2 and corresponds to 1 2

Σ I1 ,I 2

 µ (2) µ (2) I1 ,x;I2 ,x (p | σ) I1 ,x;I2 ,y (p | σ)   = (2) (2)  µ I1 ,y;I2 ,x (p | σ) µ I1 ,y;I2 ,y (p | σ)

(5-29)

Assuming that ΣI ,I and ΣI ,I are reducible via singular value decomposition, there exist K1 and K2, invertible 2 × 2 diagonalizing matrices such that 2 2

1 1

 λ 1,I K 1 Σ I 1 , I1 K1 =  1  0 T

0   µ(2) (p | σ) 0   =  I1 ,u1 u1  (2) λ 2,I 1   µI 1 , v 1v 1 (p | σ ) 0 and

 λ1,I K 2 Σ I 2 , I 2 K2T =  2  0

(5-30)

0   µ (2) (p | σ ) 0   =  I 2 ,u 2 u2  (2) λ 2, I 2   µI 2 , v 2 v 2 (p | σ) 0

As in the case of scalar-valued images, K1 rotates ΣI ,I onto the u1-v1 coordinate (2) frame. K1 is oriented to maximize the eigenvalue λ1,I1 = µ I1,u1u1 (p | σ) , the variance sampled (2) in the u1 direction at I1(p). v1 is orthogonal to u1, and the eigenvalue λ 2,I1 = µ I1 ,v1 v1 (p | σ), the variance sampled along the v1 direction is the minimum variance with respect to orientation. Likewise, K2 rotates ΣI ,I onto the u2-v2 coordinate frame with eigenvalues λ 1,I 2 and λ 2,I 2 reflecting maximum and minimum variance measurements with respect to changes in orientation. 1 1

2 2

Choose G1 and G2 such that  λ 1,I − 2 G1 =  1  0 1

  − 1 2 K1 λ 2,I1 

 λ 1,I 1 − 2   0 1

0

G =K T 1

T 1

0 λ 2,I 1

−1 2

  

and  λ 1,I − 2 G2 =  2  0 1

  − 12 K 2 λ 2, I 2  0

(5-31)  λ 1,I 2 − 2   0 1

G2 = K2 T

T

0 λ 2,I 2

− 12

  

G1 and G2 diagonalize and normalize the individual directional covariance matrices. G1 Σ I 1 ,I 1 G 1T = I

G 2 Σ I 2 ,I 2 G 2 T = I

(5-32)

G1 and G2 diagonalize the directional covariance matrices of the individual intensity values. They also can be shown to diagonalize the cross-intensity covariance matrix. The

98

Image Geometry Through Multiscale Statistics

resulting values along the diagonal, δ1 and δ2, are the correlation coefficients between the maximum and minimum directional variances between the corresponding intensity values I1(p) and I2(p).  δ1 D= 0

G1ΣI ,I G2T = D 1 2

0 δ2 

(5-33)

Taken together, equations (5-31), (5-32), and (5-33) specify the canonical correlations δ1 and δ2 and linear transformations G1 and G2 that incorporate the local image geometry. G1 and G2 can be chosen so that the δ1 and δ2 are both non-negative. Aside from the possibility of multiple roots (i.e., δ1 = δ2), δ1 and δ2 are unique. Under the ergodic assumption, G1 and G2 specify the directions for the multivalued covariance gauge. 5.8.2. Understanding Canonical Correlations of Multivalued Directional Statistics Singular value decomposition of the directional covariance matrices of the individual intensity values determines the variance value λ2 and the sampling direction v of the minimum influence of image geometry on local image statistics. In order to complete the picture of local probability distributions about a pixel in a 2-valued image, it is essential to understand how the two values covary in the minimum variance direction. The canonical correlation δ2 provides that measure. Taken together, the minimum eigenvalues of the individual directional covariance matrices and δ2 (the covariance between these two terms) can be used to describe a two-variable probability distribution. δ2 is easily rewritten as the correlation coefficient between I1 sampled in the v1 direction and I2 sampled in the v2 direction. δ2 =

µ(2) I1 ,v 1 ;I 2 , v 2 λ 2, I1 λ 2,I 2

(5-34)

In those locations where there is no directional bias, K1 and K2 become ambiguous, and isotropic sampling methods discussed in Chapter 4 are applicable. Near boundaries, the anisotropic spatial distribution of image intensities will align K1 and K2. If strong spatial anisotropy exists, K1 ≈ K2, implying u1 ≈ u2 and v1 ≈ v2. For 2-valued 2D images, the directional covariance matrix that reflects the minimum influence of image geometry on the local joint-intensity probability distribution is  λ 2, I1 Λ2 =   δ 2 λ 2,I1 λ 2,I 2

δ 2 λ 2,I1 λ 2,I 2   λ 2,I 2 

(5-35)

This statistic, like the eigenvalue λ2 in the context of scalar-valued images, can be used to normalize multivalued measurements. These normalized measurements are invariant with respect to linear functions of intensity applied to the separate intensity values that comprise the image. This normalization provides a common metric for

Directional Multiscale Image Statistics

99

comparing the multiple parameters within pixels, enabling comparisons among incommensurable values. 5.9. Covariance between Image Intensity and Space As suggested in the introduction to this chapter, an interesting development arises when canonical correlation analysis is applied to the way in which image space and image intensities covary. If the intensity values at location p are considered to be random variables and space is considered to be a random variable, a canonical correlation analysis can be performed on the resulting space of variables to orthogonalize their relationship. That is, canonical correlation analysis will yield a linear transformation that diagonalizes the covariance matrix describing the space of intensities and their spatial elements. It will also show the correlation between space, geometry, and noise for a given spatial location. 5.9.1. Directional Analysis of 2D Scalar Images This section applies canonical correlation analysis directly to scalar images of two dimensions. Consider a two dimensional image I(p) with one intensity value per pixel. That is, I(p) is defined over R1 where p = (px, py) ∈ R2. Following the formula for canonical correlation analysis described above and assigning the expectation operation to be a weighted spatial average, a relationship can be established between isotropic variance and the multiscale image gradient. For each location p0 and its surrounding Gaussian-weighted neighborhood, analyze the correlations between pixels, viewing each pixel as a sample of a multivalued random variable. Consider the pixel location p = (x, y) to be a property of a pixel in the manner of a random variable. Also let the intensity value I(p) be treated as a random variable attached to the pixel. Given the neighborhood locus about p0, denote the mean of I(p) relative to p0 to be µΙ(p0, σ), a Gaussian-windowed average (with aperture σ) of intensities centered at p0. Treating space as a random variable, denote the mean of local space to be the central point p0. Let Σ be the spatially weighted joint covariance matrix of I(p) and p about pixel p0. That is,  Σ I,I Σ=  Σ p,I

Σ I,p   Σ p,p 

(5-36)

where ΣI,I is 1 × 1 and corresponds to the local isotropic variance function. That is, ΣI,I = µ (2) (p | σ) = G(σ , p) ⊗ (I(p ))2 − (G(σ, p ) ⊗ I(p))2 I

(5-37)

This function is described in greater detail in Chapter 4. Σp,p is 2 × 2 and corresponds to

100

Image Geometry Through Multiscale Statistics

Σ p,p

(x − p x )(y − p y ) 

 (x − p x )2 =  x − p x )(y − p y )  (  1  σ 2 2π =  1  σ 2 2π

∫ ∫ ∫ ∫

σ2 =  0

0  σ2





−∞

−∞





−∞

τ 2e τνe

−∞

 

(y − p )

2

y

τ 2 +ν 2 σ2

−1 2

dν dτ

1 τ 2 + ν2 −2 σ2

dν dτ

1 σ 2π 2

1 2

σ 2π

  −∞ 1 τ 2 +ν 2 −2 ∞  σ2 dν dτ ν2e  −∞



∫ ∫ ∫∫ −∞ ∞

−∞



τνe

−1 2

τ2 +ν 2 σ2

dν dτ

(5-38)

ΣI,p is 1 × 2, and ΣI,p = (Σp,I)T. Σ p,I

( (

) )

 (xG(σ, x,y)) ⊗ ˜I(x, y) − µ (p)  ˜I =  ˜  (yG(σ, x,y)) ⊗ I(x, y) − µ ˜I (p)   1  σ 2 2π =  1  σ 2 2π  2 σ =  σ2 





−∞

−∞





−∞

−∞

∫∫ ∫∫

1 2 σ 2π 1 σ 2π 2



−τe − νe

∫∫ ∫∫ −∞ ∞

−∞



−∞ ∞

−∞

1 τ2 + ν 2 −2 σ2

1 τ 2 +ν 2 −2 σ2

− −

τ σ

2

ν σ

2

e

 x   ˜I (p x − τ, p y − ν) − µ ˜ (p x ,p y ) dν dτ I 

(˜I(p

)

− τ, p y − ν) − µ ˜I (p x , p y ) dν dτ

(

)

2 2 −1τ +ν 2 σ2

1 τ 2+ ν2

−2

e

σ2

(˜I(p (˜I(p

x

x

 − τ,p y − ν) − µ ˜I (p) dν dτ 

)

 − τ,p y − ν) − µ ˜I (p) dν dτ 

  2  ∂ σ  G(σ,p) ⊗ ˜I (p)   ∂x  2 I (p) | σ =   = σ  ∇ ˜   σ 2  ∂ G(σ,p) ⊗ ˜I (p)    ∂y 

(

)

(5-39)

)

T   

The next step is to find δ, the root of  −δΣ I,I Det   Σ p,I

Σ I,p   =0 −δΣ p,p 

(5-40)

δ is the correlation between I(p) and p in the Gaussian neighborhood about pixel p0. Computing (5-40) generates

Directional Multiscale Image Statistics  −δσ 2 Det   0

 0   Det  −δµ (2) (p | σ) − σ 2 ∇ ˜I(p) | σ ˜I −δσ 2  

((

2  = δ 2 σ 4  −δµ (2˜I ) (p | σ) + σδ

= δσ 6

((

101

 ∇ ˜I (p) | σ 

((

∂ ∂x

3

4

) + ((

G(σ, p))⊗ ˜I(p)

))  − δ σ µ 2

))

(2) ˜I

2

∂ ∂y

T

((

 − δσ1 2   0

))

0  2   σ ∇ ˜I (p) | σ  1 − δσ 2  

)

)

2 G(σ,p) ⊗ ˜I (p) 

(p | σ) (5-41)

Combining equations (5-40) and (5-41) yields

((

δσ 6 ∇ ˜I (p)|σ

)) − δ σ µ 2

3

4

(2) ˜I

(p|σ) = 0

( ( )) = δ σ µ (p| σ) σ (∇(˜I (p)| σ)) = δ µ (p|σ) σ (∇(˜I (p)| σ)) =δ 2

δσ6 ∇ ˜I (p)| σ

2

2

3

4

2

(2) ˜I

(2) ˜I

2

2

2

µ (p| σ) (2) ˜I

((

(5-42)

))= δ

σ ∇ ˜I (p)| σ

(p|σ) µ(2) ˜I

Let G and H be invertible 1 × 1 and 2 × 2 matrices respectively, such that GΣΙ,ΙGT = I

HΣp,pHT = I

D=δ

GΣI,pHT = (D 0)

(5-43)

In the current example, it is easily shown that G=

and

 cosσ θ H =  sin θ − σ

1 (p | σ) µ (2) ˜I

 cosσ θ 1 2 ˜ σ ∇ I (p) | σ  sin θ  σ (p | σ) µ (2) ˜I

((

))



sin θ σ cos θ σ

sin θ σ cosθ σ

 

 = (δ 0 ) 

(5-44)

where θ is the angle between the x axis and the gradient direction. 5.9.2. Canonical Correlation Analysis versus Differential Operators This analysis shows the relationship between the scale space gradient direction, the scale space gradient magnitude, and the isotropic multiscale central moment. Specifically, for every position p = (p , p ) in an image, there exist canonical variables throughout the neighborhood about p. That is, given image I(p), for any image position (x, y), there is a set of corresponding set of canonical variables A and B such that x

y

102

Image Geometry Through Multiscale Statistics

A=

I(x, y) − µ ˜I (p | σ) (p | σ) µ (2) ˜I

 cosσ θ B =  sin θ − σ

sin θ σ cosθ σ

  x − p x   y − p y 

(5-45)

Thus, any measurement made at position p uses a normalized multilocal coordinate frame, or normalized gauge coordinate system. It is also evident that these measurements are made relative to the mean and variance intensity of the local Gaussian neighborhood. The correlation between any A and B value is given by δ derived in equation (5-13). These observations mirror similar methods to normalize and linearize scale space measurements. In particular, the use of gauge coordinates based on the image gradient has been proposed by Koenderink and refined for image analysis by ter Haar Romeny [ter Haar Romeny 1991ab]. Scale space normalization has been explored by many researchers including Eberly [Eberly 1994] and to a lesser extent Florack and ter Haar Romeny [Florack 1993][ter Haar Romeny 1993]. That these assertions can be derived through statistical methods lends credence to the relationship between the roots of this work in differential geometry and multiscale statistics. These observations are included here for their relevance in relating the works of Chapter 3 and Chapter 4 in the setting of local directional image statistics. Further exploration into these relationships are necessary, especially in the realm of multivalued images. However, this dissertation is directed toward local isotropic and directional central moments of intensity, and a study of the canonical correlations among multivalued intensities and space is beyond the scope of this research. 5.10. Summary This chapter has explored the derivation of directional multiscale image statistics. The derived statistical operators have been developed to comparable power as those based on differential geometry, with the added capability of being able to analyze multivalued images. Joint second order central moments have been produced, and their properties discussed. A covariance matrix of these central moments was constructed and statistical tools adapted to analyze its properties. These statistics demonstrate invariance under (1) Rotation (2) Translation (3) Scale (4) Linear Functions of Intensity Some example images were presented, demonstrating the application of these principles in a discrete image based setting. These multiscale directional central moments were generalized to images of multiple values and an analysis proposed using canonical correlation analysis. Finally, some observations on applying canonical correlation analysis directly to scalar images have been presented, indicating directions for future work.

Directional Multiscale Image Statistics

103

5.A. Appendix: Singular Value Decomposition of a 2x2 Symmetric Matrix These are some fundamentals of linear algebra used in this chapter. The analyses shown in sections 5.4. presume familiarity with singular value decomposition (or SVD), the diagonalization of symmetric matrices and the related extraction of their eigenvectors and eigenvalues. Eigenvalues of covariance matrices are required in earlier presentations. Given a 2 × 2 symmetric matrix A of rank 2, 1) A = AT 2) A is diagonalizable 3) The eigenvalues of A are real and they exist 4) The corresponding eigenvectors are (or can be chosen to be) orthogonal 5) The characteristic equation of A is  u w A= w v 



λ2 − Trace(A)λ + Deter min ant(A ) = 0 ⇔ λ2 − (u + v)λ + (uv − w2 ) = 0

(5A-1)

6) Solving for the eigenvalues using the Quadratic formula 2 ax + bx + c = 0 ⇒

λ1 =

x=

− b ± b2 − 4ac 2a

(u + v) + (u + v) 2 − 4(uv + w 2 ) (u + v) − (u + v) 2 − 4(uv + w 2 ) λ = , 2 2 2

(5A-2) (5A-3)

7) The eigenvectors are the solutions to the following equation:  x   0  1   x   0  [λ2I − A]y 2  = 0   2  

[λ1I − A]y1 = 0



x1 (λ 1 − u) − y1w = 0 y1 (λ 1 − v) − x1w = 0 x 2 (λ 2 − u) − y2 w = 0

y2 (λ 2 − v ) − x 2 w = 0

(5A-4)

Solving the above equations yields the following values for the orthonormal basis for the corresponding eigenspace x1 = x2 =

(λ1 − v ) w + (λ 1 − v) 2

2

(λ 2 − v) w 2 + (λ 2 − v)2

y1 =

w w + ( λ 1 − v)2

y2 =

w 2 w + ( λ 2 − v)2

2

(5A-5)

I have had my solutions for a long time, but I do not yet know how I am to arrive at them. -C. F. Gauss

Chapter 6

Conclusions and Future Directions This dissertation has been a study of the statistics of scale space. It not only includes a statistical analysis of current methods in multiscale differential geometry for image analysis, but it also presents new multiscale statistical methods for use with multivalued data. The first sections of this chapter summarize the contributions and results of dissertation. Other sections of this chapter foreshadow future work in the theory of multiscale image statistics and applications for this research. In an effort to automatically control nonlinear diffusion for image segmentation, I began this research with an analysis of the effects of noise within the image on multiscale partial derivative measurements. The result was a theoretical study relating noise, scale, and multiscale differential invariants. The contributions of this study to the field of image analysis are listed in the next section. By themselves, however, the relations between geometry, scale, and noise derived through this analysis were not sufficient to automatically select control parameters for nonlinear diffusion systems, particularly those involving multiple intensities in each pixel. To address the problems of incommensurable image values in multiparameter data, I have introduced a new form of scale-space analysis for image analysis, suggestive of a new approach to multivalued image segmentation. This scale-space analysis approach is based on a form of statistical moments that provide multiscale statistical invariance. Like the family of differential invariants, these moments can be made invariant to spatial transformations such as rotation, translation, and zoom (the simultaneous scaling of image magnification and measurement aperture). These statistical measurements are also easily made invariant with respect to linear functions of intensity. This dissertation has explored some of the statistical properties of scale-space differential invariants, and it has described new statistical operators that are comparable in function to differential invariants. These contributions are also summarized in Section 6.1. To bring this dissertation past providing a theoretical structure, work must be done to explore the power, the properties, and the limitations of multiscale image statistics. Applications proving the capabilities enabled by statistically based methods are essential next steps. Examples have been provided in the text to demonstrate the effectiveness of these methods and their potential. However, while some scale-space applications have been shown, the behavior of multiscale statistics across scale-space remains a substantially unexplored area. Much work remains not only to validate the use of

106

Image Geometry Through Multiscale Statistics

statistical moments, but also to compare their use to other methods, in particular the set of scale-space differential operators. Finally, in support of this thesis, multiscale statistics should be used to solve real problems. Future directions for this research are outlined later in this chapter. Possible applications are suggested that will emphasize the role of multiscale statistics. 6.1. Contributions and Conclusions This dissertation has included three substantive developments as original work. The first development was an analysis of error propagation from spatially uncorrelated noise in an image to multiscale differential invariant measures of image structure. The second original development has been multiscale image analysis based on local statistics rather than on local partial derivatives as geometric measurements. The third contribution has been the development of directional multiscale image statistics. The first of these developments, presented in Chapter 3, involved a new study on the effects of spatially uncorrelated noise on normalized linear scale space measurements. The presentation examined the current model of normalized scale-space differential invariants and generated mathematical expressions for the propagation of noise. To allow differential invariants to be used in scale space, ordinary derivatives at scale must be normalized by the size of their measurement aperture. The resulting dimensionless quantities allow comparisons between different multiscale partial derivative operators across varying orders of differentiation. Surprisingly, results showed that for all given levels of initial intensity of noise, the absolute error in the multiscale derivative decreases between zeroth and first order measurements. Moreover, though the level of propagated noise increases thereafter with increasing order of differentiation, it remains less than the initial error until the third or fourth order derivatives are taken. This finding brings into some question the common wisdom that low order differentiation badly propagates noise. New image analysis methods based on multiscale statistical invariants and their implicit scale spaces were developed in Chapter 4. Local, isotropic, Gaussian neighborhood sampling operators of varying scale were used to generate local central moments of intensity that capture information about the local probability distribution of intensities at a pixel location under the assumption of piecewise ergodicity. Image analysis based on multiscale image statistics can easily be made invariant to linear functions of intensity as well as spatial rotation and translation. This trait makes the identification of objects independent of the absolute brightness of the object as well as independent of its orientation and position within the image. Multiscale central moments of intensity of different order demonstrated properties similar to multiscale differential geometric operators. Specifically, multiscale variances reflect boundariness in a fashion similar to the multiscale gradient magnitude, and multiscale skewness shows a response similar to the multiscale Laplacian operator. A scale-space algorithm for selecting locally adapted normalizing variance measures for variable conductance diffusion was developed based on these moments. Also, multiscale statistics have been generalized to provide a means of analyzing multivalued data where the data channels within the image are incommensurable (i.e., they have no common metric for measurement).

Future Directions and Conclusions

107

The spatially isotropic operators from Chapter 4 did not adequately capture image structure. The underlying geometry of the image introduces biases in the measurement of the distribution of noise within the system using isotropic operators. Orientable operators that were introduced in Chapter 5 allowed central moments to be measured in preferred directions. These central moments reduced bias in noise characterization that arises from the gradient of the image function. I showed how singular value decomposition (SVD) of directional covariances produced principal axes indicating maximum and minimum spread of the directional probability distributions. The derivation suggested that these orthogonal directions and their corresponding eigenvalues can be used to normalize measurements made of local intensities. Multivalued image analysis has also been shown to be possible through directional covariances. Using the generalized form of SVD called canonical analysis, normalized multilocal coordinate systems based on covariances enable relating the behavior of the various intensities where the lack of common metrics make analysis via intensity vectors inappropriate. Before this research, processing of multivalued images have been based on ad hoc geometric principles, unlike scalar image processing. If the relationships between the separate channels of a multivalued image are known a priori, a mapping of the space spanned by the separate image values can provide a reparameterization of the values allowing relations to be drawn among them. Since information about the intervalue relations is normally absent, the assumption is implicitly made that the multiple values have a common basis for comparison and can be treated as vectors. Methods based on multiscale differential geometric measurements are applicable to scalar-valued images, but are not appropriate for multivalued images such as multi-echo sequences from a medical magnetic resonance scan. This dissertation has leveled that discrepancy by providing tools that by measuring correlations among multiple values of image data provide common metrics for comparison among them. The methods presented here, based on multiscale image statistics, provide the foundation from which to start building new theories and approaches to understanding multivalued data and analyzing and processing multivalued images. 6.2. Future Directions in Multiscale Statistical Theory While the introduction and development of new multiscale statistical invariants has been a substantial task, there remain many unanswered questions. Fortunately, there are also many related fields from which to borrow analytical tools as well as motivations and ideas. Linear scale space and its differential invariants have been much of the model for this work. Pursuing parallels between scale-space geometry and scale-space statistics is likely to yield important and interesting results. The field of statistics also presents a wide body of knowledge upon which future explorations into multiscale moments of image intensity can be based. This work entails the introduction and presentation of multiscale central moments of images. The uses of these moments are within the purview of applied statistics and computer science. In particular, the analysis and relationships among these moments is a problem in probability. Skilled probabilists may be interested in exploring these spatially windowed central moments, their properties, and their limitations.

108

Image Geometry Through Multiscale Statistics

I address some of the salient points of these issues in the following paragraphs. This discussion is not intended to be a complete survey but rather an introduction to a few research areas with some insights into promising intellectual directions. 6.2.1. Local Differential Geometry and Local Image Statistics There are many parallels between the study of scale-space differential invariants and multiscale image statistics. As the understanding of images through differential geometry advances, the understanding of scale spaces of multiscale image statistics should also grow. Questions that are raised about differential geometric operators are often also illuminating when directed toward multiscale statistics. I have included some work on the propagation of noise in multiscale image statistics. The full characterization of the effects of noise in this area is needed. The results presented in Chapter 3 on linear scale-space differential operators indicate that there are some limitations on how many orders of differentiation are expected to be useful in normalized scale-space image analysis. It remains to be seen if parallel findings will be found of multiscale central moments of image intensity. Perhaps a more important question is in the propagation of noise through nonlinear scale. Since the Gaussian represents a solution to the heat equation, and given the extensive justification of the Gaussian as a preferred linear operator in image analysis, diffusion is an ideal process for studying linear scale space. There has been considerable work on nonlinear scale spaces, using nonlinear diffusion equations to process and analyze images [e.g., Florack 1993, Gerig 1992, Whitaker 1993ab]. Empirical evidence indicates that the variance of white noise in the input signal is reduced through many of these nonlinear diffusion systems. An exhaustive analysis, reapplying some of the approaches of Chapter 3, may prove fruitful and may induce advances in computer vision. 6.2.2. Multiscale Distribution Analysis Beyond applying the ideas of differential geometry to multiscale statistical invariants, there are motivations and research directions suggested by extending statistical methods to multiscale central moments. At the turn of the century, Pearson developed a taxonomy for classifying a wide range of univariate probability distributions [see Johnson 1970]. Pearson’s system of distributions is based on the qualitative shape of probability density function as governed by four input parameters. These parameters are strongly related to the mean, the variance, the skewness, and the kurtosis of the distribution. The fourth parameter seems to have small influence on the classification. It was considered impractical to include a fifth or sixth parameter. It will be interesting to explore local distributions and their properties in these terms. Using the results of Chapter 4, local measures of the mean and central moments of up to the fourth order have been presented in this dissertation. Pearson’s classification system could easily be applied to these moments, and a taxonomy of local distributions constructed. Such a multiscale classification system of local intensity distributions may illuminate issues of boundariness and medialness in image analysis.

Future Directions and Conclusions

109

6.2.3. Comparing Two Distributions Qualities such as texture, anisotropy, boundariness, and medialness can be used to distinguish image segments. These qualities can measured through an analysis of the local probability distributions of multiscale image statistics. The resulting information could allow the comparison of two pixels based on the probability distribution exhibited by their local neighborhoods. A useful statistical means of comparing two distributions is the Kolmogorov-Smirnov statistic. There are many statistical means of comparing two distributions to evaluate their similarity. For data that are discretized or “binned,” the chi-square test is a powerful evaluation tool. For unbinned data that are functions of a single independent variable (e.g., two valued functions of time or space), the cumulative distribution function (or an unbiased estimator of it) SN(x) of data can be used in meaningful comparisons. Cumulative distributions of two separate random variables can then be compared. This process is captured by the Kolmogorov-Smirnov (K-S) statistic, D = max SN 1 (x ) − SN2 (x ) −∞< x < ∞

(6-1)

A complete description of the K-S statistic as well as source code implementing its computation have been published by Press, et al. [Press 1992]. There are many properties of the K-S statistic that make it useful in the context of multiscale image statistics. Finding the significance of the statistic is a tractable computation. Its invariance under certain transformations of intensity is also beneficial. The following is a quote from Press, et al. What makes the K-S statistic useful is that its distribution in the case of the null hypothesis (data sets drawn from the same distribution) can be calculated, at least to useful approximation, thus giving the significance of any observed nonzero value of D. A central feature of the K-S test is that it is invariant under reparameterization of x; … For example, you will get the same significance using x as log x. [Press 1992]. The K-S test can be recast using a Gaussian spatial windowing function to attain a spatially weighted distribution. Questions comparing such qualities as texture, anisotropy, boundariness, and medialness may be resolved through a local K-S test of multiscale image statistics. 6.3. Applying Multiscale Image Statistics Promising directions for this work reside in application areas. Much of this research has been based on a particular model of images; images are assumed to have piecewise ergodic intensity distributions across space. Multiscale image statistics will enable new segmentation methods for these images, particularly those images containing multiple values per pixel.

110

Image Geometry Through Multiscale Statistics

The results from this dissertation enable multivalued nonlinear diffusion on images with incommensurable data values. Multimodal medical images, multivalued information from cartographic sources, and even color images provide sources for multiparameter data with possibly incommensurable values. Nonlinear diffusion controlled by multiscale directional image statistics is a likely means for performing smoothing of the intensity values while preserving the geometry of the image represented by object boundaries. Some of the ideas generated by this research are presented in this section indicating possible research directions. Other important issues in segmentation include the difficulties of representing uncertainty in the composition of a pixel value. Such situations arise when a pixel represents a composite or mixture of semantic elements. This often occurs at object boundaries where a pixel may reside only partially within an object. The resulting state is not appropriate for a binary segmentation. However, it is easily represented in the form of probabilities. Multiscale image statistics provide a background and foundation to assign probability values to pixels as well as directing segmentation algorithms to likely orientations to assign connectivity relations among object segments. 6.3.1. Statistical Control of Nonlinear Diffusion I have shown how multiscale image statistics can be used to set control parameters for boundary preserving noise smoothing systems based on nonlinear, geometry driven, or variable conductance diffusion (VCD). Section 4.8 describes methods to incorporate these measures directly into the diffusion equations, generating a new form of this type of image analysis. Practical studies demonstrating these methods on real data, both scalarvalued and multivalued, are required to measure their effectiveness in solving real problems. This particular approach also needs to be refined in light of current developments. As shown by the work of Eberly [Eberly 1994] and supported by the canonical analysis arguments of Chapter 5, the gradient values of the conductance functions should be normalized not only by the local variance but also by the scale at which they are measured. Chapter 5 also introduces the acquisition of second moment values along the minor axis. These results are orthogonal to the variance measured in the direction of greatest change and are likely to be a less biased measure of the local noise. Isophote curvature will still introduce some bias, but an investigation of these properties is warranted given these dissertation results. Gerig and Whitaker both have generalized some forms of VCD to higher dimensions. Gerig has demonstrated vector-valued nonlinear diffusion on medical images with some success [Gerig 1992]. Whitaker has shown that invariants other than zeroth order intensities can be diffused; his resulting geometry limited diffusion has been able to generate ridge structures that describe the general form of objects within images [Whitaker 1993ac]. While the resulting user-supervised statistically constrained VCD filtering mechanism provides a principled means of measuring dissimilarity or gradients of possibly incommensurable within-pixel data parameters, there remains the problems associated

Future Directions and Conclusions

111

with the variations of local intensity and the non-stationary nature of intensity and contrast in MR images. The development of the local multivalued statistics from Chapters 4 and 5 address these issues. The insights presented in this thesis make possible more robust explorations into statistically controlled VCD methods on multivalued data. The eventual goal of automated VCD has become more likely through the statistical approaches presented in this dissertation. Beyond controlling segmentation systems, multiscale image statistics may also play a significant role in understanding theoretical aspects of scale spaces. The use of multiscale directional statistics may be instrumental in the study of nonlinear scale spaces. Measuring the decreasing values of the minimum local directional variance at a pixel location may be an effective means of tracking object behavior across nonlinear scale. Nonlinear scale spaces will require new distance metrics; understanding the rules for these metrics may come from statistical analyses. 6.3.2. Mixtures in Segmentation A single pixel often cannot be represented by a single segment. Rather it is a composite of different object types. This often happens when an object boundary intersects the area sampled by a pixel. It also often occurs when the image data represents fine structure that approaches the dimensions of a pixel. Such data include roadways on geographic maps or fine blood vessels in medical images. In such cases it is inappropriate to assign such a pixel to a single segment designating it as all one type of value. However, by casting the problem in statistical language, the problem is made approachable. Probabilities may be assigned to a pixel, reflecting the likelihood of multiple segments within that pixel’s area. These probabilities can be used as a measure of the fraction of different image segments within a single pixel. There are two approaches to assigning probabilities to handle the composite pixel problem. One approach is to use some outside means of generating an ensemble of images over which probability distributions can be estimated and probabilities assigned. Other information about the physics of the data acquisition method can be used to assist the segmentation. For example, in his dissertation Laidlaw calculates probability distributions at each pixel of an medical magnetic resonance image from which he generates a segmentation [Laidlaw 1995]. The other approach toward statistical segmentation involves modeling local image structure as ergodic. This has been presented in this dissertation. The ergodic assumption allows spatial sampling, rather than ensemble sampling, to generate information about the probability distribution of intensities at a pixel location since boundary properties and issues of object scale can be gracefully handled. These developments in multiscale central moments enable probability distributions to be calculated and multiple probabilities assigned to individual pixels. How these probabilities and composite pixels are combined with contiguous segments is a topic for future research. Also, combining ensemble statistics within a pixel with multiscale image statistics that capture probability distributions over image space is likely to be a productive research direction.

112

Image Geometry Through Multiscale Statistics

6.4. Multiscale Image Statistics in Medicine The origins of this research are founded deep within applications of image processing methods in medicine. It is my desire to see this research transferred to the clinic where principled analytic methods can be used to provide answers and aids to diagnosis and treatment. It is essential that computer aids used in health care be robust, repeatable, verifiable, and understandable. Credibility and reliability are required attributes of any computer program in a health care situation. The medical field in particular is a rich source for multiparameter image data where the separate within-pixel values are not commensurable. There is an oncoming flood of multimodality images based on the registering and fusing of intensity values of multiple images of the same anatomy, acquired through independent means. Combinations of nuclear medicine, computed tomography, magnetic resonance imaging, and/or electroencephalogram information are beginning to inundate the field of medical image processing. These data contain multiple values which are inherently incommensurable. The multivalued statistical methods described in this dissertation should provide a principled foundation for the processing of these new forms of multimodality images. This work is directed toward applications of magnetic resonance imaging, improving image quality and attempting to find approachable means to provide segmentation and classification in clinical images. The images generated by MRI often exhibit structured variations of intensity of low frequency, allowing correction along smooth, gentle gradients. I chose to explore this method as a means of seeking a correction mechanism for these non-homogenous nonlinear responses of tissue in MR images. MRI has particularly nice attributes when viewed through the metaphor of multiscale image statistics. It is not only possible, but it is often the case that multiple values of the image are acquired. Figure 6.1 shows a two echo axial MR 2D image of the head along with its scatterplot histogram. These values cannot be considered vector quantities, since they are not tied to a common measurement metric. However, they do represent registered image intensity values, dependent on spatial location. There are strong correlations of image intensity within the pixel values, depending on the tissue they portray. Multivalued statistics can be applied to these images, and meaningful results extracted.

Figure 6.1. A 2D dual-echo MR image of the head with its scatterplot histogram.

Another useful feature of MR imaging is that issues of small sample size can be addressed through careful acquisition technique. The typical MR acquisition method attempts to improve signal to noise in diagnostic images by taking several samples or

Future Directions and Conclusions

113

images of any single slice plane and averaging them together. Noise and registration artifact are thus reduced by the averaging of registered images. This is a simple method of using the statistics of an ensemble of images to improve the measurement of the mean intensity value. This approach is easily generalized to include statistics other than the mean, including the infinite set of central moments. With a sufficient number of samples in an ensemble of images, the probability distribution of intensities at every pixel location could be completely characterized. The practical aspects of VCD in medicine are being explored at many institutions, where non-linear diffusion filters often serve a pre-processing role before traditional statistical pattern recognition classifiers are applied. In particular, filtering mechanisms provided by Guido Gerig [Gerig 1991, 1992] are in use at the Harvard Brigham and Women’s Surgical Planning Laboratory as part of a classification procedure for the processing of MR and X-ray CT data. Applying the ideas in this dissertation might provide improvements in VCD image preprocessing. Finally, the advances in automating VCD suggested earlier based on multiscale statistics can be applied in processing the non-homogeneous images that are common in medical MR imaging. Surface coils are used in MRI when greater intensity resolution is required to illuminate a particular portion of the anatomy. The resulting image has a characteristic non-homogeneous response in intensity. While the human visual system is capable of perceiving a wealth of information in such images, most computer vision algorithms are not designed to handle them. Efforts to correct adjust the gain or intensity amplification across such images have met with some success [Wells 1994]; however, they include preprocessing nonlinear diffusion steps that still require manual parameter selection. Figure 6.2 shows an MR image of a shoulder acquired using a surface coil. A lot of detail is present, but the intensity falloff represents a significant challenge for computer vision. This dissertation provides a foundation for new work in preprocessing MR images acquired from surface coils or exhibiting other non-homogeneous responses. The growth of Functional MRI and its prevalent use of surface coils as well as the advent of new imaging systems employing open-magnets exhibiting non-homogeneous induced magnetic fields will continue to challenge researchers in computer vision. 6.5. Summary Through this research, I have introduced and explored a new family of statistical operators which demonstrate invariance under rotation, translation, zoom, and linear functions of intensity. These operators are the local expressions of central moments of image intensity, reflecting the distribution of intensities of the neighboring pixel values. Central moments with a directional component have also been constructed using the same principles. Taken as a whole, the combination of local central moments, both isotropic and directional, provides a description of the behavior of local image intensities, capturing changes in both noise and image geometry. This work is based on the interplay between image intensity and image space. An image is assumed to have piecewise constant behavior with respect some statistical

114

Image Geometry Through Multiscale Statistics

Figure 6.2. MR image of a shoulder acquired using a surface receiving coil. This may represent the ultimate test for this research.

properties of its intensity. For example, these properties include but are not limited to piecewise constant mean, piecewise constant variance, and piecewise mean linearity. These traits are studied by modeling an image as a stochastic process of image space and evaluating its ergodic properties in local areas of the image. Local area image analysis is based on the piecewise ergodic assumption, and trades some regularization or smoothing of local space for insight into image structure. Although this research has been demonstrated using computer generated images, it has been directed and focused with the goal of applications on real data. Examples of images with varying noise properties and mean intensity are found throughout computer vision. New medical scanning technologies continue to provide computer scientists with challenges in segmentation, classification, object recognition, and image understanding. Market pressures in video communications make issues such as data compression based on object recognition an important research area. Multiscale image statistics is a new domain, and it is one which has barely been explored. As a basis upon which to decompose images and objects within images, it is somewhat difficult to use since it describes not just the geometry with respect to the image function, but encompasses the uncertainty as well. If data are to be faithfully filtered and represented, some measure of the uncertainty of the locations of the boundaries, the expected geometry, and the noise in that geometry is surely relevant to anyone’s analysis. If we seek answers in object recognition and image understanding, let us understand our assumptions and accurately describe the expected errors in our solutions. This dissertation has been an exploration of new domains in statistics in computer vision to provide fulcrums for our levers, to make approachable the realm of evaluating error in object measurement.

Bibliography Arnold, Steven F. 1981. The Theory of Linear models and Multivariate Analysis. New York: John Wiley & Sons. Babaud, J., A. Witkin, and R.O. Duda. 1986. Uniqueness of the Gaussian kernel for scale-space filtering. IEEE Trans. Patt. Anal. Mach. Intell. PAMI-8: 2-14. Blom, J. 1992. Topological and geometrical aspects of image structure. Ph.D. diss., Utrecht Univ., the Netherlands. Burt, Peter, and Edward Adelson. 1983. The laplacian pyramid as a compact image code. IEEE Trans. Communications. 31(4): 532-540. Chellappa, Rama, and Anil Jain. 1993. Markov Random Fields. San Diego, CA: Academic Press. Cromartie, Robert. 1995. Structure-sensitive contrast enhancement: development and evaluation. Ph.D. diss., Univ. of North Carolina at Chapel Hill, Dept. of Computer Science. Dempster, A. P., N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society. 1977(1): 1-38. Duda, Richard O., and Peter E. Hart. 1973. Pattern Classification and Scene Analysis. New York: John Wiley & Sons. Eberly, David. 1993. personal communication. Eberly, David. 1994a. Geometric analysis of ridges in N-dimensional images. Ph.D. diss., Univ. of North Carolina at Chapel Hill, Dept. of Computer Science. Eberly, David. 1994b. A differential geometric approach to anisotropic diffusion. in Geometry-Driven Diffusion in Computer Vision, ed. B.M. ter Haar Romeny, 371392. Dordrecht, the Netherlands: Kluwer. Florack, L.M.J. 1993. The syntactic structure of scalar images. Ph.D. diss., Utrecht Univ., the Netherlands. Florack, L.M.J., B.M. ter Haar Romeny, J.J. Koenderink, and M.A. Viergever. 1994a. General intensity transformations and differential invariants. J. of Math. Imaging and Vis. 4: 171-187. Florack, L.M.J., B.M. ter Haar Romeny, J.J. Koenderink, and M.A. Viergever. 1994b. Images: regular tempered distributions. in Shape in Picture, 651-659. Berlin: Springer-Verlag.

116

Image Geometry Through Multiscale Statistics

Fritsch, D. 1993. Registration of radiotherapy images using multiscale medial descriptions of image structure. Ph.D. diss., Univ. of North Carolina at Chapel Hill, Dept. of Computer Science. Geiger, Davi, and Alan Yuille. 1991. A common framework for image segmentation. Int. J. of Comp. Vis. 6(3): 227-243. Geman S. and D. Geman. 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Patt. Anal. Mach. Intell. PAMI-6: 721741. Gerig, Guido, John Martin, Olaf Kubler, Ron Kikinis, Martha Shenton and Ferenc A. Jolesz. 1991. Automating segmentation of dual-echo MR head data. in Proc. Int. Conf. Information Processing in Medical Imaging, IPMI91, Wye, Kent, UK, July 1991; Lecture Notes in Computer Science, 511, ed. A.C.F. Colchester, D.J. Hawkes, 175-187. Berlin: Springer-Verlag. Gerig, Guido, Olaf Kubler, Ron Kikinis, and Ferenc A. Jolesz. 1992. Nonlinear anisotropic filtering of MRI data. IEEE Trans. Patt. Anal. Mach. Intell. PAMI-11: 221-232. Griffin, L. D., A.C.F. Colchester, and G.P. Robinson. 1991. Scale and segmentation of grey-level images using maximum gradient paths. in Proc. Int. Conf. Information Processing in Medical Imaging, IPMI91, Wye, Kent, UK, July 1991; Lecture Notes in Computer Science, 511, ed. A.C.F. Colchester, D.J. Hawkes, 256-272. Berlin: Springer-Verlag. Gueziec, Andre and Nicholas Ayache. 1992. Smoothing and matching of 3-D space curves. Visualization in Biomedical Computing 1992, ed. Richard A. Robb. Proc. SPIE-1808: 259-273. Hu, Ming-Kuei. 1962. Visual pattern recognition by moment invariants. IRE Trans. Information Theory. IT-8(February): 179-187. Jain, Anil K. 1989. Prentice Hall.

Fundamentals of Image Processing.

Englewood Cliffs, NJ:

Johnson, Norman L., Samuel Kotz. 1970. Continuous Univariate Distributions - 1. New York: John Wiley and Sons. Koenderink, J.J. 1984. The structure of images. Biol. Cybernet 50: 363-370. Koenderink, J.J. 1990. Solid shape. Cambridge, MA: MIT press. Koenderink, J.J., and A.J. van Doorn. 1987. Representation of local geometry in the visual system. Biol. Cybern., 55:367--375. Laidlaw, David H. 1995. Geometric model extraction from magnetic resonance volume data. Ph.D. diss., California Institute of Technology, Pasadena, CA. May 1995. Lee, J.S. 1983. Digital image smoothing and the sigma filter. Comp. Vision, Graphics, and Image Processing. CVGIP-24: 255-269. Lindeberg, T. 1990. Scale-space for discrete signals. IEEE Trans. Patt. Anal. Mach. Intell. PAMI-12: 234-254.

Bibliography

117

Lindeberg, T. 1991. Discrete scale space theory and the scale space primal sketch. Ph.D. diss., Royal Institute of Technology, S-100 44 Stockholm, Sweden. May 1991. Lindeberg, T. 1994a. Scale Space Theory in Computer Vision.. Netherlands: Kluwer.

Dordrecht, the

Lindeberg, T. and B.M. ter Haar Romeny. 1994b. Linear scale-space II: early visual operations. in Geometry-Driven Diffusion in Computer Vision, ed. B.M. ter Haar Romeny, 39-71. Dordrecht, the Netherlands: Kluwer. Mallat, Stephane. 1989. A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. Patt. Anal. Mach. Intell. PAMI-12: 674-693. Metz, C. 1969. A mathematical investigation of radioisotope scan image processing. Ph.D. diss., Univ. of Pennsylvania. Morse, Bryan S. 1994. Computation of object cores from grey-level images. Ph.D. diss., Univ. of North Carolina at Chapel Hill, Dept. of Computer Science. Morse B.S., S.M. Pizer, and A. Liu. 1993. Multiscale medial analysis of medical images. in Proc. Int. Conf. Information Processing in Medical Imaging, IPMI93, Flagstaff, AZ, USA, June 1993; Lecture Notes in Computer Science, 687, ed. H.H. Barrett, A.F. Gmitro, 112-131. Berlin: Springer-Verlag. Olver, P.J. 1993. Applications of Lie Groups to Differential Equations, 2nd edition. (1st ed. 1986) Berlin: Springer-Verlag. Olver, Peter, Guillermo Sapiro, Allen Tannenbaum. 1994. Differential invariant signatures and flows: a symmetry group approach. in Geometry-Driven Diffusion in Computer Vision, ed. B.M. ter Haar Romeny, 225-306. Dordrecht, the Netherlands: Kluwer. Papoulis, A. 1991. Probability, Random Variables, and Stochastic Processes, 3rd edition. New York: McGraw-Hill. Perona, P. and J. Malik. 1990. Scale-Space and edge detection using anisotropic diffusion. IEEE Trans. Patt. Anal. Mach. Intell. PAMI-12: 629-639. Perona, P. 1992. Steerable-scalable kernels for edge detection and junction analysis. in Proc. 2nd European. Conf. on Computer Vision, Santa Margherita Ligure, Italy, May 1992. 3-18. Pizer, S.M., T.J. Cullip, and R.E. Fredericksen. 1990. Toward interactive object definition in 3D scalar images. in 3D Imaging in Medicine; NATO ASI Series, F60, ed. Karl Heinz Hohne, Henry Fuchs, and Stephen M. Pizer, 83-105. Berlin: Springer-Verlag. Pizer, S.M., E.P. Amburn, J.D. Austin, R. Cromartie, A. Geselowitz, B.M. ter Haar Romeny, and J.B. Zimmerman. 1987. Adaptive histogram equalization and its variations. Comp. Vision, Graphics, and Image Processing. CVGIP-35: 355-368. Press, W.H., B.P. Flannery, S.A. Teukolsky, and W.T. Vetterling. 1992. Numerical Recipies in C; the Art of Scientific Computing, 2nd edition. Cambridge, UK: Cambridge University Press.

118

Image Geometry Through Multiscale Statistics

Reiss, T. H. 1991. The revised fundamental theorem of moment invariants. IEEE Trans. Patt. Anal. Mach. Intell. PAMI-13: 830-834. Shah, J. 1991. Segmentation by nonlinear diffusion. in Proc. Conf. on Computer Vision and Pattern Recognition. CVPR91. 202-207. ter Haar Romeny, B.M. and L.M.J. Florack. 1991a. A multiscale geometric approach to human vision. in Perception of Visual Information, ed. B. Hendee and P. N. T. Wells, 73-114. Berlin: Springer-Verlag. ter Haar Romeny, B.M., L.M.J. Florack, J.J., Koenderink, and M.A. Viergever. 1991b. Scale space: its natural operators and differential invariants. in Proc. Int. Conf. Information Processing in Medical Imaging, IPMI91, Wye, Kent, UK, July 1991; Lecture Notes in Computer Science, 511, ed. A.C.F. Colchester, D.J. Hawkes, 239255. Berlin: Springer-Verlag. ter Haar Romeny, B.M., L.M.J. Florack, A.H. Salden, and M.A. Viergever. 1993. Higher order differential structure in images. in Proc. Int. Conf. Information Processing in Medical Imaging, IPMI93, Flagstaff, AZ, USA, June 1993; Lecture Notes in Computer Science, 687, ed. H.H. Barrett, A.F. Gmitro, 77-93. Berlin: SpringerVerlag. Torre, V. and T.A. Poggio. 1986. On edge detection. IEEE Trans. Patt. Anal. Mach. Intell. PAMI-8: 147-163. Weickert, Joachim. 1995. Multiscale texture enhancement. in Proc. 6th Int. Conf. on Comp. Anal. of Images and Patt. CAIP ‘95, Prague, Sep 1995. (Lecture Notes in Computer Science): Berlin: Springer-Verlag. Wells, W.M., W.E.L. Grimson, R. Kikinis, and F.A. Jolesz. 1994. In-vivo intensity correction and segmentation of magnetic resonance image data. In Proc of Computer Vision in Medicine (AAAI Spring Symposium Series, Stanford University, Palo Alto, CA). AAAI Technical Report SS-94-05. 186-190. Whitaker, Ross T. 1993a. Geometry-Limited Diffusion, . Ph.D. diss., Univ. of North Carolina at Chapel Hill, Dept. of Computer Science. Whitaker, Ross T. and Stephen M. Pizer. 1993b. A multi-scale approach to nonuniform diffusion, CVGIP: Image Understanding, 57(1);99-110, January 1993. Whitaker, Ross T. 1993c. Characterizing first and second order patches using geometrylimited diffusion. in Information Processing in Medical Imaging (Lecture Notes in Computer Science 687, H.H. Barrett and A.F. Gmitro, eds.): 149-167, SpringerVerlag, Berlin, 1993. Witkin, A. 1983. Scale-space filtering. in Proc. Int.. Joint Conf. on Artif. Intell., Karlsruhe, Germany, Aug 1983, 1019-1022. Witkin, A. 1984. Scale-space filtering: a new approach to multi-scale description. in Image Understanding 1984, ed. S. Ullman, W. Richards, 79-95. Norwood, New Jersey: Ablex.

Bibliography

119

Yoo, Terry S. and James M. Coggins, Using statistical pattern recognition to control variable conductance diffusion, Information Processing in Medical Imaging (Lecture Notes in Computer Science 687, H.H. Barrett and A.F. Gmitro, eds.): 495-471, Springer-Verlag, Berlin, 1993. Yoo, Terry S. 1994. Statistics and scale in variable conductance diffusion. In Proc of Computer Vision in Medicine (AAAI Spring Symposium Series, Stanford University, Palo Alto, CA). AAAI Technical Report SS-94-05. 186-190. Yuille, A. and T. A. Poggio. 1986. Scaling theorems for zero crossings. IEEE Trans. Patt. Anal. Mach. Intell. PAMI-8: 15-25.