Advanced Probability and Statistics: Remarks and Problems 9781032405155, 9781032405162, 9781003353447

280 67 8MB

English Pages [214] Year 2023

Table of contents :
Cover
Half Title
Title Page
Copyright Page
Preface
Table of Contents
1. Remarks and Problems on Transmission Lines and Waveguides
2. Remarks and Problems on Statistical Signal Processing
3. Some Study Projects on Applied Signal Processing with Remarks About Related Contributions of Scientists

Recommend Papers

Advanced Probability and Statistics [1 ed.] 1032405155, 9781032405155

The chapters in this book deal with: Basic formulation of waveguide cavity resonator equations especially when the cro

107 70 1MB Read more

Probability, Statistics and Truth

103 72 18MB Read more

Probability and Statistics 1292025042, 9781292025049

The revision of this well-respected text presents a balanced approach of the classical and Bayesian methods and now incl

938 75 12MB Read more

Mastering Probability and Statistics: A Comprehensive Guide to Learn Probability and Statistics

Unveil the Secrets of Data Analysis and Inference In the realm of data-driven decision-making, probability and statisti

122 80 Read more

40 Puzzles and Problems in Probability and Mathematical Statistics 9780387735115, 9780387735122, 2007936604

624 57 2MB Read more

Probability, Statistics, and Random Signals 9780190200510

832 96 267MB Read more

Probability, Mathematical Statistics, and Stochastic Processes

122 78 45MB Read more

Probability, Statistics and Analysis 9780521285902, 0521285909

This collection of papers is dedicated to David Kendall, the topics will interest postgraduate and research mathematicia

116 33 11MB Read more

Probability and Statistics [4 ed.] 9780137981694

Probability & Statistics, 4th Edition is a revision of the well-respected text. It presents a balanced approach of t

121 58 4MB Read more

Probability and statistics [4 ed.] 0321500466

733 77 10MB Read more

Advanced Probability and Statistics: Remarks and Problems
9781032405155, 9781032405162, 9781003353447

Author / Uploaded
Harish Parthasarathy

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Advanced Probability and Statistics:

Remarks and Problems

Advanced probability and statistics Part II:Remarks and Problems Harish Parthasarathy ECE Division, NSUT May 30, 2020

Advanced Probability and Statistics:

Remarks and Problems

Harish Parthasarathy Professor

Electronics & Communication Engineering

Netaji Subhas Institute of Technology (NSIT)

New Delhi, Delhi-110078

First published 2023 by CRC Press 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN and by CRC Press 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742 © 2023 Manakin Press CRC Press is an imprint of Informa UK Limited The right of Harish Parthasarathy to be identiﬁed as author of this work has been asserted in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. For permission to photocopy or use material electronically from this work, access www. copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact [email protected] Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identiﬁcation and explanation without intent to infringe. Print edition not for sale in South Asia (India, Sri Lanka, Nepal, Bangladesh, Pakistan or Bhutan). British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record has been requested

ISBN: 9781032405155 (hbk)

ISBN: 9781032405162 (pbk)

ISBN: 9781003353447 (ebk)

DOI: 10.4324/9781003353447

Typeset in Arial, Calibri, Cambria Math, Century Schoolbook, MT-Extra, Symbol MT,

Tahoma, Verdana, Wingdings, Palatino, Monotype Corsiva, Euclid Extra, KozGoPr6N,

Minion Pro, Symbol and Times New Roman

by Manakin Press, Delhi

Preface

The chapters of this book deals with the basic formulation of waveguide cavity resonator equations especially when the cross sections of the guides and resonators have arbitrary shapes. The focus is on expressing the total field energy within such a cavity resonator as a quadratic form in the complex coefficients that determine the modal expansions of the electromagnetic field. Such an expression can then be immediately quantized by replacing the coefficients with creation and annihilation operators. The reviews of basic statistical signal processing covering linear models, fast algorithms for estimating the parameters in such linear models, applications of group representation theory to image processing problems especially the representations of the permutation groups and induced representation theory applied to image processing problems involving the three dimensional Euclidean motion group. Some attention has been devoted to quantum aspects of stochastic filtering theory. The UKF as an improvement of the EKF in nonlinear filtering theory has been explained. The Hartree-Fock equations for approximately solving the two electron atomic problem taking spin-orbit magnetic field interactions into account has been discussed. In the limit as the lattice tends to a continuum, the convergence of the stochastic differential equations governing interacting particles on the lattice to a hydrodynamic scaling limit has also been discussed. Statistical performance analysis of the MUSIC and ESPRIT algorithms used for estimating the directions of arrival of multiple plane wave emitting signal sources using an array of sensors has been outlined here. It is based on our understanding of how the singular value decomposition of a matrix gets perturbed when the given matrix is subject to a small random perturbation. Finally, some aspects of supersymmetry and supergravity have been discussed in the light of the fact that supersymmetry is now a mathematically well-defined field of research that has opened up a new avenue to our understanding of how gravity can be unified with the other fundamental forces of nature. This book is based on the lectures delivered by the author to undergraduate and postgraduate students. These courses were on transmission lines and waveguides and statistical signal processing. Author

v

Table of Contents

1. Remarks and Problems on Transmission Lines and Waveguides

1–18

2. Remarks and Problems on Statistical Signal Processing

19–132

3. Some Study Projects on Applied Signal Processing with Remarks About Related Contributions of Scientists

133–205

vii

Chapter 1

Chapter 1

Remarks Problemson on Remarks andand Problems Transmission Transmission lines Lines and and Waveguides Waveguides [1] Study about the historical development of the Maxwell equations for elec tromagnetism starting with the experimental ﬁndings and theoretical formula tions of Coulomb, Ampere, Oorsted, Faraday, Gauss and ﬁnally culminating in Maxwell’s introduction of the displacement current to satisfy charge conserva tion in time varying situations. Study about how Maxwell converted all these ﬁndings into laws expressible in the form of partial diﬀerential equations based on the basic operations of vector calculus and how by manipulating these equa tions, he proved that electric and magnetic ﬁelds propagate in vacuum as plane waves travelling at the speed of light and thereby how he uniﬁed light with elec tricity and magnetism. Study about how Heinrich Hertz conﬁrmed Maxwell’s theory hundred years later using Leyden jar experiments. Max Planck struggled for over twenty years to ﬁnally arrive at his law for the spectrum of black body radiation. The earlier law for this spectrum that was being used was Wien’s displacement law according to which the spectral density of black body radiation was proportional to S(ν) = Cν 3 .exp(−βν) With β = A/T with A a constant. At very low frequencies this law states that the spectral density is proportional to ν 3 . The same is true at very high temperatures. At very low temperatures, this law predicts that the spectrum will vanish, ie, there will not be any radiation at all. The high temperature and low frequency limit of Wien’s displacement law was in sharp contradiction with experiment. Planck used a little of statistical mechanics but more of curve 9

1

2

Advanced Probability and Statistics: Remarks and Problems 10CHAPTER 1. REMARKS AND PROBLEMS ON TRANSMISSION LINES AND WAVE ﬁtting to modify Wien’s displacement law to S(ν) =

Cν 3 exp(Aν/T ) − 1

This law has the same low temperature and high frequency behaviour as Wien’s displacement law but at high temperatures or at low frequencies, while Wien’s law predicts a Cν 3 dependence of the spectrum of black body radiation, Planck’s law gives a behavour CT ν 2 which is in agreement with the experiments con ducted by Rubens and Kurlbaum. Planck by ﬁtting this curve to the experi mental curve of Rubens and Kurlbaum arrived at the formula A = h/k where k is Avogadro’s number and h is called Planck’s constant. Planck later on gave the following derivation of his radiation law: He assumed that radiation energy comes in quanta of hν, ie, in integer multiples of hν via harmonic oscillators. When a harmonic oscillator of frequency ν is excited to the nth energy level, it acquires an energy of nhν and by Boltzmann’s relation between energy and probability, the probability of such an oscillator getting excited to the nth level is proportional exp(−nhν/kT ). Hence, Planck concluded that the average energy of an oscillator of frequency ν is given by : n≥0 nhν.exp(−nhν/kT ) U (ν) = : n≥0 exp(−nhν/kT ) =

hν exp(hν/kT ) − 1

Next, Planck used familiar method of Rayleigh to derive a formula for the total number of oscillators having frequency in the range [ν, ν + dν] and belonging to the volume spatial V. This number is given by � d3 qd3 p/h3 q∈V,pc∈[hν,h(ν+dν]

Where he used Einstein’s energymomentum relation E = pc for photons which have zero mass. Taking into account that the photon has two independent modes of polarization, ie, perpendicular to its direction of propagation, this number evaluates to � d d dν V 8πp2 dp/h3 = dν. V 8πν 3 /3c3 = dν.V.8πν 2 /c3 dν dν p≤hν/c Multiplying this number with the average energy of an oscillator gives us the average energy of black body radiation in the frequency range [ν, ν + dν]: S(ν)dν =

8πhν 3 /c3 dν exp(hν/kT ) − 1

This is the famous Planck’s law of black body radiation and its advent was the starting point of the whole of modern quantum mechanics and quantum ﬁeld

Advanced Probability and Statistics: Remarks and Problems

11 3

theory. First Newton came who uniﬁed gravitation which causes the apple to fall with Kepler’s laws of planetary motion by proposing his celebrated inverse square law, he invented calculus along with Leibniz in order to establish Kepler’s laws of motion from his inverse square law of gravitation. More precisely, Robert Hooke who was the curator of the Royal society posed to Newton the inverse problem: What radial force law of attraction should exist between the sun and a planet in order that the planet move around the sun in ellipses satisfying Kepler’s laws ? Newton solved this inverse problem by inventing calculus and formulating his second law of motion in terms of diﬀerential calculus which he called ﬂuxions. He applied this law to the sun planet system and proved that when the force of attraction between the two is the inverse square law then the planet is guaranteed to revolve around the sun in an ellipse. It is a little unfortunate that Robert Hooke’s name does not appear prominently in Newton’s magnamopus “Philosophae Naturalis Principia Mathematica’ which is the Latin translation of “Mathematical Principles of Natural Philosophy”. Today we believe that some portion of the credit for the discovery of the inverse square law of gravitation should go to Robert Hooke. After Newton, the next major uniﬁcation in physics came with Maxwell when he created the four laws of electromagnetism based on the ﬁndings of Coulomb, Gauss, Ampere, Oorsted and Faraday and using these laws predicted that electricity, magnetism and light are one and the same phenomena which appear distinct phenomena to us primarily because of the frequencies at which these propagate. [2] The rectangular waveguide:Expressing the transverse component of the electromagnetic ﬁeld in terms of the longitudinal components. [a] A rectangular waveguide has dimensions a, b along the x and y axes respectively. Assume that the length of the guide is d. When the ﬁelds have the sinusoidal dependence exp(−jωt) and dependence upon z as exp(−γz), then ∂/∂t and ∂/∂z get replaced respectively by multiplication with −jω and −γ. Hence, the Maxwell curl equations in the ω − x − y − z domain are curlE = −jωµH, curlH = jω�E which in component form become Ez,y + γEy = −jωµHx , − − − − −(1) −γEx − Ez,x = −jωµHy , − − − − −(2) Ey,x − Ex,y = −jωµHz − − − − − (3) and likewise by duality with E → H, H → −E, � → µ, µ → �. Write down the dual equations: Hz,y + γHy = jω�Ex , − − − − −(4) −γHx − Hz,x = jω�Ey , − − − − −(5) Ey,x − Ex,y = jω�Ez − − − − − (6)

4

Advanced Probability and Statistics: Remarks and Problems 12CHAPTER 1. REMARKS AND PROBLEMS ON TRANSMISSION LINES AND WAVE Solve (1),(2),(4),(5) for {Ex , Ey , Hx , Hy } in terms of {Ez,x , Ez,y , Hz,x , Hz,y } and show that this solution can be expressed as E⊥ = (−γ/h2 )V⊥ Ez − (jωµ/h2 )V⊥ Hz × zˆ − − − (7) H⊥ = (−γ/h2 )V⊥ Hz + (jωe/h2 )V⊥ Ez × zˆ − − − (8) where V⊥ = x ˆ∂/∂x + yˆ∂/∂y, E⊥ = Ex x ˆ + Ey y, ˆ H⊥ = Hx x ˆ + Hy y, ˆ h2 = γ 2 + ω 2 eµ

[3] [a] Show that all the procedures and expressions in Step 1 are valid even when e, µ are functions of (ω, x, y) but not of z. Assuming e, µ to be constants, show that by substituting (7) and (8) into (3) and (6) gives us the two dimensional Helmholtz equation for Ez , Hz : 2 + h2 )Hz = 0 − − − (9) (V2⊥ + h2 )Ez = 0, (V⊥

[4] [a] Show using (7) and (8) that the boundary conditions that the tangential components of E and the normal components of H vanish on its boundary walls are equivalent to the conditions Ez = 0, x = 0, a, y = 0, b, ∂Hz /∂x = 0, x = 0, a, ∂Hz /∂y = 0, y = 0, b Hence by applying the separation of variables method to (9) deduce that the general solutions are given in the frequency domain by L Ez (ω, x, y, ) = c(m, n, ω)um,n (x, y)exp(−γmn (ω)z), m,n≥1

Hz (ω, x, y, ) =

L

d(m, n, ω)vm,n (x, y)exp(−γmn (ω)z),

m,n≥1

where

where

√ √ um,n (x, y) = (2 2/ ab).sin(mπx/a).sin(nπy/b), √ √ vm,n (x, y) = (2 2/ ab).cos(mπx/a).cos(nπy/b), J γm,n (ω) = h2mn − ω 2 µe, h2mn = ((mπ/a)2 + (nπ/b)2 )1/2

Advanced Probability and Statistics: Remarks and Problems

13

where m, n are positive integers. Show that J aJ b um,n (x, y)up,q (x, y)dxdy = δm,p δn,q 0

0

and deduce that if Re(γmn ) = αmn , then J aJ bJ d � |Ez |2 dxdydz = |c(m, n, ω)|2 (1 − exp(−2αmn (ω)d)/2αmn (ω) 0

0

0

Likewise evaluate J

m,n

|Hz |2 dxdydz,

J

|E⊥ |2 dxdydz,

J

|H⊥ |2 dxdydz

and hence evaluate the time averaged energy density in the electromagnetic ﬁeld at frequency ω: J aJ bJ d U = (1/4) (E|E(ω, x, y, z)|2 + µ|H(ω, x, y, z)|2 )dxdydz 0

0

0

Question: Why does the 1/4 factor come rather than the 1/2 factor ? [5] [a] Calculate the power dissipated in the waveguide walls assuming that the region outside has a ﬁnite conductivity σ? Solution: The surface current density on the wall x = 0 is ˆ × H(0, y, z) = Hy (0, y, z)ˆ z − Hz (0, y, z)ˆ y Js (0, y, z) = x This is the current per unit length on the wall. It can be attributed to a current density in the inﬁnite region beyond this wall into the boundary having a value of J(x, y, z), x < 0 provided that we take J 0 J(x, y, z)dx = Js (0, y, z) −∞

However from basic electromagnetic wave propagation theory in conducting me dia, we know that � J(x, y, z) = J(0, y, z)exp(γ0 x), γ0 = jµω(σ + jωE) Thus,

J(0, y, z)/γ0 = Js (0, y, z) and hence the average power dissipated inside the region x < 0, 0 < y < b, 0 < z < d is given using Ohm’s law by J 0 J bJ d P = (|J(x, y, z)|2 /2σ)dxdydz −∞

0

0

5

6

Advanced Probability and Statistics: Remarks and Problems 14CHAPTER 1. REMARKS AND PROBLEMS ON TRANSMISSION LINES AND WAVE J

b

=(

0

J

d 2

0

=(

2

|Js (0, y, z)| /2σ|γ0 | dydz).( J

b 0

J

J

0

exp(2α0 x)dx

−∞

d 0

(|Js (0, y, z)|2 /(4α0 σ|γ0 |2 ))dydz)

where α0 = Re(γ0 ) Repeat this calculation for the other walls. [6] [a] Show that if the waveguide has an arbitrary cross section, in an arbitrary orthogonal coordinate system (q1 , q2 ) for the x − y plane, equations (7) and (8) can be expressed as E1 = (−γ/h2 G1 )∂Ez /∂q1 − (jωµ/h2 G2 )∂Hz /∂q2 E2 = (−γ/h2 G2 )∂Ez /∂q2 + (jωµ/h2 G1 )∂Hz /∂q2 H1 = (−γ/h2 G1 )∂Hz /∂q1 + (jωE/h2 G2 )∂Ez /∂q2 H2 = (−γ/h2 G2 )∂Hz /∂q2 − (jωE/h2 G1 )∂Ez /∂q2 where G1 , G2 are the Lame’s coeﬃcients for orthogonal curvilinear coordinate system (q1 , q2 ), ie, y G1 = (∂x/∂q1 )2 + (∂y/∂q1 )2 , y G2 = (∂x/∂q2 )2 + (∂y/∂q2 )2 ,

and

q1 + E2 (ω, q1 , q2 , z)ˆ q2 , E⊥ (ω, q1 , q2 , z) = E1 (ω, q1 , q2 , z)ˆ H⊥ (ω, q1 , q2 , z) = H1 (ω, q1 , q2 , z)ˆ q1 + H2 (ω , q1 , q2 , z)ˆ q2 , deﬁne the curvilinear components of E⊥ and H⊥ respectively. Show that com bining these equations with the zˆ component of the Maxwell curl equations results in the two dimensional Helmholtz equation in the curvilinear system: ∂ G2 ∂Ez ∂ G1 ∂Ez 1 ( + ) + h2 Ez = 0 ∂q2 G2 ∂q2 G1 G2 ∂q1 G1 ∂q1 and the same equation for Hz . [b] Show that the boundary conditions on the conducting walls in the curvi linear case, assuming that the boundary curve of the waveguide cross section is given by q1 = c = constt assume the forms Ez = 0, q1 = c,

∂Hz = 0, q1 = c ∂q1

Advanced Probability and Statistics: Remarks and Problems

15

Deduce that in general, the modal eigenvalues h2 are diﬀerent in the T E (Hz = 0) and the T M (Ez = 0) cases. They are the same only in the rectangular waveguide case. [c] Deduce an expression for the total time averaged power at frequency ω dissipated in the waveguide’s conducting walls assuming that the region exterior to the guide has a constant conductivity of σ. hint: The surface current density is Js (ω, q2 , z) = qˆ1 × H⊥ (ω, c, q2 , z) and hence by the same reasoning as in step 3, we have that the volume current density in the conducting exterior satisﬁes \2 J(ω, q1 , q2 , z) − γ0 (ω)2 J(ω, q1 , q2 , z) = 0 J γ0 (ω) = jωµ(σ + jωE), 1 ∞ J(ω, q1 , q2 , z)G1 (q1 , q2 )dq1 = Js (ω, q2 , z) c

An approximate solution to this corresponding to the situation when the ﬁeld propagates only along the q1 direction in the conducting region is given by 1 q1 G1 (q1 , q2 )dq1 ) J(ω, q1 , q2 , z) = γ0 (ω)Js (ω, q2 , z)exp(−γ0 (ω) c

Note that this situation corresponds to the fact that the ﬁelds propagate from the surface of guide into the depth of the conducting walls normally, ie, along the direction q1 into it and the factf that in propagating from q1 = c to q1 q normally, the distance covered is l = c 1 G1 (q1 , q2 )dq1 . Then the average power dissipated per in a length d of the guide at frequency ω is given by 1 ∞1 A1 d Pdiss = (1/2σ) |J(ω, q1 , q2 , z)|2 G1 (q1 , q2 )G2 (q1 , q2 )dq1 dq2 dz c

0

0

where when q2 varies over [0, A] one full curve on the cross section is covered. Note that q2 is tangential to the waveguide boundary curve for any cross section. Specialization to cylindrial guides: J ˙ that [a] In Step 4, choose q1 = ρ = x2 + y 2 , q2 = φ = tan−1 (y/x)Show (ρ, φ) form an orthogonal curvilinear system of coordinates in the xy plane and that G1 = Gρ = 1, G2 = Gφ = ρ so that equations (7) and (8) assume the forms Eρ = (−γ/h2 )

∂Ez ∂Hz − (jωµ/h2 ρ) ∂ρ ∂φ

7

8

Advanced Probability and Statistics: Remarks and Problems 16CHAPTER 1. REMARKS AND PROBLEMS ON TRANSMISSION LINES AND WAVE Eφ = (−γ/h2 ρ)

∂Ez ∂Hz + (jωµ/h2 ) ∂φ ∂ρ

Hρ = (−γ/h2 )

∂Hz ∂Ez + (jωe/h2 ρ) ∂ρ ∂φ

Hφ = (−γ/h2 )

∂Hz ∂Ez − (jωe/h2 ρ) ∂ρ ∂φ

[b] Substituting the above into the z component of the Maxwell curl equa tions then gives us the two dimensional Helmholtz equation for Ez , Hz in the plane polar coordinate system: 1 ∂ ∂Ez 1 ∂ 2 Ez ρ + 2 + h 2 Ez = 0 ρ ∂ρ ∂ρ ρ ∂φ2 and the same equation for Hz . The boundary conditions are given by Ez = 0, ρ = R,

∂Hz = 0, ρ = R ∂ρ

and these solving them by the method of separation of variables with the appli cation of the appropriate boundary conditions then gives us the general solutions 2 [Jm (αm (n)ρ/R)(c1 (ω, m, n)cos(mφ)+ Ez (ω, ρ, φ, z) = E (ω)z)] c2 (ω, m, n)sin(mφ))exp(−γmn

m,n

Hz (ω, ρ, φ, z) =

2 m,n

[Jm (βm (n)ρ/R)(d1 (ω, m, n)cos(mφ)+ H d2 (ω, m, n)sin(mφ))exp(−γmn (ω)z)]

where αm (n), n = 1, 2, ... are the roots of Jm (x) = 0 while βm (n), n = 1, 2, ... : (x) = 0 and further are the roots of Jm h = hmn = αm (n)/R in the T M case (Ez = 0, Hz = 0) while h = hmn = βm (n)/R in the T E case (Ez = 0, Hz = 0). Further � E (ω) = αm (n)2 /R2 − ω 2 µe, γmn H (ω) = γmn

�

βm (n)2 /R2 − ω 2 µe

are the propagation constants for T Mmn and T Emn modes respectively. Exercises:

Advanced Probability and Statistics: Remarks and Problems

17

[1] Show using the separation of variables applied to the two dimensional Helmholtz equation that Jm (x) actually satisﬁes the Bessel equation x2 Jm (x) + xJm (x) + (x2 − m2 )Jm (x) = 0

[2] Show that if fmn (ρ) = Jm (αm (n)ρ/R), Jm (αm (n)) = 0 then ρ2 fmn (ρ) + ρfmn (ρ) + (αm (n)2 ρ2 /R2 − m2 )fmn (ρ) = 0 and hence prove the orthogonality relations R

Jm (αm (n)ρ/R)Jm (αm (k)ρ/R)ρdρ = 0, n = k 0

Hint: Multiply the above diﬀerential equation for fmn (ρ) by fmk (ρ)/ρ, inter change n and k, subtract the second equation from the ﬁrst and integrate from ρ = 0 to ρ = R. Use integration by parts to deduce the identity R

(αm (n)2 − αm (k)2 )

ρ.fmn (ρ).fmk (ρ)dρ = 0 0

[3] Repeat Exercise [2] with αm (n) replaced by βm (n) where now Jm (βm (n)) = 0. [4] Prove the orthogonality relations 2π

2π

cos(mφ)cos(nφ)dφ = 0

sin(mφ)sin(nφ)dφ = 0, m = n 0

and

2π 0

cos(mφ)sin(nφ)dφ = 0, ∀m, n

[5] Using Exercises [2], [3], [4], deduce that the functions Jm (αm (n)ρ/R)cos(mφ), Jm (αm (n)ρ/R)sin(mφ), m, n = 1, 2, ... are all mutually orthogonal on the disc of radius R, w.r.t to the area measure ρ.dρ.dφ and likewise for the functions Jm (βm (n)ρ/R)cos(mφ), Jm (βm (n)ρ/R)sin(mφ), m, n = 1, 2, ...

9

18CHAPTER 1. REMARKS AND PROBLEMS ON TRANSMISSION LINES AND WAVEG

10

Advanced Probability and Statistics: Remarks and Problems

[6] Use the result of Exercise [5] to show that when L E [Jm (αm (n)ρ/R)(c1 (ω, m, n)cos(mφ)+c2 (ω, m, n)sin(mφ))exp(−γmn (ω)z)] Ez = m,n

then L m,n

where

1

R 0

1

2π

|Ez |2 ρ.dρ.dφ =

0

E (ω)z) λ(m, n)(|c1 (ω, m, n)|2 + |c2 (ω, m, n)|2 )exp(−2αmn

E E αmn (ω) = Re(γmn (ω))

and λ(m, n) = =

1

R

0

1

R 0

1

1

2π

Jm (αm (n)ρ/R)2 cos2 (mφ)ρ.dρ.dφ 0

2π

Jm (αm (n)ρ/R)2 sin2 (mφ)ρ.dρ.dφ 0

=π

1

R

Jm (αm (n)ρ/R)2 ρ.dρ

0

Likewise, show that for L H Hz = [Jm (βm (n)ρ/R)(d1 (ω, m, n)cos(mφ)+d2 (ω, m, n)sin(mφ))exp(−γmn (ω)z)] m,n

we have 1 R 1 2π 0

0

|Hz |2 ρ.dρ.dφ =

L

H µ(m, n)(|d1 (ω, m, n)|2 +|d2 (ω, m, n)|2 )exp(−2αmn (ω)z)

m,n

where µ(m, n) = π

1

R

Jm (βm (n)ρ/R)2 ρ.dρ

0

[7] Prove that 1

R 0

: : Jm (αm (n)ρ/R)Jm (αm (k)ρ/R)ρ.dρ = 0, n = k

hint: Integrate by parts in two diﬀerent ways and substitute for the second derivatives using Bessel’s equation. [8] Repeat [7] with αm (n) replaced by βm (n).

Advanced Probability and Statistics: Remarks and Problems

19

[9] Using the results of the previous Exercises and the expressions for E⊥ and H⊥ in terms of Ez , Hz to express the time averaged energy in the electric ﬁeld as / UE = (E/4) |E|2 ρ.dρ.dφ.dz = (E/4)

/

[0,R]×[0,2π)×[0,d]

(|Ez |2 + |E⊥ |2 )ρ.dρ.dφ.dz

2 [pE (ω, m, n)(|c1 (ω, m, n)|2 + |c2 (ω, m, n)|2 ) m,n

+qE (ω, m, n)(|d1 (ω, m, n)|2 + |d2 (ω, m, n)|2 )]

and that in the magnetic ﬁeld as / UH = (µ/4)

[0,R]×[0,2π)×[0,d]

= (µ/4) 2 m,n

/

|H|2 ρ.dρ.dφ.dz

(|Hz |2 + |H⊥ |2 )ρ.dρ.dφ.dz

[pH (ω, m, n)(|d1 (ω, m, n)|2 + |d2 (ω, m, n)|2 )

+qH (ω, m, n)(|c1 (ω, m, n)|2 + |c2 (ω, m, n)|2 )]

where pE (ω, m, n), qE (ω, m, n), qE (ω, m, n), qH (ω, m, n) depend only on E, µ, R, d as parameters. Quality factor [a] The quality factor of a guide is deﬁned as the ratio of the average energy stored per unit length of the guide to the energy dissipated per unit length per cycle. Exercise Compute the quality factors for rectangular and cylindrical guides for spec iﬁed modes T Emn and T Mmn . [b] For a rectangular guide, the wavelength of propagation along the z axis for the T Emn or the T Mmn modes when the frequency is more than the cutoﬀ frequency is given by J λ = 2π/βmn , βmn = jγmn = ω 2 µE − h2mn with

h2mn = (mπ/a)2 + (nπ/b)2 The phase velocity is given by vph = νλ = ωλ/2π = ω/βmn = ω/

J

ω 2 µE − h2mn

This is greater than the speed of light! The phase velocity is therefore not a meaningful measure for the velocity of energy transfer. A more meaningful

11

12 20CHAPTER 1. REMARKS Advanced andON Statistics: Remarks andLINES Problems ANDProbability PROBLEMS TRANSMISSION AND WAVE measure is the group velocity which is based on the following observation. Let a wave ﬁeld travelling along the z axis be a sum of two harmonic components with a small frequency diﬀerence and a small wavelength diﬀerence. Thus, it can be expressed as f (t, z) = cos(ωt − kz) + cos((ω + Δω)t − (k + Δk)z) This can be in turn expressed using a standard trigonometric identity as f (t, z) = 2cos((ω + Δω/2)t − (k + Δk/2)z).cos(Δωt − Δkz) The second cosine term represents the slowly varying (in space and time) enve lope of the wave while the ﬁrst cosine term represents the sharp variations of the signal within the envelope. The velocity of energy transfer is measured by that of the envelope and its is given by vg = Δω/Δk which in the limit becomes vg = dω/dk This called the group velocity. In our case, we ﬁnd the group velocity is vg (m, n) = dω/dβmn = (dβmn /dω)−1 = � (d ω 2 µ� − h2mn /dω)−1 = (µ�ω/βmn )−1 = βmn /µ�ω � √ = 1/µ� − h2mn /(µ�ω)2 < 1/ µ�

Thus the group velocity is smaller than the velocity of light and is therefore a more meaningful measure of the velocity of energy transfer for the (m, n)th mode. Energy density in a guide of arbitrary cross section. [a] Let un (q1 , q2 ) and −h2n be the eigenfunctions and eigenvalues of the Dirichlet problem (�2⊥ + h2n )un (q1 , q2 ) = 0, un (a, q2 ) = 0 and let vn (q1 , q2 ) and and −kn2 be the eigenfunctions and eigenvalues of the Neumann problem (�2⊥ + kn2 )vn (q1 , q2 ) = 0,

∂vn (c, q2 ) =0 ∂q1

Recall that in the orthogonal coordinate system (q1 , q2 ), we have �2⊥ =

∂ G1 ∂ 1 ∂ G2 ∂ ( + ) G1 G2 ∂q1 G1 ∂q1 ∂q2 G2 ∂q2

21

Advanced Probability and Statistics: Remarks and Problems

13

1

Exercise: Assuming that the hn2 s are all distinct, show that the u�n s are all orthogonal: 1 un (q1 , q2 )um (q1 , q2 )G1 (q1 , q2 )G2 (q1 , q2 )dq1 dq2 = 0, n = m D

1

and likewise assuming that the kn2 s are all distinct, show that 1 vn (q1 , q2 )vm (q1 , q2 )G1 (q1 , q2 )G2 (q1 , q2 )dq1 dq2 = 0, n = m D

where D is the crosssection of the guide parallel to the xy or equivalently q1 q2 plane. hint: Use Green’s theorem in the form 1 (un \2⊥ um − um \2⊥ un )G1 G2 dq1 dq2 = D

1

(un Γ

∂um ∂un − um )G2 dq2 ∂q1 ∂q1

where Γ is the curve bounding the cross section D and is deﬁned by the condition q1 = c, z = 0. Show that the general solution for the longitudinal component of the electric ﬁeld and the magnetic ﬁelds can be expressed as � Ez (ω, q1 , q2 , z) = c(ω, n)un (q1 , q2 )exp(−γnE (ω)z), n

Hz (ω, q1 , q2 , z) =

� n

where

d(ω, n)vn (q1 , q2 )exp(−γnH (ω)z)

γnE (ω) =

J

h2n − ω 2 µw, J γnH (ω) = kn2 − ω 2 µw

Hence by using formulae of step 4, calculate the transverse curvilinear compo nents of the electromagnetic ﬁeld. Now prove that the functions \⊥ un (q1 , q2 ), n ≥ 1 are also orthogonal, ie, 1 (\⊥ un , \⊥ um )G1 G2 dq1 dq2 = 0, n = m and likewise for \⊥ vn . For this simply apply Green’s formula in the form 1 2 (un \⊥ um + (\⊥ un , \⊥ um ))G1 G2 dq1 dq2 D

=

1

(un ∂um /∂q1 )G2 dq2 = 0 Γ

14 22CHAPTER 1. REMARKS Advanced andON Statistics: Remarks andLINES Problems ANDProbability PROBLEMS TRANSMISSION AND WAVE since un = 0 on Γ. In the case of vn , use the same formula but with the boundary condition ∂vm /∂q1 = 0 on Γ. Also show that l l (\⊥ un , \⊥ un )G1 G2 dq1 dq2 = h2n un2 G1 G2 dq1 dq2 = hn2 D

l

D

(\⊥ vn , \⊥ vn )G1 G2 dq1 dq2 = kn2

l

D

vn2 G1 G2 dq1 dq2 = kn2

assuming that the u�n s and vn� s are normalized. Finally, prove the orthogonality of \⊥ un × z, ˆ n ≥ 1 and of \⊥ vn × z, ˆ n ≥ 1. Also prove the mutual orthogonality of \un and \⊥ vm × zˆ and of \⊥ un × zˆ and \vn . For this, you can use the identities ((\⊥ un × zˆ), (\⊥ um × zˆ)) = (\⊥ un , \⊥ um ) and likewise for vn and further with dS = G1 G2 dq1 dq2 observing that \⊥ f = that

l = = =

l

l

D

D

D

l

1 1 f,1 qˆ1 + f,2 qˆ2 G1 G2

(\⊥ un , \⊥ vm × zˆ)dS

D

ˆ ⊥ un × \⊥ vm )dS z.(\

(un,1 vm,2 − un,2 vm,1 )dq1 dq2

((un vm,2 ),1 − (un vm,1 ),2 )dq1 dq2 = 0

again by applying the two dimensional version of the Gauss divergence theorem to the function (un vm,2 , −un vm,1 ) with the boundary condition that un vanishes on Γ. In this way using the formulae of step 4, show that the total average energy of the electromagnetic ﬁeld in the guide can be expressed as l U = (�/4) (|Ez |2 + |E⊥ |2 )dSdz D×[0,d]

+(µ/4) =

l

D×[0,d]

(|Hz |2 + |H⊥ |2 )dSdz

2 (λ(ω, n)|c(ω, n)|2 + µ(ω, n)|d(ω, n)|2 ) n

where λ(ω, n), µ(ω, n) are determined completely by h2n , kn2 , d where d is the length of the guide.

Advanced Probability and Statistics: Remarks and Problems

2315

Exercise: Calculate the explicit formulae for λ(ω, n), µ(ω, n). [7]

Cavity resonators.

[a] Take a rectangular wave guide of dimensions a, b along the x and y axes respectively and d along the z axis. Cover the bottom z = 0 and the top z = d with perfectly conducting plates. We then get a rectangular cavity resonator which is a cuboid with a perfectly conducting boundaries. By applying the waveguide equations of step 1, we get that the exp(−γz) dependence may be replace with any linear combination of exp(±γz) and we must choose this linear combination so that Hz vanishes when z = 0, d and Ex , Ey also vanish when z = 0, d. It should be noted that the multiplication by −γ that replaces ∂/∂z cannot be done here since we could multiply either by ±γ. This means that the cavity resonator case, equations (7) and (8) must be replaced by E⊥ = −(1/h2 )

∂ �⊥ Ez − (jωµ/h2 )�⊥ Hz × zˆ − − − (7� ) ∂z

∂ �⊥ Hz + (jω�/h2 )�⊥ Ez × zˆ − − − (8� ) ∂z It should be noted � that even in the waveguide case, there are two solutions for γ namely ± h2mn − ω 2 µ� and we choose that linear combination of the corresponding exponentials so that with E⊥ deﬁned by (7� ), we get that E⊥ along with Hz vanishes when z = 0, d. This conditions are equivalent to Hz and ∂Ez /∂z vanishing when z = 0, d. Note that Ez vanishing when z = 0, d is ∂ z �⊥ Ez = �⊥ ∂E equivalent to ∂z ∂z vanishing when z = 0, d. Thus we must choose γ = jπp/d for some integer p and the combination sin(πpz/d) = (exp(γz) − exp(−γz))/2j for Hz and cos(πpz/d) = (exp(γz) + exp(−γz))/2 for Ez . A little speculation will show that this is valid for cavity resonators of arbitrary cross section in the xy plane. H⊥ = (1/h2 )

[8] Exercises [1] This problem tells us how to analyze the waveguide ﬁelds in the presence of a gravitational ﬁeld which is independent of (t, z) and described in general relativity in terms of an appropriate metric tensor. Assume that the metric of spacetime is diagonal with the coeﬃcients inde pendent of t, z. Thus, the metric has the form dτ 2 = g00 (x, y)dt2 + g11 (x, y)dx2 + g22 (x, y)dy 2 + g33 (x, y)dz 2 Write down explicitly the components of the Maxwell equations Fµν,σ + Fνσ,µ + Fσµ,ν = 0, − − −(1) √ (F µν −g),ν = 0 − − − (2)

1624CHAPTER 1. REMARKS Advanced Probability and Statistics: Remarks and Problems AND PROBLEMS ON TRANSMISSION LINES AND WAVEG in this background metric assuming the dependence on (t, z) to be of the form Fµν (t, x, y, z) = Hµν (x, y)exp(−jωt − γ(ω)z) Identifying equation (1) with the homogeneous Maxwell equations curlE + jωB = 0, divB = 0 identify the vectors E, B in terms of the components of Hµν . Now write down the other Maxwell equations in free space (2) in terms of the Hµν and hence in terms of E, B and solve for Ex , Ey , Bx , By in terms of Ez , Bz and derive the generalized two dimensional Helmholtz equations satisﬁed by Ez , Bz . In case inhomogeneous permittivity E(ω, x, y) and permeability µ(ω, x, y) are also to be taken into account in equation (2), ﬁrst identify (2) with the Maxwell equations divD = 0, curlH + jωD = 0 and hence determine D, H in terms of the components of F µν . Thus, state how the vacuum medium relation F µν = g µα g νβ Fαβ gets modiﬁed in the presence of the inhomogeneous medium. Derive therefrom the relationship between D, H and E, B in the inhomogeneous medium in the presence of this nonﬂat diagonal metric. Obtain thus the modiﬁed generalized Helmholtz equations for Ez , Hz in this inhomogeneous medium in the presence of the above gravitational ﬁeld. Generalized this theory to the case of orthogonal curvilinear coordinates q = (q1 , q2 ) in the xy plane, ie, by writing down the metric as dτ 2 = g00 (q)dt2 + g11 (q)dq12 + g22 (q)dq22 + g12 (q)dq1 dq2 + g33 (q)dz 2 [2] Show that the most general solution for the electromagnetic ﬁeld within a cavity resonator of arbitrary cross section in the xy plane with length d along the z axis is given by L un (q1 , q2 ).(2/d)1/2 .cos(πpz/d).Re(c(n, p)exp(−jω(n, p)t)) Ez (t, q1 , q2 , z) = n,p≥1

Hz (t, q1 , q2 , z) =

L

vn (q1 , q2 ).(2/d)1/2 .sin(πpz/d).Re(d(n, p)exp(−jω(n, p)t))

n,p≥1

E⊥ (t, q1 , q2 , z) =

−

L n,p

kn−2 .(

L

n,p

h−2 n .(−πp/d).

⊥ un (q1 , q2 ).(2/d)

ˆ).(2/d) ⊥ vn )(q1 , q2 )×z

1/2

1/2

.sin(pπz/d).Re(c(n, p).exp(−jω E (n, p)t))

.sin(pπz/d).Re(jµω H (n, p)d(n, p).exp(−jω H (n, p)t))

Advanced Probability and Statistics: Remarks and Problems

17 25

H⊥ (t, q1 , q2 , z) � −2 = kn

.(πp/d).�⊥ vn (q1 , q2 ).(2/d)1/2 .cos(pπz/d).Re(d(n, p).exp(−jω H (n, p)t)) n,p

+

�

n,p

h−2 n .(�⊥ un )(q1 , q2 )

×zˆ).(2/d)1/2 .cos(pπz/d).Re(j�ω E (n, p)c(n, p).exp(−jω E (n, p)t))

where the notation of step 6 has been used. Note that the characteristic Eﬁeld and Hﬁeld frequencies of oscillation are respectively given bu � ω E (n, p) = (µ�)−1/2 h2n + (pπ/d)2 , � ω H (n, p) = (µ�)−1/2 kn2 + (pπ/d)2 ,

To see how these expressions arise, simply use the waveguide formula ω 2 µ� + γnE (ω)2 = h2n for the transverse magnetic ﬁeld situations and ω 2 µ� + γnH (ω) = kn2

for transverse electric ﬁeld situations, combined with the resonator formula (ob tained by applying the boundary conditions on the z = 0, d surfaces γnE (ω)2 = −(πp/d)2 = γnH (ω)2 You must apply the formula E⊥ = (1/h2 ) H⊥ = (�/k 2 )

∂ � ⊥ Ez ∂z

∂ �⊥ Ez × zˆ ∂t

for transverse magnetic ﬁelds, and E⊥ = (µ/k 2 )

∂ �⊥ Hz × zˆ ∂t

∂ �Hz ∂z for transverse electric ﬁelds and then apply the superposition principle, namely that the total electric ﬁeld is the superposition over transverse magnetic modes, ie, with Hz = 0 and the transverse electric modes, ie, with Ez = 0. These are derived from the formulas of step 1 by replacing −γ by ∂/∂z and −jω by ∂/∂t. This is required since the general form of the cavity ﬁelds consists not of one mode and one frequency but is rather a superposition over all the modes and cavity frequencies. In other words, for cavity ﬁelds we must rewrite the waveguide formulas in the spacetime domain (x, y, z, t), rather than in the 2D space and frequency domain (x, y, ω). H⊥ = (1/k 2 )

18

Advanced Probability and Statistics: Remarks and Problems 26CHAPTER 1. REMARKS AND PROBLEMS ON TRANSMISSION LINES AND WAVE [3] Determine an expression for the total energy per cycle dissipated in a cavity resonator of arbitrary cross section for a T Mn,p mode and for a T En,p mode. By a T Mn,p mode, we mean the electric and magnetic ﬁelds derived from Hz = 0, Ez = un (q1 , q2 )(2/d)1/2 cos(πpz/d).Re(c(n, p)exp(−jω E (n, p)t)

and by a T En,p mode, we mean the electric and magnetic ﬁelds derived from

Ez = 0, Hz = vn (q1 , q2 )(2/d)1/2 .sin(πpz/d).Re(d(n, p).exp(−jω H (n, p)))

Chapter 2

Chapter 2

Remarks and Problems on

Remarks and Problems on Statistical Signal Processing

Statistical Signal Processing

[1] Construct the Lattice ﬁlter order recursion for an RM valued vector sta tionary stochastic process X(t), t ∈ Z by minimizing E[I X(t) +

p p

k=1

A(k)X(t − k) I2 ]

with respect to the M × M prediction coeﬃcient matrices A(k), k = 1, 2, ..., p. hints: Setting the variational derivatives of the above energy w.r.t the A(m)' s to zero gives us the optimal normal equations in the form of block structured matrix equations R(m) +

p p

k=1

where

Ap (k)R(m − k) = 0, m = 1, 2, ..., p

R(m) = E(X(t)X(t − m)T ) ∈ RM ×M

Note that R(−m) = R(m)T Note also that the optimal prediction error covariance is given by Re (p) = BbbE(ep (t)ep (t)T ) where ep (t) = X(t) +

p p

k=1

27

19

A(k)X(t − k)

20 28CHAPTER 2. REMARKS Advanced andON Statistics: Remarks and Problems ANDProbability PROBLEMS STATISTICAL SIGNAL PROCESSIN and by virtue of the orthogonality equations or equivalently, the optimal normal equations, p � Re (p) = R(0) + Ap (k)R(−k) k=1

Note that the minimum prediction error energy or pth order is E(p) = T r(Re (t))

Now write down the optimal equations in block matrix structured form and apply the block time reversal operator Jp consisiting of a reverse diagonal having blocks IM and with all the other blocks being zero matrices. Note that J p Rp Jp = Sp where Rp = ((R(k − m)))1≤k,m≤p and Sp = ((R(m − k)))1≤k,m≤p To get at an order recursion, consider a dual normal equation with A(k) replaced ˜ by A(k) and R(k) by S(k). [2] Consider the RLS lattice algorithm for the multivariate prediction in both order and time. How would you proceed ? hint: Deﬁne the data vector at time N as ⎞ ⎛ X(N )T ⎜ X(N − 1)T ⎟ (N +1)×M ⎟ XN = ⎜ ⎝ ⎠∈R .. X(0)T and deﬁne the data matrix at time N of order p by

XN,p = [z −1 XN , z −2 Xn , ..., z −p XN ] ∈ R(N +1)×M p The optimal matrix predictor at time N of order p is then given by minimizing � XN + XN,p AN,p �2 where � . � denotes the Frobenius norm and ⎞ ⎛ Ap (1)T ⎜ Ap (2)T ⎟ ⎟ ∈ RM p×M AN,p = ⎜ ⎝ ⎠ .. T Ap (p)

Question: Identify the appropriate Hilbert space for which this problem can be formulated as an orthogonal projection problem.

Advanced Probability and Statistics: Remarks and Problems

2921

[3] Calculate the autocorrelation function of the electromagnetic ﬁeld inside a waveguide of arbitrary cross section assuming that at the feedpoint, namely at the mouth of the guide at the z = 0 plane, the correlation of the Ez , Hz ﬁelds are known. [4] How would you develop an EKF for estimating the electromagnetic ﬁeld in space time over a bounded region from noisy measurements of the same at a ﬁnite discrete set of spatial pixels when the driving current density ﬁeld is a white Gaussian noise ﬁeld in spacetime ? hint: Write down the wave equation for A in the form (\2 − (1/c2 )∂t2 )A(t, r) = −µJ(t, r) Transform it into two ﬁrst order in time pde’s and by spatial pixel discretization, cast it in state variable form. Now, use the fact that the electric and magnetic ﬁelds can be expressed in a source free region as f t E(t, r) = −c2 \ × (\ × A)dt 0

B(t, r) = \ × A to arrive at a measurement model for ∂E/∂t, B at a discrete set of spatial pixels. Apply the EKF to this. [5] Consider the problem of estimating the moments of a vector parameter that modulates a set of potentials for a quantum system. The Hamiltonian is thus of the form p p θ(k)Vk (t) H(t, θ) = H0 + k=1

The objective is to estimate the moments of the parameters µp (k1 , ..., kp ) =< θ(k1 )...θ(kp ) > Schrodinger’s equation for the wave function is iψ ' (t) = H(t, θ)ψ(t) and it has a Dyson series solution ψ(t) = U0 (t)ψ(0)+ ∞ f p

n=1

0 (t) =< X >0 (t) +

n≥,1≤k1 ,...,kn ≤p

µn (k1 , ..., kn )F (t, k1 , ..., kn ) − − − −(1)

where < X >0 (t) =< U0 (t)ψ(0)|X|U0 (t)ψ(0) >=< ψ(0)|U0 (t)∗ XU0 (t)|ψ(0) > Exercise: Derive an explicit formula for F (t, k1 , ..., kn ) in terms of |ψ(0) > , Vk (t), k = 1, ..., p and U0 (t). Answer: F (t, k1 , ..., kn ) is the coeﬃcient of θ(k1 )...θ(kn ) in the sum of terms of the form � ( < ψ(0)|U0 (t−t1 )V (t1 , θ)0 U (t1 −t2 )V (t2 , θ)...U0 (tm−1 −tm ) 0 Now suppose that � � � � Hg,f (z) = < g|E1 (z + x)E0 (x)|f > dx = ∂2 ∂1 Fg,f (z + x, x)dx R

R

24

Advanced Probability and Statistics: Remarks and Problems 32CHAPTER 2. REMARKS AND PROBLEMS ON STATISTICAL SIGNAL PROCESSI exists and is is square integrable in < g|Ω+ |f >= 0 Suppose now that H1 = H0 + �.V where V is a random potential and � is a small perturbation parameter. Show that exp(itH1 ) = exp(itH0 )+�.exp(itH0 )((1−exp(−itad(H0 ))/itad(H0 ))(V )+O(�2 ) = exp(itH0 ) + �.exp(itH0 ).g(it.ad(H0 ))(V ) + O(�2 ) where g(z) = (1 − exp(−z))/z Hence, deduce that

Ω(t) = exp(itH1 ).exp(−itH0 ) =

I + �.exp(itad(H0 ))g(it.ad(H0 ))(V ) + O(�2 ) Hence assuming a given covariance function for the potential V , ie, RV V = E(V ⊗ V ) compute E((Ω(t) − I) ⊗ (Ω(s) − I))

2

upto O(� ) in terms of RV V . Now go one step further in perturbation theory as follows: exp(tH1 ) = exp(t(H0 + �V )) = exp(tH0 ).W (t, �) say. Then, ∂t W (t, �) = �.exp(−tH0 )V.exp(tH0 )W (t, �). = �.exp(−t.ad(H0 ))(V ).W (t, �) Thus, W (1, �) = I+� +�2

�

�

1

exp(−t.ad(H0 ))(V )dt 0

exp(−t.ad(H0 ))(V ).exp(−s.ad(H0 ))(V )dtds+O(�3 ) 0= o(G)−1

�

u(g)¯ v (g)

g∈G

for any two functions u, v on G. Now, by the Schur orthogonality relations, we have as elements of A(G), and with c(p, α, i, j >= d(α) < p, [Dα ]ij > � (pα ∗ pβ )(g).g pα .pβ = g∈G

and

(pα ∗ pβ )(g) = with

�

c(p, α, ij)c(p, β, km)([Dα ]ij ∗ [Dβ ]km )(g)

([Dα ]ij ∗ [Dβ ]km )(g) = � [Dα (h)]ij .[Dβ ]km (h−1 g)

h∈G

=

�

[Dα (h)]ij [Dβ (h−1 )]kl [Dβ (g)]lm

h∈G

=

�

[Dα (h)]ij []barDβ (h)]lk [Dβ (g)]lm

h∈G

and this is zero if α = � β (Schur’s orthogonality relation which states that matrix elements of inequivalent irreducible unitary representations are orthogonal) and if α = β, then this is contained in the vector space of functions Vα = span{[Dα (g)]ij : 1 ≤ i, j ≤ d(α)}

47 55

Advanced Probability and Statistics: Remarks and Problems Thus we get in A(G), β, p2α ∈ Vα pα .pβ = 0, α = Hence,

�

pα p = p2 =

α

which implies (since that

�

pα pβ =

α,β

2 pα , pα

�

2 pα

α

are in Vα and the Vα� s are mutually all orthogonal 2 p α = pα

or equivalently in terms of functions, ˆ (pα ∗ pβ )(g) = δ(α, β)pα , α, β ∈ G A projection p is said to be minimal if it cannot be decomposed as p = p1 + p2 with both p1 and p2 being projections. Thus, by the above decomposition, ˆ We have thus proved that the set p is minimal iﬀ p = pα for some α ∈ G. of all minimal projections in the group algebra of a ﬁnite group is in oneone correspondence with the set of all irreducible representations of G. Now suppose ˆ . Then, we can write p is a minimal projection associated to α ∈ G d(α)

p(g) =

�

c(i, j)[Dα (g)]ij

�

c(ij)[Dα ]ij

i,j=1

or equivalently, p=

ij

in the group algebra A(G). The condition p2 = p implies that � � c(im)[Dα (g)]im .g = p = c(ij)c(km)[Dα ]ij .[Dα ]km im,g

ijkm

=

�

c(ij)c(km)[Dα ]ij ∗ [Dα ]km (g)g

ijkm,g

=

�

c(ij)c(km)(

ijkm,l

=

�

�

¯ α (h)]lk ) [Dα (h)]ij .[D

h∈G

[Dα (g)]lm (g).g

g∈G

c(ij)c(km)o(G).d(α)−1 δ(i, l)δ(j, k).

�

[Dα (g)]lm .g

g∈G

ijkml

= and therefore,

�

�

c(ij)c(jm).o(G).d(α)−1 [Dα (g)]im g

c(im) =

� j

c(ij)c(jm)o(G).d(α)−1

48 56CHAPTER 2. REMARKS Advanced andON Statistics: Remarks and Problems ANDProbability PROBLEMS STATISTICAL SIGNAL PROCESSIN or equivalently, in matrix notation, (d(α)/o(G))C = C2 If we further impose the restriction that p is a central projection, ie, p com mutes with A(G) apart from being minimal, then the only solution to the above equation is C = (d(α)/o(G))Id(α) and hence we ﬁnd that in this case d(α)

p(g) = (d(α)/o(G))

(

[Dα (g)]ii = (d(α)/o(G))χα (g)

i=1

where χα (g) = T r(Dα (g)) is the character of the representation Dα . Prove that in A(G),

d(α)

p=

(

c(ij)[Dα ]ij

i,j=1

ˆ β = α. (In fact, the Schur commutes with all [Dβ ]ij , i, j = 1, 2, ..., d(β), β ∈ G, orthogonality relations imply that p.[Dβ ]ij = [Dβ ]ij p = 0, β = α. Prove further that p also commutes with [Dα ]ij , i, j = 1, 2, ..., d(α) iﬀ C = ((c(ij))) is a scalar multiple of Id(α) . Remark: We have shown that the problem of determining all the mini mal central projections in A(G) is equivalent to determining all the irreducible characters of G which is in turn equivalent to determining all the inequivalent irreducible representations of G. This fact will play a fundamental role in our determination of all the irreducible representations of the permutation groups. [3] Sm is the group of permutation of m elements. Any σ ∈ Sm can be represented as σ = (i1 , ..., il1 ).(il1 +1 , ..., il1 +l2 )...(il1 +...+lk−1 + 1, ..., il1 +...+lk ) where (i1 , ..., il1 +...+lk ) is a permutation of (1, 2, ..., m) and if a1 , ..., ar are dis tinct integers in {1, 2, .., m}, then (a1 , ..., ar ) denotes the cyclic permutation that sends ai → ai+1 , i = 1, 2, ..., r − 1, ar → a1 and leaves the other integers ﬁxed. In short, (a1 , ..., ar ) is a cyclic permutation in Sm with cycle length r. We thus state this result as: Every permutation is a product of disjoint cycles. Further, in the above notation, let ρ denote the permutation {1, 2, ..., m} → {i1 , ..., im } where of course m = l1 + ... + lk . We also deﬁne the permutation g = (1, 2, ..., l1 ).(l1 + 1, ..., l1 + l2 )...(l1 + ... + lk−1 + 1, ..., l1 + ... + lk ) expressed as a product of cycles. Then, it is clear that σ.ρ = ρ.g

Advanced Probability and Statistics: Remarks and Problems ie,

5749

σ = ρ.g.ρ−1

It is clear from this formula, that each conjugacy class in Sn consists precisely of those elements having the same cycle structure. More precisely we say that σ, there are kj a permutation σ ∈ 1k1 2k2 ..mkm iﬀ in the cycle representation of� m cycles of length j for each j = 1, 2, ..., m. Of course we must have j=1 j.kj = m. The conjugacy classes in Sm are therefore labeled by the integers (k1 , ..., km ). 1k1 ..mkm is a conjugacy class. The number of elements in this conjugacy class is easily seen to be µ(k1 , ..., km ) =

m! k1 !...km !1k1 ...mkm

in fact, ﬁrst simply write down all the cycles in this class serially as above in nondecreasing order of their lengths. Then we can permute all the m elements in this serial representation in m! ways. However, a given cycle of length j can be represented in j possible ways by simply by applying a cyclic permutation to the elements in this cycle and there are j possible cyclic permutations. So the number of permutations within each cycle which do not alter the cyclic representation is Πj j kj because there are kj cycles of length j and each such cycle can be represented in j possible ways by applying cyclic permutations. Further, the cyclic representation of a permutation is not altered if we simply permute the cycles of the same length amongst themselves. The total number of such permutations is simply k1 !...km !. Thus we obtain the above formula. [4] Young frames and Young Tableaux: Given positive integers m1 ≥ m2 ≥ ... ≥ mk > 0, such that m1 + ... + mk = m, we draw a tableaux consisting of rows of boxes one below the other starting at the same left end line such that the ﬁrst row has m1 boxes, the second row has m2 boxes,... the k th row has mk boxes. We denote such a frame by F (m1 , ..., mk ) and call it a Young frame. It is easily seen that the total number of Young frames for ﬁxed m is simply the total number of conjugacy classes of Sm . In fact, we can directly construct a bijection from the set of all conjugacy classes onto the set of all Young frames as follows. Given a conjugacy class 1k1 ..mkm , some of the j kj will be missing, l ie, kj = 0 for such j. Thus, we can express this conjugacy class as r1l1 ...rpp where l1 , ..., lp > 0 and 1 ≤ r1 < r2 < ... < rp ≤ m, ie, in each element of this conjugacy class, there are lj cycles of length j for each j = 1, 2, ..., p. Then we deﬁne m1 = ... = mlp = rp , mlp +1 = .. = mlp +lp−1 = rp−1 etc, ie, in other words, as an ordered ktuple, (m1 , m2 , ..., mk ) = (rp , .., rp , rp−1 , ..., rp−1 , ..., r1 , ..., r1 ) where rj occurs lj times for each j = p, p − 1, ..., 1. Note that m=

m � j=1

jkj =

p � j=1

lj r j =

k � j=1

mj

50 58CHAPTER 2. REMARKS Advanced andON Statistics: Remarks and Problems ANDProbability PROBLEMS STATISTICAL SIGNAL PROCESSIN Since the number of distinct classes of G = Sn equals the total number of inequivalent irreducible representations of G, it follows that this number is also the same as the number of distinct Young frames. A Young tableau F (T ) corresponding to a Young frame F = F (m1 , ..., mk ) �k is simply an arrangement of the m integers 1, 2, ..., m in these m = j=1 boxes, ie, in each box, we put an integer from 1, 2, ..., m and no two boxes have the same integer. Associated with the Young tableaux F (T ), we deﬁne R(T ) to be the group of all permutations of each row of F (T ) and C(T ) to be the group of all permutations of each column of F (T ). Note that the number of elements in R(T ) equals m1 !...mk ! and the number of elements in C(T ) equals n1 !...nr ! where n1 , ..., nr are the column lengths of F (T ). Note that R(T ) and C(T ) are subgroups of Sm . Deﬁne � E(T ) = pq(−1)q = P (T )Q(T ) p∈R(T ),q∈R(T )

where P (T ) =

�

p∈R(T )

p, Q(T ) =

�

(−1)q q

q∈C(T )

These elements are understood to be interpreted as elements of the group algebra A(Sm ) of Sn . Our aim is to prove that corresponding to each Young frame F , there is exactly one irreducible character χF of Sm , or equivalently, exactly one inequivalent irreducible representation of Sm and as F runs over all the Young frames, χF will run over all the distinct irreducible characters of Sm . The second part is obvious if we can prove that for two distinct Young frames F, F � , the irreducible characters χF , χF � associated with them are distinct since as remarked above, the number of distinct Young frames equals the number of classes of Sm and the number of classes of ﬁnite group equals exactly the number of inequivalent irreducible representations of the group. Now let T, T � be two Young tableaux. We say that T RT � , ie, T is related to � T if there exists a pair of indices (i, j) belonging to the column of T and to the same row of T � , otherwise, we say that T N RT � , ie, T is not related to T � . now suppose T RT � . Then we have for some (i, j) that (i, j) ∈ C(T ) ∩ R(T � ) and therefore, with (i, j) denoting the transposition of (i, j), we have (i, j)P (T � ) = P (T � ), C(T )(i, j) = −C(T ) Now let T, T � be two Tableaux with corresponding frames F = F (T ), F � = F (T � ). Note that F = F (T ) is completely speciﬁed by integers m1 ≥ m2 ≥ ... ≥ mk > 0 and F � = F (T � ) is completely speciﬁed by integers m�1 ≥ m�2 ≥ ... ≥ m�l > 0. We say that F = F � if l = k and mj = m�j , j = 1, 2, ..., k. Otherwise, we write F = F � . We write F > F � if for the ﬁrst index j = 1, 2, ... for which mj = m�j , we have mj > m�j . Obviously F = F � iﬀ either F > F � or else F � > F . Now suppose F = F � and T N RT � . First observe that by

Advanced Probability and Statistics: Remarks and Problems

5951

deﬁnition of N R, all the entries in the ﬁrst column of T occupy diﬀerent rows of T � . Then consider the ﬁrst two entries in the ﬁrst column of T . By deﬁnition of N R, these two entries must fall in diﬀerent rows of T � . Hence, by applying a column permutation q1 to T and a row permutation p�1 to T � , we can ensure that these two entries occupy the same positions in q1 T and in p1� T � . Then we also observe that q1 T N Rp�1 T � and hence by applying the same argument to these new Tableaux, we can ensure the existence of a column permutation q2 of q1 T and a row permutation p�2 of p�1 T � such that that the next pair of entries in the ﬁrst column q2 q1 T occupy the same positions in p2� p1� T � without altering the positions of the previous pair in the two tableaux. In this way, we ﬁnally end up with a column permutation q of T and a row permutation p� of T � such � that qT = p� T � . Then, T � = p −1 qT and we get T = q −1 p� T � and q −1 p� q ∈ q −1 R(T � )q = q −1 R(p� T � )q = R(q −1 p� T � ) = R(T ) Thus deﬁning p = q −1 p� q we get that p ∈ R(T ), q ∈ C(T ), q −1 p� = pq −1 , T = pq −1 T � , T � = qp−1 T We have thus proved the following theorem: Theorem: Let T, T � have the same shape. Then, if T N RT � , there exists a p ∈ R(T ), q ∈ C(T ) such that T � = qpT . / R(T ).C(T ). Now, let T be any tableaux and let g ∈ Sm be such that g ∈ Then, g −1 ∈ / C(T )R(T ) and we prove the existence of p0 ∈ R(T ), q0 ∈ C(T ) such that (−1)q0 = −1, p0 gq0 = g. Indeed, deﬁne T � = g −1 T . Then T � cannot be written as abT where a ∈ C(T ), b ∈ R(T ). In fact the equation T � = hT uniquely determines h as g −1 . Thus, by the previous theorem, T RT � and hence there exists a pair (i, j) falling in the same column of T and the same row of T � . Deﬁne q0 = (i, j), p0 = gq0−1 g −1 = gq0 g −1 Then, we get (−1)q0 = −1, p0 gq0 = g, and p0 = gq0 g −1 ∈ gR(T � )g −1 = R(gT � ) = R(T ) and the proof of the claim is complete. We state this as a theorem. Theorem: If g ∈ / R(T )C(T ), then there exists a p0 ∈ R(T ) and a q0 ∈ C(T ) such that (−1)q0 = −1andp0 gq0 = g.

52 60CHAPTER 2. REMARKS Advanced andON Statistics: Remarks and Problems ANDProbability PROBLEMS STATISTICAL SIGNAL PROCESSIN Now we are in a position to prove one of the main theorems in the Frobenius Young theory. Theorem: Let s ∈ (Sn ) be such that psq = (−1)q s∀p ∈ P (T ), q ∈ Q(T ) Then s = s(e)E(T ) Proof: Write

s=

�

s(g)g, G = Sn

g∈G

Then the stated hypothesis implies that s(pgq) = (−1)q s(g), ∀g ∈ G, p ∈ R(T ), q ∈ C(T ) (Note that R(T ), C(T ) are subgroups of G so that p ∈ R(T ) iﬀ p−1 ∈ P (T ) and likewise q ∈ C(T ) iﬀ q −1 ∈ C(T )). Taking g = e gives us s(pq) = (−1)q s(e), p ∈ P (T ), q ∈ Q(T ) Now suppose g ∈ / P (T )Q(T ). Then by the previous theorem, there exist p0 ∈ P (T ), q0 ∈ Q(T ) such that (−1)q0 = −1, p0 gq0 = g Then, s(g) = s(p0 gq0 ) = (−1)q0 s(g) = −s(g) and hence s(g) = 0. Therefore, we have proved that � � s= s(pq)pq = s(e) (−1)q pq = s(e)E(T ) p∈P (T ),q∈Q(T )

p∈P (T ),q∈Q(T )

and the proof of the theorem is complete. Corollary: E(T )2 = k(T )E(T ) for some k(T ) ∈ R. In fact, we have E(T ) = P (T )Q(T ), E(T )2 = P (T )Q(T )P (T )Q(T ) and hence, pE(T )2 q = pP (T )Q(T )P (T )Q(T )q = (−1)q E(T )2 , p ∈ P (T ), q ∈ Q(T ) since pP (T ) = P (T ), Q(T )q = (−1)q Q(T ), p ∈ P (T ), q ∈ Q(T ) Thus, by the theorem, E(T )2 = k(T )E(T )

Advanced Probability and Statistics: Remarks and Problems

53 61

for some k(T ) ∈ R. To evaluate the value of k(T ), we evaluate T r(RE(T ) ) and 2 2 T r(RE(T ) ) = T r(RE(T ) ), where for any s ∈ A(G), we deﬁne the linear operator on the vector space A(G) by Rs f = f s Then, for any g ∈ G, it is clear by choosing the standard basis {h : h ∈ G} for A(G) that T r(Rg ) = 0, g ∈ / e, T r(Re ) = m! (Recall that G = Sm ). Hence, since � s(g)Rg , s ∈ A(G) Rs = g∈G

we get that RE(T ) =

�

(−1)q Rpq

p∈P (T ),q∈Q(T )

so that T r(RE(T ) ) = m! Now let d = rank(RE(T ) ) = dimR(RE(T ) ) = dim[A(G)E(T )] Then, let {f1 , ..., fd } be a basis for (RE(T ) ). We deﬁne the linear operator A = k(T )−1 RE(T ) acting on the vector space A(G). Then, A2 = A ie, A is a projection. Thus, T r(A2 ) = T r(A) = d Combining these two formulae gives us d = T r(k(T )−1 RE(T ) ) = k(T )−1 m! so that k(T ) = m!/d and we conclude that E(T )2 = (m!/d)E(T ) Now deﬁne e(T ) = (d/m!)E(T ) Then, e(T )2 = (d/m!)2 E(T )2 = (d/m!)E(T ) = e(T )

54 62CHAPTER 2. REMARKS Advanced andON Statistics: Remarks and Problems ANDProbability PROBLEMS STATISTICAL SIGNAL PROCESSIN in other words, e(T ) is an idempotent element of the group algebra A(G). We also note that if T, T � are diﬀerent Tableaux having the same shape F = F (T ) = F (T � ) = F � , then e(T )e(T � ) = 0. [j] The Frobenius character formula for induced representations of a ﬁnite group. [1] Alternate deﬁnition of the induced representation. Let G be a ﬁnite group and H a subgroup of G. Let L be a unitary repre sentation of H in the Hilbert space Y . We deﬁne U = IndG HL ie, U is the representation of G induced by the representation L of H. There are many equivalent ways to deﬁne U . All these deﬁnitions give isomorphic representations of G. One way is deﬁne the representation space X of U as the set of all f ∈ C(G, Y ) for which f (gh) = L(h)−1 f (g), h ∈ H, g ∈ G and then U (g)f (x) = f (g −1 x), g, x ∈ G, f ∈ X. It should be noted from this deﬁnition that f ∈ X is completely determined if its value is known on any set of representatives of the cosets G/H. Hence, we may equivalently view any f ∈ X as a map from G/H → Y . In fact, let γ be a cross section map for this coset space, ie, γ : G/H → G is such that γ(x)H = x, x ∈ G/H. Then consider the element of G h(g, x) = γ)(x)−1 gγ(g −1 x), g ∈ G, x ∈ G/H Clearly since h(g, x)H = γ(x)−1 gγ(g −1 x)H = γ(x)−1 gg −1 xH = H it follows that h(g, x) ∈ H, g ∈ G, h ∈ H Then for a mapping ψ : G/H → Y , consider (V (g)ψ)(x) = L(h(g, x))ψ(g −1 x) We observe that (V (g2 )(V (g1 )ψ))(x) = L(h(g2 , x))(V (g1 )ψ)(g2−1 x) = L(h(g2 , x))L(h(g1 , g2−1 x))ψ(g1−1 g2−1 x) = L(h(g2 , x).h(g1 , g2−1 x))ψ((g2 g1 )−1 x) and

h(g2 , x))h(g1 , g2−1 x) = (γ(x)−1 g2 γ(g2−1 x)).(γ(g2−1 x)−1 g1 γ(g1−1 g2−1 x) = γ(x)−1 g2 g1 γ((g2 g1 )−1 x) = h(g2 g1 , x)

Advanced Probability and Statistics: Remarks and Problems

6355

and therefore, V (g2 )V (g1 )ψ(x) = L(g2 g1 , x)ψ((g2 g1 )−1 x) = V (g2 g1 )ψ(x), ie, as operators in C(G/H, Y ), the set {V (g) : g ∈ G}, satisﬁes V (g2 )V (g1 ) = V (g2 g1 ), g1 , g2 ∈ G Thus, V (.) is a representation of G in C(G/H, Y ). We wish to prove that V is equivalent to the induced representation U = IndG H L. Indeed, consider the map T : C(G/H, Y ) → X deﬁned by (T ψ)(g) = L(h(g −1 , H))ψ(gH), g ∈ G To see that this map is well deﬁned, we must ﬁrst show that the rhs is indeed an element of X, ie, it satisﬁes (T ψ)(gh) = L(h)−1 (T ψ)(g), g ∈ G, h ∈ H Indeed, this follows from (T ψ)(gh) = L(h((gh)−1 , H))ψ(gH) and h((gh)−1 , H) = γ(H)−1 (gh)−1 γ(ghH) = (gh)−1 γ(gH) on assuming without loss of generality, γ(H) = e Thus, h((gh)−1 , H) = h−1 h(g −1 , H) and therefore, (T ψ)(gh) = L(h)−1 L(h(g −1 , H)ψ(gH) = L(h)−1 (T ψ)(g) proving thereby that Tψ ∈ X and hence the map T is well deﬁned. We next show that T is a bijection. Indeed, T ψ = 0 implies L(h(g −1 , H))ψ(gH) = 0, g ∈ G which implies that ψ(gH) = 0, g ∈ G ie, ψ=0

56 64CHAPTER 2. REMARKS Advanced andON Statistics: Remarks and Problems ANDProbability PROBLEMS STATISTICAL SIGNAL PROCESSIN Thus, T is injective. Next, suppose φ ∈ X. Then, deﬁne ψ ∈ C(G/H, Y ) by ψ(gH) = L(h(g −1 , H))−1 φ(g), g ∈ G To show that ψ is a well deﬁned element of C(G/H, Y ), we must show that L(h((gh)−1 , H))−1 φ(gh) = L(g −1 , H))−1 φ(g), g ∈ G, h ∈ H But, for all h ∈ H, g ∈ G h((gh)−1 , H) = γ(H)−1 h−1 g −1 γ(ghH) = h−1 g −1 γ(gH) = h−1 .h(g −1 , H) Then, L(h((gh)−1 , H))−1 φ(gh) = L(h−1 h(g −1 , H))−1 .L(h)−1 φ(g) = L(h(g −1 , H))−1 φ(g) proving that ψ ∈ C(G/H, Y ) and by construction, T ψ = φ. Thus, T is also surjective. Hence, T is bijective. Finally, we prove that T intertwines the rep resentations U and V and this will establish the equivalence of these two repre sentations and hence provide an alternate equivalent deﬁnition of the induced representation. We have for ψ ∈ C(G/H, Y ), (U (g)T ψ)(g1 ) = (T ψ)(g −1 g1 ) = L(h(g1−1 g, H))ψ(g −1 g1 H) on the one hand, while on the other (T V (g)ψ)(g1 ) = L(h(g1−1 , H))(V (g)ψ)(g1 H) = L(h(g1−1 , H)).L(h(g, g1 H))ψ(g −1 g1 H) = L(h(g1−1 , H).h(g, g1 H))ψ(g −1 g1 H) and since −1 h(g1−1 , H).h(g, g1 H) = γ(H)−1 g1

γ(g1 H).γ(g1 H)−1 gγ(g −1 g1 H)

= g1−1 gγ(g −1 g1 H) = h(g1−1 g, H) it follows that U (g)T = T V (g), g ∈ G ie, T intertwines the represenations U of G in X and V of G in C(G/H, Y ) and since T is a bijection, we can write this intertwining relation as V (g) = T −1 U (g)T, g ∈ G proving the equivalence of U and V .

57 65

Advanced Probability and Statistics: Remarks and Problems

[2] Frobenius character formula for the induced representation. Let G be a ﬁnite group and H a subgroup of G. Let L be a unitary rep resentation of H and let G = IndG H L be the induced representation. Let L act in theJHilbert space Y . Then Let {phi1 , . . . , φn } be an onb for Y . De ﬁne c = o(G)/o(H) and fk (g) = c.L(g)−1 φk , if g ∈ H and fk (g) = 0 if g ∈ G − H. We claim that {f1 , . . . , fn } is an on set in the representation space X of U . Note that this space X is deﬁned as the set of all elements f ∈ C(G, Y ) for which f (gh) = L(h)−1 f (g), g ∈ G, h ∈ H. First we observe that if g ∈ G − H, h ∈ H, then gh ∈ G − H in which case fk (gh) = 0 = fk (g) by deﬁnition. Thus in this case, fk (gh) = L(h)−1 fk (g) = −0 holds. Next we observe that if g, h ∈ H, then gh ∈ H in which case, we get by deﬁnition, fk (gh) = c.L(gh)−1 φk = L(h)−1 c.L(g)−1 φk = L(h)−1 fk (g). This completes the proof of the claim that fk ∈ X, k = 1, 2, . . . , n. Next we calculate < fk , fm >=

1 L < fk (g), fm (g) > o(G) g∈G

=

c2 L < L(h)−1 φk , L(h)−1 φm > o(G) h∈H

2

= Since

c o(G)

L

< φk , φm >=

h∈H

c=

c2 o(H) δkm = δkm o(G)

J

o(G)/o(H)

Thus, {f1 , . . . , fn } is an onb for X. Let {k1 = e, k2 , , . . . , kr }, be a complete set of representatives of G/H, ie, kj H, j = 1, 2, .., r are all disjoint elements in G/H r and j=1 kj H = G. Then we claim that B = {U (kj )fk , j = 1, 2, . . . , r, k = 1, 2, . . . , n} is an onb for X. First observe that dimX = (o(G)/o(H)).dimY = rn So if we are able to prove that the elements of B deﬁned above are orthonormal, then our claim will be proved. But, L < U (kj )fk , U (kl )fm >= o(G)−1 . < fk (kj−1 g), fm (kl−1 g) >= g∈G

= o(G)−1 .

L

< fk (g), fm (kl−1 kj g >= o(G)−1 .

g∈G

L

< fk (h), fm (kl−1 kj h) >

h∈H

∈ / H, ie, the coset kl−1 kj H is This is clearly zero if l = j since then disjoint from H. On the other hand, when l = j, the above evaluates to L L < fk (h), fm (h) >= o(H)−1 < L(h)−1 φk , L(h)−1 φm > o(G)−1 h∈H

kl −1 kj

h∈H

=< φk , φm >= δk,m

58 Advanced Probability and Statistics: Remarks and Problems 66CHAPTER 2. REMARKS AND PROBLEMS ON STATISTICAL SIGNAL PROCESSIN This completes the proof that B is indeed an onb for X. Now let C be a class in G. Let χU denote the character of U and χL the character of L. We deﬁne χU (C) to be the character χU of U evaluated at any element of C. Note that these are all equal. Then, we can write C

�

H=

M

Cl

l=1

Where the Cl� s are disjoint classes in H. In the language of group algebras, we deﬁne � C= g g∈C

Then, gC.g −1 = C∀g ∈ G and U (C) =

�

U (g)

g∈C

And hence, T r(U (C)) = o(C).χU (C) on the one hand, while on the other, � T r(U (C)) = < U (kj )fk , U (C)U (kj )fk > j,k

=

�

< fk , U (kj−1 )U (C)U (kj )fk >

j,k

=

�

< fk , U (C)fk >

j,k

= (r/o(G))

�

< fk (h), fk (g −1 h) >

k,g∈C,h∈H

= (rc2 /o(G))

�

< L(h)−1 φk , L(h−1 g)φk >

k,g∈C∩H,h∈H

= r.

�

k,g∈Cl ,l

< φk , L(g)φk >= r.

�

g∈Cl ,l

χL (g) = r.

M �

o(Cl )χL (Cl )

l=1

Thus, χU (C) = (r/o(C))

M �

o(Cl )χL (cl ).

l=1

Note that r = o(G)/o(H) = c2 and M is the number of H − classes in C ∩ H while Cl , l = 1, 2, . . . , M is the enumeration of these classes.

Advanced Probability and Statistics: Remarks and Problems

67

59

[3] The Frobenius reciprocity theorem. Let χ1 be a character of G and χ ˜2 a ˜2 . Then we character of H. Let χ2 denote the the character of G induced by χ have by the Frobenius reciprocity theorem, � < χ1 , χ2 >= o(G)−1 χ ¯1 (g)χ2 (g) g∈G

�

= o(G)−1

˜2 (Cl ) χ ¯1 (g)(o(G)/o(H)o(C))o(Cl )χ

g∈C∈C(G),Cl ⊂C

Where Cl ⊂ C means that Cl ranges over all the Hclasses in C denotes the set of all the classes in G. We clearly have � χ ¯1 (g) = o(C)χ1 (C)

�

H and C(G)

g∈C

Thus, the above becomes < χ1 , χ2 >= � o(Cl )χ ¯1 (C)χ ˜2 (Cl )

o(H)−1

Cl ⊂C∈C(G)

�

= o(H)−1

o(Cl )χ ¯1 (Cl ).χ ˜2 (Cl )

Cl ⊂C∈C(G)

= o(H)−1

�

¯1 (Cl ).]χ ˜2 (Cl ) o(Cl )χ

Cl ∈C(H)

= o(H)−1 .

�

χ ¯1 (h)χ ˜2 (h) =< ResG ˜2 > H χ1 , χ

h∈H

ResG H χ1

Where denotes the restriction of the function χ1 on G to H. In par ˜2 are respectively irreducible representations of G and ticular, suppose χ1 and χ H. Then, the above formula can be expressed as ˜ 2 ) = mH ( χ ˜2 , ResG mG (χ1 , IndG H χ1 ) Hχ where mG (χ1 , χ) is the multiplicity of the irreducible character χ1 in the expan ˜2 , χ ˜) is the multiplicity of the irreducible sion of a character χ of G and mH (χ ˜ of H. This the famous Frobenius character χ ˜2 in the expansion of a character χ reciprocity theorem. [12] With reference to the previous part, in orthogonal curvilinear coordi nates, q1 , q2 for the plane, the electromagnetic ﬁeld within a cavity resonator can be expressed as � Ez (t, q1 , q2 , z) = un (q1 , q2 ).(2/d)1/2 .cos(πpz/d).Re(c(n, p)exp(−jω E (n, p)t)) n,p≥1

Hz (t, q1 , q2 , z) =

�

n,p≥1

vn (q1 , q2 ).(2/d)1/2 .sin(πpz/d).Re(d(n, p)exp(−jω H (n, p)t))

6068CHAPTER 2. REMARKS Advanced Probability andON Statistics: Remarks SIGNAL and Problems AND PROBLEMS STATISTICAL PROCESSIN E⊥ (t, q1 , q2 , z) = −

�

n,p

�

n,p

n,p

1/2 h−2 .sin(pπz/d).Re(c(n, p).exp(−jω E (n, p)t)) n .(−πp/d).�⊥ un (q1 , q2 ).(2/d)

−2 kn

.(�⊥ vn )(q1 , q2 )×zˆ).(2/d)1/2 .sin(pπz/d).Re(jµω H (n, p)d(n, p).exp(−jω H (n, p)t))

H⊥ (t, q1 , q2 , z) = +

�

�

n,p

−2 kn

.(πp/d).�⊥ vn (q1 , q2 ).(2/d)1/2 .cos(pπz/d).Re(d(n, p).exp(−jω H (n, p)t))

h−2 ˆ).(2/d)1/2 .cos(pπz/d).Re(j�ω E (n, p)c(n, p).exp(−jω E (n, p)t)) n .(�⊥ un )(q1 , q2 )×z

(see Exercise 2, in step 7). Show that the energy density in this conﬁned elec tromagnetic ﬁeld, assuming that un , vn are normalized, can be expressed in the form � � U = (�/2) |E|2 dS(q1 , q2 )dz + (µ/2) |H|2 dS(q1 , q2 )dz = (1/2)

�

[(�+�(πp/dhn )2 )c(n, p, t)2

n,p

+(µ/h2n )(˜ c(n, p, t)2 )+(µ+µ(πp/dkn )2 )d(n, p, t)2 +(�/kn2 )d˜(n, p, t)2 ] where c(n, p, t) = Re(c(n, p)exp(−jω E (n, p)t)), c˜(n, p, t) = Re(j�ω E (n, p)c(n, p)exp(−jω E (n, p)t) d(n, p, t) = Re(d(n, p)exp(−jω H (n, p)t)), d˜(n, p, t) = Re(jµω H (n, p)d(n, p)exp(−jω H (n, p)t) Show that U is time independent, ie, the total ﬁeld energy in the cavity is conserved. hint: Write c(n, p) = cR (n, p) + jcI (n, p) where cR , cI are real. Then, c(n, p, t)2 (� + �(πp/dhn ))c(n, p, t)2 + (µ/h2n )˜ (� + �(πp/dhn )2 )(cR (n, p)cos(ω E (n, p)t) + cI (n, p)sin(ω E (n, p)t))2 +(µ/hn2 )(�ω E (n, p))2 (cR (n, p)sin(ω E (n, p)t) − cI (n, p)cos(ω E (n, p)t))2 Now observe that � + �(πp/dhn )2 = (hn2 + π 2 p2 /d2 )�/hn2 = ω E (n, p)2 �2 µ/h2n and further, (µ/hn2 )(�ω E (n, p))2 = ω E (n, p)2 �2 µ/h2n and hence deduce that c(n, p, t)2 = (� + �(πp/dhn )2 )c(n, p, t)2 + (µ/h2n )˜

Advanced Probability and Statistics: Remarks and Problems

61 69

(ω E (n, p)2 �2 µ/h2n )(cR (n, p)2 + cI (n, p)2 ) Likewise, show that (µ + µ(πp/dkn )2 )d(n, p, t)2 + (�/kn2 )d˜(n, p, t)2 = (ω H (n, p)2 µ2 �/kn2 )(dR (n, p)2 + dI (n, p)2 ) Deduce that the energy of the ﬁeld is given by � (ω E (n, p)aE (n, p)∗ aE (n, p) + ω H (n, p)aH (n, p)∗ aH (n, p)) U = (1/2) n,p

where

� √ aE (n, p) = ( ω E (n, p)� µ/hn )d(n, p) � √ aH (n, p) = ( ω H (n, p)µ �/kn )c(n, p)

(Recall that c(n, p) = cR (n, p) + jcI (n, p), d(n, p) = dR (n, p) + jdI (n, p)). Explain how you would quantize this electromagnetic ﬁeld based on the introduction of Bosonic creation and annihilation operators by interpreting the above ﬁeld energy as the Hamiltonian of an inﬁnite sequence of independent harmonic oscillators. [13] Speech → MRI conversion using artiﬁcial neural networks. This problem outlines a procedure for predicting the MRI image data of a patient’s brain from his slurred speech data using a combination of a feedforward neural network and the extended Kalman ﬁlter. We assume that there is a deﬁnite relationship between the speech signal of a patient and his dynamically varying MRI image ﬁeld. It should be noted that the speech data is a low dimensional signal, say of 100 time samples while the MRI is a much higher dimensional signal, again of say 100 time samples but each sample is a vector of a very large size. So we are predicting a very high dimensional data from a lower dimensional data and this enables us to avoid the use of expensive equipment for the purpose. Let s(t) denote the speech signal of the patient and f (t) the MRI image data of the same patient transformed from a matrix image ﬁeld to a vector via the V ec operation. It is assumed that f (t) = [f1 (t), ..., fL (t)]T can be expressed as some function of s(t) = [s(t), s(t − 1), ..., s(t − L)]T . The neural network is assumed to have K layers with each layer having N nodes. Let W (k, l, m), 1 ≤ k ≤ K, 1 ≤ l, m ≤ N denote the weights connecting the (k − 1)th layer to the k th layer. The input speech signal s(t) is applied at the zeroth layer. Thus, writing the weight matrices as W(k) = ((W (k, l, m)))1≤l,m≤N ∈ RN ×N it follows that the signal vector at the k th layer is given by xk (t) = σ(W(k)xk−1 (t)), k = 1, 2, ..., L

6270CHAPTER 2. REMARKS Advanced andON Statistics: Remarks and Problems ANDProbability PROBLEMS STATISTICAL SIGNAL PROCESSIN where x0 (t) = s(t) and the output vector of the network y(t) = xL (t) is matched to the MRI process f (t) at each time. Here σ is the sigmoidal function that acts on each components of its argument, ie, if we write W(k)xk−1 (t) = z(t) = [z1 (t), ..., zL (t)]T , then σ(W(k)xk−1 (t)) = [σ(z1 (t)), ..., σ(zL (t))]T The measurement data at time t is the neural network output y(t) plus noise/error which we assume to be equal to the true MRI process f (t). To estimate the weights of the neural network, we assume a weight dynamics V ec(W(k)(n + 1)) = V ec(W(k))(n) + �W (k)(n) where �W is a noise/error process. We write the measurement model as f (t) = y(t) + v(t) = = σ(W(K)(t)...σ(W(2)(t)σ(W(1)(t)s(t)))) + v(t) = h(W(t), s(t)) + v(t) where W(t) = [V ec(W(1))(t)T , ..., V ec(W(K)(t)T ]T is the vector of all the weights at time t. The EKF is driven by the output MRI signal f (t) and noting that the weight dynamics can be expressed in the form W(t + 1) = W(t) + �W (t) it follows that the EKF can be cast in the form ˆ + 1|t) = W(t|t),

ˆ W(t ˆ + 1|t + 1) = W(t ˆ + 1|t) + K(t + 1)(f (t + 1) − h(W(t ˆ + 1|t), s(t + 1)) W(t K(t + 1) = P(t + 1|t)H(t + 1)T (H(t + 1)T P(t + 1|t)H(t + 1) + Pv )−1 ˆ + 1|t), s(t + 1)) ∂h(W(t ∂W P(t + 1|t) = P(t|t) + P� ,

H(t + 1) =

P(t+1|t+1) = I−K(t+1)H(t+1))P(t+1|t)(I−K(t+1)H(t+1))T +K(t+1)P� K(t+1)T Derive these equations from ﬁrst principles and implement this on MATLAB. For MATLAB implementation, you can use a two layered neural network. The

Advanced Probability and Statistics: Remarks and Problems

7163

problem of computing the Jacobian H of h(W, s) involves using backpropagation which is an elementary application of the chain rule of diﬀerential calculus, ie, writing xk = σ(W(k)xk−1 ), k = 1, 2, ..., L we have ∂h(W, s)/∂V ec(W(k)) = ∂y/∂V ec(W(k)) = ∂y ∂xL−1 ∂xk+1 ∂xk . ... . ∂xk ∂V ecW(k) ∂xL−1 ∂xL−2 Write down all the terms involved in this expression explicitly for this model. Approximation of multivariate polynomials using a neural network: In or der to characterize the performance of a neural network in approximating a given plant function, we require to calculate the mean square approximation error involved in approximating polynomial functions using the network. The sigmoidal functions used in the network must be approximated again by poly nomials based on truncating their Taylor series upto a given degree N and then the minimum mean square estimation error evaluated based on such an approx imation. Consider ﬁrst a two layer neural network with each layer having two nodes. The output vector is then y = σ(W2 σ(W1 x)) with x = [x1 , x2 ]T as the input and Wk , k = 1, 2 are 2 × 2 matrices. Therefore in component form, y1 = [σ(W2 (11)z1 + W2 (12)z2 ), σ(W2 (21)z1 + W2 (22)z2 )]T , where z1 = σ(W1 (11)x1 + W1 (12)x2 ), z2 = σ(W1 (21)x1 + W1 (22)x2 ) The aim is to compute the minimum mean square approximation error � minW1 ,W2 ((p1 (x1 , x2 ) − y1 )2 + (p2 (x1 , x2 ) − y2 )2 )dx1 dx2 D

over a domain D ⊂ R2 for a given bivariate polynomials p1 , p2 . For example, we can take D = [a, b] × [c, d]. This problem is the same as minimizing the mean square error between the random variables p(x) and y = y(x) when x is a uniformly distributed random vector on D. More generally, we can talk about minimizing this mean square error when x has any given probability distribution F (x) in R2 . Writing N L σ(z) ≈ c(k)z k k=0

64 72CHAPTER 2. REMARKS Advanced andON Statistics: Remarks and Problems ANDProbability PROBLEMS STATISTICAL SIGNAL PROCESSIN we have that σ(a1 u1 + a2 u2 ) =

N �

c(k)(a1 u1 + a2 u2 )k

k=0

�

=

0≤m≤k≤N

Thus,

� � k m k−m m k−m a a c(k) u1 u 2 m 1 2

� � k k−m W1 (11)m W1 (12)k−m xm 1 x2 m k,m � � � k z2 = c(k) W1 (21)m W1 (22)k−m x1k xk2 −m m k,m � � � k y1 = c(k) W2 (11)m W2 (12)k−m z1m z2k−m m k,m � � � � � � � k1 km k m k −m c(k1 )...c(km ) ... = c(k) W2 (11) W2 (12) nm m n1 z1 =

�

c(k)

×W1 (11)n1 +...+nm W1 (12)k1 +...+kn −n1 −...−nn x1n1 +...+nm x2k1 +...+kn −m1 −...−mn For ﬁxed x1 , x2 , this is a polynomial in (W1 (11), W1 (12)) of degree N n. Like wise, the formula for y2 is given by � � � k y2 = c(k) W2 (21)m W2 (22)k−m z1m z2k−m m k,m

� � � � � � � k k1 km W2 (21)m W2 (22)k−m c(k1 )...c(km ) ... = c(k) m nm n1

×W1 (11)n1 +...+nm W1 (12)k1 +...+kn −n1 −...−nn xn1 1 +...+nm xk21 +...+kn −m1 −...−mn Choosing D = [0, 1] × [0, 1] without loss of generality (since we can in the case when D = [a, b] × [a, b] replace the sigmoidal function σ by a scaled and trans lated version of it thereby reducing the problem to [0, 1]×[0, 1]. To calculate the mean squared approximation error between (y1 (x1 , x2 ), y2 (x1 , x2 )) and a bivari ate polynomial pair (p1 (x1 , x2 ), p2 (x1 , x2 )), we need to evaluate the following integrals: � � � � 1

We have

�

1

0

�

0

1

0

1

y12 dx1 dx2 ,

1

0

y1 x1r xs2 dx1 dx2 , �

1 0

�

0

�

y22 dx1 dx2 ,

0

1

0

1

�

1

0

y2 x1r xs2 dx1 dx2

1 0

y1 x1r xs2 dx1 dx2 =

Advanced Probability and Statistics: Remarks and Problems

65 73

Acknowledgement for problem [13]: I’ve borrowed this problem from my in formal discussions with Prof.Vijayant Agarwal, Vijay Upreti and Gugloth Sagar.

[14] This problem outlines the steps involved in deriving the EKF and UKF in discrete time. step 1: Let the state vector x(t) ∈ RN satisfy the stochastic diﬀerence equation x(t + 1) = f (t, x(t)) + g(t, x(t))w(t + 1), t = 0, 1, 2, ... − − − (1) where

f : R+ × RN → RN , g : R+ × RN → RN ×p

and w(t), t = 1, 2, ... white noise with zero mean. Its autocorrelation is given by E(w(t)w(s)T ) = Q(t)δ[t − s] The measurement model is z(t) = h(t, x(t)) + v(t), t = 0, 1, 2, ... − − − (2) where v(.) is also zero mean white and independent of w(.): E(v(t)v(s)T ) = R(t)δ[t − s], E(w(t)v(s)T ) = 0 (1) is called the state/process model and (2) the measurement model. w(.) is called process noise and v(.) called measurement noise. Note that when we say white noise, we mean that its samples are statistically independent, not just uncorrelated. If the noise is Gaussian, then uncorrelatedness implies independence but for nonGaussian noises, independence is a stronger condition than uncorrelatedness. Let Zt = {z(s) : 0 ≤ s ≤ t} the measurement data collected upto time t. We shall be assuming that x(0) is independent of {w(t), v(t) : t ≥ 1} ˆ (t|t) = E(x(t)|Zt ) and Remark: At time t, we have available with us x ˆ (t|t). The itera P(t|t) = Cov(e(t|t)|Zt ) = Cov(x(t)|Zt ) where e(t|t) = x(t) − x ˆ (t+1|t+1) and P(t+1|t+1). This computation tion process involves computing x ˆ (t + 1|t) = E(x(t + 1)|Zt ), e(t + 1|t) = progresses in two stages, ﬁrst compute x ˆ (t + 1|t) and P(t + 1|t) = Cov(e(t + 1|t)|Zt ) = Cov(x(t + 1)|Zt ) x(t + 1) − x ˆ (t + 1|t + 1) and based on only the state dynamics, and then update these to x P(t + 1|t + 1) based on the measurement z(t + 1). ˆ (t + 1|t). step (2a):Computation of x

6674CHAPTER 2. REMARKS Advanced Probability andON Statistics: Remarks SIGNAL and Problems AND PROBLEMS STATISTICAL PROCESSIN Then from the state model, we have that x(t) is a function of x(0), w(1), ..., w(t− 1) and z(t) is hence a function of x(0), w(1), ..., w(t − 1), v(t). Thus, Zt is a function of x(0), w(1), ..., w(t−1), v(0), ...v(t). It then follows from the assump tions made that w(t + 1) is independent of (Zt , x(t)). Hence, taking conditional expectations on both sides of of (1), and using the independence of w(t + 1) and (Zt , x(t)), we get ˆ (t + 1|t) = E(x(t + 1)|Zt ) = x E(f (t, x(t))|Zt ) + E(g(t, x(t))w(t + 1)|Zt ) where E[(g(t, x(t))w(t + 1)|Zt ] = E[E[(g(t, x(t))w(t + 1)|Zt , x(t)]|Zt ] = E[g(t, x(t))E[w(t + 1)|Zt , x(t)]|Zt ] = 0 since E[w(t + 1)|Zt , x(t)] = Ew(t + 1) = 0

Remark: if x, y, z are random vectors, then E(E(x|y, z)|z) = E(x|z) and if x, y are independent, then E(x|y) = E(x) Prove these statements by assuming joint probability densities. Thus, we get ˆ (t + 1|t) = E(f (t, x(t))|Zt ) x Now if f (t, x) is aﬃne linear in x), ie, of the form u(t) + A(t)x, then it is immediate that E((f (t, x(t))|Zt ) = f (t, E(x(t)|Zt )) ˆ (t|t)) = f (t, x In the general case, however, we cannot make this assumption. If f (t, x) is analytic in x, then we can Taylor expand it around x ˆ(t|t) � f (t, x) = u(t) + fx (t, x ˆ(t|t)) + An (t)(x) − x ˆ(t|t))⊗n /n! n≥1

and then get E[f (t, x(t))|Zt ] = u(t) +

�

An (t)µn (t|t)/n!

n≥2

where u(t) = f (t, x ˆ(t|t))

Advanced Probability and Statistics: Remarks and Problems

75

and µn (t|t) is the conditional nth order estimation error moment of x(t) given Zt , ie, ˆ(t|t))⊗n |Zt ) µn (t) = E((x(t) − x However if we were to implement a ﬁlter based on such an approach, it will become an inﬁnite dimensional ﬁlter, ie, at each time step, we have to update the conditional moments of all orders. The EKF is a ﬁnite dimensional approxi mation to such an inﬁnite dimensional ﬁlter in which we neglect the conditional moments of all orders greater than two. Thus, we get an approximation ˆ (t|t)) + (1/2)f,xx (t, x ˆ (t|t)V ec(P (t|t)) E[f (t, x(t))|Zt ] ≈ f (t, x We note that µ2 (t|t) = V ec(P (t|t)) Most authors also neglect the second term here and simply make the approxi mation ˆ (t|t)) E[f (t, x(t))|Zt ] ≈ f (t, x The UKF on the other hand, gives a better approximation even than that ob tained by truncating in the above way upto a given order of moments. It is based on evaluating the conditional expectation E(f (t, x(t))|Zt ) using the law of large numbers. Speciﬁcally writing ˆ (t|t) e(t|t) = x(t) − x and deﬁning P (t|t) = Cov(e(t|t)|Zt ) = Cov(x(t)|Zt ) we choose a sequence ξ(m), m = 1, 2, ..., K of iid N (0, IN ) random vectors in dependent of Zt and note that by the law of large numbers, (1/K)

K �

ˆ (t|t) + f (t, x

m=1

�

P(t|t)ξ(m))

conditioned on Zt converges to E(f (t, x(t))|Zt ) as K → ∞ provided that con ditioned on Zt , e(t|t) has a normal distribution. Thus, the result of the UKF is K � � ˆ (t + 1|t) = (1/K) ˆ (t|t) + P(t|t)ξ(m)) x f (t, x m=1

Note that this is also an approximation. However, if we assume that e(t|t) con ditioned on Zt has a probability distribution Fe,t|t (e) and we choose ζ(m), m = 1, 2, ..., K conditioned on Zt to be iid with this distribution Fe,t|t (e), then by the �K ˆ (t) + ζ(m)) will converge as K → ∞ law of large numbers, (1/K) m=1 f (t, x conditioned on Zt to E(f (t, x(t))|Zt ). This deﬁnes the ﬁrst stage of the EKF and the UKF.

67

68

Advanced Probability and Statistics: Remarks and Problems 76CHAPTER 2. REMARKS AND PROBLEMS ON STATISTICAL SIGNAL PROCESSIN step (2b): We now have to compute ˆ (t + 1|t)|Zt ) = Cov(e(t + 1|t)|Zt ) P(t + 1|t) = cov(x(t + 1)|Zt ) = Cov(x(t + 1) − x The EKF computes this using the following: ˆ (t|t)) = e(t + 1|t) = x(t + 1) − f (t, x ˆ (t|t)) f (t, x(t)) + g(t, x(t))w(t + 1) − f (t, x ˆ (t|t))e(t|t) + g(t, x ˆ (t|t)).w(t + 1) ≈ f,x (t, x

and hence ˆ (t|t))P(t|t)f,x (t, x ˆ (t|t))T + g(t, x ˆ (t|t))Q(t).g(t, x ˆ (t|t))T P(t + 1|t) = f,x (t, x ˆ (t|t)) for In the UKF, we are not allowed to make the approximation f (t, x ˆ (t|t) + E(f (t, x(t))|Zt ). Instead we must use the independent realizations f (t, x � ˆ (t|t) + e(t|t)) conditioned on P(t|t)ξ(m)), m = 1, 2, ..., K of f (t, x(t)) = f (t, x Zt . Thus, in the UKF, based on the large numbers, we compute P(t + 1|t) = (1/K)

K �

ˆ (t|t)+ (f (t, x

m=1

�

� ˆ (t+1|t)).(f (t, x ˆ (t|t)+ P(t|t)ξ(m))−x ˆ (t+1|t))T P(t|t)ξ(m))−x

ˆ (t + 1|t + 1) and P(t + 1|t + 1). step 3: The EKF computation of x ˆ (t + 1|t + 1) = E(x(t + 1)|Zt+1 ) = E(x(t + 1)|Zt , z(t + 1)) x ˆ (t + 1|t) by an In the EKF, we assume that the extra measurement modiﬁes x additive term proportional to the output error at time t + 1, ie, the diﬀerence ˆ(t + 1|t) = between the true output measurement z(t + 1) and its estimate z E(h(t + 1, x(t + 1))|Zt ) based on Zt . Thus the EKF gives ˆ (t + 1|t + 1) = x ˆ (t + 1|t) + K(t + 1)(z(t + 1) − h(t, x ˆ (t + 1|t))) x where E(h(t+1, x(t+1))|Zt ) has been approximated by h(t+1, Ex(t+1)|Zt )) = ˆ (t + 1|t)) Again the conditional expectation has been pushed inside h(t + 1, x the nonlinearity. This algorithm for updating the conditional expectation based on the newly arrived measurement is based on the fact that if at time t, the output estimation error is ”positive”, then we increase proportionally the state estimate while if it is negative, we decrease proportionally the state estimate. The ”Kalman gain” K(t + 1) is computed so that ˆ (t + 1) − x ˆ (t + 1|t + 1) �2 |Zt ) = E[� e(t + 1|t + 1) �2 |Zt ] E(� x is a minimum. We note that ˆ (t + 1|t + 1) = e(t + 1|t + 1) = x(t + 1) − x

Advanced Probability and Statistics: Remarks and Problems

69 77

ˆ (t + 1|t) − K(t + 1)(z(t + 1) − h(t, x ˆ (t + 1|t)))

x(t + 1) − x ˆ (t + 1|t)) + v(t + 1))

= e(t + 1|t) − K(t + 1)(h(t + 1, x(t + 1)) − h(t, x ≈ (I − K(t + 1)H(t + 1))e(t + 1|t) + K(t + 1)v(t + 1)

where H(t + 1) =

ˆ (t + 1|t)) ∂h(t + 1, x ∂x

and hence E[� e(t + 1|t + 1) �2 |Zt ] =

= T r[(I − K(t + 1)H(t + 1))P(t + 1|t)(I − K(t + 1)H(t + 1))T +K(t + 1)R(t + 1)K(t + 1)T ]

Minimizing this w.r.t K(t + 1) using the variational calculus for functions of matrices gives us the optimum Kalman gain as K(t + 1) = P(t + 1|t)H(t + 1)T (H(t + 1)P(t + 1|t)H(t + 1)T + R(t + 1))−1 The optimum value of P(t + 1|t + 1) is then obtained by subsituting this value of the optimal Kalman gain into the expression E[e(t + 1|t + 1)e(t + 1|t + 1)T |Zt ] = = [(I − K(t + 1)H(t + 1))P(t + 1|t)(I − K(t + 1)H(t + 1))T +K(t + 1)R(t + 1)K(t + 1)T ]

and assuming that P(t + 1|t + 1) is a function of only Zt and not of Zt+1 = (Zt , z(t + 1)). The above expression then yields P(t + 1|t + 1) = [(I − K(t + 1)H(t + 1))P(t + 1|t)(I − K(t + 1)H(t + 1))T +K(t + 1)R(t + 1)K(t + 1)T ] = (I − K(t + 1)H(t + 1))P(t + 1|t)

= P(t+1|t)−P(t+1|t)H(t+1)T (H(t+1)P(t+1|t)H(t+1)T +R(t+1))−1 H(t+1)P(t+1|t)

Exercise: Rewrite the expression for P(t + 1|t + 1) using the matrix inversion lemma. Remark: Note that the above expression for P(t + 1|t + 1) shows that P(t + 1|t + 1) ≤ P(t + 1|t)

which in particular implies that T r(P(t + 1|t + 1)) ≤ T r(P(t + 1|t))

This means that if we base our state estimate on an extra data point, the estimation error variance reduces. This is natural to expect.

7078CHAPTER 2. REMARKS Advanced andON Statistics: Remarks and Problems ANDProbability PROBLEMS STATISTICAL SIGNAL PROCESSIN ˆ (t + 1|t + 1) and P(t + 1|t + 1). step 4: The UKF calculation of x Here, we start with the expression ˆ (t + 1|t + 1) = E[x(t + 1)|Zt , z(t + 1)] x and assume that given Zt , (x(t+1), z(t+1)) is jointly Gaussian. This is justiﬁed ˆ (t + 1|t) + e(t + 1|t) and z(t + 1) = z ˆ(t + 1|t) + ez (t + 1|t) which since x(t + 1) = x means that the assumption amounts to saying that given Zt , (e(t + 1|t), ez (t + 1|t)) are jointly Gaussian errors. Next, we use the fact that if U, V are jointly Gaussian vectors, then E(X|Y) = µX + ΣXY Σ−1 Y Y (Y − µY ), T Cov(X|Y) = ΣXX − ΣXY Σ−1 Y Y ΣXY

where ΣXX = Cov(X), ΣY Y = Cov(Y), ΣXY = Cov(X, Y), µX = E(X), µY = E(Y) Based on these assumptions and formulae, we have the approximate formulae ˆ (t + 1|t + 1) = E(x(t + 1)|Zt , z(t + 1)) = x ˆ (t + 1|t) − Cov(x(t + 1), z(t + 1)|Zt ).Cov(z(t + 1)|Zt )−1 (z(t + 1) − z ˆ(t + 1|t)) x where Cov(x(t + 1), z(t + 1)|Zt ) = Cov(x(t + 1), h(t + 1, x(t + 1))|Zt ) ˆ (t + 1|t) + e(t + 1|t))|Zt ) = Cov(e(t + 1|t), h(t + 1, x = (1/K)

K �

�

m=1

where

ˆ (t+1|t)+ P(t + 1|t)η(m)(h(t+1, x

ˆ + 1|t) = (1/K) z(t

K �

�

P(t + 1|t)η(m))−ˆ z(t+1|t))T

ˆ (t + 1|t) + h(t + 1, x

m=1

� P(t + 1|t)η(m))

where η(m), m = 1, 2, ..., K are again iid standard normal random vectors. Moreover, Cov(z(t + 1)|Zt ) = Cov(h(t + 1, x(t + 1))|Zt ) + R(t + 1) ˆ (t + 1|t) + e(t + 1|t))|Zt ) + R(t + 1) = Cov(h(t + 1, x = (1/K)

K �

ˆ (t+1|t)+ (h(t+1, x

m=1

ˆ(t+1|t)).(h(t+1, x ˆ (t+1|t)+ −z

� P(t + 1|t)η(m))

� ˆ(t+1|t))T P(t + 1|t)η(m))−z +R(t + 1)

Advanced Probability and Statistics: Remarks and Problems

79 71

and P(t + 1|t + 1) = Cov(x(t + 1)|Zt , z(t + 1)) = Cov(x(t+1)|Zt )−Cov(x(t+1), z(t+1)|Zt ).Cov(z(t+1)|Zt )−1 .Cov(x(t+1), z(t+1)|Zt )T where all the terms except the ﬁrst on the rhs have been computed above. The ﬁrst term is Cov(x(t + 1)|Zt ) = P(t + 1|t) This completes the description of the EKF and the UKF. step 5: Performance analysis of the UKF based on the large deviation prin ciple. [15] The Belavkin ﬁlter and how it improves upon the classical Kushner Kallianpur ﬁlter. The unitary evolution in system ⊗ bath space is given by the HudsonParthasarathy noisy Schrodinger equation dU (t) = [−(iH + P )dt − LdA + L∗ dA∗ ]U (t), P = LL∗ /2

For any system space observable X, we deﬁne the system state at time t to be

X(t) = jt (X) = U (t)∗ XU (t)

and by quantum Ito’s formula, obtain djt (X) = jt (θ0 (X))dt + jt (θ1 (X))dA(t) + jt (θ2 (X))dA(t)∗ where θ0 (X) = i[H, X]−XP −P X +LXL∗ = i[H, X]−(1/2)(LL∗ X +XLL∗ −2LXL∗ ) θ1 (X) = [L, X], θ2 (X) = [X, L∗ ] = θ1 (X)∗ The measurement model is Y (t) = U (t)∗ Yi (t)U (t), Yi (t) = c¯A(t) + cA(t)∗ , c ∈ C Quantum Ito’s formula is

dA.dA∗ = dt

We have by Quantum Ito’s formula, dY (t) = dYi (t) + jt (cL + c¯L∗ )dt, jt (Z) = U (t)∗ ZU (t) for any Z deﬁned in h ⊗ Γs (L2 (R+ )), where h is the system Hilbert space and Γs (L2 (R+ )) is the bath Boson Fock space. Note that U (t) is a unitary operator in h ⊗ Γs (L2 (R+ )). In classical ﬁltering theory, the state X(t) evolves according the a classical sde: dX(t) = f (X(t))dt + g(X(t))dB(t)

72

Advanced Probability and Statistics: Remarks and Problems 80CHAPTER 2. REMARKS AND PROBLEMS ON STATISTICAL SIGNAL PROCESSIN where B(.) is classical Brownian motion and the measurement model is dY (t) = h(X(t))dt + σV .dV (t) where V (.) Brownian motion independent of B(.). In classical probability there fore, the homomorphism jt acts on the commutative algebra of real valued func tions φ on Rn where X(t) ∈ Rn and is deﬁned by jt (φ) = φ(X(t)) and it satisﬁes the sde djt (φ) = φ� (X(t))dX(t) + (1/2)T r(gg T (X(t))phi�� (X(t)))dt = jt (Kφ)dt +

d �

Bm (t)jt (gkm (X(t))Dk φ)

k,m

where

K = f T D + (1/2)T r(gg T .DDT ) is the generator of the Markov process X(t) and D = ∂/∂X is the gradient operator. In the scalar process case, there is just one state variable and one Brownian motion B(.) and then the above simpliﬁes to djt (φ) = jt (Kφ)dt + jt (gDφ)dB(t) where K = f D + (1/2)g 2 D2 , D = d/dX The measurement model in the classical case can be expressed as dY (t) = jt (h)dt + σV dV (t) In the quantum case, the Hermitian operator cL + c¯L∗ plays the role of h and cA ¯ (t) + cA(t)∗ plays the role of the classical measurement noise σV V (t). Note that c¯A(t) + cA(t)∗ by itself is a classical Brownian motion process in any state of the system and bath with a variance of |c|2 in place of σV2 . The quantum analogue of the classical generator K is the quantum Lindblad generator θ0 . Note that the classical observable φ is replaced by the quantum observable X and φ(X(t)) = jt (φ) by jt (X) = U (t)∗ XU (t). θ1 (X), θ2 (X) = θ1 (X)∗ are reduced in the classical theory to gDφ if we take B(t) = A(t) + A(t)∗ . Note that in the quantum scenario, the process noise A(t) + A(t)∗ generally is correlated with the measurement noise cA ¯ (t) + cA(t)∗ . It is uncorrelated only when we choose c to be purely imaginary. Remark: In order to see the classicalquantum analogy better, we must relate the classical theory to classical mechanics, Hamiltonians etc. Thus, we consider the Langevin equation for a classical particle moving in a potential U (q). The stochastic diﬀerential equation of motion of such a particle is given by dq(t) = p(t)dt, dp(t) = −(γp(t) + U � (q(t)))dt + g(q(t), p(t))dB(t)

Advanced Probability and Statistics: Remarks and Problems

73 81

The state of the system at time t is now X(t) = [q(t), p(t)]T and the homomorphism jt is jt (φ) = φ(X(t)) Then, by Ito’s formula, djt (φ) = jt (Kφ) + jt (Sφ)dB(t) where

Kφ = p∂φ/∂q − (γp + U � (q))∂φ/∂p + (1/2)g 2 ∂ 2 φ/∂p2

Noting that the Hamiltonian of the particle is H = p2 /2 + U (q) we can write Kφ = {φ, H}P − γp∂φ/∂p + (1/2)g 2 ∂ 2 φ/∂p2 where {., .}P denotes the Poisson bracket. To see what the quantum general ization of this is, we choose L = ψ(q, p) where now q, p are Hermitian operators satisfying the commutation relations [q, p] = ih/2π Then,

θ0 (X) = i[H, X] − (1/2)(LL∗ X + XLL∗ − 2LXL∗ ) = i[H, X] − (1/2)(L[L∗ , X] + [X, L]L∗ )

Taking X = φ(q, p) we have i[H, X] = i[p2 /2 + U (q), φ(q, p)] = (i/2)([p, φ]p + p[p, φ]) + i[U (q), φ] = (1/2){∂q φ, p} + i[U, φ] where {., .} denotes the anticommutator. In the classical case where Poisson brackets are replaced by Lie brackets, this expression becomes {φ, H}P = p∂q φ − (∂q U ).(∂p φ) but we see that in the quantum case, there are additional factors in view of the noncommutativity of q and p that are expressible as a power series in Planck’s constant. For example, suppose φ(q, p) = φ0 (q) + {φ1 (q), p}

74 Advanced Probability and RemarksSIGNAL and Problems 82CHAPTER 2. REMARKS AND PROBLEMS ONStatistics: STATISTICAL PROCESSING then we get i[H, X] = (1/2){∂q φ0 (q), p} + (1/2){{∂q φ1 (q), p}, p} In the classical case, these equations simplify as φ(q, p) = φ0 (q) + 2pφ1 (q), and i[H, X] gets replaced by p∂q φ0 (q) + 2p2 ∂q φ1 (q) A further special case of this is when X = φ(q, p) = φ0 (q) Then, i[H, X] = (1/2){∂q φ0 (q), p} while in the classical case, this reduces to p∂q φ0 (q) As for the Lindblad terms, in the quantum case with X = φ(q, p) and L = ψ(q, p), we ﬁnd that the quantum version of −γp∂φ/∂p + (1/2)g 2 ∂ 2 φ/∂p2 should be simply −(1/2)(LL∗ X + XLL∗ − 2LXL∗ ) = (−1/2)(L[L∗ , X] + [X, L]L∗ ) where X = φ(q, p) and L = ψ(q, p) chosen appropriately in terms of the classical function g(q, p). For example, choosing L = aq + bp, a, b ∈ C, we get

aq + ¯bp, φ(q, p)] = i¯ a∂p φ − i¯b∂q φ [L∗ , X] = [¯ [X, L] = −ia∂p φ + ib∂q φ

a∂p φ − i¯b∂q φ) L[L∗ , X] + [X, L]L∗ = (aq + bp)(i¯ +(−ia∂p φ + ib∂q φ)(¯ aq + ¯bp)

¯(p.∂p φ + ∂q φ.q)

= i|a|2 [q, ∂p φ] + iba −ia¯b(q∂q φ + ∂p φ.p) − i|b|2 [p, ∂q φ]

= −|a|2 ∂p2 φ − |b|2 ∂q2 φ

Advanced Probability and Statistics: Remarks and Problems

75 83

iba ¯(p.∂p φ + ∂q φ.q) − ia¯b(q∂q φ + ∂p φ.p)

(Remark: To get some sort of agreement with the classical case, the term in volving ∂q2 should not appear. Thus, we set b = 0, ie, L = aq in which case,

L[L∗ , X] + [X, L]L∗ = −|a|2 ∂p2 φ

but then we are not able to get the damping term γ.p∂p φ). So we see that if we constrain the Lindblad operator L to be linear in q and p, we are able to obtain some sort of a quantum analogue of the classical Langevin equation but with some additional terms. Moreover, by restricting L to be linear in q, p, we cannot in the quantum case account for a general diﬀusion coeﬃcient g 2 (q, p) dependent on q, p present in the classical case. Upto this, we have dealt with drawing analogies between the classical Fokker Planck equation for stochastic diﬀerential equations in classical mechanics and probability on the one hand and quantum stochastic diﬀerential equations in quantum mehcanics and quantum probability on the other. Now, we try to draw analogies between the classical ﬁltering and quantum ﬁltering equations. First we note that in the quantum case, the measurement noise process Yi (t) = c¯A(t) + cA(t)∗ is also a Brownian motion with a variance parameter |c|2 and the process noise in djt (X) appears as jt ([L, X])dA(t) + jt ([X, L∗ ])dA(t)∗ . This measurement noise is generally correlated with the process noise unless it happens that L∗ = L and c is real, in which case we get that the process noise appears as jt ([L, X])(dA(t) − dA(t)∗ ) with [L, X]∗ = [X, L] = −[L, X] while the measurement noise diﬀerential is c(dA(t) + dA(t)∗ ). Another case in which this happens is when c is pure imaginary and L∗ = −L in which case, it happens that the process noise appears as jt ([L, X])(dA(t) + dA(t)∗ ) (since [L, X]∗ = [X, L∗ ] = −[X, L] = [L, X]) while the measurement noise diﬀerential is c¯(dA(t)−dA(t)∗ ). These two cases are the only cases in which the process noise and measurement noise are independent Brownian motions as in the classical model for nonlinear ﬁltering. The Belavkin ﬁlter is obtained by denoting the nondemolition measurements upto time t by Zt = σ(Y (s) : s ≤ t) and writing

E(jt (X)|Zt ) = πt (X) and noting that πt (X), t ≥ 0 form an Abelian family of operators along with Zt and hence we can assume that they satisfy an equation dπt (X) = Ft (X)dt + Gt (X)dY (t) where Ft (X), Gt (X) are Zt measurable. Then assuming that dC(t) = f (t)C(t)dY (t), C(0) = 1

7684CHAPTER 2. REMARKS Advanced Probability andON Statistics: Remarks SIGNAL and Problems AND PROBLEMS STATISTICAL PROCESSIN we get that C(t) is Zt measurable and therefore by the basic orthogonality principle in signal estimation theory, E[(jt (X) − πt (X))C(t)] = 0

[16] Simultaneous application of the representation theory of the permuta tion groups and the Euclidean motion group in three dimensions to 3D image processing problems. Assume that we are given n 3 − D objects whose centres are located at the positions rj , j = 1, 2, ..., n. The k th object whose centre is located at rk will emit a signal fk (r − rk ). Thus, the total signal emitted by all the n objects is given by n L fk (r − rk ) X(r|r1 , ..., rn ) =) = k=1

Now suppose we permute the objects by applying a permutation σ ∈ Sn and also rotate the entire array of objects by the rotation R ∈ SO(3). Then the resulting signal ﬁeld becomes Y (r|r1 , ..., rk ) =

n L

k=1

fk (R−1 r − rσ−1 k )

= X(R−1 r|rσ−1 1 , ..., rσ−1 n ) From measurements of the signal ﬁelds X and Y, I wish to estimate the rotation R and the permutation σ. Let π be a unitary representation of SO(3) and η a unitary representation of Sn . We compute L Y (r|rρ1 , ..., rρn )η(ρ) ρ∈Sn

=

L

X(R−1 r|rσ−1 ρ1 , ..., rσ−1 ρn )η(ρ)∗

ρ∈Sn

=[

L

X(R−1 r|rρ1 , ..., rρn )η(ρ)∗ ]η(σ)∗

J

Y (Sr|r1 , ..., rn )π(S)∗ dS

ρ∈Sn

Also,

=

J

=

J

SO(3)

X(R−1 Sr|rσ−1 1 , ..., rσ−1 n )π(S)∗ dS X(Sr|rσ−1 1 , ..., rσ−1 n )π(RS)∗ dS

Advanced Probability and Statistics: Remarks and Problems �

=[

85

X(Sr|rσ−1 1 , ..., rσ−1 n )π(S)∗ dS]π(R)∗

More generally, we then have � � Y (Sr|rρ1 , ..., rρn )π (S)∗ ⊗ η(ρ)∗ dS ρ∈Sn

=[

� �

ρ∈Sn

X(Sr|rρ1 , ..., rρn )π(S)∗ ⊗ η(ρ)∗ dS](π(R)∗ ⊗ η(σ)∗ )

This equation gives us a clue to how linear estimation theory can be applied to estimate both R ∈ SO(3) and σ ∈ Sn . [17] (Part of a B.E. project) In quantum mechanics, probabilities of events are computed w.r.t a given state of the system. If the system is in a pure state |ψ >, then the probability of an event descrribed by the projection P is given by < ψ|P |ψ >. Many times, calculating the pure state wave function |ψ > becomes very complicated because we have to solve a Schrodinger equation for that. A typical example is that of a two electron atom like Helium where the wave function is a function of two position variables, ie, a function of six real variables. However, there do exist approximate ways of calculating the required wave function. One such is the HartreeFock method in which for example in a two electron atom, we know that the wave function must be antisymmetric w.r.t interchange of the two position and spin variables owing to the Pauli exclusion principle which states that two electrons cannot occupy the same state, ie, have the same positions and spins. The HartreeFock approximation involves assuming that the wave function is a product of position wave functions and spin wave functions with one of them being symmetric and the other one antisymmetric w.r.t interachange of the two electrons. Further, we assume that the position part of the wave function if antisymmetric can be represented as the antisymmetrizer of the product of single particle position wave functions and if symmetric, can be represented as the symmetrizer of the product of single particle position wave functions with the same argument being valid for the spin wave functions. I outline here most of the steps involved in doing the HartreeFock simulation for a two electron atom with interacting spins and angular momenta. The Hamiltonian without taking spin or orbital momentum interactions is given by H01 + H02 + V12 Where H01 = p12 /2m − 2e2 /r1 , H02 = p22 /2m − 2e, 2/r2 V12 = e2 /|r2 − r1 | Where p1 = −i�1 , p2 = −i�2

77

7886CHAPTER 2. REMARKS Advanced andON Statistics: Remarks and Problems AND Probability PROBLEMS STATISTICAL SIGNAL PROCESSIN The magnetic ﬁeld produced by the two nuclear protons at the site of the ﬁrst electron in view of their relative motions is given by Bp1 (r1 ) = ev1 × r1 /r13 = c1 L1 /r13 , L1 = r1 × p1 The interaction energy between this magnetic ﬁeld and the total spin orbital magnetic moment M1 = (−e/2m)(L1 + gσ1 ) Of the ﬁrst electron is given by Vp1 = −(M1 , Bp1 (r1 )) = c1 (L1 + gσ1 , L1 )/r13 Likewise the magnetic interactions between the proton and the total magnetic moment of the second electron is given by Vp2 = c2 (L2 + gσ2 , L2 )/r23 Now the ﬁrst electron moving with a velocity of v1 produces at the site of the second electron moving with a relative velocity of v2 − v1 w.r.t the ﬁrst electron, a magnetic ﬁeld at the site of the second electron given by Bm21 = −e(v1 −v2 )x(r2 −r1 )/|r2 −r1 |3 = (e/m)(−L1 −L2 −p1 xr2 −p2 xr1 )/|r2 −r1 |3 Since V1 = p1 /m, v2 = p2 /m, L1 = r1 xp1 , L2 = r2 xp2 and further the spin magnetic moment of the ﬁrst electron produces a magnetic ﬁeld Bs21 = curl2 ((−geσ1 /2m)x(r2 − r1 )/|r2 − r1 |3 ) Remark: A magnetic moment m at the origin produces a magnetic vector po tential A = m × r/r3 And a magnetic ﬁeld B = curlA = curl(m × r/r3 ) = −curl(m × �(1/r))

= −m�2 (1/r) + (m, �)]�(1/r)

= (m, �)((−r/r3 ) = −mx (∂x (r/r3 )) − my ∂y (r/r3 ) − mz ∂z (r/r3 )

= −mx (ˆ x/r3 − 3rx/r5 ) − my ((hatx/r3 − 3ry/r5 ) − mz (ˆ x/r3 − 3rz/r5 )

= −m/r3 + 3(m, r)r/r5

Thus the interaction energy between the total magnetic moment of the second

electron with the magnetic ﬁeld generated by the ﬁrst electron due to its relative motion and spin is given by (Bm21 + Bs21 , (e/2m)(L2 + gσ2 ))

Advanced Probability and Statistics: Remarks and Problems

87 79

= (e/2m)(L2 + gσ2 , (−L1 − L2 − p1 xr2 − p2 xr1 ))/|r2 − r1 |3

+(ge/2m)(L2 + gσ2 , σ1 /|r2 − r1 |3 − 3(σ1 , r2 − r1 )(r2 − r1 )/|r2 − r1 |5 ) Likewise, the interaction energy between the total magnetic moment of the ﬁrst electron with the magnetic moment of the ﬁrst electron and the magnetic ﬁeld generated by the second electron due to its relative motion and spin is given by (Bm12 + Bs12 , (e/2m)(L1 + gσ1 )) (e/2m)(L1 + gσ1 , (−L1 − L2 − p1 xr2 − p2 xr1 ))/|r2 − r1 |3

+(ge/2m)(L1 + gσ1 , σ2 /|r2 − r1 |3 + 3(σ2 , r2 − r1 )(r2 − r1 )/|r2 − r1 |5 ) Remark: There are other ways to calculate the interaction energy between the two electrons involving magnetic ﬁelds produced by them and their magnetic moments. They give diﬀerent results. For example, the magnetic ﬁeld produced by the ﬁrst electron based on its motion and spin magnetic moment is given by B1 (r) = −ev1 × (r − r1 )/|r − r1 |3 + curl(m1 × (r − r1 ))/|r − r1 |3 ) = −ev1 × (r − r1 )/|r − r1 |3 − m1 /|r − r1 |3 + 3(m1 , r − r1 )r − r1 /|r − r1 |5 Where m1 = −geσ1 /2m Is the spin magnetic moment of the ﬁrst electron. Likewise, the magnetic ﬁeld produced by the second electron is B2 (r) = −ev2 × (r − r2 )/|r − r2 |3 + curl(m2 × (r − r2 ))/|r − r2 |3 ) = −ev2 × (r − r2 )/|r − r2 |3 − m2 /|r − r2 |3 + 3(m2 , r − r2 )r − r2 /|r − r2 |5 The total energy in the magnetic ﬁeld produced by the two electrons is � EB = (µ/2) |B1 (r) + B2 (r)|2 d3 r

And the interaction part of this energy is clearly � (µ/2) (B1 (r), B2 (r))d3 r

From the above consideration, it is clear that the magnetic interaction energy must be a scalar operator built out of the vector operators p1 , p2 , r2 − r1 , σ1 , σ2 ) . It is then easy to see that this interaction energy must have the form f1 (|r2 −r1 |)(p1 , p2 )+f2 (|r2 −r1 |)(p1 , σ2 )+f2 (|r2 −r1 |)(p2 , σ1 )+f3 (|r2 −r1 |)((r2 −r1 )

×p1 , σ2 )+f3 (|r2 −r1 |)((r1 −r2 )×p2 , σ1 )+f4 (|r2 −r1 |)(σ1 , σ2 )+f5 (|r2 −r1 |)(σ1 ×σ2 , r2 −r1 )

In order to formulate the HartreeFock equations taking spin into account, we must therefore First discretize the spatial region into N 3 pixels, N being the number of pixels along each of the xyz coordinate axes. Then p1x , p1y , p1z , x1 , y1 , z1 are each represented by N 3 × N 3 matrices, ie, these act in the Hilbert space

80

Advanced Probability and Statistics: Remarks and Problems 88CHAPTER 2. REMARKS AND PROBLEMS ON STATISTICAL SIGNAL PROCESSIN 3

CN ×1 . Likewise For p2x , p2y , p2z , x2 , y2 , z2 . These act in another independent 3 Hilbert space CN Each of the spin matrices σ1x , σ1y , σ1z is represented by a 2×2 Hermitian matrix, ie, these act in the Hilbert space C2 . Likewise, σ2x , σ2y , σ2z act in another independent Hilbert space C2 . In this The total of all the above operators therefore acts in the tensor product Hilbert space 3

3

H = CN ⊗ CN ⊗ C2 ⊗ C2 = C4N

6

The Hamiltonian of the system thus has the following decomposition: H = H1 ⊗ I3N 3 ⊗ I2 ⊗ I2 + I3N 3 ⊗ H2 ⊗ I2 ⊗ I2 +V Where V acts in the joint tensor product space and can be decomposed as p � (V1k ⊗ V2k ⊗ V3k ⊗ V4k ) V = k=1

Where V2k is a function of only the position and momentum variables of the ﬁrst 3 electron, and thus acts in C3N , namely the ﬁrst tensor product component in H. V2k is a function of only the position and momentum variables of the second 3 electron and thus acts in the second tensor product component C3N of H. V3k is a function of only the spin matrices of the ﬁrst electron and therefore acts in the third tensor product component C2 and ﬁnally, V4k is a function of only the spin matrices of the second electron and therefore acts in the fourth tensor product component C2 . Accordingly, in the Hartree Fock approximation, we can assume that the state ψ of the two electrons has one of the following forms: [a] C(ψ1 ⊗ ψ2 − ψ2 ⊗ ψ1 )| + + > [b] C(ψ1 ⊗ ψ2 − ψ2 ⊗ ψ1 )| − − > [c]

√ C(ψ1 ⊗ ψ2 − ψ2 ⊗ ψ1 )(| + − > +| − + >)/ 2

[d]

√ C(ψ1 ⊗ ψ2 + ψ2 ⊗ ψ1 )(| + − > −| − + >)/ 2

Where ψ1 is a vector in the ﬁrst tensor product component, ψ2 in the second, and objects such as |++ >, |−− >, |+− >, |−+ > have have their ﬁrst component in the third tensor product components and their second component in the fourth tensor product component. Substituting for example the ﬁrst expression [a] for ψ gives us on noting that H1 and H2 are identical matrices so are V1k and V2k and so also are V3k and V4k for each k the following expression for the average of H in the state |ψ >: [a] < ψ|H|ψ >= 2C 2 [< ψ1 |H1 |ψ1 > + < ψ2 |H1 |ψ2 > − < ψ1 |H1 |ψ2 >< ψ2 |ψ1 > − < ψ2 |H1 |ψ1 >< ψ1 |ψ2 >]

Advanced Probability and Statistics: Remarks and Problems

+C 2

p �

k=1

89

81

< ψ1 ⊗ψ2 −ψ2 ⊗ψ1 |V1k ⊗V2k |ψ1 ⊗ψ2 −ψ2 ⊗ψ1 >< ++|V3k ⊗V4k |++ >

= 2C 2 [< ψ1 |H1 |ψ1 > + < ψ2 |H1 |ψ2 > − < ψ1 |H1 |ψ2 >< ψ2 |ψ1 > − < ψ2 |H1 |ψ1 >< ψ1 |ψ2 >] +2C 2

p �

k=1

[(< ψ1 |V1k |ψ1 >< ψ2 |V1k |ψ2 > − < ψ1 |V1k |ψ2 >< ψ2 |V1k |ψ1 >)| < +|V3k |+ > |2 ]

We are putting the normalization constraints: < ψ1 |ψ1 >= 1 =< ψ2 |ψ2 >, 2C 2 (1 − | < ψ1 |ψ2 > |2 ) = 1 We must now apply the variational principle to extremize < ψ|H|ψ > w.r.t |ψ1 > and |ψ2 > subject to the above constraints. Likewise, for the second case [b] < ψ|H|ψ >= 2C 2 [< ψ1 |H1 |ψ1 > + < ψ2 |H1 |ψ2 > − < ψ1 |H1 |ψ2 >< ψ2 |ψ1 > − < ψ2 |H1 |ψ1 >< ψ1 |ψ2 >] +2C 2

p �

k=1

[(< ψ1 |V1k |ψ1 >< ψ2 |V1k |ψ2 > − < ψ1 |V1k |ψ2 >< ψ2 |V1k |ψ1 >)| < −|V3k |− > |2 ]

For the third case, [c] < ψ|H|ψ >= 2C 2 [< ψ1 |H1 |ψ1 > + < ψ2 |H1 |ψ2 > − < ψ1 |H1 |ψ2 >< ψ2 |ψ1 > − < ψ2 |H1 |ψ1 >< ψ1 |ψ2 >] +2C

2

p �

[(< ψ1 |V1k |ψ1 >< ψ2 |V1k |ψ2 > − < ψ1 |V1k |ψ2 >< ψ2 |V1k |ψ1 >

k=1

×(1/2)(< + − |+ < − + |)(V3k ⊗ V4k )(| + − > +| − + >)

Now observe that (< + − |+ < − + |)(V3k ⊗ V4k )(| + − > +| − + >) = < + − |V3k ⊗ V4k | + − > + < − + |V3k ⊗ V4k | − + > + < + − |V3k ⊗ V4k | − + > + < − + |V3k ⊗ V4k | + − > and ﬁnally for the fourth case, [d] < ψ|H|ψ >= 2C 2 [< ψ1 |H1 |ψ1 > + < ψ2 |H1 |ψ2 > + < ψ1 |H1 |ψ2 >< ψ2 |ψ1 > + < ψ2 |H1 |ψ1 >< ψ1 |ψ2 >]+

82

Advanced Probability and Statistics: Remarks and Problems 90CHAPTER 2. REMARKS AND PROBLEMS ON STATISTICAL SIGNAL PROCESSIN 2C 2

� k

[< ψ1 |V1k |ψ1 >< ψ2 |V1k |ψ2 > + < ψ1 |V1k |ψ2 >< ψ2 |V1k |ψ1 >]

×(1/2)(< + − |− < − + |)(V3k ⊗ V4k )(| + − > −| − + >)

We work out the details for case [a]. Assume that the overlap constant C 2 is ﬁxed and then after taking into account normalization constraints using Lagrange multipliers, the functional to be extremized is S =< ψ|H|ψ > −E1 (< ψ1 |ψ1 > −1)−E2 (< ψ2 |ψ2 > −1) −E12 (1−1/2C 2 −| < ψ1 |ψ2 > |2 )

Setting the variational derivative δS/δψ1∗ = 0 here, gives us

+2C 2

�

2C 2 [H1 |ψ1 > − < ψ2 |ψ1 > H1 |ψ2 > − < ψ2 |H1 |ψ1 > |ψ2 >]

k

[| < +|V3k |+ > |2 (< ψ2 |V1k |ψ2 > V1k |ψ1 > − < ψ2 |V1k |ψ1 > V1k |ψ2 >) −E1 |ψ1 > +E12 < ψ2 |ψ1 > |ψ2 >= 0

Setting the variational derivative δS/δψ2∗ = 0 gives us

+2C 2

�

2C 2 [H1 |ψ2 > − < ψ1 |H1 |ψ2 > |ψ2 > − < ψ1 |ψ2 > H1 |ψ1 >]

k

[| < +|V1k |+ > |2 (< ψ1 |V1k |ψ1 > V1k |ψ2 > − < ψ1 |V1k |ψ2 > V1k |ψ1 >) −E2 |ψ2 > +E12 < ψ1 |ψ2 > |ψ1 >= 0

If we put the restriction of no overlap, ie < ψ2 |ψ1 >= 0

then we get C 2 = 1/2 and the above equations simplify to [H1 |ψ1 > − < ψ2 |H1 |ψ1 > |ψ2 >] � + [| < +|V3k |+ > |2 (< ψ2 |V1k |ψ2 > V1k |ψ1 > k

− < ψ2 |V1k |ψ1 > V1k |ψ2 >)−E1 |ψ1 >= 0−−−(a)

[H1 |ψ2 > − < ψ1 |H1 |ψ2 > |ψ2 >] � + [| < +|V1k |+ > |2 (< ψ1 |V1k |ψ1 > V1k |ψ2 > k

− < ψ1 |V1k |ψ2 > V1k |ψ1 >)−E2 |ψ2 >= 0−−−(b)

Advanced Probability and Statistics: Remarks and Problems

83 91

Note that (a) and (b) are consistent with the no overlap requirement < ψ2 |ψ1 >= 0 as follows by premultiplying (a) by < ψ2 | and (b) by < ψ1 | and obtaining an identity. It is clear that in the time dependent case, we have to replace the energy values E1 and E2 by i∂/∂t. This is a consequence of the fact that under the no overlap condition, < ψ1 ⊗ ψ2 − ψ2 ⊗ ψ1 |∂t |ψ1 ⊗ ψ2 − ψ2 ⊗ ψ1 >= < ψ1 |∂t |ψ1 > + < ψ2 |∂t |ψ2 >

since we are assuming that |ψk >, k = 1, 2 are normalized. Remarks: |ψ > is the overall wave function of the positions and spins of the two electrons. The forms of |ψ > assumed in [a], [b], [c], [d] ensure that it is antisymmetric w.r.t the interchange of the positions and spins of the two electrons. For example, in [a], [b] and [c], the wave function is antisymmetric w.r.t. the interchange of the positions of the two electrons and symmetric w.r.t the interchange of the spin of two two electrons. Hence, since the product of a minus and a plus is a minus, the overall wave function is antisymmetric w.r.t the interchange of both the positions and spins of the two electrons. In [d], the wave functions is symmetric w.r.t the interchange of the positions of the two electrons while it is antisymmetric w.r.t the interchange of the two electron spins so once again the overall wave functions is antisymmetric. We could represent such wave functions alternately as ψ(r1 , s1 , r2 , s2 ) where rk , and sk are respectively the position and zspin component of the k th electron k = 1, 2. Thus in the case [a], ψ(r1 , 1/2, r2 , 1/2) = C(ψ1 ⊗ ψ2 − ψ2 ⊗ ψ1 )(r1 , r2 ) = C(ψ1 (r1 )ψ2 (r2 ) − ψ1 (r2 )ψ2 (r1 ))

and ψ(r1 , s1 , r2 , s2 ) is zero for the choices (1/2, −1/2, ), (−1/2, 1/2), (−1/2, −1/2) for (s1 , s2 ). We can treat this wave function as a four component wave func tion of the position variables. Likewise, for the wave function of [d], it equals √ C(ψ1 (r1 )ψ2 (r2 ) + ψ1 (r2 )ψ2 (r1 ))/ 2 for the choice (s1 , s2 ) = (1/2, −1/2), the negative of the same for (s1 , s2 ) = (−1/2, 1/2) and zero for the remaining two choices of (s1 , s2 ). The antisymmetry of these wave functions are therefore equivalently expressed as ψ(r1 , s1 , r2 , s2 ) = −ψ(r2 , s2 , r1 , s1 ), r1 , r2 ∈ R3 , s1 , s2 = ±1/2 Now we come � to the point of how to express the spinorbit interaction terms in the form k V1k ⊗ V2k ⊗ V3k ⊗ V4k . The general form of this interaction terms, as discussed above is of the form (f1 (r1 , r2 , p1 , p2 , σ1 ) +(f1 (r1 , r2 , p1 , p2 , σ2 ) where (f1 (r1 , r2 , p1 , p2 , σ1 )

84 Advanced Probability and Statistics: Remarks and Problems 92CHAPTER 2. REMARKS AND PROBLEMS ON STATISTICAL SIGNAL PROCESSING = f1x (r1 , r2 , p1 , p2 )σ1x + f1y (r1 , r2 , p1 , p2 )σ1y + f1z (r1 , r2 , p1 , p2 )σ1z and likewise for the second term. The x, y, z components of the vector operators r1 , p1 act in the Hilbert space H1 = L2 (R3 ) while the components of r2 , p2 act in another independent and identical Hilbert space H2 = L2 (R3 ). The x,y,z components of σ1 act in another independent Hilbert space C2 and ﬁnally, the x, y, z components of σ2 act in yet another independent Hilbert space H4 = C2 . All the components of the Hamiltonian then act in the tensor product Hilbert space H = H1 ⊗ H 2 ⊗ H 3 ⊗ H 4 After discretization, V1k , V2k become N 3 × N 3 matrices while V3k , V4k are 2 × 2 6 3 3 matrices. H becomes the ﬁnite dimensional Hilbert space C4N = CN ⊗ CN ⊗ C2 ⊗ C2 . For example, consider an interaction term of the form

f1 (x1 , y1 , z1 )f2 (x2 , y2 , z2 )p1x σ1x = −if1 (x1 , y1 , z1 )f2 (x2 , y2 , z2 )σ1x ∂/∂x1 � We require to represent it in the form k V1k ⊗ V2k ⊗ V3k ⊗ V4k . Let e(k) denote the N × 1 column vector with a one in the k th position and zeros at all the other positions where k = 1, 2, ..., N . Likewise, let f (k) denote the 2 × 1 vector with a one in the k th position and zero at the other position k = 1, 2. Then −i∂/∂x1 is represented by an N 3 × N 3 matrix that takes the vector N � 3 φ= φ(n1x , n1y , n1z )e(n1x ) ⊗ e(n1y ) ⊗ e(n1z ) ∈ H1 = CN n1x ,n1y ,n1z =1

to the vector � D p1 x φ =

Δ−1 (φ(n1x +1, n1y , n1z )−φ(n1x , n1y , n1z ))e(n1x )⊗e(n1y )⊗e(n1z )

n1x ,n1y ,n1z

=

�

n1x ,n1y ,n1z

[e(n1x ) ⊗ e(n1y ) ⊗ e(n1z )].[(e(n1x + 1) − e(n1x ) ⊗ e(n1y ) ⊗ e(n1z )]T φ

because, we obviously have [e(n1x ) ⊗ e(n1y ) ⊗ e(n1z )]T φ = φ(n1x , n1y , n1z ) In other words, D1x is the N 3 × N 3 matrix given by Dp1 x = Δ−1

�

[e(n1x )⊗e(n1y )⊗e(n1z )].[(e(n1x +1)−e(n1x ))⊗e(n1y )⊗e(n1z )]T

n1x ,n1y ,n1z

Multiplication by f1 (x1 , y1 , z1 ) is represented by the N 3 × N 3 diagonal matrix D f1 =

N �

n1x ,n1y ,n1z =1

f1 (n1x Δ, n1y Δ, n1z Δ)[e(n1x )⊗e(n1y )⊗e(n1z )].[e(n1x )⊗e(n1y )⊗e(n1z )]T

Advanced Probability and Statistics: Remarks and Problems

9385

Likewise f2 (x2 , y2 , z2 ) is represented by another N 3 × N 3 diagonal matrix Df2 . Hence, f1 (x1 , y1 , z1 )f2 (x2 , y2 , z2 )p1x σ1x = f1 (x1 , y1 , z1 )p1x f2 (x2 , y2 , z2 )σ1x will then be represented by the 4N 3 × 4N 3 matrix (Df1 Dp1 ) ⊗ Df2 ⊗ σ1x ⊗ I2 which is of the desired form V1k ⊗ V2k ⊗ V3k ⊗ V4k . It should be noted that there are terms in the above interaction that cannot be expressed in such a completely factorized form, for example a term involving 1/|r1 − r2 |. However such terms generally have the form f (x1 , y1 , z1 , x2 , y2 , z2 )p1x σ1x − − − (α) or f (x1 , y1 , z1 , x2 , y2 , z2 )p1y σ1x or f (x1 , y1 , z1 , x2 , y2 , z2 )p1x σ1y etc. and these can be expressed in a restricted factorized form V12k ⊗ V3k ⊗ V4k where V12k is N 6 × N 6 and acts in H1 ⊗ H2 while V3k , V4k are both 2 × 2 and each act in C2 . In the above case (α), V3k =

N �

n1x ,n1y ,n1z ,n2x ,n2y ,n2z =1

f (n1x Δ, ..., n2z Δ)[e(n1x )⊗e(n1y )⊗e(n1z )⊗e(n2x ⊗e(n2y )⊗e(n2z )][�� ]T

where [�� ] denotes [e(n1x ) ⊗ e(n1y ) ⊗ e(n1z ) ⊗ e(n2x ⊗ e(n2y ) ⊗ e(n2z )]. When such a term is present in the Hamiltonian, it contributes an amount < ψ|V12 ⊗ V3 ⊗ V4 |ψ > to < ψ|H|ψ > and for example in case (a), this evaluates to C 2 < (ψ1 ⊗ ψ2 − ψ2 ⊗ ψ1 )|V12 |ψ1 ⊗ ψ2 − ψ2 ⊗ ψ1 >< +|V3 |+ >< +|V4 |+ > Now, < (ψ1 ⊗ ψ2 − ψ2 ⊗ ψ1 )|V12 |ψ1 ⊗ ψ2 − ψ2 ⊗ ψ1 >

=< ψ1 ⊗ψ2 |V12 |ψ1 ⊗ψ2 > + < ψ2 ⊗ψ1 |V12 |ψ2 ⊗ψ1 > − < ψ1 ⊗ψ2 |V12 |ψ2 ⊗ψ1 >

− < ψ2 ⊗ ψ1 |V12 |ψ1 ⊗ ψ2 >

The variational derivative of this w.r.t ψ1∗ evaluates to (IN 3 ⊗ ψ2∗ )V12 |ψ1 ⊗ ψ2 > +(ψ2∗ ⊗ IN 3 )V12 |ψ2 ⊗ ψ1 >

86

Advanced Probability and Statistics: Remarks and Problems 94CHAPTER 2. REMARKS AND PROBLEMS ON STATISTICAL SIGNAL PROCESSIN −(IN 3 ⊗ ψ2∗ )V12 |ψ2 ⊗ ψ1 > −(ψ2∗ ⊗ IN 3 )V12 |ψ1 ⊗ ψ2 >

multiplied by the factor < +|V3 |+ >< +|V4 |+ >. Note that ψ2∗ means conjugate transpose of ψ2 which is therefore a 1×N 3 row vector. IN 3 ⊗ψ2∗ and ψ2∗ ⊗IN 3 are thus N 3 × N 6 matrices and since V12 is a N 6 × N 6 matrix and |ψ1 ⊗ ψ2 > is an N 6 ×1 vector, the quantities (IN 3 ⊗ψ2∗ )V12 |ψ1 ⊗ψ2 > and (ψ2∗ ⊗IN 3 )V12 |ψ2 ⊗ψ1 > are N 3 × 1 column vectors. Now suppose we have an equation of the form idψ1 /dt = F1 (ψ1 , ψ2 ), idψ2 /dt = F2 (ψ1 , ψ2 ) as in the above derived HartreeFock equations. Here Fk (ψ1 , ψ2 ), k = 1, 2 are 3 3 N 3 × 1 vector valued functions of ψ1 , ψ2 ∈ CN ×1 and ψ1∗ , ψ2∗ ∈ C1×N . These are solved in MATLAB using a for loop implementation: Represent ψ1 , ψ2 as N 3 × K matrices where ψ1 (:, t), ψ2 (:, t) are the vectors ψ1 , ψ2 at time t. K is the total number of time samples. Then the for loop iteration will read for t = 1 : K

ψ1 (:, t + 1) = ψ1 (:, t) − i ∗ δ ∗ F1 (ψ1 (:, t), ψ2 (:, t));

ψ2 (:, t + 1) = ψ2 (:, t) − i ∗ δ ∗ F2 (ψ1 (:, t), ψ2 (:, t));

ψ1 (: t + 1) = ψ1 (:, t + 1)/norm(ψ1 (:, t + 1));

ψ2 (:, t + 1) = ψ2 (:, t + 1)/norm(ψ2 (:, t + 1));

[18] Computing the perturbation in the singular values and vectors under a small perturbation of a matrix with application to the MUSIC and ESPRIT algorithms. A ∈ CM ×N Has an SVD

A = U DV ∗

Let rank(A) = r. Then since A∗ A = V DDT V ∗ With DDT diagonal square matrix having exactly r nonzero (positive) elements, say λk = σk2 , k = 1, 2, . . . , r, we can write A∗ Avk = λk vk , k = 1, 2, . . . , r, Avk = 0, k = r + 1, . . . , N Also, AV = U D Gives for the columns u1 , . . . , uM of U , uk = Avk /σk , k = 1, 2, . . . , r Suppose now that A gets perturbed to A + δA. Then, B = A∗ A will get perturbed to B + δB, where δB = A∗ .δA + δA∗ .A + δA∗ .δA

Advanced Probability and Statistics: Remarks and Problems

87 95

= δ1 B + δ 2 B Where

δ1 B = A∗ δA + δA∗ A

is the ﬁrst order perturbation in B and δ2 B = δA∗ .δA Is the second order perturbation in B. Now using second order perturbation theory, we write for the perturbation in vk , Given by δ1 vk + δ2 vk where δ1 vk and δ2 vk are the ﬁrst and second order perturbation terms, and also for the corresponding perturbations δ1 λk + δ2 λk in the singular values, (B + δ1 B + δ2 B)(vk + δ1 vk + δ2 vk ) = (λk + δ1 λk + δ2 λk )(vk + δ1 vk + δ2 vk ) Equating terms of the same order on both sides gives Bvk = λk vk , Bδ1 vk + δ1 Bvk = λk δ1 vk + δ1 λk .vk , Bδ2 vk + δ1 λk δ1 vk + δ2 λk vk = λk δ2 vk

Perturbation theoretic analysis of the SVD based ESPRIT algo rithm Step 1: The noiseless array signal model is X0 = AS, X1 = AΦS and the noisy array signal model is Y0 = X0 + W0 , Y1 = X1 + W1 Here, A is an n × p matrix of full column rank p while S is a p × m matrix of full row rank p. Φ is a p × p diagonal unitary matrix. We also assume that n ≤ m and then Y0 , Y1 are n × m matrices of full row rank n (This assumption means that the number of time samples is much larger than the number of sensors). We write the SVD’s of X0 , Y0 , X1 , Y1 as X0 = U0 D0 V0∗ , X1 = U1 D1 V1∗ , ˜0 D ˜1 D ˜ 0 V˜ ∗ , Y1 = U ˜ 1 V˜ ∗ Y0 = U 1 0 where U0 , U1 are n×p matrices each with orthonormal columns, V0 , V1 are m×p matrices each with orthonormal columns, D0 , D1 are p × p diagonal matrices ˜1 are n×n unitary matrices, V˜0 , V˜1 are m×n ˜0 , U with positive diagonal entries, U ˜ 1 are n×n diagonal matrices ˜ 0, D matrices each with orthonormal columns and D ˜ 0 and D ˜ 1 are with positive diagonal entries. The ﬁrst p diagonal entries of D

8896CHAPTER 2. REMARKS Advanced andON Statistics: Remarks and Problems AND Probability PROBLEMS STATISTICAL SIGNAL PROCESSIN small perturbations of the diagonal entries of D0 and D1 respectively. The last ˜ 1 are small positive perturbations of zero. ˜ 0 and D n − p diagonal entries of D We can thus write ˜ 0 = diag[D0 + δD0 , δD01 ] D ˜ 1 = diag[D1 + δD1 , δD11 ] D ˜0 = [U0 + δU0 |U01 + δU01 ], U V˜0 = [V0 + δV0 |V01 ]

V˜1 = [V1 + δV1 |V11 ],

˜1 = [U1 + δU1 |U11 + δU11 ] U where δD0 , δD01 , δD1 , δD11 , δV0 , V01 , δV1 , V11 are computed by standard per turbation theory for Hermitian matrices with the Hermitian matrices Y0∗ Y0 and Y1∗ Y1 being regarded as small perturbations of the Hermitian matrices X0∗ X0 and X1∗ X1 respectively. Note that Xk∗ Xk , k = 0, 1 are m × m positive semidef inite matrices of rank p while Yk∗ Yk , k = 0, 1 are m × m positive semideﬁnite matrices of rank n. Note that δV0 and δD0 and likewise, δV1 and δD1 are com puted using standard nondegenerate perturbation theory for Hermitian matri ces while V01 , δD01 , V11 and δD11 are computed using degenerate perturbation theory for Hermitian matrices based on the secular matrix theory as described in Dirac’s book, ”The principles of quantum mechanics”. Then, U0 D0 = X0 V0 , U0 = X0 V0 D0−1 , U1 D1 = X1 V1 , U1 = X1 V1 D1−1 Thus, since δX0 = Y0 − X0 = W0 , δX1 = Y1 − X1 = W1 , we get δU0 = W0 V0 D0−1 + X0 δV0 D0−1 − X0 V0 D0−2 δD0 δU1 = W1 V1 D1−1 + X1 δV1 .D1−1 − X1 V1 D1−2 .δD1 using ﬁrst order perturbation theory. The diagonal entries of Φ are derived as the rank reducing numbers (rrn) of the matrix pencil P1 (γ) = X1 − γ.X0 = A(Φ − γ.Ip )S Now,

X1 − γX0 = U1 D1 V1∗ − γ.U0 D0 V0∗ = U1 (D1 − γQ1 D0 Q∗2 )V1∗

and since U1 and V1 have full column rank p, the rrn’s of the matrix pencil P1 (γ) can alternatively be obtained as the rrn’s of the pencil D1 − γQ1 D0 Q∗2 where

Q1 = U1∗ U0 , Q2 = V1∗ V0

Advanced Probability and Statistics: Remarks and Problems

89 97

are p × p matrices. These rrn’s are therefore equivalently the solutions of the pth degree polynomial equation det(D1 − γQ1 D0 Q∗2 ) = 0 Now let γk be one of these rrn’s, ie, it is one of the diagonal entries of Φ. Then there exist exactly two vectors ξk and ηk upto a constant of proportionality such that ξk∗ (D1 − γk Q1 D0 Q∗2 ) = 0, (D1 − γk Q1 D0 Q∗2 )ηk = 0 These equations can be expressed as ξk∗ U1∗ (U1 D1 V1∗ − γk U0 D0 V0∗ )V1 = 0, U1∗ (U1 D1 V1∗ − γk U0 D0 V0∗ )V1 ηk = 0 since

U1∗ U1 = V1∗ V1 = Ip

or equivalently, x∗k A(Φ − γk .Ip )SV1 = 0, U1∗ A(Φ − γk Ip )Syk = 0 − − − (1) where xk = U1 ξk , yk = V1 ηk Now since

R(A) = R(U0 ) = R(U1 ), R(S ∗ ) = R(V0 ) = R(V1 )

and the second implies while the ﬁrst implies

N (S) = R(V1 )⊥ R(U1 )⊥ = N (A∗ ),

it follows that (1) implies and is in fact equivalent to x∗k A(Φ − γk Ip )S = 0 − − − (2a) and A(Φ − γk Ip )Syk = 0 − − − (2b) or equivalently, since S has full row rank and A has full column rank, x∗k A(Φ − γk Ip ) = 0, (Φ − γk Ip )Syk = 0 Moreover, xk ∈ R(U1 ) = R(A) = N (A∗ )⊥ , yk ∈ R(V1 ) = R(S ∗ ) = N (S)⊥ it follows that

x∗k A = 0, Syk = 0

90 Advanced Probability and Statistics: Remarks and Problems 98CHAPTER 2. REMARKS AND PROBLEMS ON STATISTICAL SIGNAL PROCESSIN and hence, (2) when combined with the fact that only the k th diagonal entry of Φ − γk Ip vanishes that x∗k ak �= 0, x∗k aj = 0, j �= k, sTk yk �= 0, sjT yk = 0, j �= k where sTk , k = 1, 2, ..., p are the rows of S. Thus, xk∗ X0 = ξk∗ AS = xk∗ ak skT , X0 yk = ASyk = ak sTk yk Note that ak , k = 1, 2, ..., p are the columns of A while sTk , k = 1, 2, ..., p are the rows of S. This gives us the identity that skT is proportional to xk∗ X0 and ak is proportional to X0 yk . Note further that the above identities imply (x∗k ak )(skT yk ) = xk∗ X0 yk Now when noise perturbs the data matrices, we have to consider the matrix pencil P˜ (γ) = Y1 − γ.Y0 in place of its unperturbed version P (γ). This is a matrix of size n × m and will generally have rank n but when γ assumes one of its rrn values, the rank of this matrix pencil drops to n − 1. This is in contrast to the noiseless case where where the rank of P (γ) = X1 − γ.X0 (which is again a matrix of size n × m) drops from p to p − 1 when γ assumes one of the rrn values. Now, ˜0 D ˜1 D ˜ 1 V˜ ∗ − γ.U ˜ 0 V˜ ∗ P˜ (γ) = U 1 0 ˜1 (D ˜ 1 − γ.Q ˜1D ˜ ∗ )V˜ ∗ ˜ 0 .Q =U 2 1 where

˜ ∗U ˜ ˜ ˜∗ ˜ ˜1 = U Q 1 0 , Q 2 = V 1 V0

˜ 0 is an n × n diagonal matrix such that its ﬁrst p diagonal entries are large, D these being the corresponding perturbations of those of D0 while its last n − p diagonal entries are small, these being perturbations of the zero singular values ˜1 and V˜1 have full column ranks n, it follows that of X0 . We note that since U ˜ the rrn’s of P (γ) are same as those of the n × n matrix ˜ 1 .D ˜ 1 − γ.Q ˜ 0 .Q ˜∗ D 2 Note that

˜1 = Y1 V˜1 .D ˜ −1 ˜0 = Y0 V˜0 .D ˜ −1 , U U 0 1

˜1, Q ˜ 2 are n × n nonsingular matrices. These rrn’s are obtained Note also that Q by solving ˜ 1 − γ.Q ˜ 1 .D ˜ 0 .Q ˜ ∗ ) = 0 − − − (3) det(D 2

Advanced Probability and Statistics: Remarks and Problems

9991

p of these solutions will generally be close to the diagonal elements of Φ (ie, the rrn’s for the unperturbed case) while the remaining n − p solutions will be close to zero. Now, we arrange our SVD so that the ﬁrst p largest diagonal values ˜ 0 appear before the remaining n − p diagonal values. Then, we have as of D discussed above, ˜ 0 = diag[D0 + δD0 |δD01 ] D and we are interested in evaluating not the solutions of (3), but rather the solutions of the determinantal equation obtained by considering the top p × p left hand corner block of ˜ 1 .D ˜ 0 .Q ˜∗ ˜ 1 − γ.Q D 2

˜ 1 is D1 + δD1 while the top left hand corner The top left hand corner block of D ˜ 0 .Q ˜ ∗ is ˜ 1 .D block of Q 2 ˜ 0 .Q ˜ ∗ )11 = (Q ˜ 1 .D ˜ 1 )11 (D0 + δD0 )(Q ˜ 2 )11 )∗ + (Q ˜ 1 )12 δD01 ((Q ˜ 2 )12 )∗ (Q 2 where we have used the partition � (Qk )11 ˜k = Q (Qk )21

(Qk )12 (Qk )22

�

, k = 1, 2

Next observe that using ﬁrst order perturbation theory, ˜0 = ˜1 = U ˜1∗ U Q [U1 + δU1 |U11 + δU11 ]∗ [U0 + δU0 |U01 + δU01 ] implies that ˜ 1 )11 =

(Q (U1 + δU1 )∗ (U0 + δU0 ) = U1∗ U0 + U1∗ δU0 + δU1∗ U0

= Q1 + U1∗ δU0 + δU1∗ U0

˜ 1 )12 = (U1 +δU1 )∗ (U01 +δU01 ) = U ∗ U01 +U ∗ δU01 +δU ∗ U01 = U ∗ δU01 +δU ∗ U01

(Q 1 1 1 1 1 ˜ 2 = V˜ ∗ V˜0 = Q 1 [V1 + δV1 |V11 ]∗ [V0 + δV0 |V01 ] implies that ˜ 2 )11 = V1∗ V0 + V1∗ δV0 + δV1∗ V0 = (Q Q2 + V1∗ δV0 + δV1∗ V0 ˜ 2 )12 = V ∗ V01 + δV ∗ V01 = δV ∗ V01 (Q 1 1 1 and, thus, ˜ 0 .Q ˜ ∗ )11 = ˜ 1 .D (Q 2 Q1 D0 Q2 + Q1 δD0 .Q∗2

92 100CHAPTER 2. REMARKS Advanced Probability and Statistics: Remarks and Problems AND PROBLEMS ON STATISTICAL SIGNAL PROCESSI upto ﬁrst order perturbation terms. Thus upto ﬁrst order perturbation theory, the estimated values γˆk , k = 1, 2, ..., p of the true rrn’s γk , k = 1, 2, ..., p are the solutions of det(D1 + δD1 − γˆ .Q1 (D0 + δD0 )Q∗2 ) = 0 and associated left and right eigenvectors corresponding to the estimated rrn’s γˆk are obtained by solving ξˆkT (D1 + δD1 − γˆk .Q1 (D0 + δD0 )Q∗2 ) = 0, (D1 + δD1 − γˆ .Q1 (D0 + δD0 )Q∗2 )ˆ ηk = 0 Remarks: [a] [V0 |V01 ]

is a matrix of size m × n having orthonormal columns. V01 is determined by the eigenvectors of the ”secular perturbation matrix” corresponding to the zero (unperturbed) eigenvalue of the m × m matrix X0∗ X0 and its perturbation δ(X0∗ X0 ) = X0∗ W0 + W0∗ X0 The secular matrix corresponding to this is the m × m matrix of this perturbing operator w.r.t an onb for the zero eigenvalue subspace of X0∗ X0 . This secular matrix is therefore of size m − p × m − p. The n − p columns of V01 are thus orthonormal and form a subspace of the space spanned by the m−p eigenvectors of X0∗ X0 corresponding to its zero eigenvalue. Likewise, the n − p columns of V11 are also orthonormal and form a subspace of the space spanned by the m−p eigenvectors of X0∗ X0 corresponding to its zero eigenvalue. This latter m − p dimensional space is precisely N (X0∗ X0 ) = N (X0 ) = N (S) = N (X1∗ X1 ) = N (X1 ). Equivalently, R(V0 ) = R(V1 ) = R(S ∗ ) and N (S) = R(S ∗ )⊥ contains R(V01 ) as well as R(V11 ). R([V0 + δV0 |V01 ]) = R(Y0∗ ) = R(Y0∗ Y0 ) andR([V1 + δV1 |V11 ]) = R(Y1∗ ) = R(Y1∗ Y1 ) and these two subspaces clearly have dimension n. R(V0 ) = R(V1 ) = R(S ∗ ) = N (S)⊥ is in particular orthogonal to R(V01 ) as well to R(V11 ).In particular, V1∗ V01 = 0 [b] Similar remarks apply to U in place of V . The corresponding relations are obtained using X0∗ = S ∗ A∗ , X1∗ = S ∗ Φ∗ A∗ , Y0∗ = X0∗ + W0∗ , Y1∗ = X1∗ + W1∗ in place of X0 , X1 , Y0 , Y1 . In particular, U1∗ U01 = 0. This could also be seen directly using [U0 + δU0 |U01 ]diag[D0 + δD0 , δD01 ] = (X0 + W0 )[V0 + δV0 |V01 ] which implies using ﬁrst order perturbation theory that U 0 D 0 = X0 V 0 , U0 δD0 + δU0 D0 = X0 δV0 + W0 V0 ,

Advanced Probability and Statistics: Remarks and Problems

93 101

X0 V01 = 0 U01 δD01 = W0 V01 Thus in particular, U01 = W0 V01 (δD01 )−1 These equations do not appear to imply U1∗ U01 = 0. However, let us see the contribution of this term to ˜ 2 )12 )∗ ˜ 1 )12 δD01 ((Q (Q It is given by ˜ 2 )12 )∗ = U1∗ U01 δD01 ((Q ˜ 2 )12 )∗ U1∗ W0 V01 ((Q = U1∗ W0 V01 δV1∗ V01 which is of the second order of smallness in perturbation theory and hence can be neglected. In fact, this could directly be inferred from the fact that ˜ 1 )12 δD01 ((Q ˜ 2 )12 )∗ is of the second order of smallness since ((Q ˜ 2 )12 )∗ = (Q δV1∗ V01 is of the second order of smallness. [19] Problem on videoconferencing (Suggested to me by Prof.Vijyant Agar wal). There are N speakers numbered 1, 2, ..., N conversing over a common line. Let xk denote the speech vector signal spoken by the k th speaker. Assume that the listener receives a superposition of compressed versions of the diﬀerent speakers. For example, if xk (t), t = 1, 2, ..., M are the speech samples of the k th speaker, then his dominant wavelet coeﬃcients are ck (n, m) =

N � t=1

xk (t)ψnm (t), (n, m) ∈ Dk

where Dk is a small index pair set compared with the original number N of time samples of the signal. This equation can be expressed in the form ck = Ak xk where Ak = ((ψnm (t)))(n,m)∈Dk ,t∈{1,2,...,N } Let µ(Dk ) denote the number of elements in Dk . Then Ak ∈ Rµ(Dk )×N and since we are assuming that µ(Dk ) 4 while � θ 1 θ 2 θ3 θ4 d4 θ = 1 it follows that δ 4 (θ − θ� ) = (θ1 − θ1� )(θ2 − θ2� )(θ3 − θ3� )(θ4 − θ4� ) This may be explicitly checked by writing out f (θ) as f (θ) = c0 + c1 (k)θk + c2 (k, m)θk θm + c3 (k, m, n)θk θm θn + c4 θ1 θ2 θ3 θ4 and applying the above Berezin rules taking into account the anitcommutativity of the θ and the θ� to show that � f (θ)δ 4 (θ − θ� )d4 θ = f (θ� ) In the absence of a superpotential f (Φ), the ﬁeld equations are

2 2 DR S = 0

DL

This equation should be regarded as the superversion of the classical massless KleinGordon equation or equivalently the wave equation. The corresponding superpropagator G should satisfy the super pde 2 2 DR G = P δ 4 (x − x� )δ 4 (θ − θ� ) DL

where P is the projection onto the space of superﬁelds ﬁelds that belong to 2 2 DR , or equivalently that be the orthogonal complement of the nullspace of DL 2 2 long to the range space of DL DR . This is precisely in analogy with quantum electrodynamics. It is easy to see that 2 2 DR P = K�−1 DL

for some real constant K. In fact, we have that P annihilates any vector in the 2 and further, we have range of DR and hence in the range of DR 2 2 2 2 2 2 2 2 DR DL D R = DL [DR , DL ]DR P 2 = K 2 �−2 DL

with 2 T DLa = DR �DR DLa DR

{DRa , DLb } = {(γ µ θL ∂µ − γ 5 �∂θR )a , (γ ν θR ∂ν − γ 5 �∂θL )b }

130 Advanced Probability andON Statistics: Remarks SIGNAL and Problems 138CHAPTER 2. REMARKS AND PROBLEMS STATISTICAL PROCESSIN = −(γ µ )ac {θLc , ∂θLd }(γ 5 �)bd ∂µ ν −(γ 5 �)ac {∂θRc , θRd }γbd ∂ν

= −[γ µ ((1 + γ 5 )/2)(γ 5 �)T − γ 5 �((1 − γ 5 )/2)γ µT ]ab ∂µ

= [γ µ (1 + γ 5 )� + (1 − γ 5 )γ µ �]ab ∂µ

= [γ µ �(1 + γ 5 )]ab ∂µ = Xab

say. Interchanging a and b gives {DLa , DRb } = −((1 + γ 5 )�γ µT )ab ∂µ = −(γ µ �(1 − γ 5 ))ab ∂µ = Xba say. In matrix notation, these identities are expressible as T {DR , DL } = γ µ �(1 + γ 5 )∂µ , T {DL , DR } = −γ µ �(1 − γ 5 )∂µ

Adding these two equations and noting that DL anticommutes with itself and DR also anticommutes with itself, we get {D, DT } = 2γ µ γ 5 �∂µ Then, 2 DR DLa = �bc DRb DRc DLa =

�bc DRb (Xca − DLa DRc )

= �bc Xca DRb − �bc DRb DLa DRc

= �bc Xca DRb − �bc (Xba − DLa DRb )DRc

= �bc Xca DRb − �bc Xba DRc

2

+DLa DR

Equivalently, 2 [DR , DLa ] = �bc Xca DRb − �bc Xba DRc

Then, 2 DR DLa DLp = �bc Xca (Xbp − DLp DRb )

2 2 −�bc Xba (Xcp − DLp DRc ) + DLa ([DR , DLp ] + DLp DR )

so that 2 2 2 2 2 DR DL DR = �ap DR DLa DLp DR = 2 = (epsilonbc Xca �ap Xbp − �bc Xba �ap Xcp )DR

� since the product of any three DR s is zero. We can express this relationship as 2 2 2 (DR DL ) =

Advanced Probability and Statistics: Remarks and Problems

131 139

2 2T r(�.X.�.X T )DR

and hence, 2 2 2 2 2 DR ) = 2T r(�.X.�.X T )DL DR (DL

Now, X = γ µ �(1 + γ 5 )∂µ and hence, T r(�.X.�.X T ) = −T r(�.γ µ �(1 + γ 5 )2 �2 γ νT )∂µ ∂ν But, −T r(�.γ µ �(1 + γ 5 )2 �2 γ νT ) 2T r(�.γ µ .�(1 + γ 5 )γ νT )

= −2T r(�γ µ γ ν (1 − γ 5 )�)

= 2.T r(γ µ γ ν (1 − γ 5 )) = c.η µν where c is a real constant. Thus, 2 2 2 2 2 DR DR ) = c.�.DL (DL 2 2 from which it follows that P = c−1 DL DR /� is idempotent, ie, a projection.

Supersymmetric gauge theories: The gauge Design of quantum unitary gates using supersymmetric ﬁeld theories: Given a Lagrangian for a set of Chiral superﬁelds and gauge superﬁelds, we can construct the action as an integral of the Lagrangian over spacetime. We can include forcing terms in this Lagrangian for example by adding cnumber control gauge potentials to the quantum gauge ﬁeld VµA (x) or cnumber control current terms to the terms involving the Dirac current which couples to the gauge ﬁeld. After adding these cnumber control terms, the resulting action will no longer be supersymmetric. However, we can still construct the Feynman path integral for the resulting action between two states of the ﬁeld ie, by specifying the ﬁelds at the two endpoints of a time interval [0, T ] and then we obtain a transition matrix between these two states of the ﬁeld. For example, the initial state can be a coherent state in which the annihilation component of the electromagnetic vector potential has deﬁnite values and the Dirac ﬁeld of electrons and positrons is in a Fermionic coherent state where the annihilation component of the wave function has deﬁnite values. Likewise with the ﬁnal state. Or else, we may specify the initial state to be a state in which there are deﬁnite numbers of photons, electrons and positrons having deﬁnite four momenta and spins and so also with the ﬁnal state. In the case of a supersymmetric theory, we’ll have to also specify the states of the other ﬁelds like the gaugino ﬁeld, the gravitino ﬁeld and the auxiliary ﬁelds or else we may break the supersymmetry by expressing the auxiliary ﬁelds in terms of the other superﬁeld components using the variational equations of motion and then calculate the the Feynman

132 Advanced and ON Statistics: Remarks and Problems 140CHAPTER 2. REMARKS ANDProbability PROBLEMS STATISTICAL SIGNAL PROCESSI path integral corresponding to an initial and a ﬁnal state and then make this transition matrix as close as possible to a desired transition matrix by optimizing over the cnumber control ﬁelds. [21] One of the main achievements in the work of C.R.Rao was the proof of the lower bound on the error covariance matrix of a statistical estimator of a vector valued parameter based on vector valued observations using techniques of matrix theory. C.R.Rao in his work has also considered the case when the Fisher information matrix is singular and in this case, he has been able to use the methods of generalized inverses to obtain new formulas for the lower bound. The lower bound on the variance of an estimator should be compared to the Heisenberg uncertainty principle in quantum mechanics for two non commuting observables. In fact, it can be shown that the Heisenberg uncertainty inequality for position and momentum can be derived using the CRLB. The CRLB roughly tells us that no matter how much we may try, we can never achieve complete accuracy in our estimation process, ie, there is inherently some amount of uncertainty about the system that generates a random observation.

Chapter 3 Chapter 3

Some Study Projects on Some StudySignal Projects on Applied Processing Applied SignalAbout Processing with Remarks Related with remarks about related Contributions of Scientists contributions of scientists [1] Linear models: Time series models like AR, M A, ARM A, casting these mod els in the form X(n) = H(n)θ + V (n) where X(n), H(n) are data vectors and data matrices. V (n) is noise. H(n) ∈ Rn×p , X(n), V (n) ∈ Rn . If Rv = Cov(V (n)) and V (n) are iid zero mean Gaus sian, then the MLE of θ based on data collected upto time n is given by ˆ θ(n) =(

n �

H(k)T Rv H(k))−1 (

k=1

n �

H(n)T Rv X(k))

k=1

Since X(n), H(n) are data matrices, they are also random and we wish to de termine the mean and covariance of θˆ(n) in terms of the statistics of these data matrices. [2] Innovations process and its application to the construction of the Wiener ﬁlter. Stochastic processes as curves in Hilbert space. Let x(n) be a stationary process. Its innovations process e(n) can be expressed as ⊥ e(n) = Pspan(x(k):k≤n−1) x(n) = x(n) − Pspan(x(k):k≤n) x(n)

=

∞ �

k=0

l(k)x(n − k), l(0) = 1 141

133

134 Advanced Probability and Statistics: Remarks and Problems 142CHAPTER 3. SOME STUDY PROJECTS ON APPLIED SIGNAL PROCESSING WIT This means that e(n) ∈ span(x(k) : k ≤ n) and by inversion, x(n) ∈ span(e(k) : k ≤ n) so we can write � � � h(k)z −k , x(n) = h(k)e(n−k), h(0) = 1 L(z) = l(k)z −k , H(z) = L(z)−1 = k≥0

k≥0

k≥0

e(n) is clearly a white process. Let E(e(n)2 ) = σe2 Then, the power spectral density of x(n) is given by Sxx (z) = σe2 H(z)H(z −1 ) To get a causal, stable H(z) with an inverse that is also causal and stable, we assume that Sxx (z) is rational and then select H(z) so that it is causal stable and minimum phase, ie, is poles and zeroes all fall within the unit circle. Then L(z) = H(z)−1 is also obviously causal and minimum phase. The prediction error energy is σe2 = h(0)2 and if Γ denotes the unit circle, � (2π)−1 ln(Sxx (z))z −1 dz = σe2 Γ

This can be easily veriﬁed using the residue theorem and the fact that H(z) has all its poles and zeroes inside the unit circle. [3] Some Study Project problems [1] Compute the capacitance between two parallel cylinders having diﬀerent radii. [2] State and prove Doob’s maximal inequality for submartingales. [3] State and prove the Martingale downcrossing inequality and its applica tion to proving the martingale convergence theorem. [4] State and prove Doob’s Lp inequality for martingales. [5] Power spectrum estimation:Compute the mean and covariance of the periodogram of a stationary Gaussian process. [6] Apply of Doob’s L2 Martingale inequality to prove the almost sure ex istence and uniqueness of the solutions to Ito’s stochastic diﬀerential equation when the drift and diﬀusion coeﬃcients satisfy the Lipshitz conditions. [7] When the ChoiKrausStinespring operators of a quantum noisy channel have classical randomness, then how does one determine the mean square state estimation error of the output of the recovery channel when the recovery op erators have been designed in accordance with the KnillLaﬂamme theorem for

Advanced Probability and Statistics: Remarks and Problems

135 143

the mean value of the noisy channel operators in the ChoiKrausStinespring representation ? [8] Derive the Nonlinear ﬁltering equations for a Markov state when the measurement noise is a mixture of a white Gaussian component and a compound Poisson component. [9] In quantum scattering theory, when the free particle Hamiltonian is H0 and the scattering potential is V where V is a Gaussian random Hermitian operator, then how does one compute the statistical moments of the scattering operator using the well known formula for the moments of a Gaussian random vector. [10] Application of Cramer’s theorem to computing the optimal rate at which the probability of missing the target tends to zero given that the probability of false alarm tends to zero with regard to the binary hypothesis testing problem for a sequence of iid random variables, ie, under H1 , the data (X1 , ..., Xn ) has a pdf of p1 (x1 )...p1 (xn ) and under H0 , it has a pdf of p0 (x1 )...p0 (xn ). The NeymanPearson test for n iid data samples is given as follows: Select 1 )...p1 (xn ) H1 if pp10 (x (x1 )...p0 (xn ) > λn and select H0 otherwise, where λn is chosen so that P r(H1 |H0 ) = PF (n) → 0 as n → ∞ and we prove that under this constraint on PF (n), the minimum possible value of limn n−1 log(PM (n)) is −D(p1 |p0 ) where PM (n) = P r(H0 |H1 ) and � D(p1 |p0 ) = p1 (x).ln(p1 (x)/p0 (x))dx The proof of this result is based on deﬁning sequence ξ(n) = log(p1 (Xn )/p0 (Xn )) and S(n) = (ξ(1) + ... + ξ(n))/n Under any given hypothesis, the ξ(n)� s are iid r.v’s. The NeymanPearson test is S(n)/n > η(n) implies select H1 and select H0 otherwise where η(n) = n−1 .log(λn ). The false alarm probability is given by PF (n) = P r(S(n)/n > η(n)|H0 ) and the miss probability is PM (n) = P r(S(n)/n < η(n)|H1 ) These two probabilities are approximately evaluated using Cramer’s theorem. We have n−1 log(E(exp(s.Sn )|H0 ) = logE(exp(s.X1 )|H0 ) � � s = log (p1 (x)/p0 (x)) p0 (x)dx = log p1 (x)s p0 (x)1−s dx

and

n

−1

log(E(exp(s.Sn )|H1 ) = log

�

p1 (x)1+s p0 (x)s dx

136144CHAPTER 3. SOMEAdvanced Probability and Remarks andPROCESSING Problems STUDY PROJECTS ONStatistics: APPLIED SIGNAL WI Thus, for large n, n

−1

.log(PF (n)) ≈ −infx>η sups (s.x − log = −sups≥0 (s.η − log

�

�

p1 (x)s p0 (x)1−s dx)

p1 (x)s p0 (x)1−s dx)

The minimum that we require is that PF (n) → 0 and this is guaranteed once the above quantity is negative, in the worst case, we may take it to be a negative number arbitrarily close to zero. Equivalently, in this worst case situation, the supremum above, namely zero is attained when s = 0 and the optimal choice of the threshold η is obtained by setting the derivative above w.r.t s to be zero at s = 0, ie, � η = p1 (x)log(p1 (x)/p0 (x))dx = D(p1 |p0 )

For this value of the optimal threshold, we compute the optimal rate at which PM (n) converges to zero again by applying Cramer’s theorem. [11] Basics of queueing theory: Let X1 , X2 , ... denote the successive interarrival times of packets in a single server queue and let T1 , T2 , ... denote the service times for packet 1, packet 2,...etc. Let Wn denote the total waiting time for the nth packet including the service time, ie, Wn is the time taken for the nth packet starting from his arrival time upto the time when his service is completed and he leaves. We then have the obvious recursive relationship Wn+1 = max(Sn + Wn − Sn+1 , 0) + Tn+1 where Sn = X1 + ... + Xn Xn� s

Suppose the are iid with distribution FX and the Tn� s are iid with dis tribution FT . Then, the probability is to determine the law of the waiting time process {Wn , n = 1, 2, ...}. We also wish to determine the distribution of N (t), the number of packets in the queue at time t. We see that the total number of departures that have taken place in the duration [0, t] is given by D(t) = max(n ≥ 1 : W1 + ... + Wn ≤ t} and the total number of arrivals that have taken place in the duration [0, t] is given by A(t) = max{n ≥ 1 : Sn ≤ t}. Then, the size of the queue at time t is given by N (t) = A(t) − D(t)

[12] Group representation theory and its application to statistical image processing on a curved manifold. [1] Deﬁnition of the image ﬁeld model on a manifold M on which a Lie group of transformations G acts.

Advanced Probability and Statistics: Remarks and Problems

137 145

[2] Estimation of the group transformation element from the measured image ﬁeld with knowledge of the original noiseless untransformed image ﬁeld using the irreducible representations of the group G. [3] by assuming that the estimate of the Gtransformation element g is a small perturbation of the true transformation, ie, g = exp(δ.X)g0 , X ∈ g calculate the value of X upto O(δ m ) and hence determine the probability that the error X is larger than a threshold � for a given g0 . Some details:The image ﬁeld model f (x) = f0 (g0−1 x) + w(x), x ∈ M Let Ynl (x), l = 1, 2, ..., dn deﬁne an onb for an irreducible unitary representation πn of G appearing in the decomposition of the representation U in L2 (M, µ) where µ is a Ginvariant measure on M and U (g)f (x) = f (g −1 .x). Then U (g)Ynl (x) = Ynl (g −1 x) =

dn �

[πn (g)]ml Ynm (x)

m=1

with the additional condition, � Y¯nl (x)Ykm (x)dµ(x) = δ(n, k).δ(l, m) M

and L2 (M, µ) = Cl(span{Ynl : 1 ≤ l ≤ dn , n ≥ 1}) The image ﬁeld model is then equivalent to f (n) = πn (g)f0 (n) + w(n), n ≥ 1 where n , f (n, l) = f (n) = ((f (n, l)))dl=1

�

f (x)Y¯nl (x)dµ(x) M

ˆ )g0 its estimate. Then, we have Let g0 be the true value of g and exp(δ.X � ˆ = argminX∈g X σ(n)−2 � f (n) − πn (exp(δ.X)g0 )f0 (n) �2 n

where we are assuming that the noise is Gaussian with zero mean and G invariant correlation. [13] Statistical theory of ﬂuid turbulence:Partial diﬀerential equations for the velocity ﬁeld moments. From homogeneity and isotropy, Rij (r1 , r2 ) =< vi (r1 )vj (r2 ) >= A(r)ni nj + B(r)δij

138146CHAPTER 3. SOMEAdvanced Probability and Remarks andPROCESSING Problems STUDY PROJECTS ONStatistics: APPLIED SIGNAL WI where r = |r1 − r2 | and n ˆ = (r2 − r1 )/|r2 − r1 |. Also Cijk (r1 , r2 , r3 ) =< vi (r1 )vj (r2 )vk (r3 ) >= Cijk (r2 − r1 , r3 − r1 ) This third rank tensor must be constructed using scalar functions of |r2 − r1 |, |r3 − r1 | and the unit vectors along the directions r2 − r1 and r3 − r1 and it must be symmetric w.r.t the interchange of (r2 − r1 , j) and (r3 − r1 , k). Thus the general form of this tensor is given by Cijk (r1 , r2 , r3 ) = A1 (|r2 − r1 |, |r3 − r1 |)ni nj mk + A1 (|r3 − r1 |, |r2 − r1 |)mi mj nk

A3 (|r2 − r1 |, |r3 − r1 |)δij mk + A3 (|r3 − r1 |, |r2 − r1 |)δik nj

+A4 (|r2 − r1 |, |r3 − r1 |)ni nj nk + A4 (|r3 − r1 |, |r2 − r1 |)mi mj mk

where n is the unit vector along r2 − r1 and m is the unit vector along r3 − r1 .

[4] Estimating the parameters in an ARMA model XN = HN a + GN b where XN = [x(N ), x(N − 1), ..., x(0)]T , WN [w(N ), w(N − 1), ..., w(0)]T HN = [z −1 XN , ..., z −p XN ], GN = [WN , z −1 WN , ..., z −q WN ] a = [a(1), ..., a(p)]T , b = [b(0), ..., b(q)]T This deﬁnes the ARMA model. In order to estimate a, b from this model, we require to compute the pdf of XN given a, b and then maximize this pdf over a, b. [5] Statistical properties of parameter estimates in the AR model using matrix perturbation theory XN = HN θ + WN where XN = [x(N ), x(N − 1), ..., x(0)]T ,

HN = [z −1 XN , z −2 XN , ..., z −p XN ] WN = [w(N ), w(N − 1), ..., w(0)]T

w(n)� s are assumed to be iid N (0, σ 2 ).

T T θˆ(N ) = (HN HN )−1 HN XN

Now,

T T −j ˆ N −1 HN HN = ((N −1 z −i XN z XN ))1≤i,j≤p = R T T N −1 HN XN = ((N −1 z −i XN XN ))pi=1 = rˆ

139 147

Advanced Probability and Statistics: Remarks and Problems where r = ((r(i))), r(i) = R(i) = E(x(n − i)x(n)), R = ((R(i − j)))1≤i,j≤p rˆ(i) = N −1

N �

n=1

We write

ˆ j) = N −1 x(n − i)x(n), R(i,

N �

n=1

x(n − i)x(n − j)

ˆ = R + δR, rˆ = r + δr R Clearly, θ = R−1 r

ˆ )=R ˆ −1 rˆ = (R + δR)−1 (r + δr)

θ(N ≈ (R−1 − R−1 δR.R−1 )(r + δr)

= θ + R−1 δr − R−1 δR.θ

so the estimation error is given by ˆ ) − θ = R−1 δr − R−1 δR.θ eN = θ(N Large deviation evaluation of the rate at which eN converges to zero as N → ∞. Note that by ergodicity, δr, δR → 0, N → ∞ Evaluation of the rate at which�these converges to zero amounts to evaluating N the rate at which z[n] = N −1 n=1 y(n) converges to zero for any stationary process y(n). We make use of the GartnerEllis theorem to evaluate the LDP rate: ΛN (λ) = N −1 .log(Eexp(N λ.z[N ])) =N

−1

.log(Eexp(λ.

N �

y(n)))

n=1

If y(n) is approximately Gaussian with zero mean with autocorrelation R(n), then the above equals � N −1 .log(exp((λ2 /2) (N − 1 − |n|)R(n))) |n|≤N −1

= (N −1 λ2 /2)

�

|n|≤N −1

(N − 1 − |n|)R(n)

which converges as N → ∞ to (λ2 /2)

∞ �

n=−∞

R(n) = λ2 S(0)/2

140148CHAPTER 3. SOMEAdvanced Probability and Remarks andPROCESSING Problems STUDY PROJECTS ONStatistics: APPLIED SIGNAL WI where S(ω) =

�

R(n)exp(−jωn)

n

is the power spectral density of y(n). This determines the large deviation rate. It is more complicated to � derive a formula for the rate at which the empirical N distribution LN (.) = N −1 n=1 δy(n) converges to the one dimensional marginal distribution of y(n). For this, we require to evaluate the limiting GartnerEllis logarithmic moment generating function Λf = limN →∞ N −1 .log(E(exp(

N �

f (y(n))))

n=1

[6] Proof of the L2 mean ergodic theorem for wide sense stationary processes under the condition C(k) → 0, |k| → ∞. C(k) = R(k) − µ2 , µ = E(x(n)), R(k) = E(x(n)x(n + k)) Let SN = E(SN /N − µ)2 = N −1

�

N �

x(n)

n=1

(1 − (1 + |k|)/N )C(k) → 0, N → 0

|k|≤N −1

provided that C(N ) → 0

�n This follows by the Cesaro theorem: If an → 0, n → 0, then n−1 k=1 ak → 0. This proves the mean ergodic theorem for wide sense stationary processes. Now let x(n) be a stationary Gaussian process. Then ﬁx k ∈ Z and put y(n) = (x(n) − µx )(x(n + k) − µx ). Then y(n) is also a stationary process with E(y(n)) = Cx (k), Cy (m) = E(y(n + m)y(n)) − Cx (k)2 = = E[(y(n + m) − Cx (k))(y(n) − Cx (k))]

= Cx (m)2 + Cx (m + k)Cx (m − k) → 0, |m| → ∞

provided that Cx (m) → 0, |m| → ∞. Thus, by the mean ergodic theorem

applied to y(n), N � N −1 y(n) → Cx (k) n=1

which is the same as saying that N −1 .

N �

n=1

x(n)x(n + k) → Rx (k)

Advanced Probability and Statistics: Remarks and Problems since N −1

N �

n=1

141 149

x(n) → µx

by the mean ergodic theorem applied to x(n). [7] Quantum ﬁltering of cavity resonator ﬁelds in interaction with a bath. Consider ﬁrst the T M modes: � Re(c(mnp)exp(jω(mnp)t))umnp (x, y, z) Hz = 0, Ez (t, x, y, z) = mnp

�

E⊥ (t, x, y, z) =

mnp

or equivalently, Ex (t, x, y, z) =

�

h−2 mn Re(c(mnp)exp(jω(mnp)t))∂z �⊥ umnp (x, y, z)

h−2 mn (mπ/a)(pπ/d)Re(c(mnp)exp(jω(mnp)t))vmnp (x, y, z)

mnp

Ey (t, x, y, z) =

�

h−2 mn (nπ/b)(pπ/d)Re(c(mnp)exp(jω(mnp)t))wmnp (x, y, z)

mnp

where

√ √ umnp (x, y, z) = ((2. 2)/ abd)sin(mπx/a)sin(nπy/b)cos(pπz/d) √ √ vmnp (x, y, z) = −((2. 2)/ abd)cos(mπx/a)sin(nπy/b)sin(pπz/d) √ √ wmnp (x, y, z) = −((2. 2)/ abd)sin(mπx/a)cos(nπy/b)sin(pπz/d) Let < . > denote time average. Then � � |c(mnp)|2 , < Ez 2 > dxdydz = (1/2) box

�

box

mnp

< Ex2 + Ey2 > dxdydz = (1/2)

�

mnp

= (1/2)

�

mnp

where

2 2 |c(mnp)|2 h−4 mn (πp/d) (hmn )

|c(mnp)|2 ((πp/d)2 /h2mn )

h2mn = π 2 (m2 /a2 + n2 /b2 ) The total energy in the cavity due to the electric ﬁeld is then � 2 (�/2) )dxdydz (Ez2 + |E⊥ box

=

�

mnp

λ(mnp)|c(mnp)|2

142 Probability and Remarks andPROCESSING Problems 150CHAPTER 3. SOME Advanced STUDY PROJECTS ONStatistics: APPLIED SIGNAL WIT which can be abbreviated to Hs =

�

ω(n)c(n)∗ c(n)

n

where c(n) are annihilation operators of the cavity ﬁeld and c(n)∗ are the cre ation operators. They satisfy the Boson CCR: [c(n), c(m)∗ ] = δ[n − m]

Bath ﬁeld is EB (t, r) =

�

[A�k (t)ψk (r) + A�k (t)∗ ψ¯k (r) + Λ�k (t)ηk (r)]

k

where Ak (.), Ak (.)∗ , Λk (.) are the fundamental noise processes in the quantum stochastic calculus of Hudson and Parthasarathy. They satisfy the quantum Ito formula dAk dA∗m = δk,m dt, dΛk .dΛm = δkm dΛk , dAk dΛm = δkm dAk , dΛm dA∗k = δmk dA∗k Denote the above system electric ﬁeld E by Es (t, r). Then the total ﬁeld energy of the system plus bath ﬁelds within the cavity resonator is given by � (�/2) |Es (t, r) + EB (t, r)|2 d3 r box

Ignoring the bath energy, the total ﬁeld energy of the system (ie, cavity res onator) plus its interaction energy with the bath is given by H(t) = Hs + HI (t) where Hs = (�/2) HI (t) = � =

�

�

�

box

|Es (t, r)|2 d3 r,

(Es (t, r), EB (t, r))d3 r box

(Lk (t)dAk + Mk (t)dA∗k + Nk (t)dΛk )

k

with the Lk (t), Mk (t), Nk (t) being system operators deﬁned by � � 3 Lk (t) = ψk (r)Es (t, r)d r, Mk (t) = ψk (r)∗ Es (t, r)d3 r, box

box

Nk (t) =

�

ηk (r)Es (t, r)d3 r box

Advanced Probability and Statistics: Remarks and Problems Writing Es (t, r) =

L

143 151

Re(c(n).exp(jω(n)t))Fn (r)

n

we get Lk (t) =

L

(l1kn (t)c(n) + l2kn (t)c(n)∗ )

n

where

ψk (r)Fn (r)d3 r

l1kn (t) = (1/2)exp(jω(n)t) box

ψk (r)Fn (r)d3 r

l2kn (t) = (1/2)exp(−jω(n)t) box

Mk (t) =

L

(m1kn (t)c(n) + m2kn (t)c(n)∗ )

n

where

ψ¯k (r)Fn (r)d3 r

m1kn (t) = (1/2)exp(jω(n)t) box

ψ¯k (r)Fn (r)d3 r

m2kn (t) = (1/2)exp(−jω(n)t) box

and Nk (t) =

L

(n1kn (t)c(n) + n2kn (t)c(n)∗ )

n

where

ηk (r)Fn (r)d3 r

n1kn (t) = (1/2)exp(jω(n)t) box

ηk (r)Fn (r)d3 r = n ¯ 1kn (t)

n2kn (t) = (1/2)exp(−jω(n)t) box

Remark: In computing the system Hamiltonian, we must in addition con sider the contribution to the system ﬁeld energy coming from the magnetic ﬁeld. For the T M case under consideration, this energy is HsM = (µ/2) box

< |H⊥ (t, r)|2 > d3 r

where H⊥ = (jωE/h2 )\⊥ Ez × zˆ for a ﬁxed frequency and mode or more precisely, with regard to our modal expansion, L H⊥ (t, r) = E. h−2 mn Re(jω(mnp)c(mnp)exp(jω(mnp)t)).\⊥ umnp (r) mnp

144 Probability and Remarks andPROCESSING Problems 152CHAPTER 3. SOME Advanced STUDY PROJECTS ONStatistics: APPLIED SIGNAL WIT We then ﬁnd that

�

box

< |H⊥ (t, r)|2 > d3 r =

(�2 /2)

�

mnp

and hence

2 h−2 mn |c(mnp)|

HsM = (µ�2 /4)

�

mnp

2 h−2 mn |c(mnp)|

(a) Fermionic ﬁelds as system ﬁelds interacting with a photonic bath (b) Fermionic bath. The creation and annihilation operators satisfying CAR’s are � t

(−1)Λk (s) dAk (s),

Jk (t) =

0

Jk (t)∗ =

�

t

(−1)Λk (s) dAk (s)∗

0

[8] Quantum ﬁltering of YangMills gauge ﬁelds in interaction with a bath The Lagrangian density of the YangMills gauge ﬁeld is a F µνa L = (−1/4)T r(Fµν F µν ) = (−1/4)Fµν

where a a − Aaµ,ν + eC(abc)Aµb Aνc = Aν,µ Fµν

We have a = Aar,0 − Aa0,r + eC(abc)Abr Ac0 F0r a a a a − (1/4)Frs Frs L = (−1/2)F0r F0r

so the canonical momentum corresponding to the position ﬁeld Aar is a a = −F0r Pra = ∂L/∂Ar,0

Thus, a a = −Pra + A0,r − eC(abc)Abr Ac0 Ar,0

The Hamiltonian density is then a a a a a a a − L = −F0r Ar,0 + (1/2)F0r F0r + (1/4)Frs Frs H = Pra Ar,0

The YangMills ﬁeld equations are (Dµ F µν )a = 0 or more precisely in component form, F,νµνa + C(abc)Aνb F µνc = 0

Advanced Probability and Statistics: Remarks and Problems

153145

These ﬁeld equations can also be expressed in Lie algebra notation as [�ν , F µν ] = 0 where �µ = ∂µ − ieAµ = ∂µ + ieAaµ τa since [∂ν , F µν ] = F,νµν , [Aν , F µν ] = [Aνb τb , F µνc τc ] =

Abν F µνc [τb , τc ] = Abν F µνc iC(abc)τ a

[9] Quantum ﬁeld theoretic cavity resonator physics using photons, electrons, positrons, nonAbelian gauge Yang mills matter and parti cle ﬁelds and gravitons [10] Quantum control via feedback Quantum ﬁltering and control algorithms were ﬁrst introduced by V.P.Belavkin and perfected by John Gough, Kostler and Lec Bouten. dU (t) = (−(iH + P )dt + L1 dA − L∗2 dA∗ + SdΛ(t))U (t) We take an observable X and note that its Heisenberg evolution is given by jt (X) = U (t)∗ XU (t) Then djt (X) = jt (θ0 (X))dt + jt (θ1 (X))dA(t) + jt (θ2 (X))dA(t)∗ + jt (θ3 (X))dΛ(t) where θk , k = 0, 1, 2 are linear operators in the linear space of system observ ables. We take nondemolition measurements in the sense of Belavkin of the form Yo (t) = U (t)∗ Yi (t)U (t), Yi (t) = cA(t) + c¯A(t)∗ + k.Λ(t) The Belavkin ﬁlter for this measurement has the following form: πt (X) = E[jt (X)|ηo (t)], ηo (t) = σ(Yo (s) : s ≤ t) dπt (X) = Ft (X)dt +

�

Gkt (X)(dYo (t))k

k≥1

where Ft (X), Gkt (X) ∈ ηo (t). It is a commutative ﬁlter and is therefore called the stochastic Heisenberg equation. Its dual is the stochastic Schrodinger equ tation: � G∗kt (ρt )(dYo (t))k dρt = Ft∗ (ρt )dt + k≥1

146154CHAPTER 3. SOMEAdvanced Probability and Remarks andPROCESSING Problems STUDY PROJECTS ONStatistics: APPLIED SIGNAL WI Let Xd (t) be the desired Heisenberg trajectory. Then, the tracking error at time t is Xd (t) − jt (X). However, we cannot feed this error back into the HP noisy Schrodinger equation because we cannot measure jt (X) directly without perturbing the system. So we use in place of jt (X) its real time estimate πt (X) based on the nondemolition measurements ηo (t) upto time t and feedback in stead the error Xd (t)−πt (X). The system dynamics after feedback is then given by dU (t) = [(−iH + u(t) − P (t))dt + L1 dA(t) − L2 dA(t)∗ + SdΛ(t))U (t) where u(t) = K(Xd (t) − πt (X)) or more generally, u(t) = Lf (Xd (t) − πt (X)) We note that dYo (t) = dYi (t) + dU (t)∗ dYi (t)U (t) + U (t)∗ dYi (t)dU (t) = = dYi (t) − j(cL2 + c¯L∗2 )dt + kjt (S + S ∗ )dΛ + kjt (L1 − L∗2 )dA + kjt (L∗1 − L2 )dA∗

So measuring dYo amounts to measuring −jt (cL2 + c¯L∗2 )dt plus noise. In the context of cavity resonator physics, we have that −L2 is the coeﬃcient of dA∗ in the HP� equation and as we saw, this coeﬃcient is proportional for the k th HP mode to n (l1kn (t)c(n) + l2kn (t)c(n)∗ ). This means that our nondemolition measurement corresponds to measuring some projection of the cavity electric ﬁeld plus noise. Actually, we can construct a whole class of nondemolition measurements that correspond to measuring several projections of the cavity electric ﬁeld plus noise. [11] How to apply machine learning methods to problems in elec tromagnetics, gravitation and quantum mechanics Given an incident em ﬁeld (Ei (ω, r), Hi (ω, r)) incident upon a diseased tis sue characterized by an inhomogeneous permittivity tensor �ab (ω, r) and an inhomogeneous permeability tensor µab (ω, r), we determine the scattered em ﬁelds (Es (ω, r), Hs (ω, r)) after it gets scattered by the tissue. The aim is to estimate the permittivity and permeability and derive characteristic features of these using a neural network and match these characteristic features with prototype features to determine the nature of the disease. We train the neural network to take as input a set of incidentscattered ﬁeld pairs and output the permittivitypermeability parameters. Then when the neural network is pre sented with another incidentscattered ﬁeld pair, it will use its trained weights to generate the permittivity and permeability parameters which can be com pared with the prototype. In quantum mechanics, machine learning can be applied as follows: Let H(θ) be the system Hamiltonian dependent upon an unknown parameter vector to be estimated from repeated measurements of the state taking into account the

Advanced Probability and Statistics: Remarks and Problems

155147

collapse postulate. The ρk denote the state after the k th measurement taken at time tk and let {Ma } denote the POVM. Then, the state after the measurement has been taken at time tk+1 is given by �� ρk+1 = Ma U (tk+1 − tk , θ)ρk U (tk+1 − tk , θ)∗ Ma a

if we make the measurement without noting the outcome and if we note the outcome as a, then the state at time tk+1 just after the measurement has been made is given by � � ρk+1 (a) = [ Ma U (tk+1 − tk , θ)ρk U (tk+1 − tk , θ)∗ Ma ]/T r(numerator) Here

U (t, θ) = exp(−itH(θ)) The joint probability of getting measurement outcomes a1 , ..., ak respectively at times t1 < t2 < ... < tk is given by P r(a1 , ..., ak ; t1 , ..., tk |θ) = � T r( Mak )

[12] Lattice ﬁlters and the RLS lattice algorithm:

X(n) = [x(n), x(n − 1), ..., x(0)]T ,

Xn,p = [z −1 X(n), ..., z −p X(n)]

⊥ ef (n|p) = X(n) − Pn,p X(n) = Pn,p X(n)

where Pn,p is the orthogonal projection of Rn+1 onto Range(Xn,p ). ⊥ −p−1 z X(n) eb (n − 1|p) = Pn,p

Let y(n) be another signal. Write Y (n) = [y(n), y(n − 1), ..., y(0)]T Let

Yˆ (n|p + 1) = Pn+1,p+1 Y (n)

Note that Pn+1,p+1 is the orthogonal projection onto span {X(n), z −1 X(n), ..., z −p X(n)}. We can write Yˆ (n|p + 1) = Xn+1,p+1 hn,p+1 We can also write ef (n|p) = X(n) + Xn,p an,p , eb (n − 1|p) = z −p−1 X(n) + Xn,p bn−1,p

148

Advanced Probability and Statistics: Remarks and Problems 156CHAPTER 3. SOME STUDY PROJECTS ON APPLIED SIGNAL PROCESSING W Update formulas: Pn,p+1 = Pn,p + PPn,p ⊥ z −p−1 X(n) Thus ⊥ ⊥ = Pn,p − PPn,p Pn,p+1 ⊥ z −p−1 X(n) ⊥ = Pn,p − Peb (n−1|p)

Hence, ef (n|p + 1) = ef (n|p) − eb (n − 1|p) < eb (n − 1|p), X(n > / � eb (n − 1|p) �2 = ef (n|p) − eb (n − 1|p) < eb (n − 1|p), ef (n|p) > / � eb (n − 1|p) �2 = ef (n|p) − K(n|p + 1)eb (n − 1|p) from which, it follows that � ef (n|p + 1) �2 =� ef (n|p) �2 −K(n|p + 1)2 � eb (n − 1|p) �2 eb (n|p + 1) = eb (n − 1|p) − K(n|p + 1)ef (n|p) and hence � eb (n|p + 1) �2 =� eb (n − 1|p) �2 −K(n|p + 1)2 � ef (n|p) �2 We also easily see using the GramSchmidt orthonormalization process that Y (n|p) = Xn+1,p+1 hn,p+1 = X(n)hn (0) + z −1 X(n)hn (1) + ... + z −p hn (p) = eb (n|0)gn (0) + eb (n|1)gn (1) + ... + eb (n|p)gn (p) where gn (k) =< Y (n), eb (n|k) >, k = 0, 1, ..., p and hence Y (n|p + 1) = Y (n|p)+ < Y (n), eb (n|p + 1) > e(b(n|p + 1)/ � eb (n|p + 1) �2 or in other words, gn (p + 1) =< Y (n), eb (n|p + 1) > / � eb (n|p + 1) �2 We now look at time updates: Xn+1,p =

�

T ξn,p Xn,p

�

where T ξn,p = [x(n), x(n − 1), ..., x(n − p)]

Advanced Probability and Statistics: Remarks and Problems Then,

149 157

T T T −1 an,p = −(Xn,p Xn,p )−1 Xn,p X(n) = −Rn,p Xn,p X(n)

and T T Xn+1,p ) = Rn,p + ξn,p ξn,p Rn+1,p = (Xn+1,p

Application of the matrix inversion lemma then gives RLS lattice algorithm continued x[n], n ≥ 0 is a process. Let

xn = [x[n], x[n − 1], ..., x[0]]T ∈ Rn+1 , and let z −k xn = [x[n − k], x[n − k − 1], ..., x[0], 0, 0, ..., 0]T ∈ Rn+1 ie we interpret z −k x[m] = x[m − k] = 0, if k > m. Forward predictor of order p at time n: ⊥ xn , ef (n|p) = Pn,p where Pn,p is the orthogonal projection onto R(Xn,p ) where Xn,p = [z −1 xn , z −2 xn , ..., z −p xn ] ∈ Rn+1×p Backward predictor of order p at time n − 1: ⊥ −p−1 z xn eb (n − 1|p) = Pn,p

From the basic theory of projection operators in Hilbert space, it follows that ef (n|p + 1) = ef (n|p) − Kf (n|p)eb (n − 1|p) � � � � eb (n − 1|p) ef (n|p) − Kb (n|p) eb (n|p + 1) = 0 0 where Kf (n|p) =< ef (n|p)k, eb (n − 1|p) > /Eb (n − 1|p),

Kb (n|p) =< ef (n|p), eb (n − 1|p) > /Ef (n|p),

Ef (n|p) =� ef (n|p) �2 , Eb (n − 1|p) =� eb (n − 1|p)

Then, Ef (n|p + 1) = Ef (n|p) − Kf (n|p)2 Eb (n − 1|p) Eb (n|p + 1) = Eb (n − 1|p) − Kb (n|p)2 Ef (n|p) ef (n|p) = xn + Xn,p an,p eb (n − 1|p) = z −p−1 xn + Xn,p bn,p

−1 T −1 −p−1 an,p = −Rn,p Xn,p xn , bn,p = −Rn,p z xn

150 Advanced Probability and Statistics: Remarks and Problems 158CHAPTER 3. SOME STUDY PROJECTS ON APPLIED SIGNAL PROCESSING W where T Xn,p Rn,p = Xn,p T T T Xn+1,p = [ξn,p , Xn,p ], ξn,p = [x[n], x[n − 1], ..., x[n + 1 − p]]

Also, Xn,p+1 = [Xn,p , z −p−1 xn ] Then,

Rn,p+1

T Rn+1,p = Rn,p + ξn,p ξn,p � � T Rn,p Xn,p z −p−1 xn = z −p−1 xTn Xn,p � z −p−1 xn �2

Also, Xn,p+1 = [z −1 xn , z −1 Xn,p ] and hence Xn+1,p+1 = [z −1 xn+1 , z −1 Xn+1,p ] where z −1 xn+1 = [x[n], x[n − 1], ..., x[0], 0]T = [xTn , 0]T and T , 0]T z −1 Xn+1,p = [Xn,p

and hence T (z −1 Xn+1,p ))T (z −1 Xn+1,p ) = Xn,p Xn,p = Rn,p

Then, Rn+1,p+1 =

�

� x n �2 T xn Xn,p

xTn Xn,p Rn,p

�

⊥ −p−1 Note: We have that eb (n − 1|p) = Pn,p z xn and hence eb (n|p + 1) = and further,

⊥ Pn+1,p+1 z −p−2 xn+1

Xn+1,p+1 = [z −1 xn+1 , ..., z −p−1 xn+1 ] = [[xn , z −1 xn , ...z −p xn ]T , 0]T = [[xn , Xn,p ]T , 0]T

and since z −p−2 xn+1 = [z −p−1 xnT , 0]T , we easily see that eb (n|p + 1) = Pn⊥+1,p+1 z −p−2 xn+1 = [(PR⊥([xn ,Xn,p ]) z −p−1 xn )T , 0]T and PR([xn ,Xn,p ]) = PR([ef (n|p),Xn,p ]) from which it follows that ⊥ −p−1 z xn − < z −p−1 xn , ef (n|p) > ef (n|p)/Ef (n|p))T , 0]T eb (n|p + 1) = [(Pn,p

= [(eb (n − 1|p) − Kb (n|p)ef (n|p))T , 0]T

Advanced Probability and Statistics: Remarks and Problems We write An,p (z) = 1 +

p �

151 159

an,p (k)z −k ,

k=1

Bn,p (z) = z −p +

p �

bn,p (k)z −k

k=1

Then, we can write

ef (n|p) = An,p (z)xn , eb (n − 1|p) = z −1 Bn,p (z)xn = Bn,p (z)z −1 xn Thus,

An,p+1 (z) = An,p (z) − Kf (n|p)z −1 Bn,p (z),

Bn+1,p+1 (z)xn = z −1 Bn,p (z) − Kb (n|p)An,p (z) Remark: eb (n|p + 1) = z −1 Bn+1,p+1 (z)xn+1 = Bn+1,p+1 (z)[xnT , 0]T = [Bn+1,p+1 (z)xTn , 0]T −1 T xn+1 = an+1,p+1 = −Rn+1,p+1 Xn+1,p+1 � �− 1 � xn �2 xTn Xn,p T (ξn,p+1 x[n + 1] + Xn,p+1 − xn ) T xn Rn,p Xn,p

where T ξn,p = [x[n], x[n − 1], ..., x[n + 1 − p]]

Note that T T , Xn,p+1 ]T Xn+1,p+1 = [ξn,p+1

Also, T T T T T Xn+1,p+1 xn+1 = [ξn,p+1 , Xn,p +1 ][x[n + 1], xn ] = x[n + 1]ξn,p+1 + Xn,p+1 xn

Remark: We now derive a useful formula for the inverse of a block structured symmetric matrix: Let � � a b X= bT R where RT = R. Writing X

−1

=

�

q p

pT Q

�

we ﬁnd that qa + pT b = 1, qb + pT R = 0, pbT + QR = I

152160CHAPTER 3. SOMEAdvanced Probability and Remarks andPROCESSING Problems STUDY PROJECTS ONStatistics: APPLIED SIGNAL WI and hence,

q = 1/(a − bT R−1 b))

and

p = −qR−1 b = −(R−1 b)/(a − bT R−1 b)

Q = (I − pbT )R−1 = R−1 + R−1 bbT R−1 /(a − bT R−1 b) Taking T a =� xn �2 , b = Xn,p xn , R = Rn,p

gives us q = (� xn �2 +xTn Xn,p an,p )−1 = 1/xTn (xn + Xn,p an,p ) = 1/Ef (n|p) and p = an,p /Ef (n|p) Q=

−1 Rn,p

−1 −1 /Ef (n|p) + Rn,p Xn,p xn xnT Xn,p Rn,p

T −1 + an,p an,p /Ef (n|p) = Rn,p

Remark: ef (n|p) = xn +Xn,p an,p , Ef (n|p) =� ef (n|p) �2 = xTn (xn +Xn,p an,p ) = xnT ef (n|p) Then, an+1,p+1 = [qx[n + 1]xTn T Xn+1,p = Rn+1,p = Xn+1,p T + Rn,p ξn,p ξn,p

Thus,

−1 T an+1,p = −Rn+1,p Xn+1,p xn+1 = T −1 = −(Rn,p − µn,p µTn,p /(1 + ηn,p ))(ξn,p x[n + 1] + Xn,p xn ) T an,p ) = an,p − kn,p (x[n + 1] + ξn,p

where

−1 T µn,p = Rn,p ξn,p , ξn,p = [x[n], x[n − 1], ..., x[n + 1 − p]], T −1 T Rn,p ξn,p = ξn,p µn,p , ηn,p = ξn,p

kn,p = µn,p /(1 + ηn,p ) −1 µn+1,p+1 = Rn+1,p+1 ξn+1,p+1 = �� aTn,p /Ef (n|p) 1/Ef (n|p) x[n + 1] −1 ξn,p + an,p aTn,p /Ef (n|p) an,p /Ef (n|p) Rn,p � � x[n + 1]/Ef (n|p) + aTn,p ξn,p /Ef (n|p) = an,p x[n + 1]/Ef (n|p) + µn,p

Advanced Probability and Statistics: Remarks and Problems

153 161

RLS lattice algorithm continued: eb (n|p + 1)T , 0]T ef (n|p + 1) = ef (n|p) − Kf (n|p)eb (n − 1|p), eb (n|p + 1) = [˜ where eb (n|p + 1) = eb (n − 1|p) − Kb (n|p)ef (n|p) Then, ef (n|p) = xn + Xn,p an,p , eb (n − 1|p) = z −p−1 xn + Xn,p bn−1,p −1 T −1 T an,p = −Rn,p Xn,p xn , bn−1,p = −Rn,p Xn,p z −p−1 xn , T Rn,p = Xn,p Xn,p T Rn+1,p = ξn,p ξn,p + Rn,p −1 −1 Rn+1,p = Rn,p − µn,p µTn,p /(1 + ηn,p )

Note that Xn+1,p =

�

T ξn,p Xn,p

�

−1 µn,p = Rn,p ξn,p −1 T xn+1 = an+1,p = −Rn+1,p Xn+1,p T −1 T xn ) − µn,p µn,p /(1 + ηn,p ))(xn,p x[n + 1] + Xn,p −(Rn,p T = an,p − (µn,p /(1 + ηn,p ))(x[n + 1] + ξn,p an,p )

= an,p − kn,p ef (n + 1|n, p)

−1 ξn+1,p+1 µn+1,p+1 = Rn+1,p+1 � � � xn �2 xnT Xn,p Rn+1,p+1 = T xn Rn,p Xn,p � � aTn,p /Ef (n|p) 1/Ef (n|p) −1 Rn+1,p+1 = T −1 an,p /Ef (n|p) (Rn,p + an,p an,p )/Ef (n|p)

RLS lattice continued: an+1,p = an,p − kn,p ef (n + 1|n, p) where T an,p ef (n + 1|n, p) = x[n + 1] + ξn,p

Note that T an,p ef (n|n, p) = x[n] + ξn−1,p

154 Advanced Probability and Statistics: Remarks and Problems 162CHAPTER 3. SOME STUDY PROJECTS ON APPLIED SIGNAL PROCESSING WIT Is the ﬁrst component of ef (n|p). ef (n + 1|n, p) is the ﬁrst component of ef (n + 1|p) but computed using the ﬁlter coeﬃcients an,p which are estimate on data collected only upto the previous time n. We have seen that

−1 T −1 kn,p = µn,p /(1 + ηn,p ), µn,p = Rn,p ξn,p /(1 + ξn,p Rn,p ξn,p )

We have also derived a timeorder update formula relating µn+1,p+1 with µn,p . This formula is given by µn+1,p+1 = [ef (n+1|n, p)/Ef (n|p), −1 T (µn,p +Ef (n|p)−1 (an,p aTn,p ξn,p −x[n+1]Rn,p Xn,p xn ))T ]T T T T = [ef (n + 1|n, p)/Ef (n|p), µn,p + Ef (n|p)−1 ef (n + 1|n, p)an,p ]

The whole logic of the RLS lattice algorithm is to keep computing order and time updates for each of the variables that occur in the process so that ﬁnally the recursion closes, ie, gets completed. If at any stage, the recursion does not close, we introduce new variables and then compute order and time updates for the newly introduced variables. This process goes on until ﬁnally, the algorithm closes upon itself. We ﬁnd that Ef (n + 1|p) =� ef (n + 1|p) �2 =� xn+1 + Xn+1,p an+1,p �2 = ef (n + 1|p) = xn+1 + Xn+1,p (an,p − kn,p ef (n + 1|n, p))

T kn,p ef (n + 1|n, p), (ef (n|p) − Xn,p kn,p ef (n + 1|n, p))T ]T = [ef (n + 1|n, p) − ξn,p

= [(1 + ηn,p )−1 ef (n + 1|n, p), (ef (n|p) − Xn,p kn,p ef (n + 1|n, p))T ]T

Thus, taking the norm square on both the sides and using the fact that ef (n|p) is orthogonal to R(Xn,p ) gives us Ef (n+1|p) = ef (n+1|n, p)2 (1+ηn,p )−2 +Ef (n|p)+ef (n+1|n, p)2 ηn,p /(1+ηn,p )2 = ef (n + 1|n, p)2 /(1 + ηn,p ) + Ef (n|p) We now repeat this analysis for the backward prediction errors and ﬁlter coef ﬁcients. First observe that Kf (n + 1|p) =< ef (n + 1|p), eb (n|p) > /Ef (n + 1|p) ⊥ z −p−1 xn+1 eb (n|p) = PX n+1,p

ef (n|p + 1) = xn + Xn,p+1 an,p+1 = xn + [Xn,p , z −p−1 xn ]an,p+1 = ef (n|p) − Kf (n|p)eb (n − 1|p)

= (xn + Xn,p an,p ) − Kf (n|p)(z −p−1 xn + Xn,p bn−1,p )

= xn + [Xn,p , z −p−1 xn ](aTn,p − Kf (n|p)bTn−1,p )T , −Kf (n|p)]T and hence, an,p+1 =

�

an,p − Kf (n|p)bn−1,p −Kf (n|p)

�

155 163

Advanced Probability and Statistics: Remarks and Problems Likewise,

=

eb (n|p + 1) = z −p−2 xn+1 + Xn+1,p+1 bn,p+1 = � −p−1 � z xn + [xn , Xn,p ]bn,p+1 0 � � eb (n − 1|p) − Kb (n|p)ef (n|p) = 0

�

z −p−1 xn + Xn,p bn−1,p − Kb (n|p)(xn + Xn,p an,p ) 0

and hence bn,p+1 =

�

−Kb (n|p) bn−1,p − Kb (n|p)an,p

�

�

These represent respectively the order update formulas for the forward and backward prediction ﬁlter coeﬃcients. We have computed in addition the time update relation for the forward prediction ﬁlter coeﬃcients. We now do this for the backward prediction ﬁlter coeﬃcients. −1 T Xn,p z −p−1 xn bn−1,p = −Rn,p −1 T bn,p = −Rn+1,p Xn+1,p z −p−1 xn+1 = T −1 T = −(Rn,p + µn,p µn,p /(1 + ηn,p ))(ξn,p x[n − p] + Xn,p z −p−1 xn ) T bn−1,p ) = bn−1,p − kn,p (x[n − p] + ξn,p

= bn−1,p − kn,p eb (n|n − 1, p) We now need to compute Eb (n|p) in terms of Eb (n−1|p) and also N (n+1|p) =< ef (n + 1|p), eb (n|p) > in terms of N (n|p). We recall that ef (n + 1|p) = [(1 + ηn,p )−1 ef (n + 1|n, p), (ef (n|p) − Xn,p kn,p ef (n + 1|n, p))T ]T and eb (n|p) = z −p−1 xn+1 + Xn+1,p bn,p = �

T (b n−1,p − kn,p eb (n|n − 1, p)) x[n − p] + ξn,p −p−1

z xn + Xn,p (bn−1,p − kn,p eb (n|n − 1, p)) � � (1 + ηn,p )−1 eb (n|n − 1, p) = eb (n − 1|p) − Xn,p kn,p eb (n|n − 1, p)

�

It follows then by forming the inner product of these two relations that N (n+1|p) = (1+ηn,p )−2 ef (n+1|n, p)eb (n|n−1, p)+ < ef (n|p) −Xn,p kn,p ef (n+1|n, p), eb (n−1|p)−Xn,p kn,p eb (n|n−1, p) > = (1+ηn,p )

−2

ef (n+1|n, p)eb (n|n−1, p)+N (n|p)+(ηn,p /(1+ηn,p )2 )ef (n+1|n, p)eb (n|n−1, p)

ef (n + 1|n, p)eb (n|n − 1, p)/(1 + ηn,p ) + N (n|p)

156 Probability and Remarks andPROCESSING Problems 164CHAPTER 3. SOME Advanced STUDY PROJECTS ONStatistics: APPLIED SIGNAL WIT Note that we have used the orthogonality relations T T ef (n|p) = Xn,p eb (n − 1|p) = 0 Xn,p

[13] On a gadget constructed by my colleague Professor Dhananjay Gadre In the standard undergraduate course for engineering students called ”Sig nals and Systems”, as well as in the associated laboratory courses, the students are taught about how to generate various kinds of periodic signals like the sinewave, the square wave, the ramp wave etc., how to construct the Fourier series of such periodic signals as linear superspositions of higher harmonics of a fundamental frequency and how to design and analyze lowpass, highpass and bandpass ﬁlters that would ﬁlter such periodic signals so as to retain only a certain small subset of the signal harmonic components. This problem is im portant, for example, in a situation wherein a circuit designed on a breadboard which receives its input from a power supply gets the current/voltage across its components corrupted by the fundamental as well as higher harmonics of the basic 220V AC voltage source. This happens because the power supply runs on the AC source and hence some part of the latter signals enter into the circuitry of the power supply. As a result, when we measure the voltage across any of the circuit’s elements using a CRO, we observe stray components coming from the AC source and its higher harmonics on the CRO screen. These harmonics are not desirable and hence may be regarded as constituting a component of the noise in addition to the thermal noise being produced in the resistors of the circuit. One way to get rid of these harmonics is to place a ﬁlter between the circuit and the CRO to eliminate these stray components. The trouble with such a method is that the ﬁlter will consume some of the current and will there fore act as an undesirable load. It would thus be better to have a CRO which automatically does the ﬁltering and hence gives a faithful representation of the circuit’s behaviour in the absence of the AC source fundamental and higher harmonic disturbance. Another example where such a ﬁltering is required dur ing the signal recording process is a telephone line in which speakers A and B communicate across a line and due to defects in the line, A hears not only B’s speech transmitted over the line but also his own echo. An echo canceller at A’s end will use a ﬁlter H(z) which may even made adaptive and which takes A’s original speech as input and predicts the signal received by A over the line (namely B’s speech plus A’s own echo). Since A’s speech is correlated with only A’s own echo and not with B’s speech, the ﬁlter H(z) will predict only A’s echo component in the total signal received by A. Thus, when the ﬁlter’s output is subtracted from the signal received by A, the result is that A gets the signal spoken by B with a major part of his own echo cancelled out. If we have a CRO that passes through only B’s frequency components and rejects A’s frequency components (this is possible only when A’s and B’s speech sig nals occupy nonoverlapping frequency bands. If not, we can shift the band of

Advanced Probability and Statistics: Remarks and Problems

157 165

frequencies occupied by B’s speech via appropriate modulation at B’s end and use a line which supports both the band of frequencies spoken by A and B’s band shifted in frequency via modulation). Then, by just recording the signal received by A at his end using such a CRO, A can get to know B’s speech with his own echo removed. Another application of such a frequency sensitive CRO is in generating sine waves from a square wave. Square waves are easily generated using a switch in a series circuit either turned on and oﬀ after ﬁxed durations of time manually or by a rotating motor. A square wave contains all the harmonics of a fundamental and hence a CRO which passes the ﬁrst N harmonics with variable N can be used to demonstrate Gibb’s phenomenon on the behaviour of the sequence of partial sums of a Fourier series near a discontinuity of the signal. Speciﬁcally, we can demonstrate that at the discontinuity point of a square wave, the Fourier series converges to around 1.8 times its amplitude. A CRO with variable bandwidth can in fact be used to demonstrate all kinds of behaviour of the partial sums of Fourier series including the fact that at a discontinuity point of the signal, the partial sums converge to the average of the signal amplitude at the immediate left and at the immediate right of the signal. A CRO at the quantum scale level can also be used to determine the energy levels of an atomic system. Speciﬁcally, if |n >, n = 0, 1, 2, ... are the energy eigenstates of an atom with corresponding energy levels En , n = 0, 1, 2, ... re spectively, then it is known from Schrodinger’s � equation that if the initial state of the atom is the superposition |ψ(0) >= n c(n)|n >, then after time t, its � state will be |ψ(t) >= n c(n)exp(−iω(n)t)|n > where ω(n) = 2πEn /h, h be ing Planck’s constant. Thus, if X is an observable of the atomic system, after time t, its average value in this state will be given by < X > (t) =< ψ(t)|X|ψ(t) >=

�

n,m

c¯(n)c(m) < n|X|m > exp(i(ω(n) − ω(m))t)

when this signal is input into the quantum/nano CRO of the kind described above, it retains only a small ﬁnite subset of the frequencies ω(n) − ω(m) and by adjusting the bandwidth of the CRO, we can thus determine from spectral analysis of the signal appearing on the CRO, what exactly are the energy level’s of the atom or more precisely, what frequencies of radiation can be absorbed or emitted by the atom during transitions caused by perturbing the atom with an external radiation ﬁeld. Another situation in quantum mechanics that ﬁnds application here is related to the notion of repeated measurement and state collapse. Suppose a quantum system has initial state ρ(0). It evolves for time t1 under the Hamiltonian H to the state ρ(t1 −) = U (t1 )ρ(0)U (t1 )∗ where U (t) = exp(−itH). Then a measurement is made using the POVM M = {Ma : a = 1, 2, ..., N }. If a(1) is the noted outcome, after taking this measurement, the state collapses to ρ(t1 +) =

�

Ma(1) ρ(t1 −)

�

Ma(1) )/T r(ρ(t1 −)Ma(1) )

158 Probability and Remarks andPROCESSING Problems 166CHAPTER 3. SOMEAdvanced STUDY PROJECTS ONStatistics: APPLIED SIGNAL WIT Again the system evolves for a duration t2 − t1 to the state ρ(t2 −) = U (t2 − t1 )ρ(t1 +)U (t2 − t1 )∗

and again the measurement M is made and if the noted outcome is a(2), the state collapses to � � ρ(t2 +) = Ma(2) ρ(t2 −) Ma(2) /T r(ρ(t2 −)Ma(2) )

It is then clear that after N such operations, namely, free evolution under the Hamiltonian H followed by applying the measurement M and noting the out comes at each state, the ﬁnal state of the atom ρ(tN +) is expressible as a ratio of two terms. The denominator is a real number and is of the form � C(n1 , ..., nN ; m1 , ..., mN )exp(−i(ω(n1 )−ω(m1 ))t1 +(ω(n2 )

n1 ,...,nN

−ω(m2 ))(t2 −t1 )+...(ω(nN )−ω(mN ))(tN −tN −1 )))

where C(n1 , ..., nN ) are complex numbers while the numerator is of the same form but with the C(n1 , ..., nN ) being operators. In other words, the numerator and denominator are multidimensional sinusoids in the variables (t1 , ..., tN ) and if we have a generalized spectrum analyzer for multidimensional signals, then we could determine by measuring after time N , the average of an observable in the system state, the frequencies ω(n), ie, the energy levels of the atomic system Hamiltonian H as well as the initial state ρ(0) and the structure of the measurement operators Ma . In fact, more can be said about this problem. The joint probability of getting a(1), ..., a(N ) during the above measurement process at times t1 , ..., tN is given simply by P (a(1), ..., a(N )|t1 , ..., tN ) = T r(E(a(N ))U (tN −tN −1 )...E(a(2))U (t2 −t1 )E(a(1))U (t1 )ρ(0)U (t1 )∗ E(a(1)) where

U (t2 −t1 )∗ E(a(2)...U (tN −tN −1 )E(a(n))

E(a) =

J Ma

This joint probability is clearly a superposition of multidimensional sinusoids with frequenytuples (ω(n1 ), ω(n1 ) − ω(n2 ), ..., ω(nN ) − ω(nN −1 )) with complex amplitudes and a harmonic analyzer of multidimensional sinusoids will be able to determine these frequencies and hence estimate the atomic energy levels. [14] Summary of the research carried out at the NSUT on design of DSP ﬁlters using transmission line elements, design of water antennas, design of an tennas based on microstrip cavities of arbitrary cross sectional shape, design of antennas using microstrip cavities ﬁlled with material having inhomogeneous permittivity and permeability and design of fractional order ﬁlters using trans mission line elements. Acknowledgements: Informal discussions with Dr.Mridul Gupta.

Advanced Probability and Statistics: Remarks and Problems

159 167

Introduction: A lossless transmission line has the natural property of being able to generate transfer functions with arbitrary fractional delay. The reason for this is that if the input forward and backward voltage amplitudes to the line are V1+ and V1− and the corresponding output forward and backward √ ampli tudes are V2+ , V2− , then if d is the line length, and β = ω/u, u = LC, then then elementary transmission line analysis shows that V2− = V1+ exp(−jβd), V2+ = V1− exp(jβd) or equivalently, if R0 is the characteristic line impedance, then the voltage and current at the input and output terminals are related by using V1 = (V1+ + V1− ), I1 = (V1+ − V1− )/R0 , V2 = (V2+ + V2− ), I2 = (V2+ − V2− )/R0 Working in the wave domain, ie, in terms of forward and backward voltage wave components, we have that � � � �� 0 exp(jβd) V1+ V2+ = V2− exp(−jβd) 0 V1− since β = ω/u, the factor exp(−jβd) corresponds to a delay by d/u seconds and hence if time is discretized into steps of Δ seconds, the factor exp(−jβd) produces a delay of d/uΔ samples which need not be an integer if d chosen appropriately. Thus it becomes evident that by connecting in tandem several such Tx line units in conjunction with stub loads, lines having any transfer function with numerator and denominators being superpositions of arbitary fractional powers of the unit delay Z −1 can be synthesized. Speciﬁcally, consider connecting in parallel to this line of length d, an impedance Z1 at the input end. Then, the voltage at the input end remains unchanged for ﬁxed voltage and current at the output end while the input current gets modiﬁed to I1� = I1 + V1 /Z1 . We then ﬁnd that for ﬁxed V2+ , V2− , V1+ , V1− get modiﬁed respectively to � = (V1 + I1� R0 )/2 = (V1 + (I1 + V1 /Z1 )R0 )/2 V1+ and V1− = (V1 − I1� R0 )/2 = (V1 − (I1 + V1 /Z1 )R0 )/2 or equivalently � = (V1+ + V1− )(1 + R0 /Z1 )/2 + (V1+ − V1− )/2 V1+

= (1 + R0 /2Z1 )V1+ + (R0 /2Z1 )V1− and V1�− = (V1+ + V1− )(1 − R0 /Z1 )/2 − (V1+ − V1− )/2 = −R0 V1+ /2Z1 + V1− (1 − R0 /2Z1 )

160 Advanced Probability and Statistics: Remarks and Problems 168CHAPTER 3. SOME STUDY PROJECTS ON APPLIED SIGNAL PROCESSING WIT This means that connecting a load in parallel at the input end amounts to mutiplying the Smatrix given above by the matrix �

1 + R0 /2Z1 −R0 /2Z1

R0 /2Z1 1 − R0 /2Z1

�−1

to its right. Since Z1 can be any function of jω, this suggests that arbitrary transfer functions with fractional delay elements can be realized. When we have a sequence of 2 × 2 matrix transfer functions say Tk (z), k = 1, 2, ..., N connected in tandem, the overall transfer matrix is given by SN (z) = T1 (z)T2 (z)...TN (z) or equivalently, in recursive form, SN +1 (z)(1, 1) = SN (z)(1, 1)TN +1 (z)(1, 1) + SN (z)(1, 2)TN +1 (z)(2, 1), SN +1 (z)(1, 2) = SN (z)(1, 1)TN +1 (z)(1, 2) + SN (z)(1, 2)TN +1 (z)(2, 2), SN +1 (z)(2, 1) = SN (z)(2, 1)TN +1 (z)(1, 1) + SN (z)(2, 2)TN +1 (z)(2, 1), SN +1 (z)(2, 2) = SN (z)(2, 1)TN +1 (z)(1, 2) + SN (z)(2, 2)TN +1 (z)(2, 2), Sometimes, it may be easy to solve these diﬀerence equations. Usually, one deﬁnes the scattering parameter s21 (z) corresponding to a given transfer matrix as V2− /V1+ |V1− =0 . This this tells us how much amplitude is transferred from the source to the load without reﬂection. The scattering matrix S is deﬁned via � � � � V1− V1+ =S V2− V2+ The elements of S are related to those of the transfer matrix T as follows: V1− = S(1, 1)V1+ + S(1, 2)V2+ , V2− = S(2, 1)V1+ + S(2, 2)V2+ V2+ = T (1, 1)V1+ + T (1, 2)V1− , V2− = T (2, 1)V1+ + T (2, 2)V1−

[15] An example in quantum mechanics involving twoport parameters: Let Hk , k = 1, 2 be two Hilbert spaces and let ρk , k = 1, 2 be two density operators deﬁned in the Hilbert spaces Hk , k = 1, 2 respectively. Let H1 , H2 be two Hamiltonian operators deﬁned in the Hilbert spaces H1 , H2 respectively. Let V1 , V2 be two selfadjoint operators deﬁned in H1 , H2 respectively. Let V12 be a selfadjoint operator deﬁned in H1 ⊗ H2 . Let H = H1 ⊗ H2 be the Hilbert space of a quantum system having time varying Hamiltonian H(t) = H1 ⊗ I2 + I1 ⊗ H2 + V12 + f1 (t)V1 + f2 (t)V2 = H0 + f1 (t)V1 + f2 (t)V2

161 169

Advanced Probability and Statistics: Remarks and Problems where H0 = H1 ⊗ I2 + I1 ⊗ H2 + V12

is the unperturbed Hamiltonian. Note that the perturbing signals f1 (t), f2 (t) are respectively applied to the ﬁrst and the second component Hilbert spaces respectively. Then, let the initial state of this quantum system be ρ(0) = ρ1 ⊗ ρ2 After time t, the state of the system is ρ(t) = U (t)ρ(0)U (t)∗ where U (t) = T {exp(−i

�

∞

H(t)dt)}

−∞

and if we take two observables X1 , X2 respectively in the component Hilbert spaces H1 , H2 , then the averages of these observables at time t are respectively given by < X1 > (t) = T r2 (ρ(t)(X1 ⊗ I2 )) and < X2 > (t) = T r1 (ρ(t)(I1 ⊗ X2 )) Problem: Compute these averages upto linear orders in f1 (t), f2 (t) and explain how this deﬁnes a two port system. Speciﬁcally upto linear orders in f1 , f2 , show that we can write � � < X1 > (t) = a1 (t) + h11 (t, τ )f1 (τ )dτ + h12 (t, τ )f2 (τ )dτ, < X2 > (t) = a2 (t) +

�

h21 (t, τ )f1 (τ )dτ +

�

h22 (t, τ )f2 (τ )dτ

where the signals a1 (t), a2 (t) are independent of f1 (.) and f2 (.). [16] Detecting whether supersymmetry is broken or not using a deep neural network For an Abelian gaugesuperﬁeld, a gauge invariant action has the form Lg = c1 fµν f µν + c2 λT γ5 �γ µ ∂µ /λ + c3 D2 This gauge action (which is always Lorentz and gauge invariant) is supersymme try invariant only for vectors [c1 , c2 , c3 ] belonging to a one dimensional subspace of R3 . The problem of determining whether supersymetry is broken on not thus amounts to estimating these constants from measurements. Suppose we allow the gauge particles and their superpartners to interact with a Dirac electron. Then we would have to add the matter action with this gauge action and hence determine the eﬀect of this gauge perturbation upon the matter action and write down the dynamics of the Dirac electron taking into account these interactions.

162 Advanced Probability and Statistics: Remarks and Problems 170CHAPTER 3. SOME STUDY PROJECTS ON APPLIED SIGNAL PROCESSING WIT From the transition probabilities of the Dirac electron under these gauge pertur bations, we can estimate the constants c1 , c2 , c3 and hence determine whether the gauge action respects supersymmetry or not. In the nonAbelian gauge situation, the complete supersymmetric Lagrangian density for the matter and gauge ﬁelds is given by (Reference:S.Weinberg, ”The quantum theory of ﬁelds vol.III) L= −(Dµ φ)∗n (Dµ φ)n

− (1/2)ψ¯n γ µ (Dµ ψ)n + Fn∗ Fn

T −Re((f �� (φ))nm ψnL �ψnL )+2Re((f � (φ))n Fn )+c1 Im(ψ¯nL λA (tA )nm φm −c1 Im(ψ¯nR λA (tA )nm φ∗m

¯ A γ µ (Dµ λ)A +c2 �(µνρσ)f µν f ρσ −φn∗ (tA )nm φm DA −ξA DA +(1/2)DA DA −(1/4)fAµν fAµν −(1/2)λ A A

The ﬁeld equations are obtained by setting the variational derivatives of L w.r.t. ∗ ∗ , ψnR , φn , φn∗ , Fn , Fn∗ , VAµ , λA , DA to zero. In partic all the ﬁelds ψnL , ψnR , ψnL ular, we obtain a Dirac equation for ψn with the operator ∂µ replaced by the gauge covariant derivative Dµ with the gauge ﬁeld being VAµ that deﬁne the gauge ﬁeld tensor FAµν and with masses dependent upon the scalar ﬁeld and sources determined by the coupling between the gaugino ﬁeld λA and the scalar ﬁeld φm . The dynamical equations for the gaugino ﬁeld λA are again of the VAµ gauged Dirac form but with zero masses and with sources determined by coupling terms between the scalar ﬁeld and the Dirac ﬁeld. Note that the gauge ﬁeld VAµ and the scalar matter ﬁeld φn are Bosonic while the Dirac ﬁeld ψn and the gaugino ﬁeld λA are Fermionic. The superDirac and gaugino equations have the form (iγ µ Dµ − m(φ))ψn = χ1n (φ, λA ), iγ µ (Dµ λ)A = χ2 (φ, ψ)

The super scalar ﬁeld equation is now of the VAµ gauged KleinGordon form but with a source term determined by a coupling between the Dirac ﬁeld and the gaugino ﬁeld. Speciﬁcally, it is of the form Dµ Dµ φn = χ3 (ψ, λA ) Deriving these ﬁeld equations from the above supersymmetric Lagrangian in volves ﬁrst eliminating the auxiliary ﬁelds Fn , DA by noting that their varia tional equations yield ordinary linear algebraic equations for them in terms of the scalar ﬁeld. Finally, the ﬁeld equations for VAµ or equivalently, form fAµν = VAν,µ − VAµ,ν + C(ABC)VBµ VCν are given by the usual Yangmills equations but with source current terms now receiving their contribution from all the three ﬁelds:the scalar ﬁeld, the Dirac ﬁeld and the gaugino ﬁeld. These three current terms add up without interfer ence and contribute to the dynamics of the Yangmills gauge ﬁeld. [17] Review of a paper given by Mridul

Advanced Probability and Statistics: Remarks and Problems

163 171

This paper studies a certain kind of periodically forced nonlinear oscillator having a delay term in its dynamics as well as a noisy forcing term bilinearly coupled to its position. The delay terms as well as a white Gaussian noise term are parametrized by small parameters. The periodic forcing term is also bi linearly coupled to the oscillator’s position process. The complete dynamics is described by a system of two ﬁrst order diﬀerential equations in phase space. The analysis starts ﬁrst with linearizing the dynamical system and obtaining the characteristic equation for the roots of the linearized system. This characteris tic equation is a quadratic plus an exponential function, the exponential term arising due to the delay term. In short, the characteristic equation is a tran scendental equation, not soluble in closed form. The authors apply the implicit function theorem to this characteristic equation obtaining thereby a formula for the sensitivity of the roots (the roots are also called the Lyapunov exponents as they give us the rate at which a small perturbation in the initial conditions diverges exponentially) w.r.t the perturbation parameter. After this brief analy sis, the authors assume that the Lyapunov exponent is purely imaginary, ie, the linearized diﬀerential equation exhibits purely oscillatory behaviour, and derive closed form exact formulas for the oscillation frequency in terms of the pertur bation parameter β. After this, they assume that the process can be expressed as a slowly time varying amplitude modulating a sinusoid (just as in amplitude modulation theory) (they have not stated this but it is implicit in equn.(18)) and assume that the delay aﬀects only the sinusoidal carrier part and not the modulation amplitude. I think this portion of the author’s analysis requires a more detailed explanation. With this approximation, the authors are able to get rid of the delay term and thereby obtain a standard Itostochastic diﬀeren tial equation without delay for the phase, ie, the position and velocity variables (equn.(22) and (26)). In (31), the authors deﬁne a stochastic Lyapunov expo nent for the Ito sde solution (3). I think that this portion must also be explained more clearly in terms of the ergodic theorem for Brownian motion. Speciﬁcally, if A(t) is the stochastic state transition matrix for a linear sde, then for the purpose of deﬁning stochastic Lyapunov exponents, we must ensure conditions under which the limit of t−1 log(λ(t)) exists as t → ∞ where λ(t) is an eigenvalue of A(t). The authors then set up the usual FokkerPlanck equation for the pdf of the solution to Ito’s sde obtained by removing the delay term using the am plitude modulation technique mentioned in my report above. They derive from this, the stationary/equilibrium solution (36) for the FokkerPlanck equation. I think at this point that some material on bifurcation theory applied to the transcendental characteristic equation can be added to explain how when the perturbation parameter that governs the strength of the delay term is increased gradually from zero, the Lyapunov exponents will change. This analysis can be carried out using standard perturbation theory for the roots of nonlinear equa tions. The trouble, however, in using any sort of perturbation theory is to prove the convergence of the perturbation series. In (40), the authors write down a ”Melnikov function”. In order to make the text at this point selfcontained, the authors may explain here, how the Melnikov theory can be used to determine the driving frequency in a system of coupled sde’s. Speciﬁcally, if there are periodic

164 AdvancedPROJECTS Probability and Remarks andPROCESSING Problems 172CHAPTER 3. SOME STUDY ONStatistics: APPLIED SIGNAL WIT forcing terms in an sde, then the frequency of the forcing term can be extracted approximately in terms of an integral of some function of the stochastic process. I believe that this is the essence of the Melnikov theory. It would be nice to compute the mean and variance of the frequency estimate using this integral. Further, one can in principle derive maximum likelihood estimators of parame ters of sde’s using stochastic integral representations for the likelihood function. How does Melnikov’s method compare with the optimum MLE method ? Some light can be shed on this problem. Since although the noise is Gaussian, the system is nonlinear, it is follows that the system output will be nonGaussian. Hence, if we move into higher orders of perturbation theory and not just the linearized theory, apart from the power spectrum, ie, the second order moments, higher order moments and spectra would become important. The authors may shed some light on this without getting into too much computation. In conclusion, I feel that the paper contains many new and interesting results and I recommend its publication after some light on the issues mentioned above in my report are moderately clariﬁed. The main novelty of the paper appears to be to provide a ﬁrst step towards describing chaos in a dynamical system in the presence of stochastic noise. [18] Application of the RLS lattice algorithm to the problem of es timating the metric from the world lines of particles in a gravtiational ﬁeld Let gµν (x|θ) denote the metric of space time which we intend to estimate. Here, θ is the unknown parameter vector which is to be estimated. We assume that p � θ[k]ψµνk (x) gµν (x|θ) = hµν (x) + k=1

where ψµνk (x) are known test functions, hµν (x) is the unperturbed metric and the unknown parameters θ are small. The geodesic equations are r (x)(dxk /dτ )(dt/dτ )+Γr00 (x)(dt/dτ ) = 0 d2 xr /dτ 2 +Γrkm (x)dxk /dτ )(dxm /dτ )+2Γk0

and d2 t/dτ 2 + Γ0km (dxk /dτ )(dxm /d tau) + 2Γ0k0 (dxk /dτ )(dt/dτ ) + Γ000 (dt/dτ )2 = 0 Now, we write down the geodesic equations using perturbation theory upto ﬁrst order in the parameters θ. It comes out to be of the form 2 r

2

r

k

k

d x /dt = F (dx /dt, x ) +

p �

Grk (dxk /dt, xk )θ[k]

k=1

or equivalently in vectormatrix notation as d2 r/dt2 = f (dr/dt, r, t) + G(dr/dt, r, t)θ

Advanced Probability and Statistics: Remarks and Problems

165 173

We can further discretize this equation and express it as nonlinear diﬀerence equation: r[n + 2] = a(r[n + 1], r[n], n) + B(r[n + 1], r[n], n)θ This is a second order vector nonlinear time series model and the parameter vector θ can be estimated via the RLS lattice algorithm based on measurements r[n], n = 0, 1, ...,. Note that the order of this time series model is p which is also the number of test function matrices ψµνk (x), k = 1, 2, ..., p that we use. We deﬁne the following vector and matrix valued functions of time ξ[N ] = ((r[n + 2] − a(r[n + 1], r[n])))N n=1 , X[N, p] = [b1 [N ], b2 [N ], ..., bp [N ]] where bk [N ] = (Bk (r[n + 1], r[n], n)))N n=1 Then the above LIP (linear in parameters) model can be expressed in the form ξ[N ] = X[N, p]θ and we can derive a recursive least squares with time and order updates of the parameter estimates. [19] A remark about the supersymmetric proof of the AtiyahSingerPatodi theorem. In general relativity, the covariant derivative of a vector ﬁeld is calculated using the Christoﬀel symbols as connection components while on the other hand, in nonAbelian quantum ﬁeld theory, the covariant derivative is calculated in a Lie algebraic manner, ie, as the commutator between the connection covariant derivative and the vector ﬁeld expressed as as an element of the Lie algebra. This applies also to quantum gravity wherein the nonAbelian connection covariant derivative is �µ = ∂µ + Γµ where Γµ , the spinor connection of the gravitational ﬁeld is expressed in terms of the Cliﬀord algebra generated by the Dirac gamma matrices and a vector ﬁeld Bµ is also expressed as a spinor ﬁeld Bµ γ µ = Bµ eµa γ a . The commutator of these two objects then deﬁnes the covariant derivative of the vector ﬁeld. In a normal coordinate system, we wish to prove that near the origin, the latter deﬁnition of the covariant derivative as a commutator is the natural one to choose from when one computes the Laplacian as the square of the Dirac operator and this leads immediately to an expression for this Laplacian in terms of the Riemann curvature tensor from which the AtiyahSinger index theorem can be obtained. The Dirac operator is

D = γ µ (x)�µ

Where γ µ (x) = γ(eµ (x)) = γ a eµa (x) Where {γ a , γ b } = 2η ab

166174CHAPTER 3. SOMEAdvanced and Statistics: Remarks and PROCESSING Problems STUDY Probability PROJECTS ON APPLIED SIGNAL WI Thus, {γ µ (x), γ ν (x)} = 2g µν (x) Since g µν (x) = η ab eaµ (x)ebν (x) The spinorgravitational connection is Γµ (x) = [γ a , γ b ]eνa ebν:µ The Diracspinor gravitational covariant derivative is �µ = ∂µ + Γµ We get for a covariant vector Bα , a [�µ , Bα γ α ] = [�µ , Bα eα aγ ]

= [�µ , Ba γ a ] = [∂µ + Γµ , Ba γ a ] = Ba,µ γ a + [Γµ , γ a ]Ba [Γµ , γ a ] = [γ c γ d ωµcd , γ a ] Where ωµcd = eνc edν:µ = −eνc:µ edν = −ecν:µ eνd = −ωµdc [γ c γ d , γ a ] = 2η ad γ c − 2η ac γ d Thus, [Γµ , γ a ] = ωµcd η ad γ c F romwhichwededucethat [�µ , B] = [�µ , Ba γ a ] = Ba,µ γ a + ωµcd η ad γ c Ba = γ a (Ba,µ + ωµab η bc Bc ) Now, 2ωµab = 2eνa ebν:µ = eνa ebν:µ − ebν eaν:µ =

ν α eνa (ebν,µ − Γα νµ ebα ) − eb (eaν,µ − Γνµ eaα )

Upto O(x), it is clear that eνa ebν,µ − ebν eaν,µ = 0 Since eaν,µ is O(x) and further, eνa is δaν upto O(x). Thus, upto O(x), we have that ν ν 2ωµab = Γα νµ (eaα eb − ebα ea ) = Γabµ − Γbaµ

Advanced Probability and Statistics: Remarks and Problems

167 175

Since eaα = ηaα + O(x). Note that Γabµ is O(x) and hence Γabµ (x) − Γbaµ (x) = (Γabµ,m (0) − Γbaµ,m (0))xm + O(x2 ) Now, Γabµ,m − Γbaµ,m = (1/2)(gab,µm + gaµ,bm − gbµ,am − gba,µm − gbµ,am + gµa,bm ) = (1/2)(gaµ,bm − gbµ,am + gµa,bm − gbµ,am ) Thus, 2ωµab (x) = (1/2)(gaµ,bm − gbµ,am + gµa,bm − gbµ,am )(0)xm = (gaµ,bm (0) − gbµ,am (0))xm This is antisymmetric in (a, b). We wish to show that (gaµ,bm −gbµ,am +gµa,bm − gbµ,am )(0) for normal coordinates, is also antisymmetric in (µ, m). In fact, for normal coordinates, we have gµν xν = xµ and hence on diﬀerentiating, gµν,ρ xν + gµρ = δµρ Another diﬀerentiation gives gµα,ρ + gµν,ρα xν + gµρ,α = 0 Another diﬀerentiation but now at the point x = 0 gives gµα,ρβ (0) + gµβ,ρα (0) + gµρ,αβ (0) = 0 Thus gaµ,bm (0) = −(gµm,ab (0) + gµb,am (0) and to prove that gaµ,bm (0) − gbµ,am (0) is also antisymmetric in (µ, m), we must show that gaµ,bm (0) − gbµ,am (0) = −gam,bµ + gbm,aµ (0) or equivalently that gaµ,bm + gam,bµ − gbµ,am − gbm,aµ = 0 In view of the above cyclic identity, this amounts to showing that −(gab,mµ + gam,µb ) + gam,bµ − gbµ,am − gbm,aµ = 0

168 Probability and Remarks andPROCESSING Problems 176CHAPTER 3. SOME Advanced STUDY PROJECTS ONStatistics: APPLIED SIGNAL WIT or equivalently, that −(gab,mµ + gbµ,am + gbm,aµ ) = 0 which is true again by virtue of the cyclic identity. Thus, in normal coordinates, we have that ωµab (x) = Rµνab (0)xν + O(x2 ) This is the fundamental relation that we are looking for. [20] Problems in optimal control of quantum ﬁelds Historical development of quantum ﬁltering and control: Quantum ﬁlering and control algorithms in the continuous time case were ﬁrst introduced by V.P.Belavkin and perfected by John Gough and Lec Bouten. The notion of nondemolition measurements which do not interfere with future state values and which form an Abelian algebra of operators so that one can deﬁne in any state, the conditional expection of the state at time t given measurements upto time t were the creation of Belavkin. He based all his computations on the HudsonParthasarathy model for the noisy Schrodinger equation and today this approach is the standard recognized way to describe ﬁltering in the quantum context. The main obstacle in quantum ﬁltering was that noncommutativity of observables prevented one from constructing conditional expectations and that was completely and satisfactorily resolved by Belavkin by the introduction of a special class of measurements, namely nondemolition measurements which actually can be shown to be the correct quatnum analogue of the measurement process in the classical ﬁltering theory of Kallianpur and Striebel, namely the measurement diﬀrential at time t equals the sum of a function of the current sys tem state plus white measurement noise. Lec Bouten in his PhD thesis showed how one can introduce a control term into the Belavkin quantum ﬁlter so as to minimize the eﬀect of Lindblad noise in some channels. By Lindblad noise, we mean the noise terms appearing in the unitary evolution of system plus bath in the HudsonParthasarathy noisy Schrodinger equation. Later on, other re searchers including Belavkin, John Gough and Lec Bouten introduced quantum control in the form of changing the Hamiltonian in the HudsonParthasarathy Schrodinger evolution equation in accordance with a desired output and the actual nondemolition measurement at time t. For example, in the case of a quantum robot, we can alter the Hamiltonian by a torque times angular dis placement term where the torque is proportional to the diﬀerence between a desired angular momentum and the actual noisy measured angular momentum. Always feedback controllers must be in the form of forces/torques proportional to the diﬀerence between a desired output and the actual output or some diﬀer ential or integral of such a diﬀerence or more generally to a linear combination of such terms (like the p.i.d controller in classical control theory). It should be noted that in the quantum context, the parameter in the Hamiltonian being controlled becomes a function of the nondemolition measurement operator. [1] Let ρs (t), the state of a system evolve according to the GKSL equation ρ�s (t) = −i[H(u(t)), ρs (t)] − (1/2)θ(ρs (t), u(t))

Advanced Probability and Statistics: Remarks and Problems

169 177

where u(t) is the coherent state parameter and θ is the Lindblad term with the Lindblad operators Lk dependent upon the coherent state parameter u(t). Speciﬁcally, we are assuming that the coherent state |φ(u) > of the bath slowly varies with time and we write down the HudsonParthasarathy qsde for the evolution and then by tracing out over this nonvacuum bath coherent state, obtain the dynamics of the system state: dU (t) = (−(iH + LL∗ /2)dt + LdA(t) − L∗ dA(t)∗ )U (t) ρ(t) = U (t)(ρs (0) ⊗ |φ(u) >< φ(u)|)U (t)∗ , ρs (t) = T r2 (ρ(t)) We have that dρ(t) = dU (t)ρ(0)U (t)∗ +U (t)ρ(0)dU (t)∗ +dU (t)ρ(0)dU (t)∗ , ρ(0) = ρs (0)⊗|φ(u) >< φ(u)| Then, T r2 (dU (t)ρ(0)U (t)∗ ) = T r2 (Qdt + LdA(t) − L∗ dA(t)∗ )U (t)ρ(0)U (t)∗ ) ¯(t)L∗ .ρs (t)] = dt.[Q.ρs (t) + u(t)L.ρs (t) − u where Q = −(iH + LL∗ /2) Likewise, T r2 (U (t)ρ(0)dU (t)∗ ) = T r2 (U (t)ρ(0)U (t)∗ (Q∗ dt + L∗ dA(t)∗ − LdA(t))) = [ρs (t)Q∗ + u ¯(t)ρs (t)L∗ − u(t)ρs (t)L]dt and ﬁnally, T r2 (dU (t)ρ(0)dU (t)∗ ) = T r2 [L∗ dA(t)∗ U (t)ρ(0)U (t)∗ LdA(t)] = L∗ ρs (t)Ldt We thus get ρ�s (t) = [Q.ρs (t) + u(t)L.ρs (t) − u ¯(t)L∗ .ρs (t)] ¯(t)ρs (t)L∗ − u(t)ρs (t)L] + L∗ ρs (t)L +[ρs (t)Q∗ + u which is precisely the GKSL equation in a nonvacuum coherent state |φ(u) >. We can represent it as ρ�s (t) = T (ρs (t), u(t)) where T (., x) is a linear operator on the Banach space of bounded operators in the system Hilbert space parameterized by the complex parameter x. For state tracking, we require to control u(t) so that ρs (t) tracks a given state ρd (t). This

170 Advanced Probability and Statistics: Remarks and Problems 178CHAPTER 3. SOME STUDY PROJECTS ON APPLIED SIGNAL PROCESSING WI may be termed as coherent quantum control. We can also incorporate conserva tion processes in to the GKSL dynamics: The relevant HudsonParthasarathy noisy Schrodinger equation is dU (t) = (Qdt + L1 dA(t) − L2 dA(t)∗ + L3 dΛ(t))U (t) where Q = −(iH + P ). We then ﬁnd that T r2 (dU (t)ρ(0)U (t)∗ ) = [Qρs (t)dt + u(t)L1 ρs (t) − u ¯(t)L2 ρs (t) + |u(t)2 L3 ρs (t)]dt

T r2 (ρ(0)dU (t)∗ ) =

¯(t)ρs (t)L∗1 − u(t)ρs (t)L∗2 + |u(t)|2 ρs (t)L∗3 ]dt

[ρs (t)Q + u [21] pde based denoising of quantum image ﬁelds [1] Let the Lindblad operators Lk be functions of q, p where q, p are canonical position and operator valued ndimensional vectors. Thus, q is multiplication by the vector x in Rn and p is the gradient operator −i�x in Rn . The density operator in position space representation at time t is deﬁned by the Kernel ρt (x, y), x, y ∈ Rn . Thus, for f ∈ L2 (Rn ), we have that � ρt f (x) = ρt (x, y)f (y)dy and ρt qf (x) =

�

ρt (x, y)yf (y)dy

�

qρt f (x) = xρt (x, y)f (y)dy, � � ρt pf (x) = −i ρt (x, y)�y f (y)dy = i (�y ρt (x, y))f (y)dy � pρt f (x) = −i (�x ρt (x, y))f (y)dy

and more generally, a b

c d

q p ρt q p f (x) = (−i)

b+d

d a

(−1) x

�

(�y )d (ρt (x, y)y c )f (y)dy

which is equivalent to saying that the kernel of the operator q a pb ρt q c pd is given by Kt (x, y) = (−i)b+d (−1)d xa (�y )d (ρt (x, y)y c ) In our notation, xa is an abbreviation for xa1 1 ...xann and pb is an abbreviation for p1b1 ...pbnn or equivalently for (−i)b1 +...+bn

∂ b1 +...+bn ∂xb11 ...∂xbnn

Advanced Probability and Statistics: Remarks and Problems

171 179

More generally, if an operator L = L(q, p) is expressed as a function of q, p with all the q � s coming to the left and all the p� s to the right, then L(q, p)ρt is an opearator having kernel L(x, −i�x )ρt (x, y) while ρt L(q, p) has the kernel M (i�y , y)ρt (x, y) where M (u, v) is obtained from L(u, v) by replacing each term of the form ur v s in the expansion of L(u, v) by v s ur . More generally, the kernel of the operator L1 (q, p)ρt L2 (q, p) is given by L1 (x, −i�x )M2 (i�y , y)ρt (x, y) where if L2 (u, v) =

�

a(r, s)ur v s

�

a(r, s)v s ur

r,s

then M2 (u, v) =

r,s

An example of how to derive a Hamiltonian that reproduces the same eﬀect as that of a partial diﬀerential operator acting on a quantum ﬁeld. Let ψk (t, r) = exp(−i(ω(k)t − k.r)) and consider the quantum ﬁeld � X(t, r) = (a(k)ψk (t, r) + a(k)∗ ψ¯k (t, r)) k

where

[a(k), a(j)] = 0, [a(k), a(j)∗ ] = δkj

Then i∂t ψk (t, r) = ω(k)ψk (t, r), −i�r ψk (t, r) = kψk (t, r) Hence, ω(−i�r )ψk (t, r) = ω(k)ψk (t, r) so that ψk satisﬁes the pde i∂t ψk (t, r) = ω(−i�r )ψk (t, r) Since we assume ω(k) to be real valued function, we also get −i∂t ψ¯k (t, r) = ω(k)ψ¯k (t, r),

172180CHAPTER 3. SOMEAdvanced Probability and Remarks andPROCESSING Problems STUDY PROJECTS ONStatistics: APPLIED SIGNAL WI ω(i�r )ψ¯k (t, r) = ω(k)ψ¯k (t, r) and hence −∂t2 X(t, r) = ω(−i�r )2 X(t, r) provided that we assume that ω(−k) = ω(k) so that deﬁning the operator 2vector ﬁeld Z(t, r) = [X(t, r), i∂t X(t, r)]T we get that i∂t Z(t, r) = [i∂t X(t, r), ω(−i�r )2 X(t, r)]T � � 0 1 = Z(t, r) ω(−i�r )2 0 This is an example of a quantum ﬁeld satisfying a vector partial diﬀerential equation in spacetime with the time derivative being only of the ﬁrst order. Let X(t, r) be a quantum image ﬁeld built out of creation and annihilation operators. We wish To denoise this quantum image ﬁeld. Let X0 (t, r) denote the corresponding denoised quantum image ﬁeld. We pass the noisy quantum image ﬁeld X(t,r) through a spatiotemporal linear ﬁlter Having an impulse response H(t, r). The output of this ﬁlter is given by � ˆ X0 (t, r) = H(t − t� , r − r� )X(t� , r� )dt� d3 r�

ˆ 0 (t, r) is a close approximation to We wish to select the ﬁlter H(t, r) so that X X0 (t, r). in a given quantum state ρ. This means that we select the function H(t, r) so that � ˆ 0 (t, r))2 )dtd3 r T r(ρ(X0 (t, r) − X

Is minimal. Setting the variational derivative of this error energy function w.r.t. H to zero then gives us the optimal normal equations � ˆ 0 (t, r))X(t − t� , t − r� )) = 0 T r(ρ dtd3 r(X0 (t, r) − X or equivalently, � T r(ρ{X0 (t, r), X(t−t� , r−r� ))}dtd3 r � � 3 = dsd uH(s, u) T r(ρ{X(t−s, r−u), X(t−t� , t� −r� )}dtd3 r Thus, to calculate the ﬁlter, we must evaluate the symmetrized quantum corre lations T r(ρ{X0 (t, r), X(t1 , r1 )}), T r(ρ{X(t, r), X(t1 , r1 )})

173 181

Advanced Probability and Statistics: Remarks and Problems

Assuming that ρ is a quantum Gaussian state so that it is expressible as an exponential of a linearquadratic form in the creation and annihilation operators a(k), a(k)∗ , k = 1, 2, . . . , we express the quantum ﬁelds X0 , X as polynomial functionals in the a(k), a(k)∗ and computing the quantum correlations then amounts to calculating the multiple moments of the creation and annihilation operators in a Gaussian state, ie evaluating moments of the form T r(ρ.Πk (a(k)mk )Πk (a(k)∗ nk )) Where ρ = C.exp(−

� k

α(k)a(k)+¯ α(k)a(k)∗ −

�

β1 (k, m)a(k)a(m)+β2 (k, m)a(k)∗ a(m)∗

k,m

−β3 (k, m)a(k)∗ a(m))

The easiest way to evaluate these moments is to use the GlauberSudarshan resolution of the identity in terms of coherent states. [22] Deﬁnition of the universal enveloping algebra of a Lie algebra: Let g be a Lie algebra and let (C, π) be a pair such that (a) C is an associative algebra, (b) π : g → C is a linear mapping satisfying π([X, Y ]) = π(X)π(Y ) − π(Y )π(X)∀X, Y ∈ g , (c) π(g) generates C And (d) if U is any associative algebra and ξ : g → U is a linear map satisfying ξ([X, Y ]) = ξ(X)ξ(Y ) − ξ(Y )ξ(X)∀X, Y ∈ g, then there exists an algebra homomorphism ξ � : C → U such that ξ � (π(X)) = ξ(X)∀X ∈ g. Then, (C, π) is called a universal enveloping algebra of g. Theorem: If (πk , Ck ), k = 1, 2 are two universal enveloping algebras of a Lie algebra g, then they are isomorphic in the sense that there exists an algebra isomorphism ξ : C1 → C2 such that ξ(π1 (X)) = π2 (X)∀X ∈ g. [23] Questions on statistical signal processing

Attempt any ﬁve questions.

[1] Let X(n), n ∈ Z be any stationary process with E(X(n)) = µ, Cov(X(n), X(m)) = C(n − m) Prove that if C(n) → 0, |n| → ∞

then

limN →∞ N −1 . in the mean square sense.

N �

X(n) = µ

n=1

[2] If X(n) is a stationary Gaussian process with mean µ and covariance C(n − m), then show that limn→∞ N −1 .

N �

n=1

X(n)X(n + k)

174182CHAPTER 3. SOMEAdvanced Probability and Remarks andPROCESSING Problems STUDY PROJECTS ONStatistics: APPLIED SIGNAL WI converges in the mean square sense to C(k) + µ2 as N → ∞ provided that C(n) → 0 as |n| → ∞. [3] Prove the Cesaro theorem: If a(n) is a sequence of complex numbers such �N that a(n) → c as n → ∞, then N −1 . n=1 a(n) → c as N → ∞.

[4] Deﬁne a ﬁltered probability space and a Martingale and a submartin gale in discrete time w.r.t this ﬁltered probability space. Show that if X(n) is submartingale, then the maximal inequality holds: P r(max0≤n≤N |X(n)| > �) ≤ E(|X(N )|)/� Use this result to prove the following version of the strong law of large numbers: If X(n) is a sequence of iid random variables with mean µ and variance σ 2 , then almost surely, N � X(n) = µ limN →∞ N −1 . n=1

[5] Write short notes on the following: (a) The BorelCantelli Lemmas. (b) Application of the BorelCantelli lemmas to proving almost sure conver gence of a sequence of random variables. (c) Doob’s optional sampling theorem for Martingales and bounded stop times. (d) Asymptotic mean and variance of the periodogram of a stationary zero mean Gaussian random process [6] Deﬁne the Ito stochastic integral for an adapted process w.r.t. Brown ian motion and prove the existence and uniqueness of solutions to a stochastic diﬀerential equation dX(t) = f (X(t))dt + g(X(t))dB(t) interpreted as a stochastic integral equation � t � t f (X(s))ds + g(X(s))dB(s) X(t) − X(0) = 0

0

when f, g satisfy the Lipshitz condition |f (X) − f (Y )| + |g(X) − g(Y )| ≤ K|X − Y |∀X, Y ∈ R

[7] Prove using the Ito calculus that the process Mλ (t) = exp(λB(t) − λ2 B(t)/2)

Advanced Probability and Statistics: Remarks and Problems

175 183

is a Brownian martingale. Now apply Doob’s optional sampling theorem to this martingale to determine the probability density of the ﬁrst hitting time of Brownian motion B(t) at a level a > 0 given B(0) = 0. [8] Let x[n] be any process. Deﬁne the vectors xn = [x[n], x[n − 1], ..., x[0]]T ∈ Rn+1 and for r = 1, 2, ..., z −r xn = [x[n − r], x[n − r − 1], ..., x[0], 0, ..., 0]T ∈ Rn+1 Let an,p = [an,p [1], ..., an,p [p]]T denote the optimum forward prediction ﬁlter of order p at time n, ie, ef [n|p] = xn +

p �

an,p [k]z −k xn = xn + Xn,p an,p

k=1

where

Xn,p = [z −1 xn , ..., z −p xn ] ∈ Rn+1×p

is such that Ef (n|p) =� ef [n|p] �2 is a minimum. Likewise, let bn−1,p = [bn−1,p [1], ..., bn−1,p [p]]T be the optimum backward prediction ﬁlter of order p at time n − 1, ie, if eb [n − 1|p] = z −p−1 xn +

p �

bn,p [k]z −k xn = z −p−1 xn + Xn,p bn,p

k=1

is such that Eb (n − 1|p) =� eb [n − 1|p] �2 is a minimum. Derive time and order recursive formulas for the following: ef (n|p), eb (n − 1|p), an,p , bn,p Also derive time and order projection operator update formulas for the orth gogonal projection Pn,p = Xn,p (XTn,p Xn,p )−1 Xn,p onto R(Xn,p ). Now, if y[n] is another process, consider the joint process pre dictor p � ˆ n,p = hn,p [k]z −k xn y k=1

176184CHAPTER 3. SOMEAdvanced Probability and Remarks andPROCESSING Problems STUDY PROJECTS ONStatistics: APPLIED SIGNAL WI so that ˆ n,p �2 � yn − y is a minimum. Show that ˆ n,p = Pn,p yn y Show that we can write yn,p =

p−1 �

k=0

gn,p [k]eb [n − 1|k]

where gn,p [k] =< yn , eb [n − 1|k] >, 0 ≤ k ≤ p − 1 Derive time an order update formulas for the coeﬃcients gn,p [k]. In the process of doing this, you must also derive time and order update formulas for the forward prediction error ﬁlter transfer function An,p (z) = 1 +

p �

an,p [k]z −k ,

k=1

the backward prediction error ﬁlter transfer function Bn,p (z) = z −p−1 +

p �

bn,p [k]z −k

k=1

and the joint process predictor transfer function Hn,p (z) =

p �

hn,p [k]z −k

k=1

[9] Give all the steps for the construction of the L2 Ito stochastic integral for an adapted process f (t) w.r.t Brownian motion B(t) over a ﬁnite time interval [0, T ]. Also derive the fundamental properties of this integral, namely E

�

T 0

f (t)dB(t) = 0, E(

�

T

f (t)dB(t))2 = 0

�

0

T

E(f (t)2 )dt

[10] Derive the LevinsonDurbin algorithm for the calculating the forward and backward predictors of a stationary process x[n] with autocorrelation R[n] upto order p. Show that per order iteration, you require only O(p) multiplica tions as compared to O(p2 ) required if you were to calculate the order p predictor directly by solving the relevant optimum normal equations.

Advanced Probability and Statistics: Remarks and Problems

177 185

[11] Let X[n] be a stationary Gaussian process. Show that the entropy rate ¯ H(X) of this process deﬁned by ¯ H(X) = limN →∞ N −1 H(X[N ], X[N − 1], ..., X [1]) exists and in fact equals H(X(0)|X(−1), X(−2), ...) where if U, V are random vectors, then � � H(U ) = − fU (u)ln(fU (u))du, H(U |V ) = − fU V (u, v)ln(fU |V (u|v)dudv Hence, prove that ¯ H(X) = (1/2π)

�

2π

SX (ω)dω

0

where SX (ω) is the power spectral density of the process. [12] Apply Cramer’s large deviation principle to compute the optimal asymp totic false alarm error probability rate as the number of iid samples tends to ∞ as the KullbackLeibler/information theoretic distance between the two proba bility distributions. [24] Questions on transmission lines and waveguides

Attempt any ﬁve questions.

[1] Calculate the capacitance per unit length between two parallel transmis sion lines of cylindrical shape having radii a, b with a separation of d between their axes. Use the theory of functions of a complex variable to make this computation. Superdirectivity of a quantum antenna: Suppose that the electronpositron ﬁeld with wave operator ﬁeld ψ(t, r). The four current density is then given by J µ (x) = −eψ(x)∗ αµ ψ(x), αµ = γ 0 γ µ And the four vector potential generated by this four current density is then given by � Aµ (x) = J µ (x� )G(x − x� )d4 x�

Where

G(x) = (µ/4π)δ(x2 ) = (µ/4πr)δ(t − |r|/c) Is the causal Green’s function for the wave operator. Now using the above formula for the vector potential, the far ﬁeld four potential has the form � µ A (t, r) = (µ/4πr) J µ (t − r/c + rˆ.r� /c, r� )d3 r�

178 Advanced Probability and Statistics: Remarks and Problems 186CHAPTER 3. SOME STUDY PROJECTS ON APPLIED SIGNAL PROCESSING WIT It follows that as a function of frequency, the far ﬁeld four potential has the angular amplitude pattern � r.r� )d3 r� B µ (ω, r) = J µ (ω, r� )exp(jkˆ To evaluate the directional properties of the corresponding power pattern, we must ﬁrst choose a state |η > for the electronpositron system and compute �

S µν (ω, r) =< η|B µ (ω, r).B ν (ω, r)∗ |η >= < η|J µ (ω, r1 )J ν (ω, r2 )∗ |η > exp(jkr. ˆ (r1 − r2 ))d3 r1 d3 r2

In order to obtain superdirectional properties of the radiated ﬁeld, we must prepare the state |η > so that the above quantum average is large when µ = ν. First observe that in terms of the creation and annihilation operators of the electronpositron ﬁeld, the Dirac wave operator ﬁeld is given by � ψ(t, r) = [u(P, σ)a(P, σ)exp(−i(E(P )t−P.r))+ v(P, σ)b(P, σ)∗ exp(i(E(P )t−P.r))]d3 P

√ Where E(P ) = m2 + P 2 . We then ﬁnd that the temporal Fourier transform of The four current density J µ (t, r) = −eψ(t, r)∗ αµ ψ(t, r) is given by the con volution � µ J (ω, r) = (−e/2π) ψ(ω � − ω, r)αµ ψ(ω � , r)dω � R

Where ψ(ω, r), the temporal Fourier transform of ψ(t, r) is given by � ψ(ω, r) = (2π)−1 [u(P, σ)a(P, σ)exp(iP.r)δ(ω−E(P ))+

v(P, σ)b(P, σ)∗ exp(−iP.r)δ(ω+E(P ))]d3 P

In our CDRA case, we have to modify this formula slightly. The possible fre quencies of the Dirac ﬁeld are not a continuum E(P ), P ∈ R3 but rather a discrete set ω(mnp) = E(P (mnp)) and at a given oscillation frequency ω(mnp), the Dirac ﬁeld contributes an amount ψmnp (ω, r) = χ1 (mnp, r)δ(ω − ω(mnp))a(mnp) If we consider the corresponding negative frequency terms also (ie, radiation from both electrons and positrons), then the result is ψmnp (ω, r) = χ1 (mnp, r)δ(ω−ω(mnp))a(mnp)+χ2 (mnp, r)δ(ω+ω(mnp))b(mnp)∗ The result of performing the above convolution is then J µ (ω, r) = χ1 (mnp, r)

Advanced Probability and Statistics: Remarks and Problems

179 187

Periodogram is an inconsistent estimator of the power spectrum N −1 � X(n)exp(−jωn)|2 SˆN (ω) = N −1 | n=0

X(n) is a stationary zero mean Gaussian process. Then, if R(n) is absolutely summable, E[SˆN (ω)] → S(ω), N → ∞ E(SˆN (ω1 )SˆN (ω2 )) = N −2

N −1 �

E(X(n)X(m)X(k)X(l))exp(−jω1 (n−m)−jω2 (k−l))

nmkl=0

Now,

E(X(n)X(m)X(k)X(l)) = R(n−m)R(k−l)+R(n−k)R(m−l)+R(n−l)R(m−k)

The ﬁrst term based on this decomposition converges to S(ω1 )S(ω2 ). The second

term is

N −2

N −1 �

nmkl=0

R(n − k)R(m − l)exp(−jω1 (n − m) − jω2 (k − l))

� R(a)R(b)exp(−jω1 (k + a − l − b) − jω2 (k − l)) = N −2 � R(a)R(b)exp(−j(ω1 + ω2 )(k − l)).exp(−jω1 (a − b)) = N −2

with the summation range of the indices being

or equivalently,

0 ≤ k, l, a + k, b + l ≤ N − 1,

|a|, |b| ≤ N −1, max(0, −a) ≤ k ≤ min(N −1, N −1−a), max(0, −b) ≤ l ≤ min(N −1, N −1−b)

It is easy to see that this term can be expressed as [

N −1 �

a=−(N −1)

R(a)exp(−j(ω1 +ω2 )(N −a)/2)exp(−jω1 a).sin(ω(N −|a|−1)/2)/N.sin(ω/2)] ×[a < −−− > b, ω1 < −−− > ω2 ]

where [a < − − −b >, ω1 < − − − > ω2 ] denotes the same as the previous but with the indicated interchanges. This term evidently converges to zero as N → ∞ for positive ω1 , omega2 . The third and last term evaluates to N −1 �

n,l=0

R(n − l)exp(−j(ω1 n − ω2 l)).

N −1 �

m,k=0

R(m − k).exp(j(ω1 m − ω2 k))

180 Advanced Probability and Statistics: Remarks and Problems 188CHAPTER 3. SOME STUDY PROJECTS ON APPLIED SIGNAL PROCESSING WI [25] Further topics in statistical signal processing [1] Generalize the LevinsonDurbin algorithm for vector valued AR wide sense stationary processes deﬁned by X(t) = −

p �

k=1

Ap (k)X(t − k) + W (t)

where X(t) ∈ RM and A(1), ..., A(p) ∈ RM ×M . Show that the optimal normal equations are assuming that W (t) is iid N (0, I) are given by R[k] +

p �

m=1

Ap [m]R[k − m] = E[p]δ[k], k = 0, 1, ..., p

where R[k] = E(X(t)X(t − k)T ) ∈ RM ×M Use the block Toeplitz and block centrosymmetry properties of the autocorre lation matrix ((R[k − m]))1≤k,m≤p ∈ RM p×M p to obtain order recursive solutions for the matrix coeﬃcients Ap [k], k = 1, 2, ..., p. Let Bp [k] = Ap [p + 1 − k], k = 1, 2, ..., p Then, let

Let

⎛

0 Jp = ⎝ 0 ...IM

0 0 .. 0 0 ... 0 0 ...

0 IM 0

⎞ IM 0 ⎠ ∈ RM p×M p 0

Sp = ((R[m − k]))1≤k,m≤p ] ∈ RM p×M p Then where

Jp Sp Jp = S˜p , Jp2 = IM p S˜p = ((R[k − m]))1≤k,m≤p

Then, let Ap = [IM , Ap [1], Ap [2], ..., Ap [p]] ∈ RM ×M (p+1) Then, the optimal normal equations can be expressed as Ap Sp+1 = [E[p], 0, ..., 0] Then, Ap Jp+1 Jp+1 Sp+1 Jp+1 = [E[p], 0, ..., 0]Jp+1 or equivalently,

Bp S˜p+1 = [0, 0, .., E[p]]

Advanced Probability and Statistics: Remarks and Problems

181 189

where Bp = [Ap [p], Ap [p − 1], ..., Ap [1], IM ] Thus,

Bp [1 : p]S˜p + R[p : 1] = 0

Note that Bp [1 : p] = Ap [1 : p]Jp On the other hand, we have that Ap+1 Sp+2 = [E[p + 1], 0, ..., 0] or equivalently, deﬁning Ap+1 [1 : p + 1] = [Ap+1 [1], ..., Ap+1 [p + 1]], we deduce that Ap+1 [1 : p]Sp+1 + Ap+1 [p + 1]R[−p − 1 : −1] = [E[p + 1], 0, ..., 0] where R[−p − 1 : −1] = [R[−p − 1], R[−p], ..., R[−1]] Also, R[1 : p + 1] + Ap+1 [1 : p + 1]Sp+1 = 0 Note that Ap+1 = [I, Ap+1 [1], ..., Ap+1 [p + 1]] = [I, Ap+1 [1 : p + 1]] and hence Ap+1 [1 : p]Sp + Ap+1 [p + 1]R[−p : −1] + R[1 : p] = 0 On the other hand, for an arbitrary M × M matrix Kp+1 , we have from the above pth order equations, R[1 : p] + Ap [1 : p]Sp = 0 ˜p [1 : p] to be the solution to We deﬁne B ˜p [1 : p]Sp = 0, R[−1 : −p]Jp + B or equivalently,

˜p [1 : p]Sp = 0 R[−p : −1] + B

Likewise, we deﬁne A˜p [1 : p] to be the solution to

R[−1 : −p] + A˜p [1 : p]S˜p = 0 Note that

˜p [1 : p] = A˜p [1 : p]Jp B

182 190CHAPTER 3. SOME Advanced andON Statistics: Remarks and Problems STUDYProbability PROJECTS APPLIED SIGNAL PROCESSING W Then, ˜p [1 : p])Sp = −R[1 : p] + Kp+1 R[−1 : −p]Jp (Ap [1 : p] − Kp+1 B It follows by comparing the above equations that if we take Kp+1 so that Kp+1 R[−1 : −p]Jp = −Ap+1 [p + 1]R[−p : −1] then we are assured that ˜p [1 : p] Ap+1 [1 : p] = Ap [1 : p] − Kp+1 B However, noting that R[−1 : −p]Jp = R[−p : −1] the condition reduces to simply Kp+1 = −Ap+1 [p + 1] Further, we have deﬁned Bp = Ap Jp+1 = [Ap [p], ..., Ap [1], IM ] = [Bp [1 : p], IM ] so that

Bp S˜p+1 = [0, 0, .., E[p]]

and this gives us

Bp [1 : p]S˜p + R[p : 1] = 0

and hence ˜ p+1 A˜p [1 : p])S˜p = −(R[p : 1] − K ˜ p+1 R[−1 : −p]) (Bp [1 : p] − K On the other hand, we have that Bp+1 S˜p+2 = [0, ..., 0, E[p + 1]] which implies that Bp+1 [1 : p + 1]S˜p+1 + R[p + 1 : 1] = 0 or equivalently, Bp+1 [2 : p + 1]S˜p + Bp+1 [1]R[−1 : −p] = −R[p : 1] Thus, if we take ˜ p+1 = −Bp+1 [1] = −Ap+1 [p + 1] = Kp+1 K then we would get Bp+1 [2 : p + 1] = Bp [1 : p] − Kp+1 A˜p [1 : p]

Advanced Probability and Statistics: Remarks and Problems We also have

183 191

R[−1 : −p] + A˜p [1 : p]S˜p = 0

so that

R[−1 : −p − 1] + A˜p+1 [1 : p + 1]S˜p+1 = 0

which gives A˜p+1 [1 : p]S˜p + A˜p+1 [p + 1]R[p : 1] = −R[−1 : −p] and further, (A˜p [1 : p] − Lp+1 Bp [1 : p])S˜p = −R[−1 : −p] + Lp+1 R[p : 1] so that if we deﬁne Lp+1 = −Ap+1 [p + 1] = Kp+1 then we would get A˜p+1 [1 : p] = A˜p [1 : p] − Kp+1 Bp [1 : p]

[26] Reduce Dirac’s relativistic wave equation in a radial potential to the the problem of solving two coupled ordinary second order linear diﬀerential equations in a single radial variable r. Solve these by the power series method. Now add quantum noise in the form of superposition of the derivatives of the creation and annihilation processes for the vector potential and superposition of the creation and annihilation processes for the scalar potential (start with the noisy vector potential expressed in terms of creation and annihilation process derivatives and apply the Lorentz gauge condition to derive the expression for the noisy scalar potential). Then incorporate quantum Ito’s correction terms in the HudsonParthasarathy noisy Dirac equation and assuming the bath to be in a coherent state, obtain formulas upto O(e2 ) for the transition probability of the bound Dirac electron subject to quantum noise from one stationary state to another. [27] Notion of group algebra for a ﬁnite group and a representation of the group algebra in terms of a representation of the ﬁnite group. [28] Write down the Lie algebra representations of the standard generators of SO(3) acting on diﬀerentiable functions deﬁned on R3 and show that these generartors are precisely the angular momentum operators in quantum mechan ics, ie, the x, y, z components of the operators −ir × �. [29] The standard generators of the Lie algebra sl(2, C) are denoted by X, Y, H. They satisfy the commutation relations [H, X] = 2X, [H, Y ] = −2Y, [X, Y ] = H

184 Probability and Remarks andPROCESSING Problems 192CHAPTER 3. SOME Advanced STUDY PROJECTS ONStatistics: APPLIED SIGNAL WIT From these commutation relations, derive all the ﬁnite dimensional irreducible representations of sl(2, C) and hence of SL(2, C) and SU (2, C). Show that SL(2, C) is the complexiﬁcation of SU (2, C) and also of SL(2, R). These are upto isomorphism, the only two nonconjugate real forms of SL(2, C). The former is a compact real form while the latter is a noncompact real form. [30] Some problems in group representation theory. [1] Deﬁne the Weyl group for a semisimple Lie algebra. First deﬁne and prove the existence of a Cartan subalgebra, ie, a maximal Abelian algebra and show that all its elements in the adjoint representation are semisimple. This is true only for semisimple Lie algebras. The Weyl group is deﬁned as the normailizer of the Cartan subalgebra. Show that the Weyl group is generated by the set of reﬂections corresponding to simple roots w.r.t. any positive system of roots. Show that the Weyl group acts dually on the set of roots by permuting them. Give examples of complex simple Lie algebras, their Cartan subalgebras and the associated Weyl groups. Consider sl(n, C), so(n, C), sp(n, C). Use a convenient basis for these simple Lie algebras for constructing the root vectors and the Cartan subalgebras. Show that every complex semisimple Lie algebra has upto to conjugacy equivalence just one Cartan subalgebra. Show by taking the example of sl(2, R), that real semisimple Lie algebras can have more than one nonconjugate Cartan subalgebra. [2] Show that given a representation π of a semisimple Lie algebra g, and a vector v in the vector space V in which the representation π acts, suppose v is a cyclic vector and that π(H)v = λ(H)v∀H ∈ h, ie, v ∈ Vλ for some λ ∈ h∗ And π(Xi )v = 0, i = 1, 2, . . . , l, then if N− is the subalgebra of the universal enveloping algebra G of g. then V = π(N− )v. Hence deduce that π is a representation with weights, ie, we can write � V = Vµ µ∈h

Where Vµ = {w ∈ V : π(H)w = µ(H)w∀H ∈ h} Show that dimVλ = 1 ie V = C.v Show further that if π is an irreducible representation, and if w ∈ V is any vector such that π(Xk )w = 0, 1 ≤ k ≤ l, then w ∈ Vλ . [3] Let π be a representation of a Lie algebra g in V and let v be a nonzero vector in V . Let G denote the universal enveloping algebra of g and deﬁne M = {a ∈ G : π(a)v = 0}

Advanced Probability and Statistics: Remarks and Problems

185 193

Show that M is an ideal in G. Deﬁne a map T : G/mathf rakM → V By T (a + M) = π(a)v, a ∈ G Show that T is a well deﬁned linear map between vector spaces and that T is oneone. Let M1 be an ideal in G containing M. Then, show that W = T (M1 /M) consists of all elements π(a)v, a ∈ M1 in V and is therefore a π invariant subspace of V . Hence deduce that if π is an irreducible representation, then T is a bijection and M1 is either M or G, ie, in other words, M is a maximal ideal in G. Conversely, show that If π is an arbitrary representation and M as deﬁned above is a maximal ideal of G with v as a πcyclic vector then π is an irreducible representation. In fact, show that if v is πcyclic for any representation π, then T is a bijection and the mapping M1 /M → T (M1 ) Is a bijection between the set of all maximal ideals M1 of G containing M and the set of all πinvariant subspaces of V . Give examples of inﬁnite dimensional Lie algebras having ﬁnite dimensional nontrivial representations. Show that if G is a simply connected Lie group and H is a discrete normal subgroup, then the Group π1 (G/H) is isomorphic to H. Hint: First show that if γ : [0, 1] → G/H is any continuous curve and if p : G → G/H is the canonical projection, then there exists a unique continuous curve γ˜ : [0, 1] → G such that poγ˜ = γ. For this, you must make use of the discreteness of H. Now let γ : [0, 1] → G/H be a closed curve starting at H. Then, deﬁne M (γ) = γ˜ (1). Prove that if C is the equivalence class of all Closed curves in G/H starting at H and with composition deﬁned in the usual way that one deﬁnes in homotopy, ie γ1 oγ2 (t) = γ1 (2t), 0 ≤ t ≤ 1/2 and = γ2 (2t − 1) for 1/2 ≤ t ≤ 1, then C becomes a group and that M is an isomorphism from C to H. To show that, you must make use of the simple connectedness of G. Indeed, let γ ∈ C and let γ˜ : [0, 1] → G be the uniquely deﬁned continuous curve as above. Then, if γ˜ (1) = e, it follows from the simple connectedness of G that γ˜ is homotopic to the constant curve γ˜0 (t) = e, 0 ≤ t ≤ 1 from which it follows that γ = poγ˜ is homotopic to γ0 = poγ˜0 . This proves the injectivity of M and the surjectivity of M is obvious. Note the way in which we make use of the normality of H in G: Only because H is normal, it follows that G/H is a group owing to which, if we deﬁne the composition of two curves γ˜1 and γ˜2 in G both starting at e by the rule γ˜ = γ˜1 oγ˜2 , where γ˜ (t) = γ˜1 (t) For 0 ≤ t ≤ 1/2 and γ˜ (t) = γ˜ (1)oγ˜2 (2t − 1), 1/2 ≤ t ≤ 1, then it follows that p(˜ γ1 oγ˜2 ) = p(˜ γ1 )op(˜ γ2 )

186 Advanced Probability ON and APPLIED Statistics: Remarks Problems 194CHAPTER 3. SOME STUDY PROJECTS SIGNALand PROCESSING WIT [31] New topics in plasmonic antennas [1] Quantum Boltzmann equation, Version 1: We have a creation annihilation operator ﬁeld in three momentum space a(K), a(K)∗ , K ∈ R3 . These satisfy the canonical commutation relations (CCR): [a(K), a(K ' )∗ ] = δ 3 (K − K ' ) Let ρ denote the density matrix of the system (at time t = 0). Under the Heisenberg matrix mechanics, ρ remains a constant while a(K), a(K)∗ evolve with time to a(K, t), a(K, t)∗ . Let H denote the Hamiltonian of the system. Then, a(K, t) = exp(itH)a(K).exp(−itH) a(K, t)∗ = exp(itH)a(K)∗ .exp(−itH) Alternately, if we adopt the Schrodinger wave mechanics picture, the operators a(K), a(K)∗ remain constant with time while the density ρ evolves with time to ρ(t) = exp(−itH)ρ.exp(itH) We assume that ρ = exp(−βH)/Z(β), Z(β) = T r(exp(−βH)), ie, ρ is the canonical Gibbs density. Then, ρ remains constant with time under the Schrodinger picture. Deﬁne the Wignerdistribution function f (r, K, t) = T r(ρ(t).a(K + r/τ )a(K − r/τ )∗ ) We assume that J J H = g(K)a(K)∗ a(K)d3 K+ h(K1 , K2 )a(K1 )∗ a(K1 )a(K2 )∗ a(K2 )d3 K1 d3 K2 and we take ρ(0) as the Gaussian state J ρ(0) = exp(−β g(K)a(K)∗ a(K)d3 K)/Z(β) We calculate ∂f (r, K, t)/∂t = T r(−i[H, ρ(t)]a(K + r/τ )a(K − r/τ )∗ ) = iT r(ρ(t)[H, a(K + r/τ )a(K − r/τ )∗ ]) Now, [H, a(K+r/τ )a(K−r/τ )∗ ] = [H, a(K+r/τ )]a(K−r/τ )∗ +a(K+r/τ )[H, a(K−r/τ )∗ ] Now, [H, a(K + r/τ )] = +

J

J

g(K ' )[a(K ' )∗ a(K ' ), a(K + r/τ )]d3 K '

h(K1 , K2 )[a(K1 )∗ a(K1 )a(K2 )∗ a(K2 ), a(K + r/τ )]d3 K1 d3 K2

Advanced Probability and Statistics: Remarks and Problems

187 195

Now, [a(K � )∗ a(K � ), a(K+r/τ )] = [a(K � )∗ , a(K+r/τ )]a(K � ) = −δ 3 (K−K � +r/τ )a(K � ), [a(K1 )∗ a(K1 )a(K2 )∗ a(K2 ), a(K + r/τ )] = [a(K1 )∗ , a(K+r/τ )]a(K1 )a(K2 )∗ a(K2 )+a(K1 )∗ a(K1 )[a(K2 )∗ , a(K+r/τ )]a(K2 ) = −δ 3 (K − K1 + r/τ )a(K1 )a(K2 )∗ a(K2 ) − δ 3 (K − K2 + r/τ )a(K1 )∗ a(K1 )a(K2 ) Thus,

−

[H, a(K + r/τ )] = −

�

�

g(K � )δ 3 (K − K � + r/τ )a(K � )d3 K �

h(K1 , K2 )(δ 3 (K−K1 +r/τ )a(K1 )a(K2 )∗ a(K2 )

−δ 3 (K−K2 +r/τ )a(K1 )∗ a(K1 )a(K2 ))d3 K1 d3 K2 � = −g(K + r/τ )a(K + r/τ ) − a(K + r/τ ) h(K + r/τ, K2 ))a(K2 )∗ a(K2 )d3 K2 −(

�

h(K1 , K + r/τ )a(K1 )∗ a(K1 )d3 K1 )a(K + r/τ )

−g(K + r/τ )a(K + r/τ ) − {a(K + r/τ ),

�

h(K + r/τ, K � )a(K � )∗ a(K � )d3 K � }

Note that without any loss of generality, we are assuming that h(K1 , K2 ) = h(K2 , K1 ) [32] HartreeFock equations for an N electron atom

The Hamiltonian is given by

H=

N �

k=1

H0k +

�

Vkj

1≤k)

where φ1 and φ2 are normalized position space wave functions and C1 , C2 , C3 are normalization constants given by 1/C12 = 2(1− < φ1 , φ2 >) = 1/C22 , 1/C32 = 4(1+ < φ1 , φ2 >) assuming the wave functions to be real. [33] Quantum Boltzmann equation for indistinguishable particles in a system in the presence of an external quantum electromagnetic ﬁeld ρ(t) is the state of the system and bath. The total Hamiltonian of the system and bath has the form H = Hs + HB + VsB where Hs is the system Hamiltonian, HB the bath ﬁeld Hamiltonian and VSB is the interaction Hamiltonian between the system and bath. The system Hilbert space is N � Hn HS = n=1

th

where Hn is the Hilbert space of the n particle in the system and these Hilbert spaces are identical copies. The bath Hilbert space is HB . Hs acts in the system Hilbert space and is therefore to be identiﬁed as Hs ⊗ IB . HB acts in the bath Hilbert space and is therefore to be identiﬁed with Is ⊗ HB while VsB acts in Hs ⊗ HB and can therefore be expressed as � Vsk ⊗ VBk VsB = k

where Vsk acts in Hs while VBk acts in HB . The system Hamiltonian has the form N � � Hs = Hsn + Vnk n=1

1≤n 0, for ﬁxed k, we have that for all N > N0 (�, k) � supy |g(y/N ) − (2k + 1)−1 g(x/N )| < � x:|x−y|≤k

since a continuous function on a compact interval is also uniformly continuous. Thus, we get � � limN →∞ N −1 g(x/N )η(x) − N −1 g(x/N )¯ ηk,x = 0 x

x

More generally, we can replace (1/N )

� x

g(x/N )F (τx η)

197 205

Advanced Probability and Statistics: Remarks and Problems in view of the regularity of g on [0, 1] by � (1/N ) g(x/N )(2k + 1)−1 x

�

F (τy η)

y:|y−x|≤k

� � and we can then replace (2k+1)−1 y:|y−x|≤k F (τy η) by Fˆ ((2k+1)−1 y:|y−x|≤k τy η) if we assume that the η(x)� s are independent Bernoulli random variables with corresponding means ρ(x/N ) and we deﬁne Fˆ (ρ) = Eρ F (η) where if η = (η(x) : x ∈ ZN ), then Eρ F (η) is the mean of F (η(x) : x ∈ ZN ) with η(x)� s being independent Bernoulli with means ρ(x)� s. This is an elementary consequence of the law of large numbers. It is valid provided that we take k suﬃciently large. In proving hydrodynamic scaling limits, the kind of F (η) that we encounter is typically F (η) = η(0)(1 − η(1)) which results in F (τx η) = η(x)(1 − η(x + 1)) and then � (1/N ) g(x/N )η(x)(1 − η(x + 1)) x

can be replaced by

(1/N )

� x

g(x/N )ρ(x/N )(1 − ρ((x + 1)/N ))

from which the hydrodynamical scaling limit for nearest neighbour interactions can be derived. More generally, if the jump probabilities p(z) have ﬁnite range, then � g(x/N )p(z)η(x)(1 − η(x + z)) − − − − − (a) (1/N ) x,z

can be represented by (1/N )

�

g(x/N )F (τx η)

x

where F (η) =

� z

p(z)η(0)(1 − η(z))

and then we get that (a) can be replaced by � (1/N ) g(x/N )p(z)ρ(x/N )(1 − ρ((x + z)/N ) x

which immediately leads to the hydrodynamical scaling limit which is clearly linear in the symmetric case, ie, when p(z) = p(−z). [35] Some historical remarks: The RLS lattice algorithm involving compu tation of the prediction ﬁlter coeﬃcients recursively both in order and time was

198 Advanced Probability and Statistics: Remarks and Problems 206CHAPTER 3. SOME STUDY PROJECTS ON APPLIED SIGNAL PROCESSING WIT ﬁrst accomplished by the Greek engineers ”Carayannis,Manolakis and Kaloupt sidis” in a pioneering paper published in the the IEEE transactions on Signal processing sometime during the eighties. It was polished further and stability analysis carried out by the team of Thomas Kailath, an IndoAmerican engi neer. Thomas Kailath and his team members also formulated the RLS lattice algorithm for multidimensional time series models. The ﬁnal algorithm for the RLS lattice algorithm for multivariate time series with application to cyclosta tionary process prediction was carried out by Mrityunjoy Chakraborty in his Ph.D thesis. A.Paulraj invented the ESPRIT algorithm for estimating the fre quencies in a noisy harmonic process based on rotationally invariant techniques. This work appeared in the IEEE transactions on signal processing sometime in the mid eighties along with Richard Roy and Thomas Kailath. The idea was a computationally eﬃcient high resolution eigensubspace based algorithm for estimating the sinusoidal frequencies or equivalently the directions of arrival of multiple plane wave signals. It revolutionized the entire defence and astronomy industry. Later on higher dimensional versions of the MUSIC and the ESPRIT algorithm were invented by the author along in his PhD thesis along with his supervisors. These contributions are present in the papers [1] Harish Parthasarathy, S.Prasad and S.D.Joshi, ”A MUSIC like method for bispectrum estimation”, Signal Processing, Elsevier, North Holland, 1994. [2] Harish Parthasarathy, S.Prasad and S.D.Joshi, ”An ESPRIT like algo rithm for quadaratic phase coupling estimation”, IEEE transactions on Signal Processing, 1995. The scientiﬁc contributions of Subramaniyam Chandrasekhar: Chandraskehar as college student, in the early 1940’s discovered a remark able consequence of combining relativity, gravitation and quantum mechanics:by applying ideas from these newly developed ﬁelds to study the equilibrium of a star that has exhausted it fuel and is contracting under the inﬂuence of grav itation with a counter repulsive force caused by the pressure exerted by the degenerate electron gas within the star owing to the Paul exclusion principle. In this way, Chandraskehar arrived at a fundmental limiting radius of a star that has exhausted all its fuel. Such a star is called a white dwarf and Chan drasekhar’s calculations showed that this limiting radius is around 1.5 times the mass of the sun. The idea behind Chandraskehar’s calculation can be under stood as follows. All the electrons in the star after its fuel has been exhausted have energies below the Fermi level and hence if n(p) denotes the number of electrons per unit volume in phase space and me the rest mass of an electron, then the pressure of this degenerate electron gas is from the basic formula for the energymomentum tensor � (Paµ Paν /Ea )δ 3 (x − xa ) T µν = a

given by (note that

�

T µν d4 x is Lorentz invariant) � P = n(p)(p2 /E(p))d3 p p2 /2me