Adaptive Wireless Communications: MIMO Channels and Networks 1107033209, 9781107033207

Adopting a balanced mix of theory, algorithms and practical design issues, this comprehensive volume explores cutting-ed

345 11 6MB

English Pages xx+598 [619] Year 2013

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Adaptive Wireless Communications: MIMO Channels and Networks
 1107033209,  9781107033207

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Adaptive Wireless Communications Adopting a balanced mix of theory, algorithms, and practical design issues, this comprehensive volume explores cutting-edge applications in adaptive wireless communications, and the implications these techniques have for future wireless network performance. Presenting practical concerns in the context of different strands from information theory, parameter estimation theory, array processing, and wireless communications, the authors present a complete picture of the field. Topics covered include advanced multipleantenna adaptive processing, ad hoc networking, MIMO, MAC protocols, space-time coding, cellular networks, and cognitive radio, with the significance and effects of both internal and external interference a recurrent theme throughout. A broad, self-contained technical introduction to all the necessary mathematics, statistics, estimation theory and information theory is included, and topics are accompanied by a range of engaging end-of-chapter problems. With solutions available online, this is the perfect self-study resource for students of advanced wireless systems, and wireless industry professionals. Daniel W. Bliss is an Associate Professor in the School of Electrical, Computer and Energy Engineering at Arizona State University. Siddhartan Govindasamy is an Assistant Professor of Electrical and Computer Engineering at Franklin W. Olin College of Engineering, Massachusetts.

“An excellent and well-written book. This book is a must for any wireless PHY system engineer.” Vahid Tarokh, Harvard University “Great book! Fills a gap in the wireless communication textbook arena with its comprehensive signal-processing focus. It does a nice job of handling the breadth-vs-depth trade-off in a topic-oriented textbook, and is perfect for beginning graduate students or practicing engineers who want the best of both worlds: broad coverage of both old and new topics, combined with mathematical fundamentals and detailed derivations. It provides a great single-reference launching point for readers who want to dive into wireless communications research and development, particularly those involving multi-antenna applications. It will become a standard prerequisite for all my graduate students.” A. Lee Swindlehurst, University of California, Irvine

Adaptive Wireless Communications MIMO Channels and Networks DANIEL W. BLISS Arizona State University

SIDDHARTAN GOVINDASAMY Franklin W. Olin College of Engineering, Massachusetts

   Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, S˜ao Paulo, Delhi, Mexico City Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9781107033207  C Dan Bliss and Siddhartan Govindasamy 2013 Dan Bliss’s contributions are a work of the United States Government and are not protected by copyright in the United States. This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2013 Printed and bound in the United Kingdom by the MPG Books Group A catalogue record for this publication is available from the British Library Library of Congress Cataloguing in Publication data Bliss, Daniel W., 1966– Adaptive wireless communications : MIMO channels and networks / Daniel W. Bliss, Siddhartan Govindasamy. pages cm Includes bibliographical references and index. ISBN 978-1-107-03320-7 (hardback) 1. MIMO systems. 2. Wireless communication systems. 3. Adaptive signal processing. I. Govindasamy, Siddhartan. II. Title. TK5103.4836.B54 2013 621.384 – dc23 2012049257 ISBN 978-1-107-03320-7 Hardback Additional resources for this publication at www.cambridge.org/bliss

Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.

The views expressed are those of the author (D. W. B.) and do not reflect the official policy or position of the Department of Defense or the U.S. Government.

Contents

Preface Acknowledgments

page xvii xviii

1

History 1.1 Development of electromagnetics 1.2 Early wireless communications 1.3 Developing communication theory 1.4 Television broadcast 1.5 Modern communications advances 1.5.1 Early packet-radio networks 1.5.2 Wireless local-area networks

1 1 2 5 6 7 9 10

2

Notational and mathematical preliminaries 2.1 Notation 2.1.1 Table of symbols 2.1.2 Scalars 2.1.3 Vectors and matrices 2.1.4 Vector products 2.1.5 Matrix products 2.2 Norms, traces, and determinants 2.2.1 Norm 2.2.2 Trace 2.2.3 Determinants 2.3 Matrix decompositions 2.3.1 Eigen analysis 2.3.2 Eigenvalues of 2 × 2 Hermitian matrix 2.3.3 Singular-value decomposition 2.3.4 QR decomposition 2.3.5 Matrix subspaces 2.4 Special matrix forms 2.4.1 Element shifted symmetries 2.4.2 Eigenvalues of low-rank matrices 2.5 Matrix inversion 2.5.1 Inversion of matrix sum

12 12 12 12 14 16 17 19 19 19 19 21 21 22 22 23 24 26 26 26 27 28

viii

Contents

2.6

2.7 2.8

2.9

2.10

2.11 2.12

2.13 2.14

3

Useful matrix approximations 2.6.1 Log determinant of identity plus small-valued matrix 2.6.2 Hermitian matrix raised to large power Real derivatives of multivariate expressions 2.7.1 Derivative with respect to real vectors Complex derivatives 2.8.1 Cauchy–Riemann equations 2.8.2 Wirtinger calculus for complex variables 2.8.3 Multivariate Wirtinger calculus 2.8.4 Complex gradient Integration over complex variables 2.9.1 Path and contour integrals 2.9.2 Volume integrals Fourier transform 2.10.1 Useful Fourier relationships 2.10.2 Discrete Fourier transform Laplace transform Constrained optimization 2.12.1 Equality constraints 2.12.2 Inequality constraints 2.12.3 Calculus of variations Order of growth notation Special functions 2.14.1 Gamma function 2.14.2 Hypergeometric series 2.14.3 Beta function 2.14.4 Lambert W function 2.14.5 Bessel functions 2.14.6 Error function 2.14.7 Gaussian Q-function 2.14.8 Marcum Q-function Problems

Probability and statistics 3.1 Probability 3.1.1 Bayes’ theorem 3.1.2 Change of variables 3.1.3 Central moments of a distribution 3.1.4 Noncentral moments of a distribution 3.1.5 Characteristic function 3.1.6 Cumulants of distributions 3.1.7 Multivariate probability distributions 3.1.8 Gaussian distribution 3.1.9 Rayleigh distribution 3.1.10 Exponential distribution

28 28 29 29 30 33 34 35 38 38 39 40 42 44 45 46 48 48 48 51 53 57 58 58 59 61 61 62 63 63 63 63 66 66 66 67 68 69 70 70 70 71 72 73

3.2

3.3

3.4 3.5 3.6 3.7

3.8

4

Contents

ix

3.1.11 Central χ2 distribution 3.1.12 Noncentral χ2 distribution 3.1.13 F distribution 3.1.14 Rician distribution 3.1.15 Nakagami distribution 3.1.16 Poisson distribution 3.1.17 Beta distribution 3.1.18 Logarithmically normal distribution 3.1.19 Sum of random variables 3.1.20 Product of Gaussians Convergence of random variables 3.2.1 Convergence modes of random variables 3.2.2 Relationship between modes of convergence Random processes 3.3.1 Wide-sense stationary random processes 3.3.2 Action of linear-time-invariant systems on wide-sense stationary random processes 3.3.3 White-noise processes Poisson processes Eigenvalue distributions of finite Wishart matrices Asymptotic eigenvalue distributions of Wishart matrices 3.6.1 Marcenko–Pastur theorem Estimation and detection in additive Gaussian noise 3.7.1 Estimation in additive Gaussian noise 3.7.2 Detection in additive Gaussian noise 3.7.3 Receiver operating characteristics Cramer–Rao parameter estimation bound 3.8.1 Real parameter formulation 3.8.2 Real multivariate Cramer–Rao bound 3.8.3 Cramer–Rao bound for complex parameters Problems

73 75 76 77 78 78 79 79 80 81 81 82 83 86 88

Wireless communications fundamentals 4.1 Communication stack 4.2 Reference digital radio link 4.2.1 Wireless channel 4.2.2 Thermal noise 4.3 Cellular networks 4.3.1 Frequency reuse 4.3.2 Multiple access in cells 4.4 Ad hoc wireless networks 4.4.1 Achievable data rates in ad hoc wireless networks 4.5 Sampled signals Problems

88 89 91 92 92 94 95 95 96 98 99 99 102 105 116 118 118 119 122 123 125 127 128 132 134 137 138

x

Contents

5

Simple channels 5.1 Antennas 5.2 Line-of-sight attenuation 5.2.1 Gain versus effective area 5.2.2 Beamwidth 5.3 Channel capacity 5.3.1 Geometric interpretation 5.3.2 Mutual information 5.3.3 Additive Gaussian noise channel 5.3.4 Additive Gaussian noise channel with state 5.4 Energy per bit Problems

141 141 143 144 147 149 149 156 159 162 165 168

6

Antenna arrays 6.1 Wavefront 6.1.1 Geometric interpretation 6.1.2 Steering vector 6.2 Array beam pattern 6.2.1 Beam pattern in a plane 6.3 Linear arrays 6.3.1 Beam pattern symmetry for linear arrays 6.3.2 Fourier transform interpretation 6.3.3 Continuous Fourier transform approximation 6.4 Sparse arrays 6.4.1 Sparse arrays on a regular lattice 6.4.2 Irregular random sparse arrays 6.5 Polarization-diverse arrays 6.5.1 Polarization formulation Problems

170 170 172 173 174 176 179 182 182 184 186 186 188 196 196 198

7

Angle-of-arrival estimation 7.1 Maximum-likelihood angle estimation with known reference 7.2 Maximum-likelihood angle estimation with unknown signal 7.3 Beamscan 7.4 Minimum-variance distortionless response 7.5 MuSiC 7.6 Example comparison of spatial energy estimators 7.7 Local angle-estimation performance bounds 7.7.1 Cramer–Rao bound of angle estimation 7.7.2 Cramer–Rao bound: signal in the mean 7.7.3 Cramer–Rao bound: random signal 7.8 Threshold estimation 7.8.1 Types of transmitted signals 7.8.2 Known reference signal test statistic 7.8.3 Independent Rician random variables

201 203 205 205 207 208 210 211 211 212 214 218 219 219 221

Contents

7.9

7.8.4 Correlated Rician random variables 7.8.5 Unknown complex Gaussian signal Vector sensor Problems

xi

226 231 235 237

8

MIMO channel 8.1 Flat-fading channel 8.2 Interference 8.2.1 Maximizing entropy 8.3 Flat-fading MIMO capacity 8.3.1 Channel-state information at the transmitter 8.3.2 Informed-transmitter (IT) capacity 8.3.3 Uninformed-transmitter (UT) capacity 8.3.4 Capacity ratio, cI T /cU T 8.4 Frequency-selective channels 8.5 2 × 2 Line-of-sight channel 8.6 Stochastic channel models 8.6.1 Spatially uncorrelated Gaussian channel model 8.6.2 Spatially correlated Gaussian channel model 8.7 Large channel matrix capacity 8.7.1 Large-dimension Gaussian probability density 8.7.2 Uninformed transmitter spectral efficiency bound 8.7.3 Informed transmitter capacity 8.8 Outage capacity 8.9 SNR distributions 8.9.1 Total power 8.9.2 Fractional loss 8.10 Channel estimation 8.10.1 Cramer–Rao bound 8.11 Estimated versus average SNR 8.11.1 Average SNR 8.11.2 Estimated SNR 8.11.3 MIMO capacity for estimated SNR in block fading 8.11.4 Interpretation of various capacities 8.12 Channel-state information at transmitter 8.12.1 Reciprocity 8.12.2 Channel estimation feedback Problems

239 239 241 242 243 245 247 252 256 258 259 264 265 266 270 270 271 272 275 275 277 279 281 283 286 287 287 289 290 291 291 292 293

9

Spatially adaptive receivers 9.1 Adaptive spectral filtering 9.1.1 Discrete Wiener filter 9.2 Adaptive spatial processing 9.2.1 Spatial matched filter 9.2.2 Minimum-interference spatial beamforming

295 298 298 300 301 303

xii

Contents

9.3

9.4 9.5

9.6 9.7

10

11

9.2.3 MMSE spatial processing 9.2.4 Maximum SINR SNR loss performance comparison 9.3.1 Minimum-interference beamformer 9.3.2 MMSE beamformer MIMO performance bounds of suboptimal adaptive receivers 9.4.1 Receiver beamformer channel Iterative receivers 9.5.1 Recursive least squares (RLS) 9.5.2 Least mean squares (LMS) Multiple-antenna multiuser detector 9.6.1 Maximum-likelihood demodulation Covariance matrix conditioning Problems

311 313 315 317 318 322 323 328 328 331 333 333 337 339

Dispersive and doubly dispersive channels 10.1 Discretely sampled channel issues 10.2 Noncommutative delay and Doppler operations 10.3 Effect of frequency-selective fading 10.4 Static frequency-selective channel model 10.5 Frequency-selective channel compensation 10.5.1 Eigenvalue distribution of space-time covariance matrix 10.5.2 Space-time adaptive processing 10.5.3 Orthogonal-frequency-division multiplexing 10.6 Doubly dispersive channel model 10.6.1 Doppler-domain representation 10.6.2 Eigenvalue distribution of space-time-frequency covariance matrix 10.7 Space-time-frequency adaptive processing 10.7.1 Sparse space-time-frequency processing Problems

341 342 344 345 348 348 349 353 354 356 357 358 361 362 362

Space-time coding 11.1 Rate diversity trade-off 11.1.1 Probability of error formulation 11.1.2 Outage probability formulation 11.2 Block codes 11.2.1 Alamouti’s code 11.2.2 Orthogonal space-time block codes 11.3 Performance criteria for space-time codes 11.4 Space-time trellis codes 11.4.1 Trellis-coded modulation 11.4.2 Space-time trellis coding 11.5 Bit-interleaved coded modulation

365 365 366 367 369 371 373 374 376 376 376 381

Contents

11.6 11.7 11.8 11.9

12

13

11.5.1 Single-antenna bit-interleaved coded modulation 11.5.2 Multiantenna bit-interleaved coded modulation 11.5.3 Space-time turbo codes Direct modulation Universal codes Performance comparisons of space-time codes Computations versus performance Problems

2 × 2 Network 12.1 Introduction 12.2 Achievable rates of the 2 × 2 MIMO network 12.2.1 Single-antenna Gaussian interference channel 12.2.2 Achievable rates of the MIMO interference channel 12.3 Outer bounds of the capacity region of the Gaussian MIMO interference channel 12.3.1 Outer bounds to the capacity region of the single-antenna Gaussian interference channel 12.3.2 Outer bounds to the capacity region of the Gaussian interference channel with multiple antennas 12.4 The 2 × 2 cognitive MIMO network 12.4.1 Non-cooperative primary link 12.4.2 Cooperative primary link Problems Cellular networks 13.1 Point-to-point links and networks 13.2 Multiple access and broadcast channels 13.3 Linear receivers in cellular networks with Rayleigh fading and constant transmit powers 13.3.1 Link lengths in cellular networks 13.3.2 General network model 13.3.3 Antenna-selection receiver 13.3.4 Matched filter 13.3.5 Linear minimum-mean-square-error receiver 13.3.6 Laplacian of the interference 13.4 Linear receivers in cellular networks with power control 13.4.1 System model 13.4.2 Optimality of parallelized transmissions with link CSI 13.4.3 Asymptotic spectral efficiency of parallelized system 13.4.4 Application to power-controlled systems without out-ofcell interference 13.4.5 Monte Carlo simulations 13.5 Matched-filter receiver in power-controlled cellular networks

xiii

381 382 384 385 386 388 388 390 392 392 393 393 397 399 399 405 408 409 412 412 414 414 414 422 423 425 425 427 429 432 436 437 438 442 445 446 448

xiv

Contents

13.6

14

13.5.1 Application to power-controlled systems with out-of-cell interference Summary Problems

Ad hoc networks 14.1 Introduction 14.1.1 Capacity scaling laws of ad hoc wireless networks 14.2 Multiantenna links in ad hoc wireless networks 14.2.1 Asymptotic spectral efficiency of ad hoc wireless networks with limited transmit channel-state information and minimum-mean-square-error (MMSE) receivers 14.2.2 Spatially distributed network model 14.2.3 Asymptotic spectral efficiency without transmit channel-state information 14.2.4 Maximum-signal-to-leakage-plus-noise ratio receiver 14.3 Linear receiver structures in spatially distributed networks 14.3.1 Linear MMSE receivers in Poisson networks 14.3.2 Laplacian of the interference in Poisson networks and matched-filter and antenna-selection receivers 14.4 Interference alignment Problems

449 467 467 470 470 470 475

476 478 480 482 484 484 485 487 491

15

Medium-access-control protocols 15.1 The need for medium-access control 15.2 The ALOHA protocol 15.3 Carrier-sense multiple access (CSMA) 15.3.1 CSMA with collision avoidance (CSMA/CA) 15.4 Non-space-division multiple-access protocols 15.5 Space-division multiple-access (SDMA) protocols 15.5.1 Introduction 15.5.2 A simple SDMA protocol 15.5.3 SPACE-MAC 15.5.4 The reciprocity assumption 15.5.5 Ward protocol 15.5.6 Summary of some existing SDMA protocols Problems

495 495 496 498 499 504 504 504 506 507 509 509 513 518

16

Cognitive radios 16.1 Cognitive radio channel 16.1.1 Cooperative cognitive links 16.2 Cognitive spectral scavenging 16.2.1 Orthogonal-frequency-division multiple access 16.2.2 Game-theoretical analysis

520 521 522 522 523 523

Contents

16.3

16.4

Legacy signal detection 16.3.1 Known training sequence 16.3.2 Single-antenna signal energy detection 16.3.3 Multiple-antenna legacy signal detection Optimizing spectral efficiency to minimize network interference 16.4.1 Optimal SISO spectral efficiency 16.4.2 Optimal MIMO spectral efficiency Problems

xv

524 524 524 534 538 540 542 545

17

Multiple-antenna acquisition and synchronization 17.1 Flat-fading MIMO model 17.2 Flat-fading MIMO delay-estimation bound 17.3 Synchronization as hypothesis testing 17.3.1 Motivations for test statistic approaches 17.4 Test statistics for flat-fading channels 17.4.1 Correlation 17.4.2 MMSE beamformer 17.4.3 Generalized-likelihood ratio test 17.4.4 Spatial invariance 17.4.5 Comparison of performance Problems

547 548 548 550 551 552 552 553 554 556 557 557

18

Practical issues 18.1 Antennas 18.1.1 Electrically small antennas 18.1.2 Crossed polarimetric array 18.2 Signal and noise model errors 18.3 Noise figure 18.4 Local oscillators 18.4.1 Accuracy 18.4.2 Phase noise 18.5 Dynamic range 18.5.1 Quantization 18.5.2 Finite precision 18.5.3 Analog nonlinearities 18.5.4 Adaptive gain control 18.5.5 Spurs 18.6 Power consumption

559 559 560 560 560 561 561 562 562 563 564 565 567 568 568 568

References Index

569 589

Preface

In writing this text, we hope to achieve multiple goals. Firstly, we hope to develop a textbook that is useful as a reference for graduate or a supplement to advanced undergraduate classes investigating advanced wireless communications. These topics include adaptive antenna processing, multiple-input multipleoutput (MIMO) communications, and wireless networks. Throughout the text, there is a recurring theme of understanding and mitigating both internal and external interference. In addressing these areas of investigation, we explore concepts in information theory, estimation theory, signal processing, and implementation issues as are applicable. We attempt to provide a development covering these topics in a reasonably organized fashion. While not always possible, we attempt to be consistent in notation across the text. In addition, we provide problem sets that allow students to investigate these topics more deeply. Secondly, we attempt to organize the topics addressed so that this text will be useful as a reference. To the extent possible, each chapter will be reasonably self-contained, although some familiarity with the topic area is assumed. To aid the reader, reviews of many of the mathematical tools needed within the text are collected in Chapters 2 and 3. In addition, an overview of the basics of communications theory is provided in Chapters 4 and 5. Finally, in discussing these topics, we attempt to address a wide range of perspectives appropriate for the serious student of the area. Topics range from information theoretic bounds, to signal processing approaches, to practical implementation constraints. While there are many wonderful texts (and here we only list a subset) that address many of the topics of wireless communications [355, 280, 287, 314, 115, 324, 251, 255, 203], networks [100, 62], signal processing [275, 297, 238, 220, 204], array processing [294, 223, 205, 312, 248, 189], MIMO communications [247, 331, 160, 45, 22, 183, 84], information theory [68, 202, 212], estimation theory [312, 172, 297], and the serious researcher may wish to collect many of these texts, we hope that the particular collection and presentation of topics is uniquely useful to the research in advanced communications.

Acknowledgments

I would like to thank and remember Professor David Staelin of Massachusetts Institute of Technology, whose interests and insights encouraged the authors to work together. I would like to thank my coauthor, who worked tirelessly with me to write this text. I would like to particularly thank Keith Forsythe of MIT Lincoln Laboratory, from whom I learned an immense amount over the years. A number of the concepts discussed in this text were developed by him or in collaboration with him. I will always be in debt for all that I learned from him. I would also like to thank (or blame) Jim Ward of MIT Lincoln Laboratory who encouraged me to write this text. Actually, I would like to thank everyone in the Advanced Sensor Techniques Group at MIT Lincoln Laboratory. I have learned something from every one of you. We thank the many individuals who have contributed comments and suggestions for the book: Pat Bidigare, Nick Chang, Glenn Fawcett, Jason Franz, Alan Fenn, Anatoly Goldin, Tim Hancock, Gary Hatke, Yichuan Hu, Scott Johnson, Josh Kantor, Nick Kaminski, Paul Kolodzy, Shawn Kraut, Raymond Louie, Adam Margetts, Matt McKay, Cory Myers, Peter Parker, Thomas Stahlbuhk, Vahid Tarokh, Gary Whipple, and Derek Young. In particular, we thank Bruce McGuffin and Ameya Agaskar who provided a significant number of comments. We would like to thank Dorothy Ryan for all her many helpful comments. To the folks at the Atomic Bean Cafe off of Harvard Square, thanks for all the espressos and for letting me spend many, many, many hours writing there. Finally, I would like to thank my family for their support. To my wife Nadya, and daughter Coco you may see more of me now. You can decide if that is good or bad. Dan Bliss Cambridge, MA

Acknowledgments

xix

I would like to thank and remember Professor David H. Staelin, formerly of the Massachusetts Institute of Technology for his inspiration, guidance and mentorship, and in particular for introducing me to my coauthor. I would like to thank my coauthor for his insight, mentorship and for being the driving force behind this book. I would also like to thank my former colleague at MIT, Danielle Hinton, in particular for insightful discussions on multiantenna protocols. I am grateful to my colleagues at Olin College including Brad Minch, Mark Somerville, and Vin Manno, for their encouragement and general discussions, both technical and non-technical. I would also like to thank my students and former students at Olin College, in particular Yifan Sun, Annie Martin, Rachel Nancollas, Katarina Miller, Jacob Miller, Jeff Hwang, Sean Shi, Elena Koukina, Yifei Feng, Rui Wang, Raghu Rangan, Tom Lamar, Avinash Uttamchandani, Ashley Lloyd, Junjie Zhu, and Chloe Egthebas for their direct and indirect contributions to this work, and in particular for helping me refine my presentation of some of the material that has made its way into the book. Finally, I would like to thank Alo, Antariksh, my parents, parents-in-law, siblings, and the rest of my family for their patience and tireless support. Siddhartan Govindasamy Natick, MA

1

History

For better or worse, wireless communications have become integrated into many aspects of our daily lives. When communication systems work well, they almost magically enable us to access information from distant, even remote, sources. If one were to take a modern “smart” phone a couple of hundred years into the past, one would notice a couple of things very quickly. First, most of the capability of the phone would be lost because a significant portion of the phone’s capabilities are based upon access to a communications network. Second, being burned at the stake as a witch can make for a very bad day. There are many texts that present the history of wireless communications in great detail, for example in References [186, 48, 146, 304, 61]. Many of the papers of historical interest are reprinted in Reference [348]. Because of the rich history of wireless communications, a comprehensive discussion would require multiple texts on each area. Here we will present an abridged introduction to the history of wireless communications, focusing on those topics more closely aligned with the technical topics addressed later in the text, and we will admittedly miss numerous important contributors and events. The early history of wireless communications covers development in basic physics, device physics and component engineering, information theory, and system development. Each of these aspects is important, and modern communication systems depend upon all of them. Modern research continues to develop and refine components and information theory. Economics and politics are an important part of the history of communications, but they are largely ignored here.

1.1

Development of electromagnetics While he was probably not the first to make the observation that there is a relationship between magnetism and electric current, the Danish physicist Hans Christian Ørsted observed this relationship in 1820 [239] and ignited investigation across Europe. Most famously, he demonstrated that current flowing in a wire would cause a compass to change directions. Partly motivated by Ørsted’s results, the English physicist and chemist Michael Faraday made significant advancements in the experimental understanding of electromagnetics [304] in the early 1800s. Importantly for our purposes, he showed that changing current in

2

History

one coil could induce current in another remote coil. While this inductive coupling is not the same as the electromagnetic waves used in most modern wireless communications, it is the first step down that path. The Scottish physicist James Clerk Maxwell made amazing and rich contributions to a number of areas of physics. Because of his contributions in the area of electromagnetics [211], the fundamental description of electromagnetics bears his name. While Maxwell might not immediately recognize them in this form, Maxwell’s equations in international system of units (“SI”) [290, 178] are the fundamental representation of electromagnetics and are given by ∇·d=ρ ∇·b=0

∂b ∂t ∂d , (1.1) ∇×h=j+ ∂t where ∇ indicates a vector of spatial derivatives, · is the inner product, × is the cross product, t is time, ρ is the charge density, j is the current density vector, d is the electric displacement vector, e is the electric field vector, b is the magnetic flux density vector, and h is the magnetic field vector. The electric displacement and electric field are related by ∇×e=−

e = ǫd b = μh,

(1.2)

where ǫ is the permittivity and μ is the permeability of the medium. These are the underpinnings of all electromagnetic waves and thus modern communications. In 1888, the German physicist Heinrich Rudolf Hertz convincingly demonstrated the existence of the electromagnetic waves predicted by Maxwell [144, 178]. To demonstrate the electromagnetic waves, he employed a spark-gap transmitter. At the receiver, the electromagnetic waves coupled into a loop with a very small gap across which a spark would appear. The spark-gap transmitter with various modifications was a standard tool for wireless communications research and systems for a number of following decades.

1.2

Early wireless communications In the late 1800s, significant and rapid advances were made. Given the proliferation of wireless technologies and the penetration of these technologies into every area of our lives, it is remarkable that before the late 1800s little was known about even the physics of electromagnetics. Over the years, there have been various debates over the primacy of the invention of wireless communications. Who invented wireless communications often comes down to a question of semantics. How many of the components do you need before you call it a radio? As is often

1.2 Early wireless communications

3

true in science and engineering, it is clear that a large number of individuals performed research in the area of wireless communications or, as it was often called, wireless telegraphy. The following is an incomplete list of important contributors. In 1872, before Hertz’s demonstration, a patent was issued to the American1 inventor and dentist Mahlon Loomis for wireless telegraphy [193]. While his system reportedly worked with some apparent reliability issues, his contributions were not widely accepted during his life. This lack of acceptance was likely partly due to his inability to place his results in the scientific context of the time. In 1886, American physicist Amos Emerson Dolbear, received a patent for a wireless communication system [82]. This patent later became a barrier to the enforcement of Guglielmo Marconi’s patents on wireless communications in the United States, until the Marconi Company purchased Dolbear’s patent. It is worth noting this demonstration was also before Hertz’s demonstration. In 1890, the French physicist Edouard Eugene Desire Branly developed an important device used to detect electromagnetic waves. The so-called “coherer” employed a tube containing metal filings filling a gap between two electrodes and exploited a peculiar phenomenon of these filings [279]. When exposed to radio-frequency signals, the filings would fuse or cling together, thus reducing the resistance across the electrodes. British physicist Sir Oliver Joseph Lodge refined the coherer by adding a “trembler” or “decoherer” that mechanically disrupted the fused connections. Many of the early experiments in wireless communications employed variants of the coherer. The Serbian-born, American engineer Nicola Tesla was one of those largerthan-life characters. He made significant contributions to a number of areas of engineering, but with regard to our interests, he received a patent for wireless transmission of power in 1890 [309] and demonstrated electromagnetic transfer of energy in 1893 [310]. Tesla is rightfully considered one of the significant contributors to the invention of wireless communications. Bengal-born Indian scientist Jagdish Chandra Bose contributed significantly to a number of areas of science and engineering. He was one of the early researchers in wireless communication and developed an improved coherer. In 1885, he demonstrated radio communication with a rather dramatic flair [107]. By using a wireless communication link, he remotely set off a small explosive that rang a bell. His improved coherer was a significant contribution to wireless communications. His version of the coherer replaced the metal filings with a metal electrode in contact with a thin layer of oil that was floating on a small pool of mercury. When exposed to radio-frequency signals, the conductivity across the oil film would change. Marconi used a similar coherer for his radio system. The German physicist Karl Ferdinand Braun developed a number of important technologies that contributed to the usefulness of wireless communication. He developed tuned circuits for radio systems, the cat’s whisker detector (really an 1

Throughout this chapter we employ the common, if imprecise, usage of “American” to indicate citizen of the United States of America.

4

History

early diode), and directional antenna arrays. In 1909, he shared the Nobel Prize in physics with Guglielmo Marconi for his contributions. The Russian physicist Alexander Stepanovich Popov presented results on his version of a coherer to the Russian Physical and Chemical Society on May 7th, 1895 [304]. He demonstrated links that transmitted radio waves between buildings. As an indication of the importance of this technology to society, in the Russian Federation, May 7th is celebrated as Radio Day. The Italian engineer Guglielmo Marconi, began research in wireless communications in 1895 [304] and pursued a sustained, intense, and eventually wellfunded research and development program for many years to follow. He received the Nobel Prize in physics (with Karl Ferdinand Braun) in 1909 for his contributions to the development of wireless radios [304]. While he is not the inventor of radio, as is sometimes suggested, his position as principal developer cannot be dismissed. His research, development, and resulting company provided the impetus to the commercialization of wireless communications. In 1896 Marconi moved to England, and during that and the following year he provided a number of demonstrations of the technology. In 1901, he demonstrated a transatlantic wireless link, and in 1907 he established a regular transatlantic radio service. A somewhat amusing (or annoying if you were Marconi) public demonstration of the effects of potential interference in wireless communications was provided in 1903 by British magician and inventor Nevil Maskelyne [146]. Maskelyne was annoyed with Marconi’s broad patents and his claims of security in his wireless system. During a public demonstration of Marconi’s system for the Royal Institution, Maskelyne repeatedly transmitted the Morse code signal “rats” and other insulting comments which were received at the demonstration of the system, which was supposedly immune to such interference. Previously, in 1902, Maskelyne had developed a signal interception system that was used to receive signals from Marconi’s ship-to-shore wireless system. Marconi had claimed his system was immune to such interception because of the precise frequency tuning required for reception. In the first few decades of the twentieth century, wireless communication quickly evolved from a technical curiosity to useful technology. An important technology that enabled widespread use of wireless communication was amplification. The triode vacuum-tube amplifier was developed by American engineer Lee de Forest. He filed a patent for the triode (originally called the de Forest valve) in 1907 [95]. The triode enabled increased power at transmitters and increased sensitivity at receivers. It was the fundamental technology until the development of the transistor decades later. In the late 1910s, a number of experimental radio broadcast stations were constructed [304]. In the early 1920s, the number of radio broadcast stations exploded, and wireless communications began its integration into everyday life. During the Second World War, the concept of tactical communications underwent dramatic development. The radios became small enough and sufficiently robust that a single soldier could carry them. It became common for various

1.3 Developing communication theory

5

military organizations to make wireless communications available to relatively small groups of soldiers, allowing the soldiers to operate with greater effectiveness and with access to external support. By the end of the Second World War, American soldiers had access to handheld “handie-talkies,” [265], such as the Motorola SCR-536 or BC-611, that are recognizable as the technical forebearers of modern handheld communications devices.

1.3

Developing communication theory In 1900, Canadian-born American engineer Reginald Aubrey Fessenden [304], employed a high-frequency spark-gap transmitter to transmit an audio signal by using amplitude modulation (AM). Fessenden also developed the concept of the heterodyne receiver at about the same time, although the device technology available at that time did not support its use. The heterodyne receiver would multiply the signal at the carrier frequency by a tone from a local oscillator, so that the beat frequency was within the audible frequency range. In 1918, American engineer Edwin Howard Armstrong extended the heterodyne concept, denoted the superheterodyne receiver, by having the mixed signal beat to a fixed intermediate frequency. A second stage then demodulates the intermediate frequency signal down to the audible frequency range. This approach and similar variants has become the standard for most of modern communications. In 1933, Armstrong also patented another important communications concept, frequency modulation (FM). If one had to pick the greatest single contribution to communications, most researchers would probably identify the formation of information theory [284], published in 1948 by American mathematician and engineer Claude Elwood Shannon. In his work, Shannon developed the limits on the capacity of a communication channel in the presence of noise. It is worth noting the contributions of American engineer Ralph Vinton Lyon Hartley, who developed bounds for the number of levels per sample with which a communication system can communicate at a given voltage resolution [138]. Hartley’s results were a precursor to Shannon’s results. The observation by Shannon that, even in non-zero noise, effectively errorfree communication was possible theoretically, increased the motivation for the development of error-correction codes. Examples of early block codes to compensate for noise were developed by Swiss-born American mathematician and physicist Marcel J. E. Golay [113] and American mathematician Richard Wesley Hamming [134]. Over time, a large number of error-correcting codes and decoding algorithms were developed. The best of these codes closely approached the Shannon limit. Developed during the Second World War and published in the 1949, American mathematician, zoologist, and philosopher Norbert Wiener presented statistical signal processing [346]. In his text, he developed the statistical signal processing

6

History

techniques that dominate signal processing to this day. Addressing a similar set of technical issues, prolific Russian mathematician Andrey Nikolaevich Kolmogorov published his results in 1941 [176]. A frequency-hopping modulation enables a narrowband system to operate over a wider bandwidth by changing the carrier frequency as a function of time. A variety of versions of frequency hopping were suggested over time, and the identity of original developer of frequency hopping is probably lost because of the secrecy surrounding this modulation approach. However, in what must be considered a relative unexpected source of contribution to communication modulation technology, a frequency hopping patent, was given to Austrian-born American actress Hedy Lamarr (filed as Hedy Kiesler Markey) and American composer George Antheil [208]. The technology exploited a piano roll as a key to select carrier frequencies of a frequency-hopping system. As opposed to frequency hopping, direct-sequence spread spectrum (DSSS) modulates a relatively narrowband signal with a wideband sequence. The receiver, knowing this sequence, is able to recover the original narrowband signal. This technology is exploited by code-division multiple-access (CDMA) approaches to enable the receiver to disentangle the signals sent from multiple users at the same time and frequency. The origins of direct-sequence spread spectrum are partly a question of semantics. An early German patent was given to German engineers Paul Kotowski and Kurt Dannehl for a communications approach that modulated voice with a rotating generator [278]. This approach has a loose similarity to the digital spreading techniques used by modern communication systems. In the early 1950s, for direct-sequence spread-spectrum communications, the noise modulation and correlation (NOMAC) system was developed and demonstrated at Massachusetts Institute of Technology Lincoln Laboratory [338]. In 1952, the first tests of the communication system were performed. The system drew heavily from the doctoral dissertation of American engineer Paul Eliot Green, Jr. [338], who was one of the significant contributors to the NOMAC system at Lincoln Laboratory. Because direct-sequence spread-spectrum systems are spread over a relatively wide bandwidth, they can temporally resolve multipath more easily. Consequently, the received signal can suffer from intersymbol interference. To compensate for this effect, in 1958, the concept of the rake receiver, developed by American engineers Robert Price and Paul Eliot Green, Jr. [254, 338], implemented channel equalization. During late 1950s, the ARC-50 radio was designed and tested [278]. Magnavox’s ARC-50 was an operational radio that is recognizable as a modern direct-sequence spread-spectrum system.

1.4

Television broadcast While television technology is not a focus of this text, its importance in the development of wireless technology cannot be ignored. Given the initial success

1.5 Modern communications advances

7

of wireless data and then voice radio communications, it didn’t take long for researchers to investigate the transmission of images. Because of the significant increase in the amount of information in a video image compared to voice, it took decades for a viable system to be developed. Early systems often involved mechanically scanning devices. German engineers Max Dieckmann and Rudolf Hell patented [81] an electrically scanning tube receiver that is recognizable in concept to televisions used for the following seventy years. Apparently, they had difficulty developing their concept to the point of demonstration. Largely self-taught, American engineer Philo Taylor Farnsworth, developed concepts for the first electronically scanning television receiver (“image dissector”) that he conceived as a teenager and for which he filed a patent several years later in 1927 [91]. In 1927, he also demonstrated the effectiveness of his approach. During a similar period of time, while working for Westinghouse Laboratories, Russian-born American engineer Vladimir K. Zworykin filed a patent in 1923 for his version of a tube-based receiver [365]. However, the U.S. Patent Office awarded primacy of the technology to Farnsworth. In 1939, RCA, the company for which Zworykin worked, demonstrated a television at the New York World’s Fair. Regular broadcasts soon began; these are often cited as the beginning of the modern television broadcast era.

1.5

Modern communications advances In the modern age of wireless communications, with a few notable exceptions, it is more difficult to associate particular individuals with significant advances. During this era, communications systems have become so complicated that large numbers of individuals contribute to any given radio. It is sometimes easier to identify individuals who made significant theoretical contributions. However, so many significant contributions have been made that here only a small subset of contributions are identified that are particularly salient to the discussions found in the text. While satellite communications are clearly wireless communications, this type of communication is not emphasized in this text. Nonetheless, the importance of satellite communications should not be underestimated. The first communication satellite, launched in 1958, was named signal communications orbit relay equipment (SCORE) [74]. It was developed under an Advanced Research Projects Agency (ARPA later renamed Defense ARPA or DARPA) program and demonstrated the viability of these satellites. It used both prerecorded and receive-and-forward messages that were broadcast on a shortwave signal. Italian-born American engineer Andrew James Viterbi made numerous contributions to wireless communications. However, his most famous contribution is the development in 1967 of what is now called the Viterbi algorithm [327].

8

History

This algorithm specified the decoding of convolutional codes via a dynamical program approach that tracks the most likely sequences of states. In some ways, this development marked the beginning of the modern era of communications. Both in terms of the improvement in receiver performance and the computational requirements for the receiver, this is a modern algorithm. From the time of Golay and Hamming, coding theory steadily advanced. By the end of the 1980s, coding had reached a point of diminishing returns. Over time, advances slowed and the focus of research was placed on implementations. However, coding research was reinvigorated in 1993 by the development of turbo codes by French engineers Claude Berrou and Alain Glavieux, and Thai engineer Punya Thitimajshima [19]. The principal contribution of these codes is that they enabled an iterative receiver that significantly improved performance. One of the defining moments of the modern era was in 1973 when American engineer Martin Cooper placed the first mobile phone call. His team at Motorola was the first to develop and integrate their wireless cellular phone system into the wired phone network [63]. It is somewhat amusing that Cooper’s first phone call was made to a competing group of cellular engineers at Bell Laboratories. While numerous researchers contributed significantly to this area of investigation, the Spanish-born American engineer Sergio Verdu is typically identified as principal developer of multiuser detection (MUD) [322]. In systems in which multiple users are transmitting signals at the same time and frequency, under certain conditions, a receiver, even with a single receive antenna, can disentangle the multiple transmitted signals by exploiting the structural differences in the waveforms of the various signals. Because of the computational complexity and potential system advances of multiuser detection, this is a quintessentially modern communications concept. Numerous researchers suggested multiple-antenna communications systems in a variety of contexts. These suggestions are both in the context of multiple antennas at either receiver or transmitter, and in terms of multiuser systems. For example multiple-input multiple-output (MIMO) systems were suggested by American engineers Jack H. Winters, Jack Salz, and Richard D. Gitlin [350, 351], and by Indian-born American engineers Arogyaswami J. Paulraj and Thomas Kailath [245]. Because he developed an entire system concept, the initial development of MIMO communications concepts is typically attributed to the American engineer Gerard Joseph Foschini who, in his 1996 paper [99], described a multiple-transmit and multiple-receive antenna communication system. In this system, the data were encoded across the transmit antennas, and the receiver disentangled these signals from the multiple transmit antennas. In order to exploit MIMO communications links, some sort of mapping from the information bits to the baseband signal must be used. These mappings are typically called space-time codes. The trivial approach employs a standard single-antenna modulation and then demultiplexes these signals among the multiple transmit antennas. However, this approach suffers from poor performance because the required signal-to-noise ratio (SNR) is set by the SNR from the

1.5 Modern communications advances

9

transmitter with the weakest propagation. The most basic concept for an effective space-time code is the Alamouti block code. This code, patented by Iranianborn American engineers Siavash M. Alamouti and Vahid Tarokh [7], is described in Reference [8]. Tarokh and his colleagues extended these concepts to include larger space-time block codes [305] and space-time trellis codes [307].

1.5.1

Early packet-radio networks The ALOHA system (also known as ALOHAnet), which was developed by Norman Abramson and colleagues at the University of Hawaii beginning in 1968 [3], was one of the first modern wireless networks. The system involved packet radio transmissions using transceivers distributed on several islands in Hawaii. The underlying communication protocol used in ALOHAnet is now commonly known as the ALOHA protocol. The ALOHA protocol uses a simple and elegant random-access technique well-suited to packet communications systems. This protocol is described in more detail in Section 15.2. ALOHAnet was operated in a star network configuration, where a central station routed packets from source to destination terminals. Ad hoc wireless networks, which are networks with no centralized control, received attention from the United States Department of Defense (DoD) starting in the early 1970s. The DoD was interested in such networks for their battlefield survivability and the reduced infrastructure requirements in battlefields, among other factors [103]. Through ARPA, the DoD developed several packet radio communications systems such as the packet radio network (PRNet), whose development began in 1972 [103]. This network was followed by a packet radio communication system for computer communications networks in San Francisco in 1975. RADIONET, as it was called [169], differed from ALOHAnet in that it had distributed control of the network management functions and used repeaters for added reliablity and increased coverage. Another notable feature of RADIONET is its use of spread-spectrum signaling to improve robustness against jamming. RADIONET was followed by several different efforts by DARPA through the 1970s and early 1980s to develop ad hoc wireless networks for military use. Notable among these is the low-cost packet radio (LPR) system, which was an outcome of DARPA’s Survivable Radio Networks (SURAN) program. LPR used digitally controlled spread-spectrum radios with packet switching operations implemented on an Intel 8086 microprocessor [103]. Another important development in the history of wireless networks is the development of the wired Ethernet protocol by Robert Metcalfe and colleagues at the Xerox Palo Alto Research Center (PARC) in the early to mid 1970s [214]. Ethernet used carrier-sense-multiple-access (CSMA) technology (described in more detail in Section 15.3) and by 1981 offered packet data communication rates of 10 Mbps in commercially available systems at relatively low cost. In contrast, wireless packet networks offered data rates of only a few thousand bits per second at reasonable costs and equipment size. The enormous data rates offered by

10

History

Ethernet at low cost perhaps reduced the interest in developing wireless networking technologies for commercial use. Interest in wireless networks for commercial use increased after the U.S. Federal Communications Consortium (FCC) established the industrial, scientific, and medical (ISM) frequency bands for unlicensed use in the United States in 1985. The ISM bands are defined in Section 15.247 of the Federal Communications Consortium rules. The freeing of a portion of the electromagnetic spectrum for unlicensed use sparked a renewed interest in developing wireless networking protocols [103]. Other major developments in the late 1980s and 1990s that increased interest in wireless networks were the increased use of portable computers, the internet, and significant reduction in hardware costs. Since portable computer users wanted to access the internet and remain portable, wireless networking became essential.

1.5.2

Wireless local-area networks In 1997, what may be considered that grand experiment in wireless communications was initiated. The standard IEEE 802.11 [150] or WiFi was finalized for use in the industrial, scientific, and medical frequency band. While wireless communications were available previously, WiFi established a standard that enabled moderately high data rates that could be integrated into interoperable devices. This personal local-area wireless networking standard allowed individuals to setup their own networks with relative ease at a moderate price. Over the years, a number of extensions to the original standard have been developed. Of particular interest is the development of IEEE 802.11n that was finalized in 2009 [151] (although many systems were developed using earlier drafts). This provided a standard for WiFi multiple-input multiple-output (MIMO) wireless communications. The IEEE 802.11 family of standards marked a turning point in the development of wireless networks as they were instrumental in making wireless local-area networks (W-LAN) ubiquitous throughout the world. Wireless LANs running some version of the IEEE 802.11 protocol have become so common that the term “WiFi,” commonly used to signify compatibility with the 802.11 standard, made its debut in the Webster’s New College Dictionary in 2005 [213]. Almost in parallel with the development of the IEEE 802.11 protocols, the European Telecommunications Standards Institute (ETSI) developed its own protocol for wireless networking called the HiperLAN (High Performance Radio LAN). HiperLAN/1 offered in excess of 20 Mb/s data transfer rate and thus had significantly higher data transmission rates than the existing IEEE 802.11 standard at the time [66]. The IEEE 802.11 standard incorporated a number of technical extensions that were both a good match to the computational capabilities of the time and provided paths to higher data rates in IEEE 802.11g. Over time, the HiperLAN standard lost market share to the IEEE 802.11 standards.

1.5 Modern communications advances

11

At some point between the years 2000 and 2010, a rather significant change occurred in the use of wireless communications. The dominant use of wireless communication links transitioned from broadcast systems such as radio or television to two-way, personal-use links such as mobile phones or WiFi. At that point, it became considered strange to not be in continuous wireless contact with the web. Not only did this change our relationship with information, possibly fundamentally altering the nature of the human condition, but it also changed forever the nature of trivia arguments held in bars and pubs around the world.

2

Notational and mathematical preliminaries

This chapter contains a number of useful definitions and relationships used throughout the text. In the remainder of the text, it is assumed that the reader has familiarity with these topics. In general, the relationships are stated without proof, and the reader is directed to dedicated mathematical texts for further detail [40, 180, 54, 117, 18, 217].

2.1

Notation

2.1.1

Table of symbols a∈S ∃x a∗ A† ∀x

2.1.2

a is an element of the set S there exists an x complex conjugate of a Hermitian conjugate1 of A for all x

(2.1)

Scalars A scalar is indicated by a non-bold letter such as a or A. Scalars can be integer Z, real R, or complex numbers C: a ∈ Z,

a ∈ R , or a ∈ C,

respectively. The square root of −1 is indicated by i, √ −1 = i .

(2.2)

(2.3)

The Euler formula for an exponential for some real angle α ∈ R in terms of radians is given by eiα = cos(α) + i sin(α) . 1

In some of the engineering literature this operator is indicated by ·H .

(2.4)

2.1 Notation

13

An arbitrary complex number a ∈ C can be expressed in terms of polar coordinates with a radius ρ ∈ R and an angle α ∈ R, a = ρ eiα = ρ cos(α) + i ρ sin(α),

(2.5)

where the real and imaginary parts of a are indicated by ℜ{a} = ρ cos(α)

ℑ{a} = ρ sin(α) ,

(2.6)

respectively. The complex conjugate of a variable is indicated by a∗ = (ρ ei α )∗ = ρ e−i α .

(2.7)

The value of i can also be expressed in an exponential form, i = ei π /2+i 2π m ∀ m ∈ Z .

(2.8)

Consequently, exponents of i can be evaluated. For example, the inverse of i is given by 1 = i−1 i = e−i π /2 = cos(−π/2) + i sin(−π/2) = 0 − i.

(2.9)

The logarithm of variable x ∈ R, assuming base a ∈ R, is indicated loga (x) ,

(2.10)

loga (ay ) = y ,

(2.11)

such that

under the assumption that a and y are real. When the base is not explicitly indicated,2 it is assumed that a natural logarithm (base e) is indicated such that log(x) = loge (x) .

(2.12)

The translation between bases a and b of variable x is given by logb (x) =

loga (x) loga (b)

= logb (a) loga (x) . 2

(2.13)

In some texts, it is assumed that log(x) indicates the logarithm base 10 or 2 rather than the natural logarithm assumed here.

14

Notational and mathematical preliminaries

The logarithm can be expanded about 1, log(1 + x) =

∞ 

m =1

(−1)m + 1

xm ; for x < 1 m

≈ x ; for small x.

(2.14)

Consequently, it can be shown for finite values of x  x n ex = lim 1 + . (2.15) n →∞ n For real base a the logarithm of a complex number z ∈ C, such that z can be represented in polar notation in Equation (2.5), z = ρ ei α ,

(2.16)

is given by loga (z) = loga (ρ) +

i (α + 2π m) ; log(a)

m ∈ Z.

(2.17)

For complex numbers, the logarithm is multivalued because any addition of a multiple of 2πi to the argument of the exponential provides an equal value for z. If the imaginary component produced by the natural (base e) logarithm is greater than −π and less than or equal to π, then it is considered the principal value. A Dirac delta function (which technically is not a function) [40] is generally used within the context of an integral, for example, with a real parameter a, the integral over the real variable x and well-behaved function f (x)  ∞ dx f (x) δ(x − a) = f (a) . (2.18) −∞

The floor and ceiling operators are indicated by ⌊·⌋ and ⌈·⌉ ,

(2.19)

which round down and up to the nearest integer, respectively. For example, the floor and ceiling of 3.7 are given by ⌊3.7⌋ = 3 and ⌈3.7⌉ = 4, respectively. For many problems the notion of convex or concave functions are useful. Over some range of a function, the function is considered convex if all line segments connecting any two points on the function are contained entirely within the area defined by the function. Similarly, over some range of a function, it is considered concave if all line segments connecting any two points on the function are outside the area defined by the function.

2.1.3

Vectors and matrices An important concept employed throughout the text is the notion of a vector space that is discussed in the study of linear algebra. Without significant discussion, we will assume that vector spaces employed within the text satisfy the

2.1 Notation

15

typical requirements of a Hilbert space, having inner products and norms. A vector is indicated by a bold lowercase letter. For example, a column n-vector of complex values is indicated by a ∈ Cn ×1 .

(2.20)

A row n-vector is indicated by a bold lowercase letter with an underscore, a ∈ C1×n .

(2.21)

(a)m , or {a}m .

(2.22)

The mth element in a is denoted

A matrix with m rows and n columns is indicated by a bold uppercase letter, for example M ∈ Cm ×n .

(2.23)

The element at the pth row and qth column of M is denoted (M)p,q or {M}p,q .

(2.24)

The complex conjugate of vectors and matrices is indicated by a∗ and M∗ ,

(2.25)

where conjugation operates on each element independently. The transpose of vectors and matrices is indicated by aT and MT ,

(2.26)

respectively. The Hermitian conjugate of vectors and matrices is indicated by a† = (aT )∗ and M† = (MT )∗ , respectively. A diagonal matrix is indicated by ⎛ a1 0 ⎜ 0 a2 ⎜ ⎜ diag{a1 , a2 , a3 , . . . , an } = ⎜ 0 0 ⎜ . ⎝ ..

(2.27)

0 0 a3

0

The Kronecker delta is indicated by 1 ; δm ,n = 0 ;

m=n . otherwise

··· ..

0

. an



⎟ ⎟ ⎟ ⎟. ⎟ ⎠

(2.28)

(2.29)

A Hermitian matrix is a square matrix that satisfies M† = M .

(2.30)

16

Notational and mathematical preliminaries

A useful property of Hermitian matrices is that they are positive semidefinite. A positive-semidefinite matrix M ∈ Cm ×m has the property that for any nonzero vector x ∈ Cm ×1 the following quadratic form is greater than or equal to zero, x† M x ≥ 0 .

(2.31)

A related matrix is the positive definite matrix that satisfies x† M x > 0 .

(2.32)

A unitary matrix is a square matrix that satisfies U† = U−1 , so that U† U = UU† = I ,

(2.33)

where the m × m identity matrix is indicated by Im , or I if the size is clear from context, such that {Im }p,q = δp,q

p ∈ {1, 2, · · · , m} , q ∈ {1, 2, · · · , m} .

(2.34)

The vector operation is a clumsy concept that maps matrices to vectors. It could be avoided by employing tensor operations. However, it is sometimes convenient for the sake of implementation to consider explicit conversions between matrices and vectors. The vector operation extracts elements along each column before moving to the next column. The vector operation of matrix M ∈ CM ×N is denoted vec(M) and is defined by {vec(M)}(n −1)M + m = {M}m ,n .

2.1.4

(2.35)

Vector products The inner product between real vectors is indicated by the dot product, a · b = aT b .

(2.36)

In this text, the inner product for complex vectors (or Hermitian inner product) is denoted by a† b .

(2.37)

The inner product can also be denoted by  {a}m {b}∗m . a, b =

(2.38)

m

Note that the conjugation of the order is switched between the vector notation and the bracket notation, such that ∗

a† b = a, b .

(2.39)

This switch is performed to be consistent with standard conventions. When using the phrase “inner product,” we will use both forms interchangeably. Hopefully, the appropriate conjugation will be clear from context.

2.1 Notation

17

While we will not be particularly concerned about the technical details, the higher-dimensional space in which the inner products are operating is sometimes referred to as a Hilbert space. This space can be extended to an infinite dimensional space. For example, a vector a can be indexed by the variable x, a → fa (x) ,

(2.40)

where the function is defined along the axis x. Inner products in this complex infinite-dimensional space are given by integrating over the indexing parameter. In this case it is x. The complex infinite-dimensional inner product between functions f (x) and g(x) that represent two infinite-dimensional vectors is denoted  (2.41) f (x), g(x) = dx f (x) g ∗(x) . With this form, a useful inequality can be expressed. The Cauchy–Schwarz inequality is given by 2

f (x), g(x) ≤ f (x), f (x) g(x), g(x) .

(2.42)

This concept can be extended to include a weighting or a measure over the variable of integration. For example, if the measure is p(x), then the inner product is given by  (2.43) f (x), g(x) = dx p(x) f (x) g ∗(x) . The outer product of two vectors a and b is indicated by a b† .

2.1.5

(2.44)

Matrix products For matrices A ∈ CM ×K , B ∈ CK ×N , and C ∈ CM ×N , the standard matrix product is given by C = AB  {C}m ,n = Am ,k Bk ,n .

(2.45)

k

For matrices A ∈ CM ×N , B ∈ CM ×N , and C ∈ CM ×N , the Hadamard or element-by-element product is denoted · ⊙ · such that C=A ⊙ B

{C}m ,n = {A ⊙ B}m ,n = Am ,n Bm ,n .

(2.46)

18

Notational and mathematical preliminaries

For matrices A ∈ CM ×N , B ∈ CJ ×K , and C ∈ CM J ×N K , the standard definition of the Kronecker product [130] is denoted · ⊗ · and is given by C=A ⊗ B ⎛ {A}1,1 B {A}1,2 B {A}1,3 B ⎜ {A}2,1 B {A}2,2 B {A}2,3 B ⎜ = ⎜ {A} B {A} B 3,1 3,2 ⎝ .. .

...



⎟ ⎟ ⎟. ⎠

(2.47)

This definition is unfortunate because it is inconsistent with the standard definition of the vector operation. As a consequence, the forms that include interactions between vector operations and Kronecker products are unnecessarily twisted. Nonetheless, we will use definitions that will keep with the traditional notation. A few useful relationships are given here: (A ⊗ B)T = AT ⊗ BT ∗











(A ⊗ B) = A ⊗ B (A ⊗ B) = A ⊗ B −1

(A ⊗ B)

=A

−1

(2.48) (2.49) (2.50)

⊗B

−1

,

(2.51)

where it is assumed that A and B are not singular for the last relationship. The Kronecker product obeys distributive and associative properties, (A + B) ⊗ C = A ⊗ C + B ⊗ C

(A ⊗ B) ⊗ C = A ⊗ (B ⊗ C) .

(2.52) (2.53)

The product of Kronecker products is given by (A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD) .

(2.54)

For square matrices A ∈ CM ×M and B ∈ CN ×N , tr{A ⊗ B} = tr{A} tr{B} N

|A ⊗ B| = |A| |B|

M

,

(2.55) (2.56)

where the trace and determinant are defined in Section 2.2. Note that the exponents M and N are for the size of the opposing matrix. The vector operation and Kronecker product are related by vec(a bT ) = b ⊗ a T

vec(A B C) = (C ⊗ A) vec(B) .

(2.57) (2.58)

If the dimensions of A and B are the same and the dimensions of C and D are the same, then the Hadamard and Kronecker products are related by (A ⊙ B) ⊗ (C ⊙ D) = (A ⊗ C) ⊙ (B ⊗ D) .

(2.59)

2.2 Norms, traces, and determinants

2.2

19

Norms, traces, and determinants In signal processing for multiple-antenna systems, determinants, traces, and norms are useful operations.

2.2.1

Norm The absolute value of a scalar and the L2-norm of a vector are indicated by either · or · 2 . We reserve the notation |.| exclusively for the determinant of a matrix. The absolute value of a scalar a is thus a , and the norm of a vector a is denoted as follows:

 √ (a)m 2 = a† a . (2.60) a = m

The p-norm of a vector for values other than 2 is indicated by 1/p   p (a)m , a p =

(2.61)

m

for p ≥ 1. The Frobenius norm of a matrix is indicated by

  M F = (M)m ,n 2 = tr{M M† } .

(2.62)

m ,n

2.2.2

Trace The trace of a square matrix M ∈ Cm ×m of size m is the sum of its diagonal elements and is indicated by  (M)m ,m . (2.63) tr{M} = m

The trace of a matrix is invariant under a change of bases. The product of two matrices commutes under the trace operation, tr{A B} = tr{B A} . This property can be extended to the product of three (or more) matrices such that tr{A B C} = tr{C A B} = tr{B C A} .

2.2.3

(2.64)

Determinants The determinant of a square matrix A is indicated by  (A)m ,n (−1)m + n |Mm ,n | , |A| = n

(2.65)

20

Notational and mathematical preliminaries

where submatrix Mm ,n is here defined to be the minor of A, which is constructed by removing the mth row and nth column of A (not to be confused with the mth, nth element of A or M). The determinant of a 2 × 2 matrix is given by    a b    (2.66)  c d  = ad − bc .

The determinant has a number of useful relationships. The determinant of the product of square matrices is equal to the product of the matrix determinants, |A B| = |A| |B| = |B A| .

(2.67)

For some scalar c ∈ C and matrix M ∈ Cm ×m , the determinant of the product is the product of the scalar to the mth power times the determinant of the matrix, |c M| = cm |M| .

(2.68)

The determinant of the identity matrix is given by |I| = 1 ,

(2.69)

and the determinant of a unitary matrix U is magnitude one, |U| = 1 .

(2.70)

Consequently, the determinant of the unitary transformation with unitary matrix U, defined in Equation (2.33), of a matrix A is the determinant of A, |U A U† | = |U A| |U† | = |U† U A|

= |A| .

(2.71)

The product of matrices plus the identity matrix commute under the determinant, |I + A B| = |I + B A| ,

(2.72)

where A ∈ Cm ×n and B ∈ Cn ×m are not necessarily square (although AB and BA are). The inverse of a matrix determinant is equal to the determinant of a matrix inverse, |M|−1 = |M−1 | .

(2.73)

The Hadamard inequality bounds the determinant of a matrix A whose mth column is denoted by am as follows  am . (2.74) |A| ≤ m

2.3 Matrix decompositions

21

Suppose that A is a positive-definite square matrix. We can write A in terms of another square matrix B as follows A = B† B

(2.75)

|A| = |B|∗ |B| = |B| 2 ≤

 m

 bm 2 = {A}m ,m ,

(2.76)

m

noting that b†m bm = {A}m ,m , and where the inequality is an application of Equation (2.74). That is to say, the determinant of a positive-definite matrix is less than or equal to the product of its diagonal elements.

2.3

Matrix decompositions

2.3.1

Eigen analysis One of the essential tools for signal processing is the eigenvalue decomposition. A complex square matrix M ∈ CM ×M has M eigenvalues and eigenvectors. The mth eigenvalue λm , based on some metric for ordering, such as magnitude, and the corresponding mth eigenvector vm of the matrix M are given by the solution of M v m = λ m vm .

(2.77)

Sometimes, for clarity, the mth eigenvalue of a matrix M is indicated by λm {M}. A matrix is denoted positive-definite if all the eigenvalues are real and positive (λm {M} > 0 ∀ m ∈ {1, . . . , M }), and is denoted positive-semidefinite if all the eigenvalues are positive or zero (λm {M} ≥ 0 ∀ m ∈ {1, . . . , M }). In cases in which there are duplicated or degenerate eigenvalues, the eigenvectors are only determined to within a subspace of dimension of the number degenerate eigenvalues. Any vector from an orthonormal basis in that subspace would satisfy Equation (2.77). The extreme example is the identity matrix, for which all the eigenvalues are the same. In this case, the subspace is the entire space; thus, any vector satisfies Equation (2.77). The sum of the diagonal elements of a matrix is indicated by the trace. The trace is also equal to the sum of the eigenvalues of the matrix,   (M)m ,m = λm . (2.78) tr{M} = m

m

The determinant of a matrix is equal to the product of its eigenvalues,  |M| = λm .

(2.79)

m

While, in general, for some square matrices A and B the eigenvalues of the sum do not equal the sum of the eigenvalues λm {A + B} =  λm {A} + λm {B} ,

(2.80)

22

Notational and mathematical preliminaries

for the special case of I + A the eigenvalues add, λm {I + A} = 1 + λm {A} .

2.3.2

(2.81)

Eigenvalues of 2 × 2 Hermitian matrix Given the 2 × 2 Hermitian matrix M ∈ C2×2 (that is a matrix that satisfies M = M† ),   a c∗ , (2.82) M= c b the eigenvalues of M can be found by exploiting the knowledge of eigenvalue relationships between the trace and the determinant. Because the matrix is Hermitian, the diagonal values (a and b) are real. The trace of M is given by tr{M} = λ1 + λ2 = a + b.

(2.83)

The determinant of the Hermitian matrix M is given by |M| = λ1 λ2

= ab − c 2 .

(2.84)

By combining these two results, the eigenvalues can be explicitly found. The eigenvalues are given by λ1 + λ 2 = a + b λ21

+ λ1 λ2 = (a + b)λ1 0 = λ2 − (a + b)λ + ab − c 2  a + b ± (a + b)2 − 4(ab − c 2 ) λ= 2  a + b ± (a − b)2 + 4 c 2 . = 2

(2.85)

As will be discussed in Section 2.3.3, Hermitian matrices constructed from quadratic forms are positive-semidefinite.

2.3.3

Singular-value decomposition Another important concept is the singular-value decomposition (SVD). The SVD of a matrix decomposes a matrix into three matrices: a unitary matrix, a diagonal matrix containing the singular values, and another unitary matrix, Q = U S V† ,

(2.86)

2.3 Matrix decompositions

23

where U and V are unitary matrices and the diagonal matrix ⎛

⎜ ⎜ S=⎜ ⎝

s1 0 0 .. .

0 s2 0

0 0 s3

··· ..

.

⎞ ⎟ ⎟ ⎟ ⎠

(2.87)

contains the singular values s1 , s2 , . . .. In the decomposition, there is sufficient freedom to impose the requirement that the singular values are real and positive. Note that the singular matrix S need not be square. In fact the dimensions of S are the same as the dimensions of Q since both the right and left singular matrices U and V are square. The mth column in either U or V is said to be the mth left-hand or right-hand singular vectors associated with the mth singular value, sm . The eigenvalues of the quadratic Hermitian form QQ† are equal to the square of the singular values of Q, Q Q† = U S V† V S† U† = U S S† U† ,

(2.88)

where SS† = diag{ s1 2 , s2 2 , . . .}. The columns of U are the eigenvectors of Q Q† . The eigenvalues of a Hermitian form QQ† are greater than or equal to zero, and thus the form QQ† is said to be positive-semidefinite, λm {QQ† } = (S S† )m ,m ≥ 0 .

(2.89)

Notationally, a matrix with all positive eigenvalues is said to be positive-definite, as defined in Equation (2.32), and is indicated by M > 0 → λm {M} > 0 ∀ m ,

(2.90)

and a positive-semidefinite matrix, as defined in Equation (2.31), is indicated by M ≥ 0 → λm {M} ≥ 0 ∀ m .

(2.91)

The rank of a matrix is the number of nonzero eigenvalues, rank{M} = #{m : λm {M} =  0} ,

(2.92)

where #{·} is used to indicate the number of entries that satisfy the condition.

2.3.4

QR decomposition Another common matrix decomposition is the QR factorization. In this decomposition, some matrix M is factored into a unitary matrix Q and an upper right-hand triangular matrix R, where an upper right-hand triangular matrix

24

Notational and mathematical preliminaries

has the form ⎛

⎜ ⎜ ⎜ R=⎜ ⎜ ⎝

r1,1 0 0 .. .

r1,2 r2,2 0

r1,3 r2,3 r3,3

··· ··· ··· .. .

r1,n r2,n r3,n

0

0

0

···

rn ,n



⎟ ⎟ ⎟ ⎟. ⎟ ⎠

(2.93)

If the matrix M has symmetric dimensions n × n, then the decomposition is given by M = QR.

(2.94)

For a rectangular matrix M ∈ Cm ×n with m > n, the QR decomposition can be constructed so that   R , (2.95) M=Q 0 where the upper triangular matrix has dimensions R ∈ Cn ×n , and the zero matrix 0 has dimensions (m − n) × n.

2.3.5

Matrix subspaces Given some vector space, subspaces are some portion of that space. This can be defined by some linear basis contained within the larger vector space. It is often useful to describe subspaces by employing projection operators that are orthogonal to any part of the vector space not contained within the subspace. Vector spaces can be constructed by either column vectors or row vectors depending upon the application. The matrix M ∈ Cm ×n can be decomposed into components that occupy orthogonal subspaces that can be denoted by the matrices MA ∈ Cm ×n and MA ⊥ ∈ Cm ×n such that M = MA + MA ⊥ .

(2.96)

The matrix MA can be constructed by projecting M onto the subspace spanned ′ by the columns of the matrix A ∈ Cm ×m whose number of rows is less than or equal to the number of columns, that is, m′ ≤ m. It is assumed here that A† A is invertible and that we are operating on the column space of the matrix M, although there is an equivalent row-space formulation. We can construct a projection matrix or projection operator PA ∈ Cm ×m that is given by PA = A (A† A)−1 A† .

(2.97)

For some matrix of an appropriate dimension B, this projection matrix operates on the column space of B by multiplying the operator by the matrix PA B.

2.3 Matrix decompositions

25

V

span (A) Figure 2.1 Illustration of projection operation.

As an aside, it is worth noting that projection matrices are idempotent, i.e., PA PA = PA . The matrix MA which is the projection of M onto the subspace spanned by the columns of A is given by MA = PA M .

(2.98)

The rank of MA is bounded by the number of rows in A, rank{MA } ≤ m′ .

(2.99)

The orthogonal projection matrix P⊥ A is given by P⊥ A = I − PA

= I − A (A† A)−1 A† .

(2.100)

We define the matrix MA ⊥ to be the matrix projected onto the basis orthogonal to A, MA ⊥ = P⊥ A M. Consequently, the matrix M can be decomposed into the matrices M = IM = (PA + P⊥ A) M = MA + MA ⊥ .

(2.101)

To illustrate, consider Figure 2.1. The projection matrix PA projects the vector v onto a subspace that is spanned by the columns of the matrix A which is illustrated by the shaded region. The projected vector is illustrated by the dashed arrow. The associated orthogonal projection P⊥ A projects the vector v onto the subspace orthogonal to that spanned by the columns of A resulting in the vector illustrated by the dotted arrow.

26

Notational and mathematical preliminaries

2.4

Special matrix forms In signal processing applications, a number of special forms of matrices occur commonly.

2.4.1

Element shifted symmetries Toeplitz matrices are of particular interest because they are produced in certain physical examples, and there are fast inversion algorithms for Toeplitz matrices. While for a general square matrix of size n it takes order n3 operations, a Toeplitz matrix can be inverted in order n2 operations [117]. An n × n Toeplitz matrix is a matrix in which the values are equal along diagonals, ⎛ ⎞ a0 a−1 a−2 · · · a−n + 1 ⎜ a1 a0 a−1 a−n + 2 ⎟ ⎜ ⎟ ⎜ a2 ⎟ a1 a0 M=⎜ (2.102) ⎟. ⎜ . ⎟ .. ⎝ .. ⎠ . a an −1

an −2

a1

−1

a0

The Toeplitz matrix is defined by 2n − 1 values. An n × n circulant matrix is a special form of a Toeplitz matrix such that each row or column is a cyclic permutation of the previous row or column: ⎛ ⎞ an −1 an −2 · · · a1 a0 ⎜ a1 a0 an −1 a2 ⎟ ⎜ ⎟ ⎜ a2 ⎟ a a 1 0 M=⎜ (2.103) ⎟. ⎜ . ⎟ . .. a ⎝ .. ⎠ an −1

an −2

a1

n −1

a0

The circulant matrix is defined by n values. An additional property of circulant matrices is that they can be inverted in the order of n log n operations.

2.4.2

Eigenvalues of low-rank matrices A low-rank matrix is a matrix for which some (and usually most) of the eigenvalues are zero. In a variety of applications, such as spatial covariance matrices, low-rank matrices are constructed with the outer product of vectors. Rank-1 matrix For example, a rank-1 square matrix M is constructed by using complex n-vectors v ∈ Cn ×1 and w ∈ Cn ×1 , M = v w† .

(2.104)

2.5 Matrix inversion

27

This matrix has an eigenvector proportional to v and eigenvalue of w† v. The eigenvalue can be determined directly by noting that the trace of the matrix is equal to the sum of the eigenvalues which for a rank-1 matrix are all zero except for one. For comparison, this matrix has a single nonzero singular value given by w v . Rank-2 matrix A Hermitian rank-2 matrix M can be constructed by using two n-vectors x ∈ Cn ×1 and y ∈ Cn ×1 , M = xx† + yy† .

(2.105)

The eigenvalues can be found by using the hypothesis that the eigenvector is proportional to x + ay where a is some undetermined constant. The nonzero eigenvalues of M are given by λ+ and λ− ,  2 x 2 + y 2 ± ( x 2 − y 2 ) + 4 x† y 2 . (2.106) λ± {M} = 2

2.5

Matrix inversion For a square nonsingular matrix, that is a matrix with all nonzero eigenvalues, so that |M| =  0, the matrix inverse of M satisfies M−1 M = M M−1 = I .

(2.107)

The inverse of the product of nonsingular square matrices is given by (A B)−1 = B−1 A−1 .

(2.108)

The inverse and the Hermitian operations as well as the transpose operations commute, (M† )−1 = (M−1 )† and (MT )−1 = (M−1 )T .

(2.109)

The SVD, discussed in Section 2.3.3, of the inverse of a matrix is given by −1  M−1 = U D V† = V D−1 U† .

(2.110)

It is often convenient to consider 2 × 2 matrices. Their inverse is given by   −1  1 d −b a b . (2.111) = −c a c d ad − bc The general inverse of a partitioned matrix is given by −1    A B −A−1 B(D − C A−1 B)−1 (A − B D−1 C)−1 . = C D −D−1 C (A − B D−1 C)−1 (D − C A−1 A)−1 (2.112)

28

Notational and mathematical preliminaries

2.5.1

Inversion of matrix sum A general form of the Woodbury’s formula is given by (M + A B)−1 = M−1 − M−1 A (I + B M−1 A)−1 B M−1 .

(2.113)

A special and useful form of Woodbury’s formula is used to find the inverse of the identity matrix plus a rank-1 matrix, (I + v w† )−1 = I −

v w† . 1 + w† v

(2.114)

The inverse of the identity matrix plus two rank-1 matrices is also useful. Here the special case of a Hermitian matrix is considered. The matrix to be inverted is given by I + a a† + b b† ,

(2.115)

where a and b are column vectors of the same size. The inverse is given by    b b† a† b 2 a a† † † −1 + 1+ (I + a a + b b ) = I − 1 + a† a 1 + b† b γ  1 † † † † + (2.116) a bab + b aba , γ where here

γ = 1 + a† a + b† b + a† a b† b − a† b 2 .

(2.117)

This result can be found by employing Woodbury’s formula with M in Equation (2.113) given by M = I + b b† M−1 = I −

b b† . 1 + b† b

(2.118)

Consequently, Woodbury’s formula provides the form (I + a a† + b b† )−1 = (I + b b† )−1 † −1

− (I + b b )

(2.119) †

† −1

a (1 + a [I + b b ]

−1

a)



a (I + b b† )−1 ,

which, after a bit of manipulation, is given by the form in Equation (2.116).

2.6

Useful matrix approximations

2.6.1

Log determinant of identity plus small-valued matrix Motivated by a multiple-input multiple-output (MIMO) capacity expression, a common form seen throughout this text is c = log2 |I + M| ,

(2.120)

2.7 Real derivatives of multivariate expressions

29

where M is a Hermitian matrix. Because the determinant of a matrix is given by the product of the eigenvalues and that λm {I + M} = 1 + λm {M} ,

(2.121)

where the mth eigenvalue of a matrix is indicated by λm {·}, the capacity expression in Equation (2.120) is equal to    c = log2 (1 + λm {M}) m

=

 m

log2 (1 + λm {M})

= log2 (e)

 m

≈ log2 (e)

 m

log (1 + λm {M}) λm {M} = log2 (e) tr{M} ,

(2.122)

if it is assumed that λm {M} ≪ 1 ∀ m. Here the approximation log(1 + x) ≈ x for small values of x is employed.

2.6.2

Hermitian matrix raised to large power Consider the Hermitian matrix M† = M ∈ Cn ×n . Under the assumption that there is a single largest eigenvalue of M, the eigenvalue and dominant subspace can be approximated by repeatedly multiplying the matrix by itself, Mk = (UΛU† )(UΛU† ) . . . (UΛU† ) = U Λk U† n  λkm um u†m = m =1

≈ λk1 u1 u†1 ,

(2.123)

where U is a unitary matrix constructed from the eigenvectors um of M and Λ = diag{λ1 , λ2 , . . . , λn } is a diagonal matrix containing the eigenvalues of M. The largest eigenvalue λ1 grows faster than the other eigenvalues as the number of multiplies grows and eventually dominates the resulting matrix. Here it is assumed that there is a strict ordering of the largest two eigenvalues λ1 > λ2 .

2.7

Real derivatives of multivariate expressions The real derivatives of multivariate expressions follow directly from standard scalar derivatives [217]. For the real variable α, the derivatives of complex

30

Notational and mathematical preliminaries

N -vector z and complex matrix M are given by ⎛ ∂ ∂ α {z}1 ∂ ⎜ ∂ ⎜ ∂ α {z}2 z=⎜ .. ∂α ⎝ . ∂ ∂ α {z}N

and



⎞ ⎟ ⎟ ⎟ ⎠

 ∂ ∂ M {M}m ,n , = ∂α ∂α m ,n

(2.124)

(2.125)

respectively. A few useful expressions follow [217]. Under the assumption that the complex vectors z and A are functions of α, the derivative of the quadratic form z† A z with respect to the real parameter α is given by     ∂ ∂ † ∂ ∂ † † z Az = z Az + z A z + z† A z. (2.126) ∂α ∂α ∂α ∂α The derivative for the complex invertible matrix M with respect to real parameter α can be found by considering the derivative of ∂/∂α(MM−1 ) = 0, and it is given by   ∂ ∂ M−1 = −M−1 M M−1 . (2.127) ∂α ∂α The derivatives of the determinant and the log determinant of a nonsingular matrix M, with respect to real parameter α are given by  ∂ −1 ∂ |M| = |M| tr M M (2.128) ∂α ∂α and  ∂ −1 ∂ log |M| = tr M M . ∂α ∂α

(2.129)

The derivative of the trace of a matrix is equal to the trace of the derivative of the matrix,  ∂ ∂ tr{M} = tr M . (2.130) ∂α ∂α

2.7.1

Derivative with respect to real vectors Calculus involving vectors and matrices is useful for problems involving filtering and multiple-antenna processing. Here vectors in real space are considered. Derivatives with respect to complex variables are considered in Section 2.8.

2.7 Real derivatives of multivariate expressions

31

For a real column vector x of size N , the derivative of a scalar function f (x) with respect to a real column vector x of length N is defined to be a row vector given by   ∂ ∂ ∂ ∂ f (x) = f (x) f (x) ··· f (x) . (2.131) ∂x ∂{x}1 ∂{x}2 ∂{x}N This is the typical, but not the only, convention possible. Under certain circumstances, it is convenient to use the gradient operator that produces a vector or matrix of the same dimension as the object with which the derivative is taken, ⎛ ⎞ ∂ ∂ {x}1 f (x) ⎜ ⎟ ∂ ⎜ ∂ {x}2 f (x) ⎟ ⎜ ⎟, ∇x f (x) = ⎜ (2.132) .. ⎟ ⎝ ⎠ . ∂ ∂ {x}N f (x) where the scalar function is indicated by f (·), to matrix x ∈ RN ×1 , and ⎛ ∂ ∂ ∂ {A}1 , 1 f (A) ∂ {A}1 , 2 f (A) ⎜ ∂ ∂ ⎜ ∂ {A}2 , 1 f (A) ∂ {A}2 , 2 f (A) ⎜ ∇A f (A) = ⎜ .. ⎝ . ∂ ∂ ∂ {A}M , 1 f (A) ∂ {A}M , 2 f (A)

and the gradient is with respect ··· ··· ···

⎞ f (A) f (A) ⎟ ⎟ ⎟ , (2.133) ⎟ ⎠ ∂ ∂ {A}M , N f (A) ∂ ∂ {A}1 , N ∂ ∂ {A}2 , N

where the scalar function is indicated by f (·), and the gradient is with respect to matrix A ∈ RM ×N . The Laplacian operator [11] is given by ∇x2 f (x) = ∇x · ∇x f (x) .

(2.134)

Note that the term “Laplacian” can be used to describe several different quantities or operators. In the context of this book, in particular in Chapters 13 and 14, we also make reference to the Laplacian of a random variable, which is the Laplace transform of the probability density function of the random variable. In a Euclidean coordinate system, the Laplacian operator is given by ∇x2 f (x) =

N  ∂ 2 f (x) . ∂{x}2m m =1

(2.135)

In a three-dimensional space that is defined in polar coordinates with cylindrical radius ρ, azimuthal angle φ in radians, and height z, the Laplacian operator is given by   ∂f (ρ, φ, z) 1 ∂ ρ (2.136) ∇2 f (ρ, φ, z) = ρ ∂ρ ∂ρ 1 ∂ 2 f (ρ, φ, z) ∂ 2 f (ρ, φ, z) + 2 + . ρ ∂φ2 ∂z 2

32

Notational and mathematical preliminaries

In a three-dimensional space that is defined in spherical coordinates with radius r, azimuthal angle φ in radians, and angle from zenith (or equivalently from the north pole) θ, the Laplacian operator is given by     1 ∂ f (r, φ, θ) 1 ∂ ∂ 2 ∂f (r, φ, θ) 2 r + 2 sin(θ) ∇ f (r, φ, θ) = 2 r ∂r ∂r r sin(θ) ∂θ ∂θ 2 1 ∂ f (r, φ, θ) + 2 , (2.137) 2 ∂φ2 r sin (θ) where · here indicates the inner product of the gradient operators. Some useful evaluations of derivatives are presented in the following. For an arbitrary vector a that is not a function of the real column vector x, the derivative of the product of the column vectors is given by   ∂ T a x = aT e1 aT e2 aT e3 · · · ∂x (2.138) = aT , where em is the column vector of all zeros with the exception of the mth element, which has a value of 1, ⎛ ⎞ 0 ⎜ .. ⎟ ⎜ . ⎟ ⎜ ⎟ ⎟ (2.139) em = ⎜ ⎜ 1 ⎟. ⎜ . ⎟ ⎝ .. ⎠ 0

Similarly, the derivative of the transpose of the product of the vectors with respect to the transpose of x is given by ⎛ T ⎞ e1 a ∂ ⎜ eT a ⎟ T x a=⎝ 2 ⎠ ∂xT .. . = a.

(2.140)

Finally, taking the derivatives of the inner product with respect to the opposite transpositions of x gives the forms   ∂ T x a = eT1 a eT2 a eT3 a · · · ∂x (2.141) = aT and ⎞ aT e1 ∂ ⎟ ⎜ T aT x = ⎝ a e2 ⎠ ∂xT .. . ⎛

= a.

(2.142)

2.8 Complex derivatives

33

The matrix A can be decomposed into a set of column vectors a1 , a2 , a3 , . . . , A = (a1

a2

or into a set of row vectors b1 , b2 , b3 , . . . ⎛ b1 ⎜ b2 ⎜ A=⎜ b ⎝ 3 .. .

···)

a3

(2.143)



⎟ ⎟ ⎟. ⎠

(2.144)

The derivative of the matrix vector product with respect to x is given by ∂ A x = (Ae1 ∂x =A

Ae2

Ae3

···) (2.145)

and ⎞ ⎛ eT1 A ∂ ⎟ ⎜ ⎜ T xT A = ⎝ e2 A ⎠ = ⎝ T ∂x .. . ⎛

= A.

⎞ b1 b2 ⎟ ⎠ .. .

(2.146)

Another common expression is the quadratic form xT Ax. The derivative of the quadratic form with respect to x is given by   ∂ T x A x = eT1 A x eT2 A x eT3 A x · · · ∂x   + xT A e1 xT A e2 xT A e3 · · · = xT AT + xT A = xT (A + AT ) .

Similarly, the derivative of the quadratic form with respect ⎞ ⎛ T ⎛ T e1 A x x A e1 ∂ ⎜ eT A x ⎟ ⎜ xT A e2 T x Ax = ⎝ 2 ⎠+⎝ ∂xT .. .. . .

to xT is given by ⎞ ⎟ ⎠

= A x + AT x = (A + AT ) x .

2.8

(2.147)

(2.148)

Complex derivatives Because it is useful to represent many signals in communications with complex variables, many problems involve functions of complex variables. In evaluating the derivative of the function of complex variables, it is observed that the derivative can be dependent upon the direction in which the derivative is taken, for example, along the real axis versus along the imaginary axis. Functions whose

34

Notational and mathematical preliminaries

derivatives are independent of direction are said to be holomorphic. Many discussions of complex analysis focus upon holomorphic (or analytic) functions. Holomorphic functions, which have unique derivatives with respect to the complex variable, satisfy the Cauchy–Riemann equations [53]. However, holomorphic functions occupy a very special and small subset of all possible complex functions. The focus on holomorphic functions is problematic because many of the functions that are important to signal analysis are not holomorphic. Here, derivatives of holomorphic functions are considered in Section 2.8.1, and nonholomorphic functions are considered by employing the Wirtinger calculus in Section 2.8.2.

2.8.1

Cauchy–Riemann equations In this section, holomorphic functions are discussed followed by an overview of the calculus for a more general class of functions. If a function of complex variable z is composed of real functions u and v, f (z) = f (x, y) = u(x, y) + iv(x, y) z = x + iy z ∗ = x − iy ,

(2.149)

where x and y are real variables, the derivative of f with respect to z is given by f (z) − f (z0 ) df = lim z →z dz z − z0 0 [u(x, y) − u(x0 , y0 )] + i[u(x, y) − u(x0 , y0 )] . = lim x→x 0 ,y →y 0 [x − x0 ] + i[y − y0 ]

(2.150)

Because the path to z0 cannot matter for holomorphic functions, there is freedom to approach the point at which the derivative is evaluated by moving along x or along y. Consequently, the derivative can be expressed by [u(x, y0 ) − u(x0 , y0 )] + i[u(x, y0 ) − u(x0 , y0 )] df = lim dz x→x 0 [x − x0 ] z = z  ∂v  0 ∂u +i . = ∂x ∂x 

(2.151)

[u(x0 , y) − u(x0 , y0 )] + i[u(x0 , y) − u(x0 , y0 )] df = lim dz y →y 0 i[y − y0 ]  z = z 0  ∂v  1 ∂u +i . = i ∂y ∂y 

(2.152)

With equal validity, the derivative can be taken along y, so that

In order for the derivative to be independent of direction, the real and imaginary components of the derivative must be consistent, so the holomorphic function

2.8 Complex derivatives

35

must satisfy ∂v ∂u = ∂x ∂y ∂v ∂u =− . ∂y ∂x

(2.153)

These relationships are referred to as the Cauchy–Riemann equations.

2.8.2

Wirtinger calculus for complex variables Unfortunately, many useful functions do not satisfy the Cauchy–Riemann equations and alternative formations are useful [352, 145]; for example, the real function g(z) = z 2 = z z ∗

= (x + iy) (x − iy)

= x2 + y 2

u(x, y) = x2 + y 2 ,

v(x, y) = 0 .

(2.154)

The Cauchy–Riemann equations are not satisfied in general, ∂ u(x, y) = 2x ∂x ∂ u(x, y) = 2y ∂y

= =

∂ v(x, y) = 0 ∂y ∂ − v(x, y) = 0 . ∂x

(2.155)

However, all is not lost. Real derivatives can be employed that mimic the form of the complex variables. These derivatives can be used to find stationary points for optimizations of real functions of complex variables and other calculations such as Cramer–Rao estimation bounds under special conditions. For the complex scalar z, where the real and imaginary components are denoted x and y, a vector form of the complex scalar is given by     x ℜ{z} = . (2.156) y ℑ{z} With this notation, a new set of real variables ζ and ζ can be constructed with a transformation that is proportional to a rotation in the complex plane,      x ζ 1 i = y 1 −i ζ   x + iy . (2.157) = x − iy Consequently, the real variables {ζ, ζ} can be directly related to the complex variable z and its complex conjugate z ∗ . The real components of z can be found

36

Notational and mathematical preliminaries

in terms of the complex variable “doppelgangers”3 {ζ, ζ} by using the inverse of the transformation matrix, 

x y



= =



1

i

−1 

ζ



1 −i ζ    ζ 1 1 1 2

i

−i

ζ

.

(2.158)

By using the above transformation, complex doppelganger derivatives can be defined by 1 df = dζ 2



∂f ∂f −i ∂x ∂y



(2.159)

and 1 df = 2 dζ



∂f ∂f +i ∂x ∂y



,

(2.160)

where the terms z and z ∗ in the expression f are replaced with the complex doppelgangers ζ and ζ, respectively. It is worth stressing that the complex doppelgangers are not complex variables. If great care is taken, one can use the notation in which z and z ∗ are used as the complex doppelgangers directly. It is probably clear that this approach is ripe for potential confusion because ·∗ is both an operator and an indicator of an alternate variable. Furthermore, in using the Wirtinger calculus, we take advantage of underlying symmetries. While taking a derivative with respect to a single doppelganger variable may be useful for finding a stationary point (as evaluated in Equation (2.170)), it is not the complete derivative. As an example, when the value of the gradient is of interest, typically the full gradient with both derivatives is necessary. This derivative form [5, 262, 172] is sometimes referred to as Wirtinger calculus, or complex-real (CR) calculus. Given that this is just a derivative under a change of variables, it is probably unnecessary to give the approach a name. However, for notational convenience within this text, this approach is referenced as the Wirtinger calculus. It is worth noting that this definition is not unique. As an aside, the Cauchy–Riemann equations can be expressed by taking the derivative with respect to the “conjugate” doppelganger variable, ∂f = 0. ∂ζ 3

(2.161)

The term “doppelgangers” has not been in common use previously. It is used here to stress the difference between the complex variable and its conjugate, and two real variables used in their place.

2.8 Complex derivatives

37

Evaluation of stationary point by using Wirtinger calculus The standard approach to evaluating the extrema (maximum, minimum, or inflection) of a function is to find a stationary point. The stationary point of a real function g(x, y) satisfies both ∂ g(x, y) = 0 ∂x

and

∂ g(x, y) = 0 . ∂y

(2.162)

When one is searching for a stationary point of a real function with complex parameter z, it is useful to “rotate” the independent real variables {x, y} into the space of the doppelganger complex variables {ζ, ζ}. The function of the complex variable z is given by g(z) = g(x, y) = g˜(ζ, ζ) .

(2.163)

The Wirtinger calculus is clearer when conjugation of compound expressions is evaluated and expressed in terms of the variables {ζ, ζ}. However, here, at the risk of some confusion, the doppelganger variables will be expressed by {z, z ∗ }. By using the Wirtinger calculus, the following differentiation rules are found: ∂z ∗ ∂z = ∗ =1 ∂z ∂z ∂z ∗ ∂z = 0. = ∂z ∗ ∂z

(2.164)

For example, under Wirtinger calculus, the derivatives with respect to the doppelganger variables of the expression z 3 z ∗ 2 are given by ∂  3 ∗2  z z = 3z 2 z ∗ 2 ∂z

(2.165)

∂  3 ∗2  z z = 2z 3 z ∗ . ∂z ∗

(2.166)

∂ g˜(z, z ∗ ) = 0 ∂z

(2.167)

∂ g˜(z, z ∗ ) = 0 . ∂z ∗

(2.168)

and

This result is somewhat nonintuitive if you consider the meaning of z and z ∗ . However, by remembering that here z and z ∗ represent real doppelganger variables, it is slightly less disconcerting. In particular, the stationary point of a real function of complex variables expressed in terms of the conjugate variables g˜(z, z ∗ ) satisfies

and

38

Notational and mathematical preliminaries

A case of particular interest is if f (z, z ∗ ) is real valued. In this case, the derivatives with respect to z and z ∗ will produce the same stationary point,   ∂f 1 ∂f ∂f + i = ∂z ∗ 2 ∂x ∂y  ∗ 1 ∂f ∂f = −i 2 ∂x ∂y  ∗ ∂f . (2.169) = ∂z In other words, the relationships ∂f ∂f =0 = 0 and ∂z ∗ ∂z produce the same solution for z.

2.8.3

(2.170)

Multivariate Wirtinger calculus Given the complex column N -vector z, the Wirtinger calculus discussed in Section 2.8.2 considers {z}m and {z}∗m independent variables. For a vector function f (z, z∗ ), where {z}m and {z}∗m indicate the doppelganger variables, the derivative with respect to z is given by ⎛ ⎞ ∂ ∂ ∂ ∗ ∗ · · · ∂ {z} {f (z, z∗ )}1 ∂ {z}1 {f (z, z )}1 ∂ {z}2 {f (z, z )}1 M ⎜ ∂ ∂ ∂ ∗ ∗ · · · ∂ {z} {f (z, z∗ )}2 ⎟ ⎜ ∂ {z}1 {f (z, z )}2 ⎟ ∂ ∂ {z}2 {f (z, z )}2 M ⎟. f (z, z∗ ) = ⎜ . ⎜ ⎟ .. ∂z ⎝ ⎠ ∂ ∂ ∂ ∗ ∗ ∗ {f (z, z )} {f (z, z )} · · · {f (z, z )} N N N ∂ {z}1 ∂ {z}2 ∂ {z}M (2.171)

Similarly, the derivative with respect to z∗ is given by ⎛ ∂ ∂ ∗ ∗ ∂ {z ∗ }1 {f (z, z )}1 ∂ {z ∗ }2 {f (z, z )}1 ⎜ ∂ ∂ ∗ ∗ ⎜ ∂ {z ∗ }1 {f (z, z )}2 ∂ ∂ {z ∗ }2 {f (z, z )}2 ∗ ⎜ f (z, z ) = .. ⎜ ∂z∗ ⎝ . ∂ ∂ ∗ {f (z, z )} {f (z, z∗ )}N ∗ ∗ N ∂ {z }1 ∂ {z }2

··· ···

∂ ∂ {z ∗ }M ∂ ∂ {z ∗ }M

···

∂ ∂ {z ∗ }M

⎞ {f (z, z∗ )}1 {f (z, z∗ )}2 ⎟ ⎟ ⎟. ⎟ ⎠ {f (z, z∗ )}N (2.172)

By using Wirtinger calculus, the differential of f is given by df (z, z∗ ) =

2.8.4

∂ ∂ f (z, z∗ ) dz + ∗ f (z, z∗ ) dz∗ . ∂z ∂z

(2.173)

Complex gradient In many gradient optimization operations, gradients are employed to find the direction in the tangent space of a function at some point that has the greatest change. The gradient of some function of the complex vector z = x + iy

2.9 Integration over complex variables

39

constructed from real vectors x and y is most clearly defined by its derivation in the real space, where the real gradient was discussed in Section 2.7.1. For some real function f (z), the gradient of f (z) = f (x, y) is probably clearest when expressed by building a vector from stacking x and y,   x , (2.174) v= y so that the gradient is given by ∇v f (x, y) .

(2.175)

This gradient can be remapped into a complex gradient by expressing the components associated with y as being imaginary, ∇x f (x, y) + i ∇y f (x, y) .

(2.176)

Some care needs to be taken in using this form because it can be misleading. To evaluate a complex gradient, it is sometime useful to evaluate it by using Wirtinger calculus. The problem with using the Wirtinger calculus to describe the gradient is that it is not a complete description of the direction of maximum change. There is some confusion in the literature in how to deal with this issue [47]. Here we will first employ an explicit real gradient as the reference. Second, we will abuse the notation of gradient slightly by defining a complete gradient of a real function as being different from the Wirtinger gradient. As an example, consider the function f (z) = z† z = xT x + y T y .

(2.177)

The gradient is given by ∇x (xT x + yT y) + i ∇y (xT x + yT y) = 2x + i 2y .

(2.178)

By interpreting Equations (2.159) and (2.160) as gradients, the complete gradient of a real function in terms of the Wirtinger calculus is given by ∇x f (x, y) + i ∇y f (x, y) = 2 ∇z ∗ f (z) .

(2.179)

For the above example, the complete gradient is then given by ∇x (z† z) + i ∇y (z† z) = 2∇z ∗ (z† z) = 2z.

2.9

(2.180)

Integration over complex variables For signal processing applications, two types of integral are used commonly: contour integrals and volume integrals. A volume may be a simple area as in the case of a single complex variable, or a hypervolume in the case of a vector space of complex variables.

40

Notational and mathematical preliminaries

2.9.1

Path and contour integrals A path integral, or line integral, or contour integral, over the complex variable z follows a path in z defined by S. The integral over some function f (z) is represented by  dz f (z) . (2.181) S

If the path is closed (forming a loop), then the term contour integral is often used [54, 180, 40]. In contour integration, a particularly useful set of tools is available if the function is differentiable, which is identified as a holomorphic or analytic function with some countable number of poles. A pole is a point in the space of z at which the function’s value goes to ± infinity. The integrals are often the result of evaluating transformations of functions. If the path S forms a loop, then the path is said to be closed and the notation is given by  dz f (z) . (2.182) S

A common approach is to convert a line integral to a closed contour integral by adding a path at a radius of infinity. This technique can be done without modifying the integral if f (z) goes to zero sufficiently quickly when approaching infinity so that  ∞  dz f (z) = dz f (z) , (2.183) −∞

S

where the original integral is along the real axis. Contour integrals over holomorphic functions are particularly interesting because the evaluation of the integral is the same under deformations of the path so long as no poles are crossed by the deformation. A pole is a point in the space of z where the value of f (z) becomes unbounded. For example, the function 1 z−a

f (z) =

(2.184)

has a simple pole at z = a. Note that this function is holomorphic (∂f (z)/∂z ∗ = 0 in a Wirtinger calculus sense). An integral over a closed path that encloses no poles can be deformed to a path of zero length with a finite integrand and therefore evaluates to zero,  dz f (z) = 0 . (2.185) Sno p oles

In order to deform an integral past a pole, a residue is left. The integral is then given by the sum of these residues created by deforming past the poles located at am enclosed within the original path,   (2.186) dz f (z) = 2πi Resa m {f (z)} , S

m

2.9 Integration over complex variables

41

where Resa m {f (z)} indicates the residue located at the mth enclosed pole located at am , of the function f (z). In general, f (z) can be expressed in terms of a Laurent series [53] about the mth pole am f (z) =

∞ 

n = −∞

bn (z − am )n ,

(2.187)

where bn is the coefficient of the expansion. The residue is given by b−1 , Resa m {f (z)} = b−1 .

(2.188)

A function is said to have a pole of order N at am if it can be expressed as the ratio of a holomorphic function h(z), such that h(am ) = 0, and term (z − am )N , f (z) =

h(z) . (z − am )N

(2.189)

Under the assumption that f (z) has a pole of order N at am , the residue is given by Resa m {f (z)} = lim

z →a m

 ∂ N −1  (z − am )N f (z) . N −1 ∂z

(2.190)

For the special form f (z) = ez t g(z), which is commonly generated in transform problems, the residue for the kth order pole at am of g(z) is given by Resa(km) {f (z)} =

1 ∂ (k −1) lim [(z − am )k ez t g(z)] . (k − 1)! z →a m ∂z (k −1)

(2.191)

The line integral along the real axis of the function f (z) = eiω z /(z 2 + 1) is an example,  ∞ eiω z dz 2 φ= z +1 −∞  iω z e , (2.192) = dz 2 z +1 where it is assumed that if ω > 0, then the upper path (via i∞) is taken as shown in Figure 2.2. Otherwise, the lower path is taken. The poles are at z = ±i. Only z = i is enclosed by the contour when following the upper path. Thus, because this is a simple pole, the residue is given by   eiω z Resa= i {f (z)} = lim (z − i) 2 z →i z +1 e−ω = . (2.193) 2i The integral is given by φ = 2πi Resa= i {f (z)} = π e−ω ,

(2.194)

42

Notational and mathematical preliminaries

z

+i

−i Figure 2.2 Contour of integration using the upper half plane with poles at ±i.

when ω > 0. Similarly, if ω < 0, the lower path encloses the pole at z = −i, so φ = π e−ω  .

(2.195)

We have addressed the cases of poles enclosed by a path and poles outside a path. In the case in which a path is constrained such that a pole is on the path, the residue evaluates to 1/2 of that if the pole were enclosed. Because of the potential subtleties involved in evaluating contour integrals, consulting a complex analysis reference is recommended [53].

2.9.2

Volume integrals The integral over a volume (in this case area) in the complex plane can be formally denoted  (2.196) dz dz ∗ f (z, z ∗ ) .

Here the notation is borrowed from the Wirtinger calculus for dz and dz ∗ formally acting as independent variables. These integrals are often the result of evaluating probabilities. Often the slightly lazy notation of f (z) is used rather than f (z, z ∗ ). Notationally, this form can be extended to complex n-vector space z ∈ Cn ×1 using the notation  (2.197) dn z dn z ∗ f (z) , where dn z and dn z ∗ are shorthand notation for d{z}1 , d{z}2 . . . d{z}n and d{z}∗1 , d{z}∗2 . . . d{z}∗n . In general, the integrals need to be converted to the real space of x and y for evaluation. When convenient, the notation d2 z = dx dy

(2.198)

2.9 Integration over complex variables

43

is employed. Also used is the notation dΩz to indicate the differential hypervolume over the real and imaginary components of z, given by dΩz = dx1 dy1 dx2 dy2 . . . dxn dyn ,

(2.199)

where xm = ℜ{z}m and ym = ℑ{z}m . In the case of a matrix Z, the differential volume dΩZ includes differential contributions associated with all elements of the matrix. For the real vector x, the differential hypervolume is given by dΩx = d{x}1 d{x}2 . . . d{x}1 .

(2.200)

If z is a complex scalar, then the differential hypervolume is indicated by dΩz . Because the two definitions of the differential volumes differ by the value of the Jacobian (determinant of the Jacobian matrix), it is important to stress that dΩz indicates the differential volume in terms of the real and imaginary components of the complex variable and not in terms of the complex variable and its conjugate. As an example of using differential volume, the integration over the complex circular Gaussian distribution is considered. The complex circular Gaussian distribution with probability density p˜(z, z ∗ ) for Wirtinger parameters and p(x, y) for real parameters is given by p˜(z, z ∗ ) dz dz ∗ =

1 −z 2 /σ 2 e dz dz ∗ . 2πσ 2

(2.201)

By evaluating the Jacobian, the probability can be mapped to the space of x and y    ∂z ∂z   ∂x ∂y  J =  ∂z∗ ∂z∗   ∂x  ∂y    1 i   = −2i . =  (2.202) 1 −i 

Consequently, the probability density functions are related by

p˜(z, z ∗ ) dz dz ∗ = p(x, y) J dx dy = p(x, y) J dΩz .

(2.203)

By using the substitutions x = (z +z ∗ )/2 and y = (z −z ∗ )/(2i), the probability density function in terms of x and y is given by 1 −(x 2 + y 2 )/σ 2 e dx dy πσ 2 1 −(x 2 + y 2 )/σ 2 e dΩz . = πσ 2

p(x, y) dx dy =

(2.204)

This equation can be rewritten as the product of two real Gaussian distributions by noting that the variance of the magnitude of the complex distribution σ 2 = σR2 e + σI2m (where σR2 e is variance of the real part of z and σI2m is variance of the imaginary part of z) is twice the variance of either of the two real distributions

44

Notational and mathematical preliminaries

σR e , so that σ 2 = 2σR e . Consequently, the probability density is given by p(x, y) dx dy = 

1 πσR2 e

e−x

2

2 /(2σ R e)

2 2 1  e−y /(2σ R e ) dx dy. 2 πσR e

(2.205)

As a result, the integral over x and y is given by   2 2 2 2 1 1 e−x /(2σ R e )  e−y /(2σ R e ) dx dy p(x, y) = dx dy  2 2 2πσR e 2πσR e = 1 · 1 = 1.

(2.206)

Similarly, the second moment of the zero-mean complex circular Gaussian distribution, which is formally given by φ= is given by 



z 2

e− σ 2 dz dz z , 2πσ 2 ∗

2

(2.207)

x2+y2

e− σ 2 φ = dx dy (x + y ) πσ 2 2 2 x2+y2   − σ2 − x σ+2y 2 e 2 e = dx dy x + dx dy y πσ 2 πσ 2 y2 x2   e− σ 2 e− σ 2 + dy y 2 √ = dx x2 √ πσ 2 πσ 2 √ 3/2 √ 3/2 πσ πσ = √ + √ = σ2 . (2.208) 2 2 πσ 2 πσ 2 Because x and y are often used to indicate variables other than the real and imaginary parts of z, the notations zr and zi may sometimes be invoked, respectively. Consequently, the differential real variable area dx dy would be indicated by dΩz = dzr dzi .

2.10

2

2

Fourier transform A particularly useful transformation for engineering and the sciences is the Fourier transform. The typical interpretation is that the Fourier transform relates a time domain function to a frequency domain function. The transform of a complex function g with a real parameter t (time) to the complex function G with a real parameter f (frequency) is given by  ∞ dt e−i 2π t f g(t) . (2.209) G(f ) = −∞

The inverse transform is given by  g(t) =



−∞

df ei 2π t f G(f ) .

(2.210)

2.10 Fourier transform

45

In terms of angular frequency ω = 2π f , these transforms4 are indicated by  ∞ G(ω) = dt e−i t ω g(t) . (2.213) −∞

The inverse transform is given by 1 g(t) = 2π

2.10.1





dω ei t ω G(ω) .

(2.214)

−∞

Useful Fourier relationships If G(f ) is the Fourier transform of g(t), then ∞ −∞ ∞ −∞

dt e−i 2π f t

∞ −∞

−i 2π f t

g(t)

= G(f )

dt e−i 2π f t g(t − a) = e−i a 2π f G(f )   ∞ f 1 −i 2π f t dt e g(a t) = G a a −∞ ∞ −∞

dt e

dt e−i 2π f t

∂m ∂ tm

(2.215)

m

g(t) = (i2πf ) G(f )   1 θ(a t) = a sinc fa ,

where θ(x) is a function equal to 1 if x ≤ 1/2 and is 0 otherwise. Note that θ(x) is sometimes denoted in the literature by rect(x). Also, sinc(x) is given by sin(πx)/(πx). If G(f ) and H(f ) are the Fourier transforms of g(t) and h(t), the convolution of g(t) and h(t) is indicated by  (f ∗ h)(t) = dx f (x) h(t − x) , (2.216) where x is a variable of integration to perform the convolution. The Fourier transform of the convolution is given by  ∞ dt e−iω t (g ∗ h)(t) = G(f ) H(f ) . (2.217) −∞

Parseval’s theorem is given by   dt g(t) h∗(t) = df G(f ) H ∗(f ) , 4

(2.218)

In some of the literature, the normalization is defined for symmetry for the angular frequency variable  ∞ 1 ˜ dt e−i ω t g(t) . (2.211) G(ω) = √ 2π −∞ The inverse transform is given by 1 g(t) = √ 2π



∞ −∞

˜ dω ei ω t G(ω) .

(2.212)

46

Notational and mathematical preliminaries

which also implies that the integral over the magnitude squared in either domain is the same,   dt g(t) 2 = df G(f ) 2 . (2.219)

2.10.2

Discrete Fourier transform The discrete form of the Fourier transform is often useful for digital systems that employ sampled data. It is useful for considering the spectral composition of a finite temporal extent of data that has bounded spectral content satisfying the Nyquist criteria. The standard description of the Nyquist criteria states that the spectral content of a signal can be represented exactly for a bandlimited signal of single-sided bandwidth Bs.s. , if the regular real samples are spaced by Ts such that Ts ≤ 1/(2Bs.s. ). While there are both positive and negative frequencies produced by the Fourier transform, they provide redundant information for a real signal; thus, the single-side bandwidth is often considered. In our discussion, because we typically assume that we are working with samples of a complex signal (for example, we might have an analog-to-digital converter5 at the receiver for both the real and imaginary components), so we have two real samples for every sample point in time, the full bandwidth (both positive and negative frequencies) contains useful information. We denote this full bandwidth B = 2Bs.s. . Consequently, we require that the sample period must satisfy Ts ≤

1 . B

(2.220)

It is worth noting that if the spectrum is known to be sparse, then compressive sampling techniques [88] can be used that reduce the total number of samples, but that discussion is beyond the scope of this text. If the spectral content extends just a little beyond that supported by the Nyquist criteria, then the spectral estimate at the spectral edges will be contaminated by aliasing in which spectral components at one edge extend beyond the estimated spectral limits and contaminate the estimates at the other end. A set of regularly spaced samples is assumed here. This set may be of finite length. The samples in the time domain (organized as a vector) are represented here by {x}m = g([m − 1]Ts ) . The samples in the frequency domain are represented here by   1 . {y}m = G [m − 1] Ts 5

Here we are ignoring quantization effects.

(2.221)

(2.222)

2.10 Fourier transform

47

Under the assumption of unitary normalization, the discrete Fourier transform (DFT) for equally spaced samples is given by n −1 1  −i 2 π m k n √ e {x}m {y}k = n m =0

; k = 0, . . . , n − 1 .

(2.223)

; m = 0, . . . , n − 1 .

(2.224)

The inverse DFT is given by n −1 1  i 2π mk {x}m = √ e n {y}k n k=0

Analogous to aliasing in the frequency domain, when we sample signals in the frequency domain, we may introduce aliasing in the time domain if the signals of interest have temporal extent beyond the Nyquist criteria. Here a symmetric (or √ unitary) normalization is used in which the summation is multiplied by 1/ n for both transformations. It is also common to use the normalization in which the transformation is missing the normalization term and the inverse transformation has the term 1/n. Depending upon the situation, each normalization has advantages. In this text, the symmetric version is preferred. It is sometimes useful to think of the DFT in terms of a unitary matrix operator denoted here by F, ⎞ ⎛ 1 1 1 1 1 2π 2π 2π 2π ⎜ 1 e−i n 2 e−i n 3 · · · e−i n (n −1) ⎟ e−i n 1 ⎟ ⎜ 2 π 2 π 2 π 2 ⎜ 1 −i n 2 −i n 4 −i n 6 −i nπ 2(n −1) ⎟ e e e e ⎟ ⎜ 1 F= √ ⎜ −i 2nπ 6 −i 2nπ 9 −i 2nπ 3(n −1) ⎟ −i 2nπ 3 ⎟. ⎜ e e e 1 e n⎜ ⎟ .. ⎟ ⎜ .. . ⎠ ⎝ . 2 2π 2π 2π 2π 1 e−i n (n −1) e−i n 2(n −1) e−i n 3(n −1) e−i n (n −1) (2.225) This DFT matrix satisfies the unitary characteristics, F−1 = F† F† F = FF† = I .

(2.226)

By using this definition, the vectors x and y are given by y = Fx x = F† y .

(2.227)

A computationally efficient implementation of the DFT is the fast Fourier transform (FFT). Often the terms DFT and FFT are used interchangeably. The FFT enables the evaluation of the DFT with order O(n log(n)) operations rather than the O(n2 ) of a matrix-vector multiply observed in Equation (2.227). This potential computational savings motivates a number of signal processing and communications techniques. As an example, orthogonal-frequency-division multiplexing (OFDM), discussed in Section 10.5.3, typically exploits this computational savings.

48

Notational and mathematical preliminaries

2.11

Laplace transform Another transform that is often encountered is the Laplace transform, which can be viewed as a generalization of the Fourier transform. The Laplace transform of a function f (·) is defined as  (2.228) L {f (·)} (s) = dxf (x) e−sx ,

where s is a complex number, s ∈ C. If the integral above does not converge for all s, the region of convergence of the Laplace transform needs to also be specified. The region of convergence refers to the range of values of s for which the Laplace transform converges. The Laplace transform has properties that are similar to the Fourier transform. Suppose that the Laplace transform of the functions f (t) and g(t) are F (s) and G(s). Then the following properties hold: L {a f (t) + b g(t)} = a F (s) + b G(s)  d f (t) = s F (s) . L dt

2.12

Constrained optimization

2.12.1

Equality constraints

(2.229) (2.230)

In many situations, we may wish to optimize a function subject to an equality constraint. This type of problem can be described by min f (x) = f (x1 , x2 , . . . , xn ) such that

(2.231)

g(x) = g(x1 , x2 , . . . , xn ) = 0 ,

(2.232)

where x ∈ Rn ×1 . The function to be optimized, f (x) ∈ R, is known as the objective function and g(x) is the constraint function. In other words, the optimization problem described above aims to find the smallest value of f (x) over all points where g(x) = 0. The method of Lagrange multipliers establishes necessary conditions for a point x = (x1 , x2 , . . . , xn ) to be a stationary point (a local minima, maxima or inflection point). The principle behind the Lagrange multiplier has a satisfying geometric interpretation. The main property used in the method of Lagrange multipliers is that the gradient of the constraint and the objective functions must be parallel at stationary points. To see why the gradients must be parallel at a given ¯ , consider a two-dimensional example where n = 2. For illusstationary point x trative purposes, consider Figures 2.3 and 2.4, where an example of an objective function f (x1 , x2 ) and a constraint function g(x) = x1 − x2 = 0

2.12 Constrained optimization

49

0

f(x 1, x2)

−0.2 −0.4 −0.6 constraint −0.8 −1 −2 −1 0 1 2 x1

−2

0

−1

1

2

x

2

Figure 2.3 Constrained optimization with equality constraints.

are shown in surface and contour plots, respectively. The minimum occurs at the point marked by the dot in Figure 2.3. As we trace out the path of the constraint function g(x) = 0 on the surface of the objective function f (x), observe that the constraint function intersects the contours of the objective function, except at stationary points (the point from which the dashed arrow originates in Figure 2.4), where the constraint function just touches the contour of the objective function, but never crosses it. In other words, the constraint function is tangent to the surface of the objective function at all stationary points. Since the gradient is always perpendicular to tangents, the gradient vector must be parallel to the objective function at station¯ is a stationary point that satisfies the constraint ary points. Hence, if a point x equation g(x) = 0,

(2.233)

the gradient vectors of the objective function f (x) and the constraint function ¯ . In other words, the gradient vectors must be linearly g(x) are parallel at x

Notational and mathematical preliminaries

0 −0.2

f (x1, x2)

50

−0.4 constraint

−0.6 −0.8 −1 −2

tangent contour −1 0 1 2 x1

−2

0

−1

1

2

x2

Figure 2.4 Constrained optimization with equality constraints, contour plot.

related. Thus, there must exist a λ such that ∇x f (¯ x) = −λ ∇x g(¯ x), and g(¯ x) = 0.

(2.234) (2.235)

The term λ is known as the Lagrange multiplier. We can combine the two equations above by defining a function Λ(¯ x, λ) as follows: Λ(¯ x, λ) = f (¯ x) + λ g(¯ x), and then writing ∇x,λ Λ(¯ x, λ) = 0.

(2.236)

The gradient operator in the equation above is with respect to the elements of x and λ. Taking the gradient with respect to the elements of x ensures that the gradient vectors of the objective function f (x) and the constraint function g(x) are parallel, that is to say, Equation (2.234) is satisfied. Taking the gradient with respect to the Lagrange multiplier λ ensures that the constraint equation is satisfied since taking the derivative of Equation (2.236) with respect to λ results

2.12 Constrained optimization

51

precisely in Equation (2.235). Hence, by solving for λ and x in Equation (2.236), one can find all the stationary points. With multiple constraint functions, the method of Lagrange multipliers can be generalized by using essentially the same arguments presented before. Suppose that the constraint equations are g1 (x) = 0

(2.237)

g2 (x) = 0 .. . gK (x) = 0. ¯ , there must exist λ1 , λ2 , . . . λK such that At any stationary point x   K  ∇x,λ 1 ,...,λ k f (¯ x) + λk gk (¯ x) = 0.

(2.238)

k=1

2.12.2

Inequality constraints We may also wish to optimize an objective function with both equality and inequality constraints. This may be stated mathematically as follows: min f (x) = f (x1 , x2 , . . . , xn ) such that

(2.239)

h1 (x) = 0 h1 (x) = 0 .. . hK (x) = 0,

(2.240)

with the following inequality constraints: g1 (x) ≤ 0 g2 (x) ≤ 0 .. .

gM (x) ≤ 0 .

(2.241)

¯ is a local minima that satisfies the constraints, x ¯ must satisfy the If a point x Karush–Kuhn–Tucker conditions, which are as follows. ¯ = (λ ¯1 , λ ¯2 , . . . λ ¯ K ) such that There exist μ ¯ = (¯ μ1 , μ ¯2 , . . . , μ ¯M ) and λ ∇f (¯ x) +

M 

m =1

μ ¯m ∇ gm (¯ x) +

M 

m =1

¯ k ∇ hk (¯ x) = 0 , λ μ ¯m gm (¯ x) = 0 for m = 1, 2, . . . M , μ ¯m ≥ 0 for m = 1, 2, . . . M .

(2.242)

52

Notational and mathematical preliminaries

¯ must satisfy the constraints of the problem, which are known as the Note that x ¯ cannot be a solution to the primal feasibility constraints, since without them x optimization problem: gm (¯ x) ≤ 0 for m = 1, 2, . . . M

x) = 0 for k = 1, 2, . . . K . hk (¯

(2.243)

The Karush–Kuhn–Tucker conditions above are sufficient for optimality if f (x) and gm (x) are convex for m = 1, 2, . . . M . Figure 2.5 illustrates a convex function f (x1 , x2 ) with the following inequality constraint: x1 − x2 ≤ 0. The arrows in the plot indicate the region of the x1 − x2 plane that satisfies the boundary conditions. The global minimum is marked by the dot. In this case, the Karush–Kuhn–Tucker conditions are given by ∇ f (¯ x1 , x ¯2 ) + μ ¯∇ (¯ x1 − x ¯2 ) = 0

(2.244)

μ ¯(¯ x1 − x ¯2 ) = 0

(2.245)

and

μ ¯ ≥ 0.

The feasibility conditions are satisfied by any point in the region indicated by the arrows. Observe that Equation (2.245) can be satisfied either if x ¯1 − x ¯2 = 0, i.e., the optimal point is at the boundary of the constraint function, or if μ ¯ = 0, in which case the optimal point is not on the boundary. If the point is not on the boundary, then μ ¯ = 0, and Equation (2.244) becomes ∇ f (¯ x1 , x ¯2 ) = 0 ,

(2.246)

which is a global minimum because of the convexity of f (x1 , x2 ). Now suppose that the constraint function is x2 − x1 ≤ 0 , and the objective function f (x1 , x2 ) is the same as before. Figures 2.6 and 2.7 illustrate this case, where the dot indicates the optimal point. The Karush–Kuhn–Tucker conditions for this optimization problem are ∇ f (¯ x1 , x ¯2 ) + μ ¯∇ (¯ x2 − x ¯1 ) = 0

(2.247)

μ ¯(¯ x2 − x ¯1 ) = 0

(2.248)

and

μ ¯ ≥ 0,

2.12 Constrained optimization

53

25

20

f(x1, x2)

15

10

5

0 −5 10

5

0

−5

−10

−10

−5

0

5

10

x

x2

1

Figure 2.5 Karush–Kuhn–Tucker theorem with interior minima.

with the feasibility constraints satisfied by points on the region of the x1 –x2 plane indicated by the arrows. Note that at the minima, Equation (2.248) is satisfied for any μ ¯ because x ¯2 − x ¯1 = 0. Hence Equation (2.247) remains unchanged and identical to the Lagrange multiplier technique for optimization with equality constraints for which the optimal point is apparent from Figure 2.7. The global optimality of this point follows from the convexity of f (x1 , x2 ).

2.12.3

Calculus of variations A function mapping members of a set of functions to real values is a functional.6 In many contexts, we may wish to find a function that maximizes or minimizes a functional, subject to some constraints. For instance, one could be asked to describe the shortest path between two points on the surface of a sphere. Solutions to problems of this form can be found by using the calculus of variations. Suppose that we wish to find the function y(x) that minimizes the following quantity,  b I(y) = dx g(x, y(x), y ′ (x)) , (2.249) a

6

Note that the general definition of a functional is a mapping from a vector space (the space of functions is a vector space) to the real numbers.

Notational and mathematical preliminaries

25

20

15

f(x1, x2)

54

10

5

0 −5 10

5

0

−5

−10

−10

−5

0

5

10

x

x

1

2

Figure 2.6 Karush–Kuhn–Tucker theorem with boundary minima.

where the functional is g(·, ·, ·). We use the notation y ′ (x) to represent the derivative of y with respect to x, i.e., y ′ (x) =

d y(x) dx

(2.250)

For all continuous functions that maximize or minimize the quantity I in Equation (2.249), the function y(x) must satisfy   d dg dg − = 0. (2.251) dy dx dy ′ The equation above is known as the Euler–Lagrange differential equation. The canonical example used to illustrate calculus of variations is the problem of proving the fact that the shortest path between two points on a plane is given by a straight line connecting the two points. Consider Figure 2.8 and suppose that we wish to travel from the point (a, y(a)) to the point (b, y(b)) along a function y(x). The length of the curve connecting the point (a, y(a)) to the point (b, y(b)) is given by

2   b d y(x) . (2.252) dx 1 + dx a

2.12 Constrained optimization

55

8 6

x1− x2 = 0

4

x2

2 0 −2 −4 −6 −8 −8

−6

−4

−2

0

2

4

6

8

x1 Figure 2.7 Karush–Kuhn–Tucker theorem with boundary minima, coutour plot.

y

(b, y(b))

(a, y(a)) x Figure 2.8 Shortest path between two points determined by using calculus of

variations.

We start by defining I(y(.)) as the following operation on the functional y(.):

I(y(.)) =



a

b

dx

1+



2 d y(x) . dx

(2.253)

56

Notational and mathematical preliminaries

If the function y(.) minimizes I(y(.)), then it must satisfy Equation (2.251). Writing the Euler–Lagrange differential equation we have ⎛ ⎞ d y (x) d ⎝ dg dx ⎠  − 0= 2 d d y(x) d x 1 + dx y(x) ⎛ ⎞ 2  d 2 y (x) d 2 y (x) d y(x) 2 dx d x2 ⎝ ⎠ =−   2 + d d 2 3/2 dx 1 + dx y(x) 1 + dx y(x) = −

d 2 y (x) d x2

1+



2 3/2 d dx y(x)

(2.254)

Hence, the curve that minimizes the distance between the two points (a, y(a)) and (b, y(b)) must have a second derivative that equals zero for all x in [a, b]. If the second derivative is zero in the interval, then for all x in [a, b], it must be the case that d d y(x) = y(a) . (2.255) dx dx Integrating both sides with respect to x yields d y(a) + A , y(x) = x dx

(2.256)

where A is a constant. Hence, we have proved that if y(x) is the curve that minimizes the distance between the points (a, y(a)) and (b, y(b)), it must be a straight line. Note that technically, we haven’t proved that there is a curve that minimizes the distance between those points, but assuming that there is such a curve, we have shown that it is a straight line. In order to prove that it is necessary for any function y(x) which minimizes (2.249) to satisfy the Euler–Lagrange equation, we first observe that any function y(x) that minimizes Equation (2.249) must satisfy I(y(x)) ≤ I(yǫ (x)), where yǫ (x) is a perturbed version of y(x). The perturbed function yǫ (x) is given by the following yǫ (x) = y(x) + ǫ h(x) . Here h(x) is any other function which has the following properties h(a) = h(b) = 0 . The last property implies that y(a) = yǫ (a) and y(b) = yǫ (b). Additionally, assume that h(x) is continuous and has a continuous derivative. If y(x) minimizes I(y(.)), it must be the case that yǫ (x) at ǫ = 0 minimizes I(y(.)). Hence, the derivative of I(yǫ (x)) with respect to ǫ must be zero. This

2.13 Order of growth notation

57

requirement leads to d d I(yǫ (x)) = dǫ dǫ



b a

  d yǫ (x) x . dx g x, yǫ , dx

Moving the derivative into the integral, one finds that    b d d d I(yǫ (x)) = g x, yǫ , yǫ (x) dx dǫ dǫ dx a  b d d d d d ′ d = g x+ yǫ + y . dx g g ′ dx dǫ d yǫ d ǫ d yǫ d ǫ ǫ a Substituting yǫ (x) = y(x) + ǫ h(x) yields  b d d d ′ d I(yǫ (x)) = g h(x) + g h (x) dx. dx ′ dǫ d yǫ d yǫ d yǫ a

(2.257)

(2.258) (2.259)

(2.260)

d Because y(x) = y0 (x) minimizes I(y(.)), dǫ I = 0 at ǫ = 0. Therefore, at ǫ = 0, we have  b d d g h(x) + g h′ (x) = 0 . dx (2.261) dy dy a

We can then use integration by parts to write the right-hand side of the above integral as follows     b d d d g− 0= g h(x) . dx dy d x d y′ a Since the function h(x) is any arbitrary continuous function with a continuous derivative, the above equation can hold only if   d d d g− g = 0, dy d x d y′ which is the Euler–Lagrange differential equation.

2.13

Order of growth notation In many areas of engineering, it is useful to understand the order of growth of one function with respect to another. For instance, we may wish to know the rate at which the per-link data rate in an ad hoc wireless network declines with increasing numbers of nodes. The so-called big-O/little-O notation is useful to describe this. This notation is also used in other fields such as computer science and approximation theory, and is also referred to as the Landau notation. Consider two real functions f (x) and g(x). We say that f (x) is “big-O” of g(x), or f (x) = O(g(x)), when there exists a constant A and X such that f (x) < A g(x)

(2.262)

58

Notational and mathematical preliminaries

for all x > X. In other words, the function g(x) grows at a faster rate with x than the function f (x). We say that f (x) is “little-O” of g(x), or f (x) = o(g(x)), when g(x) →0 f (x)

(2.263)

as x → ∞. We say that f (x) is “theta of” g(x), or f (x) = Θ(g(x)), if there exist an X, and constants A1 and A2 such that for all x > X, A1 g(x) ≤ f (x) ≤ A2 g(x).

(2.264)

In other words f (x) and g(x) grow at the same rate for sufficiently large x. Confusingly, it is common practice to use this notation to describe the order of growth of functions when x is close to zero as well as when x is large as described above. The little-O notation is most commonly used in this context, whereby we say that f (x) is little-O of g(x), or f (x) = o(g(x)), when g(x) →0 f (x)

(2.265)

as x → 0. A common application of the little-O notation is in writing Taylor series expansions of functions for small arguments. For instance, one may write the Taylor series expansion of log(1 + x) for small x as log(1 + x) = x + o(x) .

2.14

(2.266)

Special functions In this section, we briefly summarize some special functions that are often encountered in wireless communications in general and in this text in particular.

2.14.1

Gamma function The gamma function Γ(z) is an extension of the factorial function to real and complex numbers and is defined as  ∞ Γ(z) = dτ τ z −1 e−τ . (2.267) 0

The integral requires analytic continuation to evaluate in the left half plane, and the gamma function is not defined for non-positive integer real values of z. For the special case of integer arguments the gamma function can be expressed in terms of the factorial, Γ(n) = (n − 1)! =

n −1

m =1

m.

(2.268) (2.269)

2.14 Special functions

59

Two related functions are the upper and lower incomplete gamma functions defined respectively as Γ(s, x) =







x

dτ τ s−1 e−τ ,

(2.270)

dτ τ s−1 e−τ .

(2.271)

x

and γ(s, x) =

0

The lower incomplete gamma function γ(s, x) is often encountered in communication systems since the cumulative distribution function of a χ2 distributed random variable is proportional to the lower incomplete gamma function. The following asymptotic expansion of the lower incomplete gamma function is useful for analyzing the probability of error of wireless communications systems. γ(s, x) =

2.14.2

1 s x + o(xs ) . s

(2.272)

Hypergeometric series The hypergeometric series [129] is defined as ∞ !p  (a ) xk !mq = 1 m k , p Fq (a1 , . . . ap ; b1 , . . . bq ; x) = n = 1 (bn )k k!

(2.273)

k=0

where here (a)k is known as the Pochammer symbol defined as (a)k =

Γ(a + k) = a(a + 1)(a + 2) · · · (a + k − 1). Γ(a)

(2.274)

From its definition, it can be observed that the hypergeometric series does not exist for non-positive integer values of bn because it would result in terms with zero denominators. The hypergeometric series arises in a variety of contexts; in particular, the Gauss hypergeometric function (p = 2, q = 1) has some special properties. For convenience of notation let a = (a)1 , b = (a)2 , and c = (b)1 , when the argument is unity, we have 2 F1 (a, b; c; 1)

=

Γ(c) Γ(c − a − b) . Γ(c − a) Γ(c − b)

(2.275)

60

Notational and mathematical preliminaries

Other special values include the following: log (1 + x) = x 2 F1 (1, 1; 2; −x)

(2.276)

−a

(1 − x)

= 2 F1 (a, b; b; x) for all b except non-negative integers, (2.277)   3 1 , 1; ; −x2 , (2.278) arctan(x) = x 2 F1 2 2   1 1 3 2 , ; ;x , (2.279) arcsin(x) = x 2 F1 2 2 2      1 1 3 , ; ; −x2 , (2.280) log x + 1 + x2 = x 2 F1 2 2 2 and numerous other values which can be found in the literature. One particular identity that is not widely available in the literature is the following, which applies when p and q are integers with 0 < p < q:     p p p p ; 1 + ; x = Lerch x, 1, (2.281) 1, F 2 1 q q q q q −1   1 p  k q1 −p (2.282) ζq x =− log 1 − ζqk x q , q k=0

where ζq = e

2π i q

is the qth root of unity and the Lerch transcendent is defined as Lerch(x, s, a) =

∞ 

k=0

xk s , (a + k)

(2.283)

which has the following property, Lerch(x, s, a) = x Lerch(x, s, a + 1) +

1 . (a2 )s/2

Using this equality, we find     p p p p 2 F1 1, − ; 1 − ; x = − Lerch x, 1, − q q q q   q−p q p − = − Lerch x, 1, q q p q −1   1 px   k q1 p−q ζq x =− log 1 − ζqk x q . q

(2.284)

(2.285) (2.286) (2.287)

k=0

Euler’s hypergeometric transforms can be used to manipulate the parameters of the Gauss hypergeometric function. The latter two of the following are useful for numerically evaluating the hypergeometric series with negative arguments which have alternating sign terms. The latter two transforms convert negative

2.14 Special functions

61

arguments x to non-negative arguments: = (1 − x)c−a−b 2 F1 (c − a, c − b; c; x)   x −b 2 F1 (a, b; c; x) = (1 − x) 2 F1 c − a, b; c; x−1   x −a . 2 F1 (a, b; c; x) = (1 − x) 2 F1 a, c − b; c; x−1 2 F1 (a, b; c; x)

2.14.3

(2.288) (2.289) (2.290)

Beta function The beta function B(x, y) is defined by B(x, y) =



0

=

1

du ux−1 (1 − u)y −1

Γ(x) Γ(y) . Γ(x + y)

(2.291)

Incomplete beta function The incomplete beta function is defined by  z du ux−1 (1 − u)y −1 B(z; x, y) = 0

zx = 2 F1 (x, 1 − y; x + 1; z) , x

(2.292)

where 2 F1 (a1 , a2 ; b; z) is the hypergeometric function defined in Section 2.14.2. Regularized beta function The regularized beta function is given in terms of the ratio of an incomplete to a complete beta function and is defined as I(z; x, y) =

2.14.4

B(z; x, y) . B(x, y)

Lambert W function The Lambert W (or product logarithm) function which is often denoted by W (z) is a function that has multiple branches and cannot be expressed in terms of elementary functions. It is defined as the functional inverse of z = W (z) eW (z ) .

(2.293)

For real arguments, the Lambert W function W (z) has only two real branches: the principal branch W0 (z) and another branch that is simply denoted by W−1 (z).

62

Notational and mathematical preliminaries

The principal branch has the following special values:   1 = −1 W0 − e W0 (0) = 0

W0 (e) = 1 . Additionally, the principle branch has a series representation which is valid for |x| ≤ 1e , W0 (x) =

∞ 

k=1

2.14.5

(−1)k −1

k k −2 k x . (k − 1)!

Bessel functions Bessel functions [129, 343, 53] are given by the functional solutions for fα (x) to the differential equation ∂ ∂2 fα (x) + (x2 − α2 ) fα (x) = 0 . fα (x) + x (2.294) ∂x2 ∂x There are two “kinds” of solution to this equation. Solutions of the first kind are denoted by Jα (x), and solutions of the second kind are denoted Yα (x) (and sometimes Nα (x)). The parameter α indicates the “order” of the function. The Bessel function of the second kind is defined in terms of the first kind by x2

Yα (x) =

Jα (x) cos(α π) − J−α (x) . sin(α π)

(2.295)

Bessel functions are often the result of integrals of exponentials of trigonometric functions. The Bessel function and the confluent hypergeometric function 0 F1 (·, ·) are related by  x α   x2 2 . (2.296) α + 1; − F Jα (x) = 0 1 Γ(α + 1) 4 For integer values of order m, the Bessel function can also be expressed by using the contour integral form given by  x 1 Jm (x) = dz z −m −1 e 2 (z −1/z ) , (2.297) 2πi C where the counterclockwise contour C encloses the origin [11]. Modified Bessel functions of the first and second kind are denoted by Iα (x) and Kα (x), respectively. The modified Bessel function of the first kind is proportional to the Bessel function with a transformation in the complex plane of the form Iα (x) = (i)−α Jα (i x) .

(2.298)

Problems

63

For integer order, this form can be expressed in terms of the contour integral  ix 1 dz z −m −1 e 2 (z −1/z ) Im (x) = (i)−α 2πi C  x 1 = dz z −m −1 e 2 (z + 1/z ) , (2.299) 2πi C where the contour C encloses the origin. The modified Bessel function of the second kind is given by π (2.300) Kα (x) = (i)α + 1 [Jα (i x) + i Yα (i x)] . 2

2.14.6

Error function The error function, often denoted erf(·), is defined by  x0 2 2 erf(x0 ) = √ dx e−x . π 0

2.14.7

Gaussian Q-function The Gaussian Q-function, often denoted Q(·), is defined by  ∞ x2 1 Q(x0 ) = √ dx e− 2 . 2π x 0

2.14.8

(2.301)

(2.302)

Marcum Q-function The generalized Marcum Q-function [293, 280, 232, 255] is defined by  ∞ x2+ν 2 1 QM (ν, μ) = M −1 dy y M e− 2 IM −1 (ν y) ν μ  m ∞  2 2 ν − x +2 ν Im (μ ν) =e μ m = 1−M

1 − e = 2π i

μ2+ν2 2



μ2

ν2 z

e 2z + 2 , dz M z (1 − z) S

(2.303)

where the contour S encloses the pole at zero but not the pole at 1. If M is not specified, a value of 1 is assumed.

Problems 2.1 Evaluate the following expressions. √ (a) ℜ e−i π

64

Notational and mathematical preliminaries

(b)

log4 (1 + i)

(c)

∞ dx δ(x  1−∞  I4  a

(d) (e) Γ(2)

2

(x−1)] − 1) cosh√[π 2−x 2

2.2 (a) (b) (c) (d)

For complex vectors a and b, evaluate the following expressions. rank{a b† } [I − a(a† a)−1 a† ] a [I − a(a† a)−1 a† ] b(b† b)−1 b† a −1  b if a† b = 0 I + aa†   −1 (e) I + aa† b if a† b = 1/2 † (f) log2 |I + a a | if a = 1

2.3 (a) (b) (c)

For unit-norm complex vectors a and b, evaluate the following expressions. λm {I + a a† + b b† } if a† b 2 = 1/2 tr{I + a a† + b b† } |I + a a† + b b† |

2.4 For matrices A ∈ Cm ,p and B ∈ Cm ,q , show that |I + A A† + B B† | ≥ |I + A A† | . 2.5 Evaluate the following Wirtinger derivatives (where z ∗ is interpreted as the doppelganger variable for the conjugation of z). "∞ m (a) ∂ ∂z ∗ m = 0 mz m (b) ∂ ∂z ∗ z† z (c) ∂∂z † z† z (d) (e)

∂ z† A z ∂ z† z† B z ∂ † ∂ z∗ z A

2.6 Evaluate the following integrals under the assumption that the closed contour encloses a radius of 10 of the origin. # 1 (a) dz (z −1) 2 z # 1 (b) dz (z −20) 2 z # −3) (c) dz (z(z−2)(z −1) 2 z # (d) dz (z 2 z−1) # z (e) dz (ze−1)

2.7 Evaluate the following integrals where V indicates the entire volume spanned by the variables of integration. (a) For real variables x and y 2 2 ∞ ∞ dx −∞ dy(x2 + y 2 ) e−x e−y −∞

Problems

(b) (c)

65

For complex variable z 2 dΩz z 2 e−z  V For the complex n-vector z 2 dΩz z 2 e−z V

2.8 Evaluate Gauss hypergeometric function expressions in terms of common functions. (a) 2 F1 (1, 2, 4; 1) (b) 2 F1 (1, 1, 2; −1) (c) 2 F1 (1/2, 1/2, 3/2; −3) (d) 2 F1 (−2, 1/2, 1/2; 1/2) 2.9 By using the calculus of variation, find the shortest distance between a point on the zenith of a sphere (the north pole) and a point on the equator.

3

Probability and statistics

3.1

Probability While it is often suppressed when confusion is unlikely, it is pedagogically useful to differentiate between a random variable and a particular value that a random variable takes. In this section, we will be explicit in our notation for random variables or realizations of them. However, throughout the rest of the text, the formalism will be employed only if confusion is likely. Imagine that a random variable is denoted X and a value for that random variable is given by x. The probability Pr{x ∈ S; a} of a continuous real random variable X having a value x within some set of values S, given some parameter of the distribution a, is given by the integral of the probability density function (PDF) pX (x; a) over S,  Pr{x ∈ S; a} = dx pX (x; a) . (3.1) S

Depending upon the situation, the explicit dependency upon parameters may be suppressed. The cumulative distribution function (CDF) is the probability PX (x0 ) that some random variable X is less than or equal to some threshold x0 , PX (x0 ) = Pr{x ≤ x0 }  x0 = dx pX (x; a) .

(3.2)

−∞

3.1.1

Bayes’ theorem There are a variety of probability densities including: prior probability density,1 marginal probability density, posterior probability density,2 and conditional probability density, which are denoted here as pX (x) : prior probability density pY (y) : marginal probability density pX (x|y) : posterior probability density pY (y|x) : conditional probability density. 1 2

Often the Latin form is used a priori to denote the probability. Often the Latin form is used a posteriori to denote the probability.

(3.3)

3.1 Probability

67

For single variables, and pY (y) > 0, this relationship (Bayes’ theorem) can be written in the important form pX (x|y) =

pY (y|x) pX (x) . pY (y)

(3.4)

A useful interpretation of this form is to consider the random variable X as the input to a random process that produces the random variable Y that can be observed. Thus, the likelihood of a given value x for the random input variable X is found given the observation y of the output distribution Y . Throughout statistical signal processing research, a common source of debate is the use of implicit and sometimes unstated priors in analysis. These priors can dramatically affect the performance of various algorithms when exercised by using measured data that often have contributions that do not match simple models.

3.1.2

Change of variables Consider a random variable X with probability density pX (x), and a new random variable that is a function of Y = f (x). Assuming that the function f (x) is one to one and is differentiable, we can find the probability density pY (y) of Y using the following transformation: pX(f −1 (y)) $, pY (y) = $  $ ∂ $ $ ∂ x f (x)x= f −1 (y ) $

(3.5)

where x = f −1 (y) indicates the inverse function of f (x), and the notation ·|x=x 0 indicates evaluating the expression to the left with the value x0 . However, it is not uncommon for the inverse to have multiple solutions. If the jth solution given by x at some value y to the inverse is given by x = fj−1 (y), then the transformation of densities is given by pY (y) =

 j

pX(fj−1 (y)) $ $.  $ ∂ $ $ ∂ x f (x)x=f −1 (y ) $

(3.6)

j

Consider a multivariate distribution involving M random variables which is discussed in greater detail in Section 3.1.7. Define the random vectors X ∈ RM ×1 whose realizations are denoted by x and Y ∈ RM ×1 whose realizations are denoted by y. The probability density functions of these random vectors are given by pX (x) and pY (y), respectively. Let the vector function f (x) map x to y such that y = f (x). If at the point y, there are multiple solutions for x, then the functional inverse fj−1 (y) is the jth solution. The relationship between the probability density function of X and Y is then given by pY (y) =

 j

pX (fj−1 (y)) $, $ $ $  ∂ f (x)  $ $ $ ∂ x x= f −1 (y) $ j

(3.7)

68

Probability and statistics

where |∂f (x)/∂x| is the Jacobian associated with the two random vectors, and the notation |.|x = x0 indicates that the absolute value of the quantity within the bars is evaluated with the parameter x = x0 .

3.1.3

Central moments of a distribution The characteristics of the distribution of random variables are often represented by the various moments about the mean of the distribution. The expectation of some function f (x) of the random variable X with probability density function pX (x) is indicated by  f (X) = dx f (x) pX (x) . (3.8) The mean value of the random variable X is given by  X = dx x pX (x) .

(3.9)

The mth central3 moment about the mean indicated here by μm is given by  μm = dx (x − X)m pX (x). (3.10) By construction, the value of μ1 is zero. The following central moments, denoted here as μ2 , μ3 , and μ4 , are related to the variance, skew, and kurtosis excess of a distribution. The variance of random variable X is given by  2 σX = dx (x − X)2 pX (x) = μ2 .

(3.11)

Note that in situations where the random variable in concern is clear, we shall omit the subscript, denoting the variance simply as σ 2 . The skewness of random variable X is an indication of the asymmetry of a distribution about its mean. It is given by the third central moment normalized by the variance to the 3/2 power; thus, it is unitless, skew{X} = =

dx (x − X)3 pX (x) (σ 2 )

μ3 3/2

(σ 2 )

3/2

.

(3.12)

Finally, the kurtosis of random variable X is a measure of a distributions “peakiness.” It is given by the fourth central moment normalized by the variance squared; thus, it is unitless. The excess kurtosis is the ratio of the fourth cumulant 3

Central indicates that it is the fluctuation about the mean that is being evaluated.

3.1 Probability

69

to the square of the second cumulant; cumulants are discussed in Section 3.1.6. The excess kurtosis is given by subtracting 3 from the kurtosis dx (x − X)4 pX (x)

excess kurtosis{X} = =

2

(σ 2 )

μ4 2

(σ 2 )

−3

− 3,

(3.13)

which has the desirable characteristic of evaluating to zero for Gaussian distributions. Unfortunately, there is sometimes confusion in the literature as to whether “kurtosis” indicates kurtosis or excess kurtosis. Jensen’s inequality Jensen’s inequality can be used to relate the mean of a function of a random variable to the function of the mean of the random variable. Specifically, Jensen’s inequality states that for every convex function f (·) of a random variable X, f (X) ≥ f (X) .

(3.14)

Similarly for every concave function g(·) of a random variable X, g(X) ≤ g(X) .

3.1.4

(3.15)

Noncentral moments of a distribution The noncentral moments of a distribution for the random variable X are similar to the central moments with the exception that the mean is not removed from the expectation. The mth noncentral moment indicated here by μ′m is given by μ′m = X m   = dx xm pX (x) .

(3.16)

A tool that is sometimes useful in working with problems involving moments is the moment-generating function M (t; X) for random variable X and dummy variable t, which is given by  M (t; X) = dx pX (x) et x & % = et X ( ' ∞  1 (t X)m . (3.17) = m! m =0 The mth moment is found by noting that the derivative with respect to t evaluated at t = 0 leaves only the mth term in a Taylor expansion of the exponential,   ∂ M (t; X)  = X m  . (3.18) ∂t t= 0

70

Probability and statistics

3.1.5

Characteristic function For a variety of applications, it is useful to consider transforms of the probability density. The characteristic function is proportional to the inverse Fourier transform in terms of angular frequency. The integral transform of some density pX (x) of real random variable X is denoted by the characteristic function φ(s) and is given by  (3.19) φX (s) = dx ei x s pX (x) , for which s is the transformed variable for x that corresponds to the angular frequency. Note that the characteristic function of a random variable is essentially the Fourier transform (see Section 2.10) of its probability density function. The moment-generating function, on the other hand, is essentially the Laplace transform (see Section 2.11) of the PDF evaluated at real values.

3.1.6

Cumulants of distributions The concepts of cumulants and moments are closely related. Estimating the cumulants of observed signals can be useful in disentangling or detecting signals [200]. The mth cumulant km of a probability distribution can be implicitly defined in terms of the characteristic function of the random variable X and is implicitly defined by log φX (s) =

∞  km (i t)m . m! m =1

(3.20)

In terms of the central moments μm , the first few cumulants are given by k1 = μ k2 = μ2 k3 = μ3 k4 = μ4 − 3μ22

k5 = μ5 − 10μ2 μ3

k6 = μ6 − 15μ2 μ4 − 10μ23 + 30μ32 .

3.1.7

(3.21)

Multivariate probability distributions The probability density function of multiple random variables indicates the probability that values of the random variables are within some infinitesimal hypervolume about some point in the variable space. The probability density is denoted pX 1 ,X 2 ,... (x1 , x2 , . . .) .

(3.22)

3.1 Probability

71

If the random variables are independent, then the joint probability density function is equal to the product of the individual probability densities,  (3.23) pX m (xm ) . pX 1 ,X 2 ,... (x1 , x2 , . . .) = m

Given some parameter A, the probability of a complex random matrix variable X having a value X that is contained within a space defined by S is given by  PX (X ∈ S; A) = dΩX pX (X; A) , S = dΩX pX ((X)1,1 , (X)1,2 , . . . (X)2,1 , . . . ; A) , (3.24) S

where dΩX , discussed in Section 2.9.2, is the notation for the measure and is given by dΩX = d{ℜX}1,1 d{ℜX}1,2 . . . d{ℜX}m ,n · d{ℑX}1,1 d{ℑX}1,2 . . . d{ℑX}m ,n .

(3.25)

Note that the measure is expressed in terms of the real and imaginary components of the complex random variable. This convention is not employed universally, but will be assumed typically within this text. In the case of a real random variable, the imaginary differentials are dropped. The probability density function of a given set of random variables xm given or conditioned on particular values for another set of variables yn is denoted by pX 1 ,X 2 ,... (x1 , x2 , . . . |y1 , y2 . . .) .

(3.26)

Bayes’ theorem relates the conditional and prior probability densities, pX 1 ,X 2 ,...,Y 1 ,Y 2 ,... (x1 , x2 , . . . , y1 , y2 . . .) = pX 1 ,X 2 ,... (x1 , x2 , . . . |y1 , y2 . . .) pY 1 ,Y 2 ,... (y1 , y2 . . .)

= pY 1 ,Y 2 ,... (y1 , y2 . . . |x1 , x2 , . . .) pX 1 ,X 2 ,... (x1 , x2 , . . .) .

3.1.8

(3.27)

Gaussian distribution The Gaussian distribution is an essential distribution in signal processing. Because of the central limit theorem, processes that combine the effects of many distributions often converge to the Gaussian distribution. Also, for a given mean and variance, the entropy (which is used in the evaluation of mutual information) is maximized for Gaussian distributed signals. The analysis of multiple-antenna systems will often take advantage of multivariate Gaussian distributions. The probability density function for a real% Gaussian& random variable X with value x, mean μ = X, and variance σ 2 = (X − μ)2 is given by pX (x; μ, σ) dx = √

1

2πσ 2

e−

( x −μ ) 2 2σ 2

dx .

(3.28)

72

Probability and statistics

This normal distribution is often identified by N (μ, σ 2 ). The complex normal (or Gaussian) distribution assuming circular symmetry for a complex random variable Z with value z, mean μ, and variance σ 2 is given by pZ (z; μ, σ) dℜz dℑz =

1 − ( z −μ2 ) 2 dℜz dℑz . e σ πσ 2

(3.29)

The complex version of the distribution is often denoted CN (μ, σ 2 ). The probability density for an m by k random matrix Z with value Z ∈ Cm ×k drawn from a multivariate complex Gaussian is given by pZ (Z; X, R) dΩZ =

† −1 1 e−tr{(Z−X) R (Z−X)} dΩZ , |R|k π m k

(3.30)

where the mean of Z is given by X ∈ Cm ×k . The covariance of the rows of the random matrix is R ∈ Cm ×m under the assumption of independent columns (note that it is possible to define a more general form of Gaussian random matrix with dependent columns). The notation dΩZ indicates the differential hypervolume in terms of the real parameters ℜ{Z}p,q and ℑ{Z}p,q , indicating the real and imaginary part of the elements of Z, where p and q are here indices identifying elements in the matrix. The covariance matrix is an important concept used repeatedly throughout adaptive communications. For some complex random matrix Z with values Z and mean X, the covariance matrix is given by R= =

)



(Z − Z)(Z − Z)† k dΩZ

*

(Z − X)(Z − X)† pZ (Z; X) . k

(3.31)

The covariance is Hermitian, R = R† . In this form, the covariance matrix is a measure of the cross correlation between the elements along the columns of Z.

3.1.9

Rayleigh distribution The magnitude of a random variable drawn from a complex, circularly symmetric Gaussian distribution with variance σ 2 follows the Rayleigh distribution. Here we will identify this random variable as Q with value q. Suppose that the Gaussian variable is denoted by Z with value z, and has variance σ 2 . If its magnitude is denoted by q = z

(3.32)

3.1 Probability

73

then the probability density4 for the real Rayleigh variable Q is given by + 2 q −q 2 /σ 2 dq ; q ≥ 0 σ2 e pR ay (q) dq = (3.33) 0 dq ; otherwise, where σ 2 is the variance of the complex Gaussian variable z. The cumulative distribution function PR ay (q) for the Rayleigh variable q is given by  q dx p(x) PR ay (q) = +0 2 2 ;q≥0 1 − e−q /σ (3.34) = 0 ; otherwise.

3.1.10

Exponential distribution The square of a Rayleigh distributed random variable is a useful quantity in analyzing the received power of a narrowband transmission through a Rayleigh fading channel because the amplitude of the channel coefficient in that case follows a Rayleigh distribution. An example application is the interarrival times of a one-dimensional Poisson point process (see Section 3.4). The exponential random variable is parameterized by the inverse of its mean λ. The probability density function of the exponential random variable is λ e−λ x dx ; x ≥ 0 pE xp (x) dx = (3.35) 0 dx ; x < 0. The cumulative distribution function of the exponential random variable is 1 − e−λ x ; x ≥ 0 PE xp (x) = (3.36) 0 ; x < 0. Its mean and variance are

3.1.11

1 λ

and

1 λ2

, respectively.

Central χ2 distribution The sum q of the magnitude squared of real, zero-mean, unit-variance Gaussian variables Xm with value xm is characterized by the χ2 distribution. The sum of k independent Gaussian variables denoted by q is q=

k 

x2m .

(3.37)

m =1

4

This density assumes that the complex variance is given by σ 2 , which is different from a common assumption that the variance is given for a real variable. Consequently, there are some subtle scaling differences.

74

Probability and statistics

Since the random variables Xm have unit variance and zero mean, % & Xm 2 = 1 .

(3.38)

The distribution pχ 2 (q) for the sum of the magnitude square q of k independent, zero-mean, unit-variance real Gaussian random variables is given by pχ 2 (q; k) dq =

+

1 2 k / 2 Γ(k /2)

q k /2−1 e−q /2 dq

0 dq

;q≥0 ; otherwise.

(3.39)

The cumulative distribution function Pχ 2 (q; k) of the χ2 random variable is given by Pχ 2 (q; k) = =



q

+0

dr pχ 2 (r; k) 1 Γ(k /2)

γ

0

k

q 2, 2



;q≥0 ; otherwise,

(3.40)

where Γ(·) is the standard gamma function, and γ(·, ·) is the lower incomplete gamma function given by Equation (2.271). Complex χ2 distribution With a slight abuse in terminology, we define the complex χ2 distribution as the distribution of the sum q of n independent complex, circularly symmetric Gaussian random variables Zm with values zm . The sum q is given by [173] q=

n 

m =1 2

zm 2

& % Zm 2 = σ .

(3.41) (3.42)

To be clear, the variance detailed here is in terms of the complex Gaussian variable and we include the variance explicitly as a parameter of the distribution. By employing Equation (3.5) and noting that the number of real degrees of freedom is twice the number of complex degrees of freedom (k = 2n), the distribution 2 pC χ 2 (q; n, σ ) for the sum of the magnitude squared q ≥ 0 is given by  q ; 2n dq σ 2 /2  n −1 q 2q 1 = 2 n −1 e− σ 2 dq 2 σ 2 Γ(n) σ q q n −1 e− σ 2 dq , = 2 n (σ ) Γ(n)

2 pC χ 2 (q; n, σ ) dq =

1 pχ 2 σ 2 /2



(3.43)

3.1 Probability

75

where it is assumed that the variance σ 2 of zm is the same for all m, and the density is zero for q < 0. The cumulative distribution for q is given by  q 2 PχC2 (q; n, σ 2 ) = dr pC χ 2 (q; n, σ ) 0

=

3.1.12

 q  1 γ n, 2 . Γ(n) σ

(3.44)

Noncentral χ2 distribution The sum of k nonzero mean, unit variance Gaussian random variables follows a noncentral χ2 distribution. Assume that the variable q is drawn from the noncentral χ2 distribution. Here, the set of k random variables is defined such that mth random variable Xm with value xm has mean μm (not to be confused with the notation used for the central moments) and is drawn from real unitvariance Gaussian distributions (plural since they have different means), Xm ∼ N (μm , 1) .

(3.45)

The random variable q is given by the sum of the magnitude squared of the real independent Gaussian variables q=

k 

x2m

(3.46)

m =1

μm = Xm  & % σ 2 = Xm − μm 2 = 1 ,

(3.47) (3.48)

where here μm indicates the mth mean (and not the moment). The probability density for q ≥ 0 is given by [174]  q k /4−1/2 1 √ (3.49) Ik /2−1 ( ν q) , pχ 2 (q; k, ν) = e−(q + ν )/2 2 ν

where Im (·) indicates the mth order modified Bessel function of the first kind (discussed in Section 2.14.5), and the density is zero for q < 0. The noncentrality parameter ν is given by the sum of the standard deviation normalized means, ν=

k 

μ2m .

(3.50)

m =1

The cumulative distribution function for q ≥ 0 is given by [232, 173]  q Pχ 2 (q; k, ν) = dr pχ 2 (r; k, ν) 0 √ √ = 1 − Qk /2 ( ν, q) ∞  ν m  γ(m + k/2, q/2) −ν /2 2 , =e m! Γ(m + k/2) m =0

(3.51)

76

Probability and statistics

where QM (·, ·) is the Marcum Q-function discussed in Section 2.14.8, and γ(·, ·) is the lower incomplete gamma function. Complex noncentral χ2 distribution If n random variables are drawn from a circularly symmetric complex Gaussian distribution with complex mean μm and variance σ 2 , then the complex noncentral χ2 distribution can be found by noting that n complex degrees of freedom correspond to 2k real degrees of freedom, and the complex noncentrality parameter is given by νC =

n 

m =1

μm 2 .

(3.52)

In converting from the real to the complex distribution, a factor of 2 occurs in multiple changes of variables. In addition to the change in number of degrees of freedom, the real and imaginary variances are half the complex variance, 2 σr2e = σim = σ 2 /2, where we include the variance σ 2 as a parameter of the distribution. Consequently, k = 2n, ν = 2ν C /σ 2 , and q → 2q/σ 2 ; thus, the complex noncentral χ2 distribution for q ≥ 0 is given by    1 −(q + ν C )/σ 2  q (n −1)/2 2 νC q C 2 C pχ 2 (q; n, σ , ν ) dq = 2 e dq , (3.53) In −1 σ νC σ2 and is zero for q < 0. The cumulative distribution function for q ≥ 0 is given by  q 2 C PχC2 (q; n, σ 2 , ν C ) = dr pC χ 2 (r; n, σ , ν ) 0 , ,  2ν 2q . (3.54) , = 1 − Qn 2 σ σ2

3.1.13

F distribution The F distribution is a probability distribution of the ratio of two independent central χ2 distributed random variables. It is parameterized by two parameters n1 and n2 and has a density function ⎧ n2 n1 n1 ⎨ n 1 2 n 2 2 x 2 −1 1 ;x≥0 n1 n n1 n2 B( 2 , 2 ) (n 1 x+ n 2 ) 2 + 22 (3.55) pF (x; n1 , n2 ) = ⎩ 0 ; otherwise.

B(·, ·) here refers to the beta function defined in Section 2.14.3. The cumulative distribution function of the F -distributed random variable is given by   n1 n2 n1 x PF (x; n1 , n2 ) = I ; ; , (3.56) n2 + n1 x 2 2 where I(.; ., ) is the regularized beta function defined in Section 2.14.3.

3.1 Probability

77

When viewed as the ratio of two χ2 random variables normalized by their degrees of freedom, the parameters n1 and n2 are the degrees of freedom of the numerator and denominator random variables. In other words, if the random variables X1 and X2 follow χ2 distributions with n1 , and n2 degrees of freedom respectively, then the random variable Y =

X1 n2 X2 n1

(3.57)

follows an F distribution with parameters n1 and n2 .

3.1.14

Rician distribution The magnitude of the sum of a real scalar a and a random complex variable z sampled from a circularly symmetric complex Gaussian distribution with zero mean and variance σ 2 is given by the random variable Y with value y, y = a + z  = (a + ℜ{z})2 + ℑ{z}2 .

(3.58)

The random variable y follows the Rice or Rician distribution whose probability density function pR ice (y) is, +  2 a y  −(y 2 + a 2 )/σ 2 2y e dy ; y ≥ 0 σ 2 I0 σ 2 pR ice (y) dy = , (3.59) 0 dy ; otherwise where I0 (·) is the zeroth order modified Bessel function of the first kind (discussed in Section 2.14.5). In channel phenomenology, it is common to describe this distribution in terms of the Rician K-factor, which is the ratio of the coherent to the fluctuation power, K=

a2 . σ2

(3.60)

It may be worth noting that the Rician distribution is often described in terms of two real Gaussian variables. Consequently, the distribution given here differs from the common form by replacing σ 2 with σ 2 /2. The cumulative distribution function for a Rician variable for value greater than zero is given by  y0 dy pY (y) PR ice (y0 ) =   0 y 0 2 2 2 2ay 2y e−(y + a )/σ dy 2 I0 = 2 σ σ 0  √ √  a 2 y0 2 , (3.61) , = 1 − QM = 1 σ σ

78

Probability and statistics

where QM (ν, μ) is the Marcum Q-function discussed in Section 2.14.8. The distribution for the square of a Rician random variable q = y 2 is the complex noncentral χ2 distribution with one complex degree of freedom.

3.1.15

Nakagami distribution The Nakagami distribution was developed to fit experimental data for wireless propagation channels that are not well modeled by either the Rayleigh or Rician distributions. For random variable X with value x, a Nakagami distribution pN ak (x; m, ω) is parameterized by two variables and takes the following form pN ak (x; m, ω) dx =

m 2 2mm x2m −1 e− ω x dx, Γ(m) ω m

(3.62)

where ω is the second moment of the random variable and m is a parameter known in the communications literature simply as the “m-parameter.” The mean and variance are given by   Γ m + 21  ω  21 (3.63) X = Γ(m) m and

ω var[X] = ω − m

   2 Γ m + 12 Γ(m)

(3.64)

Observe that for m = 1 the Nakagami distribution reduces to the Rayleigh distribution as follows: x2 2 (3.65) pN ak (x; m, ω) = xe− ω . ω 2

+ 1) With m = (K 2K + 1 , the Nakagami distribution is close to a Rician distribution. Hence, the Nakagami distribution can be used to model a wider range of channels than the Rayleigh and Rician channels alone.

3.1.16

Poisson distribution The Poisson distribution is a discrete probability distribution that is useful for modeling independent events that occur in some interval (or volume in general). The probability of n events is characterized by a rate μ and has the following probability mass function (PMF) μn e−μ . n! The cumulative distribution function of the Poisson distribution is n  μk , Pn = e−μ k! pn =

k=0

and its mean and variance are μ.

(3.66)

(3.67)

3.1 Probability

79

The Poisson distribution is useful in calculating the number of arrivals of a point process in a given interval of length t (or volume in general) where μ/t is the rate of arrivals. Suppose that the inter-arrival times of buses at a bus stop are completely independent and the rate of arrivals is λ. Then, the PMF of the number of buses arriving in an interval of duration τ is pn =

3.1.17

(λτ )n e−λτ . n!

(3.68)

Beta distribution The beta distribution can be used to describe the fraction of the vector norm squared retained if a complex random Gaussian vector of size k + j is projected into a subspace of size k as discussed in Reference [173]. The random variable X with value x is described by the ratio of sums of the magnitude squared of a set of random identically distributed, circularly symmetric complex Gaussian variables "k 2 m = 1 gm , (3.69) x = "k " j 2 ′ 2 m = 1 gm + m = 1 gm

where G′m is a real Gaussian random variable with the same statistics as Gm ′ (with values gm and gm , respectively). The probability density of the beta distribution pβ (x; j, k) is given by pβ (x; j, k) dx =

Γ(j + k) j −1 x (1 − x)k −1 dx , Γ(j) Γ(k)

(3.70)

and the corresponding CDF Pβ (x0 ; j, k) is given by  x0 dx fβ (x; j, k) Pβ (x0 ; j, k) = 0

Γ(j + k) B(x0 ; j, k) , = Γ(j) Γ(k)

(3.71)

where B(x; j, k) is the incomplete beta function that is discussed in Section 2.14.3. Note that while the beta distribution can be used to describe the retained fraction of the norm square of a (k + j)-dimensional Gaussian random vector projected onto a k-dimensional space, the beta distribution is more general than that. As such, the parameters j and k could be non-integers as well.

3.1.18

Logarithmically normal distribution One of the standard issues in employing Gaussian distributions to represent various types of phenomenology is that many real-life distributions have long tails (that is the probability of large deviations is greater than the Gaussian distribution would suggest). One distribution with much longer tails is the logarithmically normal distribution (or more commonly log-normal distribution). For

80

Probability and statistics

the log-normal random variable X with value x, the probability density function is given by plog N or m (x; μ, σ 2 ) dx =



1

x 2πσ 2

e−

l o g x −μ 2 σ2

dx .

The cumulative distribution function is given by    log x − μ 1 2 √ 1 + erf , Plog N or m (x0 ; μ, σ ) = 2 σ 2

(3.72)

(3.73)

where erf(·) is the error function discussed in Section 2.14.6.

3.1.19

Sum of random variables For a variety of applications, the distribution of the sum of random variables is desired. The distribution of the sum of independent random variables is given by the convolution of the distributions of the individual random variables. To show this consider two independent random variables with X and Y , with x drawn from pX (x) and y drawn from pY (y) respectively. The sum of the random variables is Z, such that z = x + y. The distribution of z is drawn from pZ (z). pZ (z) = =





dx dy pX ,Y (x, y) pZ (z|x, y) dx dy pX (x) pY (y) pZ (z|x, y) ,

(3.74)

where pX ,Y (x, y) is the joint probability density for x and y, pZ (z|x, y) is the probability density of z conditioned upon the values of x and y, and x and y are assumed to be independent. Because z = x + y, the conditional probability is simple, given by pZ (z|x, y) = δ(x + y − z) ,

(3.75)

where δ(·) is the Dirac delta function. Consequently, the distribution for Z is given by  pZ (z) = dx dy pX (x) pY (y) δ(x + y − z)  (3.76) = dx pX (x) pY (z − x) , which is the convolution of the distributions for X and Y . This same result can be evaluated by using characteristic functions of the distributions. The characteristic function φX (s) of a distribution for the random variable X, discussed in Section 3.1.5, is given by  φX (s) = dx ei s x pX (x) . (3.77)

3.2 Convergence of random variables

81

Because the convolution observed in Equation (3.76) corresponds to the product in the transform domain, the characteristic function for Z is given by φZ (s) = φX (s) φY (s) ,

(3.78)

and the corresponding probability density function for Z is given by  1 du e−i s z φX (s) φY (s) . pZ (z) = 2π

3.1.20

(3.79)

Product of Gaussians Another distribution of interest results from the product of random variables. To show this, consider two independent real random variables with X and Y , with x drawn from pX (x) and y drawn from pY (y), respectively. The product of the random variables is Z, such that z = xy.

(3.80)

The distribution of z is pZ (z). The probability distribution is given by  pZ (z) = dx dy pX (x) pY (y) δ(x y − z) .

(3.81)

The distributions of the product of the of real zero-mean Gaussian variables is given by pZ (z) =



2

− 2xσ 2

e dx dy 

x

2

− 2yσ 2

e 

y

2π σx2 2π σy2   z 1 , = K0 π σx σy σx σy

δ(x y − z) (3.82)

where K0 (·) is the modified Bessel function of the second kind of order zero discussed in Section 2.14.5.

3.2

Convergence of random variables What does it mean for a random variable to converge? While the convergence of a sequence of deterministic variables to some limit is straightforward, convergence of random variables to limits is more complicated due to their probabilistic nature. In the following, we define several different modes of convergence of random variables, starting with modes of convergence that are typically viewed as stronger modes of convergence followed by weaker ones. The proofs of these properties can be found in standard probability texts such as References [131] and [171].

82

Probability and statistics

3.2.1

Convergence modes of random variables Consider a sequence of random variables Xn , n = 1, 2, . . . and another random variable X. Almost-sure convergence We say that Xn converges with probability 1, or almost surely to a random variable X, if 1 0 (3.83) Pr lim Xn = X = 1 . n →∞

Almost-sure convergence is typically denoted by either a.s.

Xn −−→ X

or

w .p.1

Xn −−−→ X .

(3.84) (3.85)

Almost-sure convergence simply means that the event that Xn fails to converge to X has zero probability. Convergence in quadratic mean We say that Xn converges in quadratic mean, the mean-square sense, or in the L2 sense to X if % & lim Xn − X 2 = 0 . (3.86) n →∞

Convergence in quadratic mean is denoted by either L

2 Xn −→ X

or

q.m .

Xn −−−→ X.

(3.87) (3.88)

This mode of convergence simply means that the mean of the squared deviation of Xn from its limiting value X goes to zero. Note that almost-sure convergence does not in general imply convergence in mean square or vice versa. However, suppose that the sum of the mean-square deviations of Xn and its limiting value X is finite, i.e., ∞  % & Xn − X 2 < ∞ ,

(3.89)

n=1

then it can be shown [131], that

a.s.

Xn −−→ X.

(3.90) (3.91)

The idea of convergence in quadratic-mean can be generalized to convergence in kth mean for k > 0. We say that Xn converges in kth mean to X if & % (3.92) lim Xn − X k = 0 . n →∞

3.2 Convergence of random variables

Additionally, if ℓ ≤ k, then the above expression implies % & lim Xn − X ℓ = 0 . n →∞

83

(3.93)

In other words, convergence in the kth mean of a random variable implies convergence in all lower-order means as well. Convergence in probability We say that Xn converges in probability to X if for every ǫ > 0, lim Pr {|Xn − X| ≥ ǫ} = 0 .

n →∞

(3.94)

Convergence in probability is typically denoted by P

Xn − → X.

(3.95)

Convergence in probability simply means that the probability that the random variable Xn deviates from X by any positive amount goes to zero as n → ∞. Convergence in distribution We say that Xn converges in distribution to a random variable X if the cumulative density functions of Xn converge to the cumulative density function of X as n → ∞. In other words, lim PX n (x) = PX (x)

n →∞

(3.96)

at all points x where PX is continuous. That is to say, the cumulative distribution function of Xn converges to the cumulative distribution function of X. Note that this form of convergence is not really a convergence of the random variables themselves but rather of their probability distributions. Convergence in distribution is denoted with the following: Xn − → X.

D

(3.97)

Xn − → X.

d

(3.98)

Convergence in probability implies convergence in distribution.

3.2.2

Relationship between modes of convergence The different modes of convergence described above are closely related to each other. In general, almost-sure convergence is the strongest, frequently encountered form of convergence, implying most other modes of convergence (with the notable exception of convergence in the kth mean). Convergence in distribution is generally considered the weakest form of convergence and is implied by all the other modes of convergence. The following subsections list some of the relationships between the modes of convergence, starting with relationships that hold for all random variables followed by some special cases.

84

Probability and statistics

General relationships Convergence of a random variable Xn to a random variable X with probability 1 or almost surely implies that the random variable Xn converges in probability to X as well, since convergence in probability is a weaker notion of convergence than convergence with probability 1. Similarly, the convergence of Xn to X in quadratic mean implies convergence in probability of Xn to X. Since convergence in distribution is weaker than convergence in probability, convergence of random variables Xn to X in probability implies convergence in distribution of Xn to X. In other words, the cumulative distribution function of Xn converges to that of X if Xn converges to X in probability. Mathematically, these relationships can be written as follows a.s.

P

(3.99)

q.m .

P

(3.100)

P

D

(3.101)

Xn −−→ X =⇒ Xn − → X Xn −−−→ X =⇒ Xn − → X Xn − → X, → X =⇒ Xn − where =⇒ indicates a mathematical implication.

Some restricted relationships The previous section describes the relationships between the different modes of convergence that hold in general, for all random variables. For some special cases, additional properties may hold. One special case that proves useful in analyzing communication systems is the convergence of a random variable that is a continuous function of another random variable. In the context of communication systems, a useful example is the convergence of the spectral efficiency log2 (1 + SINR) in the case where the signal-to-interference-plus-noise ratio (SINR) converges to some value. The basic result here is that convergence under each mode is preserved under continuous functions. Mathematically speaking, if f (X) is a continuous function of X, then Xn −−→ X =⇒ f (Xn ) −−→ f (X)

a.s.

(3.102)

P

P

→ X =⇒ f (Xn ) − → f (X) Xn −

(3.103)

D

D

Xn − → X =⇒ f (Xn ) − → f (X) .

(3.104)

a.s.

Note that if f (X) is bounded in addition to being continuous, i.e., f (X) < A, for some finite constant A, in addition to the property above we also have D

D

f (Xn ) − → f (X) =⇒ Xn − → X.

(3.105)

While convergence in distribution does not imply convergence in probability in general, when a sequence of random variables converges in distribution to a constant, this property does indeed hold. Observe that if a sequence of random variables converges in distribution to a constant, the limiting cumulative distribution function will be a step function with the step at the limiting value. In

3.2 Convergence of random variables

85

this case, it can be shown that convergence in probability holds as well. More formally, suppose that A is a constant, then D

P

→ A =⇒ Xn − → A. Xn −

(3.106)

Sums of random variables maintain their convergence properties for almost-sure convergence, convergence in probability, and convergence in quadratic mean. That is to say, if two sequences of random variables each converge in some fashion to a limit, the sum of the random variables also converges in the same manner to the sum of the limits. More formally, consider an additional sequence of random variables Yn for n = 1, 2, . . . . The following properties then hold: a.s.

a.s.

a.s.

(3.107)

P

(3.108)

Xn −−→ X and Yn −−→ Y, then Xn + Yn −−→ X + Y P

P

Xn − → X +Y → Y, then Xn + Yn − → X and Yn − q.m .

q.m .

q.m .

Xn −−−→ X and Yn −−−→ Y, then Xn + Yn −−−→ X + Y.

(3.109)

Note that the above property does not hold in general for convergence in distribution. However, we have the following property, known as Slutsky’s theorem, which applies when one of the sequences of variables converges to a constant A, D

D

D

→ X and Yn − → A, then Xn + Yn − → X + A. Xn −

(3.110)

Note that even almost-sure convergence does not imply convergence of means in general. In other words, even if a sequence of random variables Xn converges with probability 1 to a random variable X, it is not necessarily the case that the expected values of the Xn , i.e., Xn , converge to X. The reason for this apparent paradox is that convergence of the mean of a random variable depends on the rate at which the probabilities associated with that random variable converge, whereas convergence in probability or almost-sure convergence do not depend on the rate of convergence of the probabilities associated with the random variables. A simple example that is often given in textbooks on probability is the following. Let the random variable Xn take on the following values, n with probability n1 . (3.111) Xn = 0 with probability 1 − n1 The mean of Xn is always 1, but as n → ∞, the probability that Xn = 0 approaches zero. In other words, Xn converges with probabilty 1 to zero whereas Xn  = 1 for all n. What is needed for almost-sure convergence or convergence in probability to imply convergence of the mean is a property called uniform integrability, which is defined as & % (3.112) lim sup Xn 1{X n > ν } = 0 . ν →∞ n

where 1{A } = 1 if A is true and 0 otherwise.

86

Probability and statistics

The contents of the expectation operator in (3.112) are nonzero only for suitably large values of Xn , i.e. |Xn | > ν. In other words, the expectation is averaging only “large” values of Xn . The supremum over n outside the expectation looks for the value of n for which the average value of |Xn | when small values of |Xn | are forced to zero is largest. Finally, ν is taken to infinity which means that the values of Xn that are not zeroed in the averaging operation get successively larger. This property ensures that the mean value of Xn converges at the correct rate such that convergence in probability implies convergence of the means. We then have the following property, which states that the absolute value of the deviation of Xn from X converges to zero, if and only if Xn converges in probability to a random variable X, and Xn is uniformly integrable. Mathematically, this can be expressed as P

Xn − → X and Xn is uniformly integrable ⇐⇒  Xn − X  → 0.

(3.113)

While uniform integrability is hard to prove in many cases, a stronger requirement can be used instead of uniform integrability. Suppose that Xn ≤ W ∀ n

(3.114)

Xn − → X,

P

(3.116)

 Xn − X  → 0 .

(3.117)

W < ∞

(3.115)

and

then

This property is known as the dominated convergence theorem as applied to random variables. The variable Xn is dominated by another random variable W that has finite mean. The finite mean of W ensures that the mean of Xn converges at such a rate that convergence in probability will imply convergence of the means. The proof and detailed analyses of uniform integrability and the dominated convergence theorem are beyond the scope of this text and can be found in advanced probability texts such as [24].

3.3

Random processes While random variables are mappings from an underlying space of events to real numbers, random process can be viewed as a mapping from an underlying space of events onto functions. Random processes are essentially random functions and are useful for describing transmitted signals when the underlying signal sources are nondeterministic. Figure 3.1 illustrates a random process X(t) in which elements in an underlying space of events (which need not be discrete) map onto functions. The set of all

3.3 Random processes

Space of events

.. .

87

x1(t)

.

. . ..

x1(t), x2(t) – possible realizations of the process X(t)

x2(t)

Figure 3.1 Random processes as a mapping from an underlying event space to

functions.

possible functions that X(t) can take is called the ensemble of realizations of the process X(t). Note that X(t) for any particular t is simply a random variable. A complete statistical characterization of a random process requires the description of the joint probability densities (or distributions), of the random variables X(t) for all possible values of t. Note that since t is in general uncountable and hence cannot be enumerated, the joint distribution in general needs to be specified for a continuum of values of t. In general, the joint density for all possible t is very difficult to obtain for realworld signals. If we restrict ourselves to ergodic processes, that loosely speaking, are random processes for which single realizations of the process contain the statistical properties of the entire ensemble, it is possible to estimate certain statistical properties of the ensemble from a single realization of the process. Of particular interest are the second-order statistics of ergodic random processes, which are the mean function μ(t) = X(t)

(3.118)

RX (τ1 , τ2 ) = X(τ1 ) X ∗ (τ2 ) .

(3.119)

and the autocorrelation function

It is also possible to define the cross correlation between two random processes X(t) and Y (t) as follows: RX Y (τ1 , τ2 ) = X(τ1 ) Y ∗ (τ2 ) .

(3.120)

Note that the expectations above are taken with respect to the ensemble of possible realizations of the processes X(t) and Y (t), jointly.

88

Probability and statistics

3.3.1

Wide-sense stationary random processes Random processes for which the mean function is a constant and the autocorrelation function is dependent only on the difference in the time indices are called wide-sense stationary (WSS) random processes. A random process X(t) is wide-sense stationary if the following two conditions hold: X(t) = μ

(3.121) ∗

RX (τ1 , τ2 ) = X((τ1 − τ2 ) + t) X (t)

∀ t.

(3.122)

The autocorrelation function for wide-sense stationary processes is written with a single index corresponding to the time lag as RX (τ ) = X(t + τ ) X ∗ (t) .

(3.123)

Two processes are jointly wide-sense stationary if they are each wide-sense stationary and their cross correlation is just a function of the time lag. The cross correlation of wide-sense stationary processes is usually written with a single index as follows: RX Y (τ ) = X(t + τ ) Y ∗ (t) .

(3.124)

The power spectral density (PSD) of a wide-sense stationary random process is a measure of the average density of power of the process in the frequency domain. It can be defined as follows: ( '  T  1  2  x(t) e−2 π i f t dt . SX (f ) = lim  T →∞  T  −T 2

The Einstein–Weiner–Khinchin theorem further states that if the integral exists, the PSD is given by  ∞ SX (f ) = dτ RX (τ ) e−i 2 π f τ . (3.125) −∞

Observe here that the power-spectral density is the Fourier transform of the autocorrelation function. Wide-sense stationary processes are good approximations for many nondeterministic signals encountered in the real world, including white noise. Additionally, the effect of linear-time-invariant (LTI) systems on wide-sense stationary random processes can be characterized readily.

3.3.2

Action of linear-time-invariant systems on wide-sense stationary random processes Consider a linear-time-invariant system as in Figure 3.2, where h(t) is the impulse response of the system. If x(t) is a deterministic signal, y(t) = x ∗ h(t), where ∗ denotes the convolution operation. If x(t) is a realization of a random process, this relationship holds, but

3.3 Random processes

89

y(t)

x(t)

h(t)

Figure 3.2 Linear-time-invariant system.

in many scenarios observing the output signal y(t) in response to one realization of x(t) may not be very useful to characterize the behavior of the LTI system. Much more meaningful results can be obtained by characterizing the secondorder statistics of X(t) and Y (t). Suppose that h(t) is absolutely integrable, i.e.,  ∞ dt h(t) < ∞ . −∞

Then it can be shown that X(t) and Y (t) are jointly wide-sense stationary, provided that X(t) is wide-sense stationary. The cross-correlation function can be found as follows:  * ) ∞ dα h(α) X(t + τ − α) X ∗ (t) RY X (τ ) = Y (t + τ ) X ∗ (t) = −∞  ∞ dα h(α) X(t + τ − α) X ∗ (t) = −∞  ∞ dα h(α) RX (τ − α) = −∞

= h ∗ RX (τ ) .

(3.126)

Note that the expectation can be taken into the integral because h(t) is absolutely integrable. Using a similar set of steps, it can be shown that ← − (3.127) RX Y (τ ) = h ∗ RX (τ ) ,

← − where h (t) = h(−t) is a time-reversed version of h(t). Similarly, it can be shown that ← − (3.128) RY (τ ) = h ∗ h ∗ RX (τ ) .

3.3.3

White-noise processes A random process N (t) is called a white-noise process if it is wide-sense stationary and its autocorrelation function is given by RN (τ ) = N0 δ(τ ) .

(3.129)

Note that N0 here is the value of the power spectral density of the white-noise process since the power-spectral density is simply the Fourier transform of the

90

Probability and statistics

autocorrelation. Hence, white-noise processes have a flat PSD since the Fourier transform of an impulse is a constant. This fact implies that white-noise processes have infinite bandwidth and so infinite power. In practice, however, all systems have limited bandwidth and the observed noise is not white. Additionally, zero-mean, white-noise processes are uncorrelated at different time samples since N (t1 )N (t2 ) = RN (t1 − t2 + t) = 0 = X(t1 ) X(t2 ) if t1 = t2 .

(3.130)

Note that the variance of a sample of a zero-mean, white-noise process is infinite since & % var(X(t)) = X(t)2 − (X(t))  ∞ = RX (0) = df S(f ) = ∞ .

(3.131)

−∞

The last step follows from the fact that S(f ) is a constant for all f . Note that by taking the Fourier transform of Equation (3.128), we find SY (f ) = H(f ) 2 SX (f ) ,

(3.132)

where H(f ) is the Fourier transform of h(t). Hence, if a white-noise process N (t) is filtered through a band-pass filter, the resulting output is no longer white and may have finite variance. Perhaps the most commonly used wide-sense stationary random process in the analysis of wireless communication systems is the white-Gaussian-noise (WGN) process. The white-Gaussian-noise process is a white-noise process with zero mean and amplitude distributed as a Gaussian random variable. Since whitenoise processes are uncorrelated at different time instances and uncorrelated Gaussian random variables are also independent, samples of a white-Gaussiannoise process at different time instances are independent random variables. As an example, consider a zero-mean white-Gaussian-noise process N (t) with power-spectral-density N0 that is filtered through an ideal low-pass filter with cut-off frequency of ±W and unit height in the pass band. The variance of the output of the low-pass filter can then be found as follows. Let the output of the filter be Y (t). Then the variance of Y (t) is  ∞ & % 2 df SY (f ) Y (t) = RY (0) = −∞ = (3.133) df Sf (f ) = 2 W N0 . pass band Since linear combinations of Gaussians are still Gaussian, Y (t) is a Gaussian distributed random variable with zero mean and variance 2 W N0 .

3.4 Poisson processes

3.4

91

Poisson processes A Poisson process is a simple stochastic process that is commonly used to model discrete events that occur at random times, such as the start times of telephone calls, radioactive decay of particles, and in simplified models of buses arriving at a bus stop. The Poisson process is defined in terms of a function N (t), which counts the number of occurrences of these events (or arrivals as they are commonly referred to) that occur from some reference time (typically t = 0) to the time t. In other words, N (t) counts the number of arrivals from time 0 up to time t. The defining characteristic of a Poisson process is that the numbers of arrivals in disjoint intervals are independent random variables. That is to say, the random variables N (b)−N (a) and N (d)−N (c) are independent if the intervals [a, b), i.e., t between a and b, and [c, d), i.e., t between c and d, are disjoint. The Poisson process is characterized by its intensity function λ(t). The intensity function defines the mean number of arrivals in an interval [a, b) as follows:  b dt λ(t) . (3.134) N (b) − N (a) = a

The homogeneous Poisson process is a Poisson process for which λ(t) = λ, i.e., the mean number of arrivals in any interval is simply proportional to the length of the interval. The following are the main characteristics of a homogeneous Poisson process with intensity λ. (1) The numbers of arrivals in disjoint intervals are independent random variables. (2) The number of arrivals in a duration τ is a Poisson random variable with parameter λτ , i.e., the probability mass function is Pr{k arrivals in any interval of length τ } =

1 (λτ )k e−λτ . k!

(3.135)

(3) The time between two consecutive arrivals is an exponential random variable with mean λ1 . The Poisson process can also be defined in Rd , where it is referred to as a Poisson point process (PPP). The defining characteristic of the Poisson point process is that the number of points in any disjoint subset of Rd are independent random variables. The number of points in any subset B ∈ Rd is a Poisson random variable with mean  dx λ(x) . (3.136) B

Thus, for a homogenous Poisson point process, the number of points in B is a Poisson random variable with mean λVol{B} where Vol{B} is the volume of B. For d = 2, the Poisson point process is a useful model to describe planar wireless networks with a completely random distribution of users. For the homogeneous Poisson point process, the probability distributions of the distance

92

Probability and statistics

to the nearest, second-nearest user, and so forth, are useful in the analysis of wireless networks. The probability density function of the distance rk between an arbitrary point to the kth nearest point of a two-dimensional Poisson point process can be found as follows [210]. Suppose that the point is the origin, then Pr{rk ≤ r} = Pr{greater than k − 1 points of the PPP in a circle of radius r} −λπ r 2

=1−e d fr k (r) = dr

k −1 

m =0



−λπ r 2

1−e

= 2e−λπ r

2

k −1 

m =0

3.5

(λπr2 )m m! k −1 

m =0

(3.137)

(λπr2 )m m!



(3.138)

 1  m(λπr2 )m − (λπr2 )m + 1 . rm!

(3.139)

Eigenvalue distributions of finite Wishart matrices Consider a matrix G ∈ Cm ×n with m ≤ n, and entries drawn independently and randomly from a complex circular Gaussian distribution of unit variance and define 1 M = GG† . (3.140) n The matrix M is known as a Wishart matrix. Let λ1 , λ2 , . . . , λm denote the ordered eigenvalues of M where λ1 ≤ λ2 ≤ · · · ≤ λm ≤ ∞. Then, the joint probability density function of λ1 , λ2 , . . . , λm is [159] fλ 1 ,λ 2 ,...,λ m (λ1 , λ2 , . . . , λm ) = K

m 

i= 1

e−λ i λin −m

m 

i< j

(λi − λj )2 ,

(3.141)

where K is a constant that ensures the joint distribution integrates to unity. Note that the marginal probability density of the kth largest eigenvalue, λk , is known and can be found in references such as Reference [359].

3.6

Asymptotic eigenvalue distributions of Wishart matrices For a matrix G ∈ Cm ×n with entries drawn independently and randomly from a complex circular Gaussian distribution, the distribution of eigenvalues of the Hermitian matrix M defined in Equation (3.140) converges to an asymptotic distribution as m → ∞ and n → ∞ under the constraint that the ratio of m to n is fixed, m . (3.142) r= n

3.6 Asymptotic eigenvalue distributions of Wishart matrices

93

While the eigenvalues of M are random for finite values of m and n, the distribution of eigenvalues converges to a fixed distribution (the Marcenko–Pastur distribution) as m and n approach ∞ [347, 206, 168, 286, 323, 259, 315, 33]. Because M grows to be infinite in size, there are correspondingly an infinite number of eigenvalues for M. The technical tools to develop the resulting eigenvalue distribution are discussed in the following section (Section 3.6.1). The distribution for the eigenvalues is given by the sum of a continuous probability distribution fr (λ) and a discrete point at zero weighted by cr : fr (λ) + cr δ(λ) ,

(3.143)

where the constant associated with the “delta function” at 0 is given by   1 . (3.144) cr = max 0, 1 − r The first term of the probability measure, fr (λ), is given by ⎧ √ ⎪ ⎪ ⎨ (λ−a r ) (b r −λ) ; ar ≤ λ ≤ br 2π λ r fr (λ) = , ⎪ ⎪ ⎩ 0 ; otherwise

(3.145)

where

√ ar = ( r − 1)2 ,

√ br = ( r + 1)2 .

(3.146)

The largest eigenvalue of M is known to converge to br , which for n = m equals 4. The infinite-dimensional form can be employed with reasonable fidelity at surprisingly small values of m and n. Depending upon the details of the problem being addressed, values as small as 4 can be approximated by the infinite dimensional distribution. A useful approximation for the kth largest eigenvalue when m and n are moderately large is obtained by finding the value of λ for which the limiting distribution function of the eigenvalues Fr (.) takes the value of (m − k + 1)/m. This approximation can be expressed as follows, λk ≈ Fr−1 ((m − k + 1)/m),

(3.147)

where Fr−1 (·) indicates the function inverse, and 1  1 1 ar br + (br − x)(x − ar ) Fr (x) = (ar + br ) − 8 4   2π ar + br − 2x 1 (ar + br ) arcsin + 4π ar − br   1  2 ar br − br x − ar x  , ar br arctan + 2π 2 ar br (br − x) (x − ar )

(3.148)

94

Probability and statistics

which for the special case of m = n reduces to ⎧ √ ⎨ π + 4x−x 2 + 2 arcsin (−1+ 21 x ) 2π F1 (x) = ⎩1

3.6.1

if 0 ≤ x < 4 x ≥ 4.

(3.149)

Marcenko–Pastur theorem One of the seminal results in the analyses of asymptotic eigenvalue distributions [315] of random matrices is the Marcenko–Pastur theorem, which can be used to find the probability measure in Equation (3.145). In other words, the Marcenko– Pastur theorem tells us the distribution of the eigenvalues of random matrices that take the form of covariance matrices, as the dimensions of these matrices grow large. The theorem was first derived in Reference [206] and strengthened in Reference [12], and has proven useful in analyzing the limiting signal-to-interference-plusnoise ratio (SINR) of wireless links with multiple antennas and/or code-divisionmultiple-access (CDMA) systems with random codes as given in references such as [313] and [323]. The theorem can be condensed into a form suitable for wireless communications applications as follows. Consider a matrix G ∈ Cm ×n where the entries of G are independent, identically distributed, zero-mean complex random variables with unit variance and an n × n diagonal matrix T = diag(τ1 , τ2 , . . . τn ) where τi ∈ R. Assume that n, N, → ∞ such that n/N → c > 0, a constant. In this asymptotic regime, assume that the empirical distribution function (e.d.f.) of τ1 , τ2 , . . . τn converges with probability 1 to a limiting probability distribution function H(τ ) and that G and T are independent. Note that the empirical distribution function of a set of variables is defined as the proportion of those variables that are less than, or equal to, the argument of the function. That is, consider a set A of N real numbers. The empirical distribution function of the members of the set A is e.d.f. {A} (τ ) =

Number of elements in A less than or equal to τ . N

(3.150)

In the limit as n, N → ∞, the empirical distribution function of the eigenvalues of B = GTG† converges with probability 1 at all points where the empirical distribution function is continuous, to a nonrandom probability density function f (τ ) whose Stieltjes transform m(z) defined for complex z satisfies the following equation:

z m(z) + 1 = m(z) c



0



dH(τ )

τ . 1 + τ m(z)

(3.151)

Note that Equation (3.145) can be found by setting dH(τ ) equal to the Dirac measure at 1 and solving Equation (3.151). Also, the Stieltjes transform of dφ(t)

3.7 Estimation and detection in additive Gaussian noise

95

is denoted by mφ (z) and is defined as mφ (z) =





dφ(t)

−∞

1 t−z

for Im(z) < 0.

3.7

Estimation and detection in additive Gaussian noise

3.7.1

Estimation in additive Gaussian noise

(3.152)

A problem frequently encountered in communication systems (for an example see Chapter 8) and many other fields is the estimation of a vector s ∈ Cp×1 from a noisy observation z ∈ Cn ×1 , where s and z are related by the following equation, z = Hs + n.

(3.153)

The mixing matrix is denoted H ∈ Cn ×p , or in the case of MIMO communications (discussed in Chapter 8) the channel matrix, and n ∈ Cn ×1 is a vector of noise samples. Given the noisy observation z, one may wish to obtain the maximum-likelihood (ML) estimate of s, which we denote here by ˆs. The maximum-likelihood estimator is defined as ˆs = arg max p(z|s) .

(3.154)

s

While more practical receiver algorithms are considered in Chapter 9, as an introduction, the maximum-likelihood signal estimator with known channel and interference parameters is considered here. By assuming that the noise is sampled from a complex Gaussian distribution n ∼ CN (0, R), and that the interferenceplus-noise covariance matrix R and channel matrix H are known, the maximumlikelihood estimate for the transmitted signal s is −1 † −1  ˆs = H† R−1 H H R z.

(3.155)

The previous equation can be derived by starting with the PDF of the conditional probability in Equation (3.154), p(z|s) =

  1 † −1 exp − (z − H s) R (z − H s) . π n |R|

(3.156)

Because the exponential is a monotonically increasing function of its argument, maximizing Equation (3.156) is equivalent to minimizing †

(z − H s) R−1 (z − H s) .

(3.157)

96

Probability and statistics

By exploiting the Wirtinger calculus discussed in Section 2.8.2, setting the derivative to zero yields, d † (z − H s) R−1 (z − H s) = 0 d s† H† R−1 H s − H† R−1 z = 0  −1 † −1 s = H† R−1 H H R z. (3.158)  † −1 −1 is positive-definite, even though it We have assumed here that H R H is only guaranteed to be non-negative-definite by virtue of the fact that R is a covariance matrix.

3.7.2

Detection in additive Gaussian noise Vector detection Consider a system described by Equation (3.154) but the vectors s can only take on one of the values s1 , s2 , . . . , sK . Given the noisy observation z, we wish to detect which of the possible s vectors was actually present such that the probability of making an error Pe is minimized, where Pe = Pr{ˆs = s} .

(3.159)

For a given observation z, the probability of error is minimized if the estimated value ˆs is such that the conditional probability of s given z is maximized. That is to say, the minimum probability of error estimator of s is ˆs =

arg max s ℓ ∈{s 1 ,s 2 ,...,s k }

p (s = sℓ |z) .

(3.160)

To illustrate this problem, it is instructive to consider the case when the random vector of interest can take one of two possible, values i.e., K = 2. We can write the conditional probability above using Bayes rule discussed in Section 3.1.7 as p (s|z) =

p (z|s) p (s) . p (z)

(3.161)

Thus, ˆs = s1 if p (z|s = s2 ) p (s = s2 ) p (z|s = s1 ) p (s = s1 ) ≥ p (z) p (z) p (s = s2 ) p (z|s = s1 ) ≥ . p (z|s = s2 ) p (s = s1 )

(3.162)

The quantity on the left-hand side is known as the likelihood ratio. Assuming that s1 and s2 are equally likely and that n ∼ CN (0, R), we can write Equation (3.162) as   1 † −1 exp − (z − H s ) R (z − H s ) 1 1 π n |R|   1 † exp − (z − H s2 ) R−1 (z − H s2 ) . (3.163) ≥ n π |R|

3.7 Estimation and detection in additive Gaussian noise

97

Simplifying by using the fact that the exponential is a monotonically increasing function yields: †



(z − H s1 ) R−1 (z − H s1 ) ≤ (z − H s2 ) R−1 (z − H s2 ) .

(3.164)

If the noise samples in the vector n are uncorrelated and have equal variance, R = σ 2 I and the expression above becomes 2

2

||z − H s1 || ≤ ||z − H s2 || .

(3.165)

Since H sℓ equals z if s = sℓ and no noise is present, the equation above can be thought of as nearest-neighbor detection. If s = s1 , an error occurs with the following probability: 1 0 2 2 Pr{ˆs = s2 |s = s1 } = Pr ||z − H s1 || ≥ ||z − H s2 || | s = s1 1 0 2 2 = Pr ||n|| ≥ ||n + H (s1 − s2 )|| 1 0 2 = Pr (s1 − s2 )† H† n + n† H(s1 − s2 ) ≥ ||H (s1 − s2 )|| 1 0 4 3 2 = Pr 2 ℜ (s1 − s2 )† H† n ≥ ||H (s1 − s2 )|| 1 0 2 (3.166) = Pr v ≥ ||H (s1 − s2 )|| ,

  2 where v ∼ N 0, 2σ 2 ||H (s1 − s2 )|| . Hence Equation (3.166) evaluates to Pr{ˆs = s2 |s = s1 } = Q



||H (s1 − s2 )|| √ 2 σ2



,

(3.167)

which by symmetry (because s1 and s2 are equally likely) equals the probability of error. Extending this analysis to systems with a larger number of possible values of the vector s, i.e., K ≥ 2 (for a general R), yields the following expression for the minimum probability of error estimator for s when s is uniformly distributed among s1 , s2 , . . . sK , ˆs =



arg min s ℓ ∈{s 1 ,s 2 ,...,s k }

(z − H sℓ ) R−1 (z − H sℓ ) .

(3.168)

If the noise samples are uncorrelated, ˆs =

arg min s ℓ ∈{s 1 ,s 2 ,...,s k }

2

||z − H sℓ || .

(3.169)

The probability of error can be bounded from above by finding the worst-case difference between H sℓ and H sm . Let dmin =

min

ℓ ∈1 , 2 , . . . , K m ∈1 , 2 , . . . , K , ℓ = m

||H sℓ − H sm ||.

(3.170)

98

Probability and statistics

Then, the probability of error when s is one of K equally likely vectors, is bounded from above by   |d | min √ . (3.171) Pe ≤ Q 2 σ2 Matrix detection in white Gaussian noise We can extend the vector detection problem of the previous section to a matrix detection problem where the observations are matrices Y ∈ Cn ×m , the mixing matrix H ∈ Cn ×p , the signal matrix S ∈ Cp×m , and the noise matrix W ∈ Cn ×m . This class of problem is often encountered in multiple-antenna systems where the rows of the observation matrix represent the samples received at a given antenna of a receiver over multiple time samples. We can thus write a system equation analogous to Equation (3.153) as Z = HS + N,

(3.172)

where S can take values of S1 , S2 , . . . SK with equal probability. The matrix detection problem can be rewritten by vectorizing the matrices Z, S, and W ¯ ∈ Cp m ×1 , and w ¯ ∈ Cn m ×1 are obtained ¯ ∈ Cn m ×1 , S whereby the vectors z by stacking up the columns of Z, S, and N respectively. Additionally, define ¯ ∈ Cn m ×p m as a block diagonal matrix whose diagonal blocks comprise the H matrix H, that is to say ¯ = H ⊗ Im . H

(3.173)

Equation (3.172) can thus be written as ¯ ¯s + w ¯=H ¯. z

(3.174)

The probability of error can be bounded by writing ¯ d min = =

min

¯ ¯sℓ − H ¯ ¯sm || ||H

(3.175)

min

¯ Sℓ − H ¯ Sm ||F . ||H

(3.176)

ℓ ∈1 , 2 , . . . , K m ∈1 , 2 , . . . , K , ℓ = m ℓ ∈1 , 2 , . . . , K m ∈1 , 2 , . . . , K , ℓ = m

Recall that ||A||F is the Frobenius norm of A, which is the square root of the sum of the squares of all entries of the matrix A. The probability of error is thus bounded from above by     ¯ dmin  Pe ≤ Q √ . (3.177) 2 σ2

3.7.3

Receiver operating characteristics We can consider an example problem in which there are two hypotheses: signal of interest present H1 or signal of interest absent H0 . A common technique

3.8 Cramer–Rao parameter estimation bound

99

for displaying the performance of a particular detection test statistic φ(Z) as a function of some observed data is the receiver operating characteristic (ROC) curve. Here, the test statistic is implicitly a function of any parameters known about the transmitted signals of interest or environment. Detection is declared if the test statistic achieves or exceeds some threshold η, φ(Z) ≥ η .

(3.178)

Given an ensemble of observations defined by the density p(Z|H1 ) in which signal is present H1 , the probability of detection Pd (η) is defined by Pd (η) = Pr{φ(Z) ≥ η}  = dΩZ p(Z|H1 ) θ{φ(Z) − η} , where the function θ{x} for some real variable x is defined here to be 1 ; x≥0 θ{x} = . 0 ; x 2. (a) Show that the mean signal power at the center of the disk I is infinite. (b) Show that the signal power at the center of the disk I is finite with probability 1. 3.11 Use Equation (3.5) to derive Equation (3.59) from Equation (3.53).

4

Wireless communications fundamentals

4.1

Communication stack For convenience in design, the operations of radios are often broken into a number of functional layers. The standard version of this stack is referred to as the open systems interconnection (OSI) model [291], as seen in Figure 4.1. The model has two groups of layers: host and media. The host layers are the application, presentation, session, and transport layers. The media layers are the network, data-link, and physical layers. In many radio systems, some of these layers are trivial or the division between the layers may be blurred. The OSI stack is commonly interpreted in terms of wired networks such as the internet. Depending upon the details of an implementation, various tasks may occupy different layers. Nonetheless, the OSI layered architecture is useful as a common reference for discussing radios. In this text, the media layers are of principal importance. The network layer indicates how data are routed from an information source to a sink node, as seen in Figure 4.2. In the case of a network with two nodes, this routing is trivial. In the case of an ad hoc wireless network, the routing may be both complicated and time varying. The network layer may break a data sequence at the source node into smaller blocks and then reassemble the data sequence at the sink node. It also may provide notification of errors to the transport layer. The data-link layer controls the flow of data between adjacent nodes in a network. This layer may provide acknowledgments of received data, and may or may not contain error checking or correction. Sometimes this layer is broken into the logical-link-control and media-access-control (MAC) sublayers. The MAC is used to control the network’s reaction to interference. The interference might be internal, that is, caused by the network’s own links, or external, that is, cause by a source not under the network’s control. The logical-link-control sublayer is used by the protocol to control data flow. The logical-link-control sublayer interprets frame headers for the data-link layer. The MAC specifies a local hardware address and control of a channel. The physical layer defines the mapping of information bits to the radiated signal. The physical layer includes error-correction coding, modulation, and spectral occupancy. It also includes all the signal processing, at both the transmitter and receiver.

4.2 Reference digital radio link

119

Figure 4.1 OSI stack.

Simple Network

Sink

Source

Figure 4.2 Network of nodes with a connection between a source node and a sink node.

4.2

Reference digital radio link The basic physical layer of a digital radio link has nine components: data source, encoding, modulation, upconversion, propagation, downconversion, demodulation, decoding, and data sink. While not all digital radios conform to this structure, this structure is flexible enough to capture the essential characteristics for discussion in this text. Here we have distinguished between up/downconversion and modulation. This distinction is a convenient convention for digital radios. In practice there is a large variety of data sources. The classic modern example is the cellular or mobile phone [260]. The modern mobile phone is used for internet access, data, video, and occasionally voice. For this discussion, we will focus on voice communications. In the uplink, voice data are sent from the phone to the base station. The analog voice signal is digitized and compressed by using a vocoder. There are a variety of approaches to vocoders that in general provide significant source compression. The raw digitized signal might require a data rate of as much as 200 kbits/s or more. The signals compressed by vocoders typically require around 10 kbits/s. These data, along with a number of control parameters, are the data source.

120

Wireless communications fundamentals

Figure 4.3 Examples of modulations: BPSK, QPSK, 8-PSK, and 16-QAM.

The encoding of the data typically includes some approach to compensate for noisy data, denoted forward-error-correction (FEC) encoding. Error-correction codes introduce extra parity data to compensate for noise in the channel. With a strong code, multiple errors caused by noise in the channel can be corrected. The theoretical limit for the amount of data (that is, information not parity) that can be transmitted over a link in a noisy channel with essentially no errors is given by the Shannon limit and is discussed in Section 5.3. Modern codes allow communication links that can closely approach this theoretical limit. Coding performance and computation complexity can vary significantly. Following the data encoding is the modulation. Depending upon the details of the forward-error-correction coding scheme, the error-correction algorithms used may or may not be strongly coupled with the modulation. As an example, trellis coding strongly couples modulation and coding [316, 317]. The modulation translates the digital data to a baseband signal for transmission. The baseband signal is centered at zero frequency. Associated with each transmit antenna are an in-phase and a quadrature signal. It is often convenient to view these signals as being complex, with the real component corresponding to the in-phase signal and the imaginary component corresponding to the quadrature signal. The variety of modulation schemes vary in complexity. Some examples shown in Figure 4.3 are binary phase-shift keying (BPSK), which uses symbols BPSK: {−1, +1} ,

(4.1)

4.2 Reference digital radio link

quadrature phase-shift keying (QPSK) which uses symbols  −1 − i −1 + i +1, +i +1 − i √ , √ , √ , √ QPSK: , 2 2 2 2

121

(4.2)

M-ary phase-shift keying (PSK), which uses symbols PSK: {e2π i n /M } ; n ∈ {0, . . . , M − 1} ,

(4.3)

and M-ary quadrature amplitude modulation (QAM), which uses symbols QAM: {±p ± i q} ,

(4.4)

such that the regularly spaced values of p and q form a square lattice with M points. The set of symbols observed in the complex plane is commonly referred to as a constellation. The overall scale of these modulations is somewhat arbitrary. The important scale is relative to the interference-plus-noise amplitude at the receiver. For real systems, channels, filters, and other effects distort these idealized modulations. Orthogonal-frequency-division multiplexing (OFDM), which is discussed in greater detail in Section 10.5.3, is a common modulation approach that builds a composite symbol from a sequence of simpler symbols, such as QPSK. It does this by transmitting over time the inverse fast Fourier transform (IFFT) of a sequence of symbols. As a consequence, each simple symbol is associated with a bin of the fast Fourier transform (FFT), and, given a sufficiently narrow FFT subcarrier, the communication system comprises a set of flat-fading channels. This is a useful approach for environments with frequency-selective fading, particularly if the channels are relatively static over time. For analog communications, it was common to think of signal modulation and frequency upconversion both as modulation. For digital communications, it seems more natural to make the distinction between modulation and frequency upconversion clearer. The modulation is typically done digitally. The frequency upconversion may be either digital, analog, or both, depending upon the system. Some systems perform the frequency upconversion in multiple steps. For example, it is often convenient to upconvert to an intermediate frequency (IF) digitally,1 then to upconvert to the carrier frequency using analog circuitry. This approach is basically a modern version of a superheterodyne transmitter. Mathematically, upconverting to a carrier frequency, f0 , can be performed by multiplying the complex baseband signal as a function of time, s(t), by the term e−iω t , where ω = 2πf0 is the angular frequency. The physical signal is given by the real part of this product, ℜ{e−iω t s(t)} = ℜ{s(t)} cos(ωt) + ℑ{s(t)} sin(ωt) .

(4.5)

This approach takes advantage of the orthogonality of sin and cos. For analog upconversion, the IF signal is multiplied by a real carrier then filtered, as seen 1

Many modern systems employ direct conversion, avoiding the IF stage because of integrated circuit (IC) advantages.

122

Wireless communications fundamentals

Figure 4.4 Example of a communication system with a digital IF upconversion

to the frequency fIF , followed by an analog upconversion to the carrier frequency fIF + fUP .

Figure 4.5 Examples of multipath scattering in an environment that is observed by

multiple receivers.

in Figure 4.4. The mixer creates images at the sum and difference of the IF frequency and the analog upconversion frequency. Because of filter design constraints, it is helpful to keep the IF frequency reasonably high so that the filter can easily select one of these images. For logistical reasons, even more stages are sometimes used.

4.2.1

Wireless channel Channels have a wide variety of characteristics. A simple static line-of-sight channel with a single transmit antenna and single receive antenna can be characterized by a single complex number. Channels between multiple-antenna transmitters and multiple-antenna receivers are considered in detail in Chapter 8. More typical and more interesting channels are subject to the effects of time-varying multipath scattering. For such channels, the transmitted signal bounces off various scatterers in the environment. Because of the spatial distribution of scatterers, the receiver observes the transmitted signal coming from a distribution of angles at various delays, displayed in Figure 4.5. The signal that is observed at the receiver is the result of the sum of the delayed versions of the transmitted signal as discussed in Chapter 10. If these delays are significant compared to inverse of the bandwidth of the signal then there is said to be delay spread. Delay spread introduces channel sensitivity to frequency. Consequently, channels

4.2 Reference digital radio link

123

with delay spread are said to be frequency selective. If there is motion in the environment or if the transmitter or receiver is moving, the channel will change over time. Because the directions to various scatters are not typically identical, motion introduces a range of Doppler frequency shifts. In this regime, it is said that the channel has Doppler spread. The effects of this complicated environment can be mitigated or even exploited by using adaptive techniques as discussed in Chapter 10. For the sake of convenience, the channel attenuation that is caused by multipath scattering is often factored into a term associated with fading that incorporates the variation in the channel due to relative delays and motion, and a term associated with overall average attenuation. The average attenuation is typically parameterized by the link length, r. Typically, average signal power in ad hoc wireless networks is assumed to decay with distance r as r−α e−γ r , where α is known as the path-loss exponent and γ is an absorption coefficient. In most works in the literature, γ is set to zero and α > 2.

4.2.2

Thermal noise The introduction of noise is usually associated with the internal noise of the receiver. In 1928, John Johnson [167] observed that thermal noise in a conductor was proportional to bandwidth and temperature. This result was discussed by Harry Nyquist [233]. This noise is associated with the black-body radiation [236]. It turns out that proportionality to temperature and bandwidth is a lowfrequency approximation. At higher frequencies, the effects of quantum mechanics reduce the noise spectral density. However, for frequencies f of interest, the classical approximation is accurate because the frequency is far from the quantum limit, f ≪ kB TK /h

≈ 6 THz at room temperature,

(4.6)

where this kB ≈ 1.38 · 10−23 J/K is the Boltzmann constant in SI units, h ≈ 6.624 · 10−34 Js is the Planck constant in SI units, and TK is the absolute temperature that is expressed here in Kelvin. The observed receive noise power Pn is bounded from below by thermal noise Pn ≥ kB TK B .

(4.7)

The bandwidth is given by B. In practice, the noise power is a number of decibels higher than the thermal limit. In order to characterize the noise in a system, it is common to cite an effective temperature such as TK = 1500 K, which is much higher than the real room temperature of around TK = 290 K. Alternatively, the noise of the system is characterized by a noise figure, fn , which is usually presented in decibels. Expressed on a linear scale, the noise figure multiplies the

124

Wireless communications fundamentals

right-hand side of Equation (4.7). The noise figure of a good receiver might be two to three decibels. However, it is not uncommon to have noise figures a few decibels higher. The channel also includes external interference that may come from unintended spectral sidelobes of nearby spectral occupants or competing users at the same frequency. External interference is a common issue in the industrialscientific-medical (ISM) band in which WiFi operates [150]. In the case of ad hoc wireless networks, this interference may be caused by other users in one’s own network. Similar to upconversion, downconversion is used to transform the signal at carrier frequency to a complex baseband signal s(t). The downconversion may be performed in a single step or in multiple steps using an intermediate frequency. These conversions may be performed digitally or by using analog circuitry. As an example, a single-stage downconversion can be notionally achieved by multiplying the received signal by the complex conjugate of the upconversion, eiω t , eiω t ℜ[s(t) e−iω t ] = eiω t (ℜ{s(t)} cos(ωt) + ℑ{s(t)} sin(ωt))   = ℜ{s(t)} cos2 (ωt) + i sin(ωt) cos(ωt)   + ℑ{s(t)} i sin2 (ωt) + sin(ωt) cos(ωt)   1 1 1 + cos(2ωt) + i sin(2ωt) = ℜ{s(t)} 2 2 2   1 1 1 + ℑ{s(t)} i − i cos(2ωt) + sin(2ωt) . 2 2 2

(4.8)

It is clear from the above form that the signal is broken into a high-frequency component centered at 2ω and a baseband component near zero frequency. Under the assumption that ω is large compared with 2πB, where B is the signal bandwidth, the baseband signal can be recovered by applying a lowpass filter. This will remove the 2ωt terms, giving the downconverted signal, 1 1 (ℜ{s(t)} + iℑ{s(t)}) = s(t) . (4.9) 2 2 Demodulation covers a range of approaches to working with the received baseband signal. For multiple-antenna receivers, some form of signal combining can be used to improve demodulation performance by increasing signal-to-noise ratio (SNR) or mitigating cochannel interference. Two basic classes of decoders are the hard and the soft decoder. As an example, consider a complex baseband QPSK signal in noise, displayed in Figure 4.6. Because of the noise, the received points lie in a region near but not at the transmitted QPSK symbol. A hard decision on the two modulated bits can be made based on the quadrant in which the signal is observed. These hard decisions can be sent to the decoder. In the example depicted in Figure 4.6, some hard decisions (for example, the dark gray dots in the bottom left quadrant) will be incorrect. In modern receivers, it is more common to estimate the likelihood of the possible bit states and pass these “soft decisions” to the decoder. In general, hard decisions reduce computation

Imagninary Part Of Received Signal

4.3 Cellular networks

125

2

1

0

−1

−2 −2

−1 0 1 Real Part Of Received Signal

2

Figure 4.6 Example of an ensemble of points drawn from a complex QPSK

modulation with amplitude 1 in the presence of Gaussian noise with an SNR of 4 dB. The points at various shades of gray correspond to four originating points.

complexity, while soft decisions improve performance. Soft decisions also blur the line between demodulation and decoding. The decoder is intimately tied to the encoder. By using the hard or soft decisions provided by the demodulator, the decoder extracts an estimate of the original data sequence. Strong decoders can compensate for large symbol error rates. Finally, the data sink uses these data. In the case of a mobile phone, the vocoded signal is reproduced. In the case, of a wireless network, the data may be repackaged and retransmitted along the next link in the network.

4.3

Cellular networks The cellular network topology is commonly found in mobile telephone systems. A typical cellular network comprises multiple access points or base stations that are connected to a tethered infrastructure, and mobile units such as telephones that communicate with one or more base stations, although each mobile typically communicates with one base station. This topology enables coverage over large areas; additionally, through base station control, it ensures that nearby nodes operate with minimal interference to each other. The link carrying data from a mobile unit to a base station is called the uplink and the converse is called the downlink. For a given cell, the uplink is a many-toone network as multiple mobile units typically connect to a single base station, and is called the multiple-access channel in information theory. The downlink between a single base station and multiple mobile units forms a one-to-many network, and is called the broadcast channel in information theory.

126

Wireless communications fundamentals

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Figure 4.7 Cellular model with Poisson cells, with base stations denoted by dots.

Base-station locations are selected on the basis of many factors, including usage patterns, availability of space, and terrain, thereby making detailed mathematical analysis extremely difficult. For analytical tractability, two simple models are often used to describe cellular networks: the Poisson-cell model and the hexagonal-cell model. In both cases, it is typically assumed that mobile units communicate with their nearest (in Euclidian distance) base station although in practice, mobile units typically communicate with the base station with which they have the highest signal-to-interference-plus-noise ratio (SINR). In the Poisson-cell model, base stations are assumed to be distributed on a plane according to a Poisson point process (PPP) with a constant average likelihood (or area density) ρb . The Poisson point process is in a sense the most random distribution of points as it assumes that every point is located independently from other points and the number of points in any two disjoint regions are independent random variables. Figure 4.7 illustrates the Poisson-cell model in which base stations are denoted by circles and the lines divide the plane into cells. A mobile unit that falls within a given cell is typically modeled as communicating with the base station associated with that cell. Alternatively, base stations can be modeled as located on a hexagonal grid, which results in hexagonal cells in a honeycomb pattern as in Figure 4.8. Assuming that the coverage area of a base station is a disk centered on the base

4.3 Cellular networks

127

3.5

3

2.5

2

1.5

1

0.5

0

−0.5

−1

−1.5

−2 −2

−1

0

1

2

3

4

5

Figure 4.8 Cellular model with hexagonal cells, with base stations denoted by dots.

station, the hexagonal-cell model results in the most efficient coverage of a given two-dimensional area, that is the fewest number of base stations are required to cover a given area. Note that both the Poisson-cell model and the hexagonal-cell model are opposite extremes. In reality, base-station locations are carefully planned, but are subject to geographical and usage variances.

4.3.1

Frequency reuse In many cellular systems, it is desirable for the base stations to have minimal interaction with one another. To minimize interference from nearby cells (intercell interference), the total communication bandwidth is divided into κ orthogonal channels, and nearby cells operate on different channels. Hence, mobile units in nearby channels do not interfere with each other. The number κ is called the frequency-reuse factor. Note that some authors define the frequency-reuse factor as 1/κ. For the Poisson-cell model, the minimum value for κ so that no two adjacent cells share the same channel is four by the celebrated four-color theorem, which states that the minimum number of colors required to color an arbitrary twodimensional map such that no two adjacent countries have the same color is four. For regular cells such as in the hexagonal-cell model, a smaller number of channels may be possible. For instance, Figure 4.9 shows a channel assignment in which no two adjacent cells share the same band with κ = 3. In practice, κ may take on values as large as 11, which is done to reduce intercell interference in systems with small cell sizes.

128

Wireless communications fundamentals

3 2 3

2 1

3 2

1

1

3

2

3

1

2

1 3

2

3

1

2

3

2

3 2

3

1

2

1

2

3 2

1

3

Figure 4.9 Channel assignment for hexagonal cells with reuse factor three.

Typically, within each cell, transmissions are orthogonalized using some multiple-access approach such as time-division multiple access (TDMA) or codedivision multiple access (CDMA), which are sometimes combined with spacedivision multiple access (SDMA) using sectored or adaptive antennas. Hence, we may view cellular systems as comprising large-scale frequency-division multiple access (FDMA) to mitigate intercell interference combined with T/C/SDMA to mitigate intracell interference. It is not uncommon for CDMA systems to employ a frequency reuse factor of 1. Consequently, they suffer from intercell interference which reduces the number of simultaneous users that they can support.

4.3.2

Multiple access in cells Since nearby cells often operate in different frequency bands (large-scale FDMA), it is common to assume that, within each cell, nodes do not experience interference from other cells, although there has been recent work that explicitly models out-of-cell interference; that is described in Chapter 13. Additionally, the uplink from the mobile units to base stations and downlinks from base stations to mobile units typically occupy different frequency bands. Within each cell time-division multiple access or code-division multiple access is typically used and sometimes combined with sectored-antennae that essentially divide users spatially. Time-division multiple access in cellular networks is conceptually straightforward and is used in systems such as Global System

4.3 Cellular networks

129

Guard time/interval

Slot 0

Slot 1

Slot 2

Slot K−1

Slot 0

Frame

Figure 4.10 Time slots for time-division multiple access.

for Mobiles (GSM). The base station assigns mobile users to noninterfering time slots and provides an accurate common timing reference for all in-cell users. Basic time-division multiple access In time-division multiple-access systems, the base station divides time into individual slots and assigns each slot to a single mobile unit for transmission, as illustrated in Figure 4.10. A duration of time called a frame is divided into K time slots with guard intervals between the time slots to handle timing offsets at different mobiles caused by propagation delays and mismatches in timing synchronization. Basic code-division multiple access For code-division multiple-access systems, users are separated by encoding their information with waveforms that allow their signals to be detangled at the base station. For instance, on the downlink, the base station typically encodes users’ signals by using orthogonal functions such as the Walsh codes illustrated in Figure 4.11. For a simple illustration of orthogonal CDMA, consider a toy example with four mobile units in a given cell where the base station wishes to communicate one data symbol per mobile. For the length-4 Walsh functions shown in Figure 4.11, the base station may assign the kth function (or code) to the kth user. To communicate the data symbol xk to the kth user, the base station transmits xk ck (t). Assuming an ideal channel, the kth user receives the following signal yk (t) =

4 

xj cj (t) + nk (t),

(4.10)

j=1

where nk (t) is the noise process at the kth receiver. The kth receiver may recover a noise-corrupted version of xk by filtering the received signal through a filter matched to its code ck (t) and sampled at time t = 0, ⎞ ⎛  ∞ 4  1 rk (0) = xj cj (τ ) + nk (τ )⎠ dτ ck ∗ (τ ) ⎝ 4 −∞ j=1 = xk + nk ,

(4.11)

130

Wireless communications fundamentals

c3 (t)

c4 (t)

c1 (t)

c2 (t)

Figure 4.11 Walsh functions. ∞

where nk = 41 −∞ ck ∗ (τ ) nk (τ ) dτ . Thus, the interference is eliminated as rk (0) does not contain any contribution intended for the other mobile units. Note that length-M Walsh functions can also be represented as length-M vectors, where ck (t) for M = 4 can be represented by the following vectors: ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 1 1 1 1 ⎜ 1 ⎟ ⎜ −1 ⎟ ⎜ −1 ⎟ ⎜ 1 ⎟ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ c1 = ⎜ (4.12) ⎝ 1 ⎠ , c2 = ⎝ −1 ⎠ , c3 = ⎝ −1 ⎠ , c4 = ⎝ 1 ⎠ . 1 −1 1 −1

With this representation, the operation of matched filtering followed by sampling can be interpreted as a simple inner product. A matched filter, discussed for spatial processing in Section 9.2.1, has a form that has the same structure as that expected of the signal. Here the matched filter is given by the structure of the Walsh function spreading sequence. On the uplink, mobile users do not encode their signals using orthogonal codes because the transmitted signals from the mobile units pass through different channels, thus destroying orthogonality. Additionally, the Doppler spread (discussed in detail in Chapter 10) resulting from relative motions of the different mobile units causes signals received by the base station from different mobiles to no longer be orthogonal. For these reasons, practical CDMA systems utilize random spreading codes on the uplink from the mobiles to base stations.

4.3 Cellular networks

131

Consider the following simple example in which two mobile users (nodes 1 and 2) are in the same cell. Let hk (t) denote a linear-time-invariant (LTI) channel impulse response between the base station and node k. Furthermore, we shall assume reciprocity. On the downlink, let the base station encode the signal intended for the kth user by using the function dk (t). The received signal at the kth mobile user is given by yk (t) = x1 hk ∗ d1 (t) + x2 hk ∗ d2 (t) + nk (t) .

(4.13)

Suppose that the kth mobile user is able to invert the channel perfectly by using an equalizer whose impulse response is fk (t). The equalized signal for user k is given by y˜k (t) = x1 d1 (t) + x2 d2 (t) + fk ∗ nk (t) .

(4.14)

Mobile node 1 can then match filter y˜k (t) with d1 and sample at time 0 to remove the interference. Node 2 can do the same with d2 (t). The situation is different on the uplink from the mobile units to the base stations. Let the kth user encode its transmit symbol xk by using the function ck (t). The received signal at the base station is given by y(t) = x1 h1 ∗ c1 (t) + x2 h2 ∗ c2 (t) + n(t) .

(4.15)

In this case, unless h1 (t) and h2 (t) take very specific forms, it is not possible to simultaneously invert h1 (t) and h2 (t). Any orthogonality associated with c1 (t) and c2 (t) will be lost, making it not very useful to employ orthogonal codes on the uplink. Instead, pseudorandom (but known at the base station) codes are used. If the codes are sufficiently long, they will be nearly orthogonal as the following analysis illustrates. Consider a collection of M codes of length M where each code vector has M zero-mean i.i.d. entries of variance 1/M . Let the jth entry of ck be denoted by cij . We then have c†k ck =

M  j=1

c∗ij cij =

M 1  √ | M cij |2 . M j=1

By the law of large numbers, for large M M 1  √ | M cij |2 c†k ck = M j=1 √ ≈ var{ M cij } = 1 . Similarly, when M is large, we have the following for i = k: M 1  c†k ck = M c∗ij ck j M j=1 % & % & ≈ M c∗ij ck j = M c∗ij M ck j  = 0

since cij and cik are zero-mean and uncorrelated random variables.

(4.16)

(4.17) (4.18)

(4.19) (4.20)

132

Wireless communications fundamentals

Nominal beam pattern for one directional antenna

Three directional antennas combined to cover cell in three sectors

Cell boundary

Figure 4.12 Illustration of coverage of a nominal sectored cell using directional

antennas.

Hence, we conclude that long random codes √ are nearly orthogonal. Note that, typically, cij are assigned values from ±1/ M with equal probability although the above analysis holds for other distributions as well. Space division multiple access with sectored antennas Base stations are often equipped with sectored antennas, which are antennas that focus their energy in several specific directions. A cell is typically divided up into sectors with one or more antennas assigned to radiate with a beam pattern that covers that sector with little energy spillage into adjacent sectors. Sectored antennas allow base stations to simultaneously transmit to multiple mobiles in a given cell if they are in different sectors. Figure 4.12 illustrates a nominal cell that is divided into three sectors. The solid lines represent the beam patterns of the three directional antennas used in this sectored antenna. The base station can transmit to three mobile units in different sectors simultaneously with minimal interference unless at cell boundaries.

4.4

Ad hoc wireless networks A simpler network topology is an ad hoc wireless network as illustrated in Figure 4.13. Such a network does not have central controllers (such as base stations in cellular networks), and data links are typically between a single transmitter and receiver pair although variations that include one-to-many and many-to-one links also exist. Because of the lack of central control, any algorithms used in ad hoc networks have to be distributed in nature. From a practical standpoint, simple algorithms

4.4 Ad hoc wireless networks

133

Figure 4.13 Ad hoc network with transmitters denoted by dots and receivers by circles.

are attractive in such networks because the overhead required to synchronize a spatially distributed ad hoc network is high. Characterizing the capacity of such networks is very difficult because there are many different ways that nodes can cooperate with each other. At the time of writing, the capacity of 2 × 2 wireless links is still unknown, although it is known in certain regimes and approximations to the capacity region have been found. Thus, the problem of characterizing the capacity region of such networks is very difficult. Most work on the capacity of ad hoc wireless networks to date has focused on capacity-scaling laws, which are the rates at which the throughput in the network changes with the number of nodes in the network n. Two general models are used. Dense networks are networks whose area is constant and thus density of nodes increases with n. Extended networks are network whose the area is increased with the number of nodes such that the area density of nodes is constant. Note that scaling laws are typically given in “order-of-growth” type expressions. The pre-constants of the order-of-growth expressions are often ignored, and, as such, these results are usually only useful when n is very large. Figure 4.14 illustrates some of the key results on the capacity-scaling laws of ad hoc wireless networks. The results illustrated here apply to dense networks with some differences in network models. For instance, the Gupta–Kumar model [132] does not include channel fading, and the Ozgur et al. [240] model uses a specific fading model. For TDMA systems, time is split up among the n nodes and each node is assigned a time slot in a round-robin fashion. Since each node gets only 1/n th of the time to communicate, its throughput decays as 1/n. The Gupta–Kumar work assumes multi-hop communications in which each physical-layer link is

134

Wireless communications fundamentals

Per-link capacity TDMA ~

1 n

Multi-hop - Gupta & Kumar (2000) etc. ~

1 n

Ozgur, Leveque, Tse (2007) ~ Constant

>1000

>10 9

Number of nodes, n

Figure 4.14 Illustration of key results on the capacity-scaling laws of ad hoc wireless

networks. Note that the per-link rates illustrated in this figure are qualitative.

between a given node and one of its nearby nodes. Since the distances between √ nodes and their nearest neighbors decay with 1/ n, the signal power received by a node from its nearest neighbors increases as nα /2 . However, so does the interference power. Thus, the physical-layer links can maintain approximately constant signal-to-interference ratios and approximately constant data rates with increasing n. Because the number of hops required to traverse a fixed distance √ √ increases as n, the data rate also decays approximately as n. Gupta and Kumar showed this principle for a specific traffic pattern, and it was extended to random traffic patterns by Franceschetti et al. [101]. The Ozgur et al. result uses a hierarchical cooperation scheme with distributed multiple-input multipleoutput (MIMO) links, where collections of nearby nodes act as virtual antenna arrays. More details on these results are given in Chapter 14.

4.4.1

Achievable data rates in ad hoc wireless networks Single-antenna systems A different approach used in analyzing ad hoc wireless networks is to find the data rates achievable for a given outage probability in networks in which nodes utilize specific communication schemes. The primary tool for such analyses has been stochastic geometry, where nodes are typically modeled as distributed on a plane according to a homogenous Poisson point process that places nodes randomly with uniform spatial probability density. For narrowband systems that have nodes distributed according to a Poisson point process, that transmit with i.i.d. power levels, the aggregate interference power seen at a typical node in the network can be characterized by its characteristic function, although the CDF and PDF of the interference powers is not known. For the case of an ad hoc wireless network with α > 2 and the density of transmitting nodes equaling ρ, the characteristic function of the interference

4.4 Ad hoc wireless networks

135

power I with transform variable s is given in Reference [133]:    5 6  % −sI& 2 2 e = exp −ρπ (hP )2/α Γ 1 − sα , (4.21) α % & where (hP )2/α is the expected value of the product of the channel fading coefficients h and the transmit powers P of the transmitting nodes. The characteristic function is particularly useful in computing the probability that the signal-to-interference ratio (SIR) of a representative link exceeds some threshold in Rayleigh fading channels. This property is the result of the fact that the received signal power conditioned on the transmit power and link length in Rayleigh fading channels is distributed as a unit-mean exponential random variable (see Section 3.1.10) whose CDF PE xp (x) is 1 − e−x , x ≥ 0, PE xp (x) = (4.22) 0, x < 0. For x ≥ 0, the complementary cumulative distribution function (CCDF) is 1−PE xp (x) = e−x . With S and I representing the received signal and interference powers respectively, the probability that the SIR is greater than some threshold τ is given by Pr(S/I > τ |I) = Pr(S > τ I|I) = exp(−τ I) .

(4.23) (4.24)

Writing the probability density function of the interference as pk and integrating out with respect to the interference power I yields  (4.25) Pr(S/I > τ ) = dI Pr(S/I > τ |I) pk (I)  (4.26) = dI e−τ I pk (I) % −τ I& (4.27) = e    5 6  2 2 = exp −ρπ (hP )2/α Γ 1 − τα . (4.28) α Using tools from stochastic geometry, Weber et al. [340] introduced the idea of transmission capacity, which is the product of the data rate and the maximum density of nodes in an ad hoc network that achieves a particular SINR for a given outage probability weighted by the probability. This quantity enables a more direct analysis of the achievable data rates in such networks. The authors in Reference [340] use transmission capacity to compare direct-sequence CDMA that uses matched-filter receivers, with frequency-hopping CDMA in networks with spatially distributed nodes. They find that the order-of-growth of the transmission capacity with spreading length of frequency-hopping CDMA systems is larger than that of direct-sequence CDMA systems.

136

Wireless communications fundamentals

More formally, consider a wireless network with nodes distributed according to a Poisson point process with intensity (or user density) ρ. Suppose that links in the network have a target data rate of rt , which is achievable if the SINR on each link exceeds a threshold γ. Then define the contention density ρǫ as the maximum density of nodes such that the probability that the SINR is less than or equal to the threshold γ is less than or equal to ǫ. In other words, ρǫ = max such that Pr(SINR ≤ γ) ≤ ǫ. ρ

(4.29)

The transmission capacity cT is then defined as cT = ρǫ (1 − ǫ)rt .

(4.30)

Note that the transmission capacity cT is the product of the maximum density of nodes that achieves the target SINR, and the probability of a link achieving the target SINR and the communication rate that is supportable given that SINR. Multiple-antenna systems Antenna arrays in ad hoc wireless networks are useful both in terms of spatial multiplexing (that is enabling a single user to transmit multiple data streams) as well as for SDMA. By nulling out the interference contribution from nearby nodes, it is possible to get significant increases in SINR and hence data rates. With N antennas per receiver, it is possible to null out the interference from N − 1 sources in ideal situations; however, this interference mitigation may come at the expense of some loss of signal SNR at the output of the mitigating receiver. Alternatively, it is also possible to increase signal power by a factor of N relative to noise by coherently adding signals from N antennas coming from a target signal source. The following simple heuristic argument shows that it is possible to achieve signal-to-interference-plus-noise ratio scaling on the order of (N/ρ)α /2 in ad hoc wireless networks with a power-law path-loss model. Consider a receiver with N antennas at the center of a circular network of interferer density ρ as illustrated by the square in Figure 4.15. Suppose this receiver uses a fraction ζ of its degrees of freedom to null the ζN − 1 interferers closest to it. When N is large, these nulled interferers occupy a circle of radius approximately equal to   ra = (ζN − 1)/πρ ≈ ζN/πρ . (4.31) Assuming that the interferers are distributed continuously from ra to infinity and integrating their interference contribution, we find that the residual interference grows as ra2−α ∼ (N/ρ)1−α /2 . Suppose that the remaining (1 − ζ)N degrees of freedom are used by the receiver to increase the SINR relative to thermal noise and residual interference by a factor N . Then, the SINR grows as (N/ρ)α /2 . In networks with α > 2, which are common, the SINR growth with number of antennas is greater than linear, which would be the case for simple coherent combination. Additionally, this heuristic analysis indicates that it may be possible to

4.5 Sampled signals

137

Closest unnulled interferer

R

Nulled region

Continuous distribution of un-nulled interferers.

Figure 4.15 Illustration of interference contribution from planar network with nulling

of nearby interference.

increase the density of users and maintain the same SINR by linearly increasing the number of antennas per receiver with the node density. This has been shown independently for different sets of assumptions in References [56], [123], and [164].

4.5

Sampled signals There are some subtleties in considering sampled signals and channels. Some of these effects with regard to channels are explored in more detail in Section 10.1. From the basic physics, in some sense, all electromagnetic signals are quantized because the signal is mediated by photons that have quantized energies. However, at frequencies of interest for most wireless communications, the energy of a single photon is so small that it is not useful, although it is an important consideration for optical communications. The large number of photons received at an antenna are so very large that statistically they can be modeled well by continuous signals. Somewhat amusingly, because essentially all modern communications are digital, the physical signals are quantized in energy and time, although this quantization has little to do with the physical photon quantization. In general in this text, it is assumed that any subtleties associated with sampling have been addressed and we work with the complex baseband signal. However, we will address some of the sampling issues here. The fundamental issue is that sampled signals need to be sampled at a rate that satisfies the Nyquist criterion,2 that is, for a band limited signal of width 2

While “Nyquist” is widely used to indicate this criterion, work by John Whittaker, Vladimir Kotelnikov, and Claude Shannon could justify use of their names, and the criterion is sometimes denoted the WKS criterion [25].

138

Wireless communications fundamentals

B (including both positive an negative frequencies), the complex sampling rate fs must be greater than the bandwidth B.3 Translating between the continuous and the sampled domains typical requires an anti-aliasing filter or pulse-shaping filter. A common example of the pulse-shaping filter is the raised-cosine filter [255]. This filtering is required because a sampled signal has spectral images at multiples of the inverse of the sample period. For a continuous signal s(t) ∈ C as a function of time t, the spectrum S(f ) as a function of frequency f is given by  (4.32) S(f ) = dt e−i2π t f s(t) . The spectrum of the sampled signal Ss (f ) with sample spacing T is given by   T δ(t − mT ) Ss (f ) = dt e−i2π t f s(t) m

=



e−i2π m T f T s(mT )

m

=

 m

S(f − m/T ) ,

(4.33)

where δ(·) is the delta function, and the scaling by T is for convenience. The evaluated form in Equation (4.33) is the discrete-time Fourier transform of the sampled form of the signal. Two issues can be observed from Equation (4.33). First, for a received signal, if the spectral width of the signal S(f ) is greater than 1/T the various images of the continuous signal would overlap, resulting in an inaccurate sampled image. This phenomenon is referred to as aliasing. Second, if one were to transmit this signal, it would occupy infinite bandwidth. This is undesirable, and physically unrealizable. A pulse shaping filter is applied typically to reduce the spectral width of the signal. To get perfect reconstruction of a band-limited signal (of bandwidth B ≤ 1/T ), one can theoretically employ a perfect spectral “brick-wall” filter θ(f T ) that is spectral flat, within −1/(2T ) to 1/(2T ) and zero everywhere else. The impulse response of this filter is given by the sinc function, so that the reconstructed signal is given by    t − mT . (4.34) s(t) = s(mT ) sinc T m

Unfortunately, this approach is not achievable because the sinc function is infinite in extent; however, reconstruction filters that require a small number of samples can be designed that still work well.

Problems 4.1 In considering various constellations, when in the presence of additive Gaussian noise, the largest probability of confusing one point with another point on 3

Note we dropped the common factor two because we are assuming complex samples so that B includes both the positive and negative frequencies.

4.5 Sampled signals

(01)

i

−1 (10)

139

(00) 1

−i

(11)

Figure 4.16 QPSK bit assignments.

the constellation is driven by the distance between the closest points. Compared to a BPSK constellation, find the relative average power required to hold this minimum amplitude distance equal for the following constellations: (a) QPSK, (b) 8-PSK, (c) 16-QAM, (d) 64-QAM, (e) 256-QAM. 4.2 the (a) (b)

Under the assumption of 30 ◦ C temperature measured at the receiver, find observed noise power for the following parameters: noise figure of 2 dB, bandwidth of 10 kHz, noise figure of 6 dB, bandwidth of 10 MHz.

4.3 In considering a two-stage superheterodyne downconversion to complex baseband for a system with a carrier frequency of 1 GHz and a bandwidth of 40 MHz, find IF frequency ranges such that undesirable images are suppressed by more than the square of the relative filter sidelobe levels under assumption of the same filter being used at each stage. 4.4 By assuming that nodes in a wireless network communicate directly with their nearest neighbor, evaluate the capacity scaling laws for networks that are constrained to (a) a linear geometry, (b) a three-dimensional geometry. 4.5 Consider the constellation diagram for a QPSK system shown in Figure 4.16 with the constellation points assigned to 2-bit sequences. If possible, find an alternative assignment of bits that leads to a lower average probability of bit error.

140

Wireless communications fundamentals

4.6 Consider the following equation: Z =S+N.

(4.35)

Let N be distributed according to a zero-mean, circularly symmetric, Gaussian random variable with variance 12 σ 2 per real dimension and is independent of S. The random variable Z is used to estimate S such that the probability of error in making the estimation is minimized. (a) Suppose that S = ±V with equal probability. Find the probability of error in terms of the Q function. Recall that the Q function is the integral of the tail of a standard Gaussian probability density function, i.e.  ∞ x2 1 (4.36) dx √ e− 2 . Q(t) = 2π t (b) Suppose that S = ±U + ±U i. Find the probability of error. (c) How should U and V be related such that the probabilities of error in the previous two parts are equal? (This question really is about the SNR requirement for a QPSK system and BPSK system to have the same probability of symbol error.) 4.7 Suppose that c1 , c2 , . . . , cM are M × 1 vectors and c†j cj = 1 for all j: z=

M 

sj cj + n ,

(4.37)

j=1

where the si terms take on values of ±1 with equal probability and n contains independent, identically distributed Gaussian random variables with zero mean and variance 0.01. Let an estimate of s1 be given by sˆ1 = sign c†1 y .

(a) Suppose that c†j c1 = 0 for j = 1. Find the probability that sˆ1 = s1 . (b) Suppose that√ the entries of the vectors cj are i.i.d. random variables taking values of ±1/ M with equal probability. Find the probability that aˆ1 = a1 . 4.8 Consider a network comprising interferers distributed according to a Poisson point process with density of interferers ρ and subject to the standard inversepower-law path-loss model with path-loss exponent α > 2. Consider a link of length r in this network between a receiver that is not part of the process and an additional transmitter at a distance r away. Assuming that the signals are subject to Nakagami fading with shape parameter μ equaling a positive integer, find the cumulative distribution function (CDF) of the signal-to-interference ratio of the link in question. Hint: the upper incomplete gamma function Γ(s, x) for positive integers s can be expressed as follows: Γ(s, x) = (s − 1)! e−x This problem was inspired by Reference [148].

s−1 k  x

k=0

k!

.

(4.38)

5

Simple channels

For most wireless communications, channels (what happens between the transmitter and receiver) are complicated things. For the sake of introduction, in this section we consider a single transmit antenna and receive antenna, residing in a universe without scatterers or blockage.

5.1

Antennas The study and design of antennas is a rich field [15]. Here, we focus on a small set of essential features. The first important concept is that antennas do not radiate power uniformly in direction or in polarization. The radiated power as a function of direction is denoted the radiation pattern. If the antenna is small compared with the wavelength (for example, if the antenna fits easily within radius of a 1/8 wavelength), then the shape of the radiation pattern is relatively smooth. However, if the antenna is large compared with the wavelength, then the radiation pattern can be complicated. Antenna patterns are often displayed in terms of decibels relative to a notional isotropic antenna (denoted dBi). The notional isotropic antenna has the same gain over all 4π of solid angle.1 Gain is an indication of directional preference in the transmission and reception of power. The axisymmetric radiation pattern for an electrically small (small compared with a wavelength) dipole antenna is displayed in Figure 5.1. In the standard spherical coordinates of r, θ, φ, which correspond to the radial distance, the polar angle, and the azimuthal angle, respectively, the far-field electric field is limited to components along the direction of θ, denoted eθ . For electrically small dipoles, the radiation pattern is proportional to [290, 154] eθ 2 ∝ 1

1 sin2 θ . r2

(5.1)

Consider an object in a three-dimensional space projected onto a unit, from a point at the origin, the solid angle encodes the fraction is the area of the unit sphere that is occupied on the unit sphere. Solid angle is typically normalized such that 4π covers the entire viewable angular area.

142

Simple channels

One can find this relationship by noting that the radiation pattern must satisfy the differential equation

  1 ∂2 2 ∇ − 2 2 e(r, t) = 0 , c ∂t

(5.2)

to satisfy Maxwell’s equations, where e(r, t) is the electric field vector as a function of time t and position r, and is determined by radial distance, polar angle, and azimuthal angle indicated by r, θ, φ. The speed of light is indicated by c, and ∇2 is the Laplacian operator discussed in Section 2.7.1. Solutions to this equation are proportional to gradients of spherical harmonics [11] denoted Yl,m (θ, φ), where l indicates the degree of the harmonic and m is the order of the harmonic. By observing various symmetries of the antenna and of the spherical harmonic function, one can determine the contributing degree and order. Here these observations are made without proof, and the interested reader can find more thorough discussions of spherical harmonics. These symmetries are presented in various texts discussing spherical harmonics [11]. Based on the axial symmetry of the antenna, the radiated power must be axisymmetric. Solutions with m = 0 are a function of φ and therefore have azimuthal structure (that is, they are a function of φ); consequently, order zero m = 0 is required. Furthermore, the value of the θ-direction component of {e}θ is the same under parity inversion such that the direction vector r → −r. A result of this symmetry is that there is a symmetry in the value eθ above and below the θ = π/2 plane. Spherical harmonics that observe this symmetry require that only odd values of degree l are allowed. Here it is assumed that the antenna is small compared with a wavelength. Given the short length of the antenna compared to a wavelength, it is difficult to induce complicated radial functions of current flow [15]. The coupling of current to induce a particular spherical harmonic is proportional to the spherical Bessel function jl (k d) [11], where k is the wavenumber and d is the distance along the antenna. The l = 1 spherical Bessel function moves the quickest from zero as a function of k d, and thus corresponds to the solution with the largest current and thus radiated power. The lowest-order spherical harmonic satisfying all the symmetries and radiating the most power for an electrically small dipole is Yl= 1,m = 0 (θ, φ). Consequently, the electric field that is proportional to the gradient of this function is proportional to sin(θ). The gain is therefore given by Equation (5.1). The notional isotropic antenna would radiate equal power in all directions. Consequently, if the isotropic antenna and the small dipole antenna radiated the same total power, such that the integrals over a surface at some distance r are the same, then the peak gain Gdipole , which is at the horizon (θ = π/2), is

5.2 Line-of-sight attenuation

Dipole e

143

2





Figure 5.1 Radiation pattern of a small (length ≪ λ) dipole antenna.

∼1.76 dBi,

sin2 (π/2) Gdipole = % 2 & sin (θ) =

=

sin2 (π/2)

dφ dθ sin θ sin 2 (θ ) dφ dθ sin θ

3 ≈ 1.76 dBi . 2

(5.3)

Typically, the peak gain increases as the size of the antenna increases in units of wavelength. A second important concept is that the magnitude of the radiation pattern is the same for transmitting and for receiving. This property is because Maxwell’s equations are symmetric under time reversal [290, 154]. Consequently, a transmit and receive pair of antennas observe reciprocity; that is, they will observe the same channel and antenna gain when the link direction is reversed. If an antenna tends to radiate power in a particular direction, then it will prefer receiving power from that same direction. Signals being received from other directions will have some relative attenuation.

5.2

Line-of-sight attenuation The attenuation between two antennas in a line-of-sight environment in the far field is proportional to the gain of the transmitter times the effective area of the receiver divided by the distance between antennas squared. The motivation for this relationship is given by the effect of the density of the power spreading on the surface of a sphere. For a given power transmitted in some direction (proportional to antenna gain), the power received is then given by the product of the flux of power in that direction times the effective area of the receive antenna, as seen in Figure 5.2.

144

Simple channels

Surface 2 Area ~ r

Effective Area

Gain

Figure 5.2 Spherical propagation about transmitter.

5.2.1

Gain versus effective area Consider a signal propagating in both directions between a pair of line-of-sight antennas (antenna 1 and antenna 2). The transmitted and received powers are denoted Pt,1 and Pt,2 , and Pr,1 and Pr,2 , respectively. The transmit gains and receive effective areas are denoted Gt,1 and Gt,2 , and Aeff ,1 and Aeff ,2 , respectively. The gain is an indication of how much power is radiated or received in a particular direction. The concept of effective area is a large antenna approximation. If one considered a very large antenna that was many wavelengths across in both dimensions, then its ability to collect photons is essentially given by the physical cross section upon which the photons impinge. As the antenna grows smaller, the correspondence between physical area and effective area becomes less clear. In the limiting case of a thin wire dipole antenna of finite length but very small width, the effective area has little to do with the physical area. Nonetheless, it is still a useful concept. The relationships between the transmit and receiver powers are then given by Pr,2 = α Aeff ,2 Gt,1 Pt,1 Pr,1 = α Aeff ,1 Gt,2 Pt,2 ,

(5.4)

where α is some constant that incorporates effects such as attenuation due to propagation. From reciprocity, the attenuation for the link in either direction must be the same. Consequently, the ratios of the transmit to receive powers must be the same, Pr,2 /Pt,1 = Pr,1 /Pt,2 , or equivalently, Gt,2 Aeff ,2 = . Aeff ,1 Gt,1

(5.5)

Thus, the effective area of an antenna in some direction is proportional to the gain of the antenna in that direction. The next question is to determine the

5.2 Line-of-sight attenuation

145

Antenna

Load

Figure 5.3 Antenna in thermal equilibrium.

constant of proportionality between the effective area of an antenna Aeff and gain of that antenna G. To determine the constant of proportionality between the gain and effective area, a thermodynamic argument is invoked [236]. The effective area is defined by the power received by the antenna divided by the power spectral-density flux impinging upon the antenna, Aeff =

P , S

(5.6)

where P is the received power spectral density (in units of W/Hz) and S is the power spectral-density flux (in units of W/(m2 · Hz) for example) under the assumption that the polarization and direction maximize the received power. An antenna is in thermal equilibrium with its background in a chamber, as seen in Figure 5.3. For radio frequencies, the total power spectral-density flux of blackbody radiation Φf (in units of W/m2 /Hz, for example) can be approximated by [264] 2 f 2 kB T , (5.7) c2 which is known as the Rayleigh–Jeans law, where f is the frequency, c is the speed of light, kB is the Boltzmann constant and T is the absolute temperature. For the system to be in equilibrium, the incoming power and the outgoing power must be equal. Because power is being received at the antenna from all directions, P is the sum of power received from all directions, and the effective area and blackbody power spectral-density flux are denoted Aeff (θ, φ) and S(θ, φ). The differential of the solid angle is indicated by dΩ = dφ dθ sin(θ). The received power spectral density is given by  P = dΩ Aeff (θ, φ) S(θ, φ)  2π  π  2π  π Φf Φf ·1+ · 0, dφ dθ Aeff (θ, φ) dφ dθ sin(θ) Aeff (θ, φ) = 2 2 0 0 0 0 (5.8) Φf =

146

Simple channels

where the first term on the right-hand side of Equation (5.8) includes the radiation that matches the polarization of the antenna and the second term includes the radiation that is orthogonal to the polarization of the antenna. Consequently, the received power spectral density from the blackbody radiation is given by   π 2f 2 kB T 2π P = dφ dθ Aeff (θ, φ) 2 c2 0 0 f 2 kB T = 4π Aeff (θ, φ) . (5.9) c2 As discussed in Section 4.2.2, at lower frequencies, the thermal power spectral density due to the resistor and radiated by the antenna is given by P = kB T .

(5.10)

By equating the incoming and outgoing power, the average effective area is found to be Aeff (θ, φ) =

λ2 , 4π

(5.11)

by noting that the wavelength is given by λ = c/f . By construction, the average gain is one: G(θ, φ) = 1 .

(5.12)

Because the effective area and gain are proportional and their averages are determined here, they are related by Aeff (θ, φ) = G(θ, φ) , λ2 /4π

(5.13)

under the assumption that their polarizations are matched. When the direction parameters are dropped, maximum gain and effective area are typically assumed, so that the gain and effective area are related by G=

4π Aeff . λ2

(5.14)

As mentioned previously, the effective area of an antenna is somewhat different from the physical area. For moderately sized antennas (a few wavelengths by a few wavelengths), effective area is typically smaller than the physical area. Effective area is difficult to interpret for wire antennas and electrically small antennas, although if one imposes the constraint that no physical dimension can be smaller than some reasonable fraction of a wavelength (something on the order of 1/3), then it is at least somewhat consistent with the effective area. While it is generally assumed that the gain indicates the peak gain from the antenna exclusively, sometimes the inefficiencies due to impedance mismatches between the amplifier or receiver and the antenna are included with the gain. In this case, directionality is the gain of the ideal antenna, and the gain is the product of the directionality and the adverse effects of inefficiencies [15].

5.2 Line-of-sight attenuation

147

Figure 5.4 Notional radiation beamwidth for antenna with lh × lw effective area.

5.2.2

Beamwidth The beam shape is dependent upon the details of the antenna shape, but the shape of the main lobe can be approximated by considering a rectangular antenna with area Aeff ≈ A = lw lh , where lw and lh are the width and height of the antenna, as seen in Figure 5.4 The beamwidths in each direction are approximately λ/lw and λ/lh . For a square antenna l = lw = lh of gain G, the beamwidth, Δθ, in each direction is approximately , 4π λ . (5.15) Δθ ≈ = l G There are advantages and disadvantages to having higher-gain antennas. In a line-of-sight propagation environment, the increased gain provides more signal power at the receiver. However, this comes at the cost of requiring greater accuracy in pointing the antennas. Furthermore, in non-line-of-sight environments with significant multipath, there is no one good direction for collecting energy from the impinging wavefronts. Collecting energy in complicated scattering environments is one of the advantages of using adaptive antenna arrays versus high-gain antennas. As an example, if we imagine a square 20 dBi antenna, the effective area is given by A=

G 2 λ ≈ 8 λ2 4π

and the approximate beamwidth is given by , 4π ≈ 0.35 rad . Δθ ≈ G

(5.16)

(5.17)

In terrestrial communications, it is not uncommon for communication links to be blocked by walls, buildings, or foliage. Even when the transmitter and receiver have a direct path, the propagation is often complicated by the scatterers in the environment. Nonetheless, line-of-sight propagation is useful as a reference propagation loss.

148

Simple channels

For a line-of-sight environment, the received power Pr is related to the transmit power Pt by the free-space attenuation, which is given by Pr = a 2 Pt Gt Aeff a 2 = 4π r2 Gt Gr λ2 = , (4π r)2

(5.18)

where a is the complex attenuation of the signal, Gt and Gr are the gains of the transmit and receive antennas toward each other, and Aeff is the effective area of the receive antenna. The distance from the transmitter to the receiver is r. Geosynchronous orbit link It is amusing to consider the link from a satellite in geosynchronous orbit to a ground-based transmitter. Geosynchronous orbits match the orbital period to the earth’s rotation. Geosynchronous orbits are convenient for communication satellites because the satellites appear to be approximately stationary as the earth rotates. Consequently, the pointing direction of the ground antenna can be fixed. We can find the channel attenuation between a geosynchronous satellite and a ground-based transmitter for a few typical parameters. The altitude r of a geosynchronous satellite is about 36 000 km, as seen in Figure 5.5. This is a relatively long link. One of the digital TV bands occupies spectrum a little above 10 GHz. Here we will pick 12 GHz or a wavelength λ of about 25 mm. At this frequency, relatively small antennas can have fairly high gain. Compared to low carrier frequency, it is relatively easy to achieve 30 dBi gain Gr . Satellites’ antennas often have even higher gains. One important limitation is the broadcast coverage area. If the gain is too high, the beam might not illuminate the required area on the ground. If one desired a beam using a square antenna that covered the continental USA, the peak gain Gt is about 28 dBi, although smarter beam shaping could be used to increase the gain. In practice, a satellite could cover the same region by using a larger antenna and multiple feeds. Each feed illuminates the main antenna from a slightly different angle. Consequently, the antenna illuminates a different region from each feed. With the nominal parameters given in the previous paragraph, the attenuation can be calculated. The attenuation through the channel is given by a 2 =

Gt Gr λ2 (4π r)2

= 35 + 30 + 10 log10 {λ2 } − 10 log10 {(4π)2 } − 10 log10 {r2 }

≈ 35 + 30 + (−32) − 22 − 151 = −140

[dB] ,

[dB] (5.19)

where the arithmetic was performed on a decibel scale. The attenuation in a real environment will be slightly worse because of atmospheric and other nonideal effects.

5.3 Channel capacity

149

r ~ 36 000 km Gr

Gt

Figure 5.5 Notional geometry of satellite broadcast received on earth.

5.3

Channel capacity The single-input single-output (SISO) channel link capacity was described by Claude Shannon [284, 68]. The capacity provided a theoretical bound on the data rate that can be successfully transmitted and received through a noisy channel with arbitrarily low error rate. Before this result, it was unclear if essentially error-free performance was achievable given some positive channel noise. The bound does not provide guidance on how to design a practical link. However, the notion that the bound is achievable has driven communication systems ever since. The bound is based on a few simple ideas. The transmit signal as a function of time is given by s(t). The signal at the receiver has some average power. The channel, represented by the complex attenuation a, is known. Added to the communication signal is additive noise as a function of time n(t). The received signal as a function of time z(t) is given by z(t) = a s(t) + n(t) .

(5.20)

Next, two approaches for motivating channel capacity are discussed. The first is geometric. The second is based on the concept of mutual information. For a more thorough discussion see [68, 314], and the original discussion [284].

5.3.1

Geometric interpretation A heuristic approach to motivate channel capacity can be constructed geometrically. Consider a sequence of ns transmitted symbols, s(t0 ), s(t1 ), . . . s(tn s −1 ), and corresponding received signals, z(t0 ), z(t1 ), . . . z(tn s −1 ). This sequence is used to construct a codebook of allowed transmitted symbol sequences such that there is little probability of confusing one entry in the codebook from another entry even in the presence of the additive complex Gaussian noise. The question is, how many different distinguishable sequences can exist for a given number of symbols, signal power, and noise power? As a simple example, consider a system in which there are only two transmitted symbols, s(t0 ) = ±2, and s(t1 ) = ±2. Furthermore, consider a very strange non-Gaussian noise structure such that the values {−3/2, −1/2, 1/2, 3/2} can be

150

Simple channels

added to s(t0 ), and s(t1 ) with equal probability. The complete list of 64 possible channel output states is displayed in Figure 5.6. The four values of the s(t0 ), s(t1 ) pair are represented by the dots, and all of the potential output states of z(t0 ), z(t1 ) are represented by the intersections of the grid lines. • The total number of states is 64. • The number of noise states is 16. • The number of information states is 4 by construction. Because of the careful construction of the noise, all four of these states can be recovered. Consequently, the system reliably can communicate 2 bits. The potential number of useful information bits can also be seen by considering the entropy of the received signal and the noise. By analogy to statistical mechanics, the entropy with equally likely states is given by the logarithm of the number of potential states. Thus, the entropy associated with z(t0 ), z(t1 ), denoted here as Hz , is given by Hz = log2 (64) = 6 ,

(5.21)

and the entropy associated with noise, denoted here as Hn , is given by Hn = log2 (16) = 4 .

(5.22)

The total number of information bits or the capacity C of this channel is given by C = Hz − Hn

= 6 − 4 = 2.

(5.23)

If one thinks of the sequence of transmitted symbols as occupying a complex ns -dimensional space, the maximum number of codewords (useful transmitted symbol sequences) is given by the ratio of the number of total states divided by the number states occupied by the noise. Finding the number of codewords in this manner assumes a best case packing of noise and symbol spacing (as seen in Figure 5.6). For continuous distributions, a state is a notional concept, convenient for developing a geometric understanding. A state is occupied if a draw from a random distribution lies within some differential hypervolume around a given point. For states to be occupied with equal probability, the differential hypervolume must be proportional to the inverse of the probability density at the local value. The total number of receive states is a function of the received signal power and the noise power. One can think of a particular codeword as a vector in the ns -dimensional space and the noise as a fuzzy ball around the endpoint of the vector. We assume that the fuzzy ball is given by a Gaussian distribution, whose selection is discussed in greater precision in the next section. For any finite value of ns complex symbols, the Gaussian fuzzy ball has an arbitrarily large extent. However, as ns → ∞, the sphere associated with the Gaussian distribution

5.3 Channel capacity

151

z(t1) 4

2

-4

-2

2

4

z(t0)

-2

-4 Figure 5.6 Simple entropy example. There are 16 possible noise states and 64 possible

received states.

hardens. The notion of hardening indicates that the fluctuation about a central value decreases as the number of symbols increases. Consequently, essentially all draws from the Gaussian distribution will be arbitrarily close to the surface of the sphere. As the number of complex symbols ns increases, the probability of large fluctuations in noise distance decreases. A second implication of considering the large dimensional limit is that the ns -dimensional hypervolume is dominated by the contributions at the surface, that is the vectors associated with codewords are nearly always close to the surface. Thus, by assuming that the distribution of codewords is statistically uniform across the surface, the number of states is proportional to the volume. The capacity is given by the ratio of the volume of hyperspheres in which signal-plus-noise vectors occupy to the volume of hyperspheres in which noise vectors occupy. It is shown later in this section that this packing is achievable. A set of code words generated by drawing from a complex Gaussian distribution satisfies the requirements of sphere hardening and uniformity. In Section 5.3.2, the optimality of the Gaussian distribution is discussed. Sphere hardening The magnitude-squared norm (here denoted x) of an n-dimensional real noise vector n with elements sampled from a real zero-mean unit-variance complex

152

Simple channels

Gaussian distribution is given by x = n 2 =

 m

{n}m 2 .

(5.24)

The probability distribution for the random variable x (magnitude-squared norm of n) is given by a complex χ2 distribution fχC2 (x; n, σ 2 ), discussed in Section 3.1.11, with a variance of σ 2 . The mean of the magnitude-squared norm x for the distribution fχC2 (x; n, σ 2 ) is given by x =



0



dx x fχC2 (x; n, σ 2 )

= n σ2 .

(5.25)

The variance of the magnitude-squared norm x is given by  ∞ % 2& 2 x − x = dx x2 fχC (x; n, σ 2 ) − n2 σ 4 0

= n(n + 1) σ 4 − n2 σ 4 = n σ 4 .

(5.26)

For a fuzzy Gaussian ball, the square of the radius is given by x and the standard deviation of the fluctuation about the mean of x is given by the square root of the variance of x. The fuzziness ratio indicated by the standard deviation to the mean square of the radius is given by √ 2 nσ → 0 ; as n becomes large. (5.27) n σ2 Because the ratio goes to zero, it is said that the noise sphere has hardened. This effect can be observed in Figure 5.7. As n becomes larger, the density function becomes sharply peaked about the mean. For larger values of n, the noise about the codeword vector is modeled well by a hard sphere. Volume of hypersphere For some point in a k-dimensional real space, the volume of the hypersphere of some radius r is given by [334] V (ℜ) (k, r) =

π k /2 rk . Γ[k/2 + 1]

(5.28)

For some point in an m-dimensional complex space, the volume of the hypersphere is given by V (m, r) =

πm 2 m πm r2 m = r . Γ[m + 1] m!

(5.29)

Note, this is not the volume of an m-dimension hypersphere in a real space because of the doubling of dimensions due to the complex space.

5.3 Channel capacity

153

4

1 f␹C2 x, n, n

3

2

1

0 0.0

0.5

1.0 x

1.5

2.0

n Figure 5.7 Probability density function for fχC2 (x; n, 1/n), with n = 10, 20, 40, . . . , 100.

The volume Vn of the fuzzy complex noise ball for large m is approximated by πm m x m! πm 2 (σ m)m , = m!

Vn (m, σ) ≈

(5.30)

where x denotes the magnitude-squared norm of the noise used in the previous section. By using essentially the same argument and by noting that the signal and noise are independent, the variances of the noise and the signal power add. The volume of the hypersphere associated with the received signal Vz is approximated by Vz (m,



σ 2 + Pr ) ≈

πm ([σ 2 + Pr ] m)m , m!

(5.31)

where Pr is the average receive signal power observed at the receiver in the absence of noise. Geometric capacity construction To simplify the discussion, it is assumed that a = 1 in Equation (5.20) without loss of generality. For a large ns -dimensional complex space, corresponding to the m complex symbols transmitted, the number of separable codewords ncode can be bounded by the number of fuzzy noise balls that fit into the total hypervolume. This number can be approximated by the ratio of the volumes of the hyperspheres

Simple channels

6

Capacity b symbol

154

5 4 3 2 1 0 10

5

0

5 10 SNR dB

15

20

Figure 5.8 Channel capacity in terms of bits per symbol as a function of SNR.

associated with the noise-plus-signal power to the noise power, √ Vz (ns , σ 2 + Pr ) ncode ≤ Vn (ns , σ) (σ 2 + Pr )n s ≤ , (σ 2 )n s

(5.32)

in the limit of a large number of symbols ns . Thus, the upper limit on the number of bits per complex symbol c, which is an outer bound on capacity, is given by the log2 of the number of possible codewords, 1 log2 (ncode ) ns   Pr = log2 1 + 2 . σ

c=

(5.33)

The ratio of the signal power Pr to noise power σ 2 is the SNR. The capacity channel as a function of SNR is displayed in Figure 5.8. This geometric argument sets an upper bound on the data rate. It is based on some notion of perfectly packing the noise spheres such that there is no potential confusion between symbols. Somewhat surprisingly, this bound is theoretically achievable in the limit of a large dimensional space. Achievability To demonstrate that the capacity bound is asymptotically achievable, a particular implementation is proposed. For a given data rate, the probability of error must go to zero as the number of symbols goes to infinity. The effect of sphere hardening is exploited here. Consider a random code over m complex symbols with ncode codewords constructed using vectors drawn randomly from a complex Gaussian distribution. These codewords would randomly fill the space (lying near the surface with high probability). Given that a particular codeword was

5.3 Channel capacity

n-Dimensional Space

155

Potentially Confused Symbol

Noise Symbol

Figure 5.9 Notional representation of two symbols. The first symbol is displayed

including some distribution of noise. The second symbol is potentially confused with the first symbol. In this case, the second symbol does not cause confusion because it is outside the noise volume of the first symbol.

transmitted, the probability that another randomly generated codeword would (1) fall within a confusion distance (determined by the noise power) per r is given by the ratio of the volumes of the hypersphere corresponding to the noise to the hypersphere corresponding to the signal plus noise, Vn (m, σ 2 ) Vz (m, σ 2 + Pr )  m σ2 = , Pr + σ 2

p(1) er r =

(5.34)

where the equality is asymptotically valid using the sphere-hardening approximation. By employing the union bound (which exploits the observation that summing the probabilities of pairwise errors as if there were independent variables produces is an upper bound on the real error probability), the probability that the ncode − 1 erroneous codeword might lie within the noise sphere of the correct codeword is bounded by per r



σ2 ≤ (ncode − 1) Pr + σ 2  m σ2 < ncode . Pr + σ 2

m (5.35)

By using the definition that the coding rate r is the number of bits encoded by the codebook normalized by the m complex symbols used r=

log2 (ncode ) , m

(5.36)

156

Simple channels

the probability of error is bounded by m σ2 Pr + σ 2 2 = 2m (r −log 2 [1+P r /σ ]) ,

per r < 2r m



(5.37)

where the relationship x = 2log 2 (x) is exploited. The error is driven to zero as the exponent of the right-hand side of Equation (5.37) becomes large. For Pr > 0, the exponent of the right-hand side tends to −∞ as n → ∞, if   Pr r < log2 1 + 2 σ = c.

(5.38)

Thus, given the Gaussian random code construction, error-free decoding is possible as n → ∞ for rates approaching the channel capacity   Pr c = log2 1 + 2 . (5.39) σ Here the capacity has been developed under the assumption of a complex baseband signal. Consequently, there are two orthogonal degrees of freedom (real and imaginary). In other texts, the capacity is sometimes developed under the assumption of real variables. In that case, the capacity is half what is displayed in Equation (5.39). In addition, the real and imaginary components of the noise would be considered separately. The variance of the real or imaginary component of the noise would then be half of the complex noise. In this chapter, c is used to represent the capacity in terms of bits/symbol or in terms of bits/s/Hz. Elsewhere in the text, c is used to represent the speed of light. Hopefully, the usage will be obvious from context and will not cause any confusion.

5.3.2

Mutual information In previous sections, channel capacity was discussed in terms of a geometric argument. A description of channel capacity in terms of mutual information and entropy is provided here. The capacity is the mutual information with the optimal input probability distribution for a given noise distribution and for a given set of channel constraints [68]. In the channel model used in the previous section, z = as + n,

(5.40)

it is convenient to set the units of power such that the coefficient a is 1. Here we have suppressed the explicit dependence upon time. Throughout this section, more care is taken notationally with regard to random variables than is taken in most of the text. A random variable X is indicated with an uppercase character.

5.3 Channel capacity

157

Some instance of that variable x is drawn from the probability distribution associated with X. Throughout this section, it is assumed that the random variables are complex. The maximum information per symbol (or data rate) is given by the mutual information I(S; Z) between the random variables S and Z when the distribution for S is optimized. The mutual information is given by   p(s, z) 2 2 I(S; Z) = d s d z p(s, z) log2 p(s) p(z) = h(Z) − h(Z|S)

= h(S) − h(S|Z) ,

(5.41)

where the temporal parameter t of the random variables has been suppressed for s and z, and the differential area in the complex space d2 s is described in Section 2.9.2. Differential entropy for some random variable is indicated by h(·). The conditional differential entropy is indicated by h(·|·), where the second term is the constraining condition. The joint probability distribution of S and Z is indicated by p(s, z). While formally the notion pS,Z (s, z) might be clearer, it is assumed that dropping the subscripts will not cause confusion. Similarly, the probability distributions of S and Z are indicated by p(s) and p(z), respectively. The differential entropy h(·) and conditional differential entropy h(·|·) are named, making a connection with statistical mechanics [264]. The use of the modifier “differential” indicates that this is the entropy used for continuous random variables. While the derivation will not be presented explicitly, the motivation for the mutual information being given by the difference in the entropy terms is directly related to the geometric discussion in the previous section. In statistical thermodynamics, entropy is proportional to the log of the number of possible states Ω. Each state is assumed to be equally likely with probability p = 1/Ω. Consequently, statistical mechanical entropy is given by kB log Ω = kB log

1 p

= −kB log p ,

(5.42)

where kB is the Boltzmann constant. This expression is a measure of entropy in units of joules/kelvin, which relates energy with temperature. In information theory, it is convenient to ignore the energy discussion and use base 2 rather than the natural logarithm because information is typically measured in terms of the number of bits. In addition, the constant of proportionality is dropped. It is worth noting that in the literature the natural log is used sometimes, and the units of information are given in “nats.” Specifically, the log2 is replaced with a natural log in Equation (5.39). A single nat is equivalent to 1/ log(2) ≈ 1.44 bits. In this text bits are preferred. Unlike in the typical statistical thermodynamics discussion, the probability of each state may not be equally likely. The entropy of a random variable X is the expected value of the number of bits required to

158

Simple channels

specify a state taken over all values of X, ) h(X) = log2

1 p(x)

*

.

(5.43)

For continuous variables, h(X) is called the differential entropy and is given by   1 d2 x p(x) log2 p(x)  = − d2 x p(x) log2 [p(x)] .

h(X) =



(5.44)

Similarly, conditional entropy is given by * 1 log2 p(x|y)  = − d2 x d2 y p(x, y) log2 [p(x|y)] ,

h(X|Y ) =

)

(5.45)

where p(x|y) is the probability density of x assuming a given value for y. The difference between the entropy and the conditional entropy is given by h(X) − h(X|Y ) = −



dx p(x) log2 [p(x)] +





d2 x d2 y p(x, y) log2 [p(x|y)]

d2 x d2 y p(x, y) log2 [p(x)]    p(x, y) + d2 x d2 y p(x, y) log2 p(y)    p(x, y) = d2 x d2 y p(x, y) log2 p(x) p(y) =−

= I(X; Y ) .

(5.46)

where the relationship p(x, y) = p(x|y) p(y) is employed. By observing the symmetry between x and y in Equation (5.46), it can be seen that the mutual information is also given by I(X; Y ) = h(Y ) − h(Y |X)

= h(X) − h(X|Y ) .

(5.47)

The above form is somewhat satisfying heuristically. If the entropy is expressed in units of bits, then the entropy is the average number of bits required to specify

5.3 Channel capacity

159

the state of a random variable. If Z is the observed random variable and S is the source random variable, then h(Z) is the average number of bits required to specify the observed state, and h(Z|S) is the average number of bits required to specify the observed state if the source state is known (this is the average number of bits required to specify the noise). Consequently, the difference must be the number of information bits that can be communicated per symbol. The capacity c is given by maximizing the mutual information with respect to the transmit probability distribution p(s), c = max I(S; Z) . p(s)

5.3.3

(5.48)

Additive Gaussian noise channel By using the definition in Equation (5.20), the three random variables are given by the received signal Z, the transmitted signal S, and the complex Gaussian noise N associated with n. By using Equation (5.47), the mutual information between the channel input S and the received signal Z is given by I(S; Z) = h(Z) − h(Z|S) .

(5.49)

The differential entropy h(Z|S) is simply the differential entropy associated with the noise h(N ), h(Z|S) = h(S + N |S) = h(N ) .

(5.50)

This differential entropy evaluation can be seen directly by noting that under the change of variables z = s + n, the probability p(s + n|s) = p(n), h(Z|S) = − =− =− =−









d2 s d2 z p(z, s) log2 [p(z|s)] d2 s d2 n p(s + n|s) p(s) log2 [p(s + n|s)] d2 s d2 n p(n) p(s) log2 [p(n)] d2 n p(n) log2 [p(n)]

= h(N ) .

(5.51)

Here we use the observation that the integrals over p(s + n|s) and p(n) are the same even though the distributions themselves are not the same. The differential

160

Simple channels

entropy for the Gaussian noise can be evaluated directly: 

d2 n p(n) log2 [p(n)] ⎛ n 2 ⎞ n 2  − σ2 − n e e σ n2 ⎠ ⎝ = − d2 n log 2 πσn2 πσn2

h(N ) = −



n 2 2 σn

  n 2 2 log + log [πσ ] (e) 2 2 n πσn2 σn2 n 2    − σ2 n n 2 2 e 2 = log2 [πσn ] + log2 (e) d n πσn2 σn2 =



2

d n

e

= log2 [πσn2 ] + log2 (e) = log2 [π σn2 e] .

(5.52)

The capacity is given by finding the probability distribution that maximizes the mutual information between the received signal Z and the transmitted signal S in Equation (5.49). To find this distribution, the calculus of variations is employed as discussed in Section 2.12.3. To be meaningful, this maximization must have some physical constraints. In particular, if the power is allowed to go to infinity, then the capacity can be infinite. The mutual information is maximized under an average power constraint P0 , % 2& s ≤ P0 .

(5.53)

The mutual information is maximized by maximizing the entropy of the received signal Z under the average power constraint. The entropy is given by h(Z) = −



d2 z p(z) log2 [p(z)] .

(5.54)

For some random complex variable X, the distribution p(x) that maximizes the entropy can be found using the Lagrangian constrained optimization. There are three basic constraints that p(x) must satisfy. These are the basic constraints on a probability, • p(x) ≥ 0



d2 x p(x) = 1 ,

and, for the problems under &consideration here, there is an average power con% straint on X such that x 2 = σx2 , •

d2 x x 2 p(x) = σx2 .

5.3 Channel capacity

161

The variational functional is φ constrained by the Lagrangian multipliers λ1 and λσ ,    2 2 d x p(x) − 1 φ = d x p(x) log2 [p(x)] − λ1   d2 x x 2 p(x) − σ 2 . (5.55) − λσ The optimal probability density is found by setting the total variation of the function δφ to zero,    1 2 2 − λ1 − λσ x δp(x) = 0 , (5.56) δφ = d x log2 [p(x)] − log 2 where the relationship log2 (x) = log(x)/ log(2) is used. By solving for p(x) such that the total variation is zero for some arbitrary variation of δp(x), the following form is found, 0 = log2 [p(x)] − p(x) = e 2(λ 1 + λ σ x

2

1 − λ1 − λσ x 2 log 2 )

2

= a eλ σ x ,

(5.57)

where a is a constant incorporating e 2λ 1 . Applying the constraints to determine λ1 and λσ , a distribution p(x) that maximizes the entropy is found to be the complex Gaussian,  2 1 = d2 x a eλ σ x π λσ λσ , ⇒a=− π = −a

(5.58)

and, by using the notation x = xr + ixi ,  2 λσ σx2 = − d2 x x 2 eλ σ x π  2 2 λσ dxr dxi (x2r + x2i ) eλ σ (x r + x i ) =− π   2 2 2 2 λσ λσ dxr dxi x2r eλ σ (x r + x i ) − dxr dxi x2i eλ σ (x r + x i ) =− π π   2 2 λσ dxi eλ σ x i dxr x2r eλ σ x r = −2 π  √   √ 1 λσ π π √ =− = −2 π λσ 2(−λσ )3/2 −λσ 1 ⇒ λσ = − 2 . (5.59) σx

162

Simple channels

Encoder

Figure 5.10 Discrete memoryless channel with random state.

Consequently, the maximum differential entropy is given by the complex Gaussian distribution, 1 −x2 /σ x2 e . (5.60) π σ2 Therefore, the distribution for Z that maximizes the mutual information is Gaussian. Because the noise is Gaussian by construction, the distribution for S must also be Gaussian. Because the variance of the sum of independent Gaussian variables is the sum of the variances, the variance of Z is the sum of the received signal power and noise power Pr + σn2 , and the channel capacity bound is given by p(x) =

I(S; Z) = h(Z) − h(Z|S) = h(Z) − h(N )

= log2 [π (Pr + σn2 ) e] − log2 [π σn2 e] .   Pr = log2 1 + 2 . σn

(5.61)

Because the Gaussian distribution maximizes entropy, Gaussian noise also has the greatest detrimental effect on the mutual information observed in Equation (5.49).

5.3.4

Additive Gaussian noise channel with state Another canonical channel in information theory is the discrete memoryless channel (DMC) with random state known at the encoder. For a single-channel use, this channel can be described by the following equation and Figure 5.10: z = s + t + n,

(5.62)

where n ∼ N (0, σ 2 ) is additive noise distributed as a zero-mean, Gaussian random variable of variance σ 2 , s is the transmitted symbol, and t is an interfering signal (also known as the state) that is known noncausally at the transmitter.

5.3 Channel capacity

163

While it may seem unrealistic that the transmitter knows the interfering signal perfectly, there are many situations in which this model is applicable. For instance, in broadcast channels where one transmitter has two different information symbols to send to two different receivers, the signal intended for a particular receiver is interference to an unintended receiver. Moreover, since the transmitter is the source of both symbols, it must know the interfering signal perfectly. Another possible application is in intersymbol-interference (ISI) channels where successive symbols interfere with one another because of dispersion by the channel. The capacity of this channel when the interfering signal t is not necessarily Gaussian has been found by Gel’fand and Pinsker [108]. To achieve capacity, an auxillary random variable U is used to aid in the encoding of the message via a binning strategy. Suppose that the transmitter wishes to send a message m (see Figure 5.10) to the receiver. In the canonical additive-white-Gaussian-noise (AWGN) channel, the transmitter will transmit an ns -symbol-long codeword, which is used to represent the message m, whereby each message maps to a single codeword. In the Gel’fand–Pinsker scheme, each message m maps to several possible codewords, each of length ns symbols. All the codewords corresponding to a particular message m are said to come from the same bin. The codeword that is ultimately selected to be transmitted is based on the value of the ns state symbols that will occur during the transmission of the ns symbols that represent the message m. Hence, since the transmitted codeword is dependent on the state symbols that will occur during the transmission of that codeword, the transmitter can precompensate for the effect of the interfering state symbols. By using random, jointly typical coding and decoding (see, for example, Reference [68]), Gel’fand and Pinsker show that the capacity of this channel is

C = max {I(U ; Z) − I(U ; T )} , p(u ,s|t)

(5.63)

where U is an auxiliary random variable that is used to generate the codewords. A detailed treatment of the general result of Gel’fand–Pinsker is beyond the scope of this text and can be found in specialized texts on information theory such as [68]. However, an example based on systems with Gaussian-distributed noise, and Gaussian-distributed state variables, called dirty-paper coding (DPC) (introduced in Reference [67]) can be used to illustrate the Gel’fand–Pinsker result. Dirty-paper coding or Costa precoding is a technique used to select an appropriate auxiliary random variable U . Costa finds that U = S + μT can be used to achieve capacity when the noise random variable N , and the interfering signal random variable T are Gaussian with variances σn2 , σt2 , respectively. Using this

164

Simple channels

choice of U , observe that I(U ; Z) = H(Z) − H(Z|U )

= H(S + T + N ) − H(S + T + N |S + μT )

= H(S + T + N ) − H(S + T + N, S + μT ) + H(S + μT )   (P + σn2 + σt2 )(P + μ2 σt2 ) 1 (5.64) = ln 2 P σt2 (1 − μ) + σn2 (P + μ2 σz2 ) and that by making S Gaussian,   σ2 I(U ; T ) = ln 1 + μ2 t . P

(5.65)

Suppose that the rate is given by     2 (P + σn2 + σt2 )(P + μ2 σt2 ) 1 2 σt , − ln 1 + μ I(U ; Z) − I(U ; T ) = ln 2 P σt2 (1 − μ) + σn2 (P + μ2 σt2 ) P (5.66) then, Costa finds that by setting μ=

P , P +N

R = I(U ; Z) − I(U ; T ) =

  1 P ln 1 + 2 . 2 σn

(5.67)

(5.68)

This rate equals the capacity as if%the &interfering signal does not exist! Note that all along we have assumed that |x|2 ≤ P . Thus, the “presubtraction” of the interference is done in such a manner that the average power of the transmitted signal is unchanged. If a naive presubtraction strategy is used, the average transmit power would be higher, as the following example illustrates. Consider a system in which the transmitted symbol at a given time s equals the difference between the codeword associated with the message m denoted by s˜ and the state t, that is s = s˜ − t .

(5.69)

Since the received signal is a superposition of s and t, the receiver only sees the codeword associated with the message m, i.e. s˜. The average transmit power for this scheme is % 2& |s| = P + σt2 . (5.70)

This power is, of course, higher than the transmit power budget. Thus, dirtypaper coding precompensates for the interference by cleverly encoding the transmitted signal without a power penalty. There is a nice geometric interpretation of the dirty-paper coding technique that is depicted in the following figures, which is based on the presentation in [314]. Consider a two-dimensional codeword space, and suppose that there are

5.4 Energy per bit

165

four possible codewords, each represented by a different shape as depicted in Figure 5.11. The dirty-paper coding technique extends this constellation by generating additional codewords that can be used to represent the original four codewords. Figure 5.12 depicts one such extension. Note that a given constellation point from Figure 5.11 may be represented by any one of the four constellation points of the same shape in Figure 5.12. For instance, suppose that the transmitter wishes to send the codeword corresponding to the circle. It looks in the extended constellation for the circle that is closest to a scaled version of the state vector μ t. We shall represent the corresponding codeword by the vector u (note that this is a vector of ns auxiliary variables). The transmitter then sends the difference between this auxiliary vector u and the scaled interfering signal, s = u − μt.

(5.71)

The scaling by μ ensures that the correct amount of power is devoted to presubtracting the interference signal. In the extreme case where there is no noise, μ = 1 and the entire interfering signal is presubtracted. When the noise is much larger than the signal, μ ≪ 1 and most of the transmit energy is used to transmit the codeword. The receiver multiplies the received vector y by μ to get μ y = μ u − μ2 t + μ t + μ n

(5.72)

and finds the closest constellation point to μ y in the extended constellation. Note that when the number of symbols per codeword ns is large, the vector μ u − μ2 t + μt = μs + μt in the previous expression lives with high probability √ in an ns -dimensional sphere of radius μ ns P around the point μ t. The scaled noise vector μ n, that is, the third term on the right-hand side of  Equation (5.72) with high probability lives in an ns -dimensional sphere of radius μσn2 . Thus, as the block size increases, that is, ns → ∞, with high probability the total number of codewords (without the extension) that can be distinct is simply the ratio of √ the volume of an ns -dimensional sphere  with radius μ ns P to the volume of an ns -dimensional sphere of radius μns σ 2 . Following the analysis of Section 5.3.1, as ns → ∞, the number of distinct codewords converges to:   P (5.73) ns log2 1 + 2 , σn which per symbol, is the capacity of the additive white Gaussian noise without interference! This interpretation of the dirty-paper coding scheme is related to Tomlinson–Harashima precoding (see Problems).

5.4

Energy per bit For a Shannon channel capacity c in bits/symbol or bits/second/hertz, the actual link data rate R in bits/second for a frequency-band-limited signal is bounded

166

Simple channels

Figure 5.11 Original constellation for Gel’fand–Pinsker/dirty-paper coding.

µt

u − µt

Figure 5.12 Extended constellation for Gel’fand–Pinsker/dirty-paper coding. The

figure shows how the codeword corresponding to the circle is transmitted when the state vector t is known.

by R ≤ Bc, where B is the bandwidth of the complex signal. If the symbol rate is equal to bandwidth, then the bound on spectral efficiency in terms of bits/second/hertz is also c. By noting that the noise power σn2 can be expressed in terms of the noise spectral density N0 , σn2 = N0 B = kB T B ,

(5.74)

where kB is the Boltzmann constant (∼ 1.38 · 10−23 J/K) and T is the absolute temperature as discussed in Section 4.2.2. The spectral efficiency bound for the spectral efficiency c ≥ R/B in bits/second/hertz is given by   Pr , (5.75) c = log2 1 + N0 B

5.4 Energy per bit

167

where Pr is the receive power of the transmitted signal, N0 is the noise spectral density (assuming complex noise), and B is the bandwidth. For this simple channel, the SNR is given by SNR =

Pr . N0 B

(5.76)

A related useful measure of signal energy to noise is Eb /N0 , which is sometimes unfortunately pronounced “ebb-no.” This is the energy per information bit at the receiver divided by the noise spectral density. The energy is given by the power divided by the symbol rate. The energy per information bit normalized by the noise spectral density Eb /N0 is given by Pr 1 B Eb = = SNR . N0 R N0 R Consequently, capacity in terms of Eb /N0 is given by   Eb R . c ≥ log2 1 + N0 B

(5.77)

(5.78)

The equality is satisfied if the communication rate density R/B is equal to the capacity c. Thus, the implicit relationship between bounding spectral efficiency and Eb /N0 is defined as   Eb c = log2 1 + c . (5.79) N0 We can solve for Eb /N0 for a capacity-achieving link: Eb c N0 2c − 1 Eb . = N0 c 2c = 1 +

(5.80)

In the limit of low spectral efficiency, we can use the fact that the log of 1 plus a small number is approximately the small number presented in Equation (2.14),   Eb c = log2 (e) log 1 + c N0   Eb c . (5.81) ≈ log2 (e) N0 Consequently, for small spectral efficiencies c ≪ 1, there is a limiting Eb /N0 that is independent of the exact value of spectral efficiency, 1 Eb ≈ −1.59 dB . ≈ N0 log2 (e)

(5.82)

Furthermore, this is the smallest Eb /N0 required for any nonzero spectral efficiency. Links with large spectral efficiencies c ≪ 1 require larger Eb /N0 . The required Eb /N0 as a function of the channel capacity is displayed in Figure 5.13.

Simple channels

8 6 Eb N0 dB

168

4 2 0 0.01

0.05 0.10 0.50 1.00 Capacity b symbol

5.00

Figure 5.13 The required noise-normalized energy per information bit as a function of

the channel capacity in terms of bits per second per hertz.

As the capacity falls below about 0.1 b/s/Hz, the required Eb /N0 does not change appreciably. The implication is that in this constant Eb /N0 regime the data rate is proportional to power. The capacity region where this is true is sometimes denoted the noise-limited regime. Above about 1 b/s/Hz, increasing the data rate requires an exponential increase in power. The capacity region in which this is true is sometimes denoted the power-limited regime.

Problems 5.1 For a satellite in geosynchronous orbit about the earth centered over the continental United States, (a) find the antenna gain required to cover the continental United States well (approximate 3 dB northeast corner of Maine and southwest corner of California, about 5000 km); and (b) evaluate the approximate effective area assuming a carrier frequency of 10 GHz. 5.2 In a line-of-sight environment without scatterers, find the largest achievable data rate between two short dipoles with the same orientation, separated by 1 km, transmitting 1 W, operating at a carrier frequency of 1 GHz, with a receiver at temperature of 300 K. 5.3 Consider Figure 5.14, which is a block diagram of a Tomlinson–Harashima precoder developed in the 1970s to mitigate intersymbol-interference. The principles of its operation are very similar to the Costa precoding described in this chapter. (a) Suppose that the box marked f (·, V ) is removed, i.e., x[k] equals m[k] with the output of the filter g[k] subtracted out. Find g[k] such that y[k] = x[k].

Problems

169

Transmitter

Figure 5.14 Tomlinson–Harashima precoder.

(b) (c)

Suppose that ||m[k]|| ≤ M . What is the largest possible value that x[k] can take, assuming that the box marked f (·, V ) is still not present? Please specify f (·, V ) such that ||x[k]|| ≤ V ∀k and show that y[k] = x[k].

5.4 By noting that at low spectral efficiency the best case Eb /N0 ≈ −1.59 dB, evaluate the minimum received energy required to decode 1000 bits at a temperature of 300 K. 5.5 Evaluate the differential entropy for a real Rayleigh random variable.

6

Antenna arrays

Arrays of antennas can be used to improve the signal-to-noise ratio (SNR) and to mitigate interference. For many communication links, the propagation environment is complicated by scattering that can distort an incoming signal both in direction and delay. In this chapter’s introductory discussion, a simplifying assumption is employed. Within this chapter, it is assumed that there is no scattering. An example would be a line-of-sight link in a large anechoic chamber. In addition, it is assumed that the antenna array is small compared with the ratio c/B of the propagation speed c to the bandwidth B. As a consequence, it is assumed that the signal is not dispersive across the antenna array. A dispersive channel would have resolvable delay spread across the antenna array. These restrictions will be removed in subsequent chapters.

6.1

Wavefront Consider a single transmitter that is a long distance away from an array of receive antennas. The wavefront that expands in a sphere about the transmitter can be approximated by a plane near the antenna array as seen in Figure 6.1. In other words, the transmitter is far enough from the receive antenna array such that the phase error associated with the plane wave approximation is small. This is a valid approximation1 when L2 , (6.1) λ if R is the distance from the source, L is the size of the array (L is the largest distance between any two antennas), and λ is the wavelength. Here it is assumed that each receive antenna is identical. Because the receive antennas are at slightly different distances from the source in general, the plane wave impinges upon each antenna with some relative delay. Under the assumption of a narrowband signal, that is, a signal that does not have sufficient signal bandwidth B to resolve the relative antenna locations c B≪ , (6.2) L R≫

1

The notation ≫ indicates much greater, which is not a precise notion, but is dependent upon the allowed error in the approximation.

6.1 Wavefront

171

Figure 6.1 Propagation of wavefront across an array that has wavevector k, wavelength λ, and transmitted signal s(t). The time delay of the signal observed between two antennas is Δt.

the delays can be approximated well by phase differences of the carrier wave. Far from a source, the propagating signal can be approximated by a plane wave with angular frequency ω = 2πf0 , where f0 is the carrier frequency. The plane wave is given by a solution ψ(x, t) to the wave equation [154] represented by the partial differential equation ∇2 ψ(x, t) −

1 ∂2 ψ(x, t) = 0 , c2 ∂t2

(6.3)

where t is time, and x ∈ R3×1 is a location in space. The characteristic direction information of the plane wave is contained within the wavevector, k ∈ R3×1 . The wavevector points along the direction of propagation and has the magnitude k =

2π λ

= 2π

f0 . c

(6.4)

There are a variety of ways of interpreting the wave equation, depending upon the application. As an example, the electric field along some direction (the direction perpendicular to the plane displayed in Figure 6.1 would be useful) could be given by the real part of ψ(x, t). More complicated polarization and geometries could be constructed by using different solutions to the wave equation ψ(x, t) for the electric field along each axis. The complex amplitude of a propagating plane wave as a function of time and location (the solution of the wave equation2 ) [154] is characterized by ψ(x, t) = a e−iω t+ ik·x , 2

(6.5)

If the sign of the terms in the exponent in Equation (6.5) are inverted, then Equation (6.3) is still satisfied. Unfortunately, both conventions are employed.

172

Antenna arrays

where the complex amplitude attenuation is given by a ∈ C (the phase of the complex attenuation is defined by the position and phase of the source), the distance x ∈ R3×1 is measured from some arbitrary origin, and k · x = kT x indicates the inner product. The location referenced by x is valid for any point in the far field by satisfying Equation (6.1). For an antenna used by the receiver, the point x at which the field is measured is often called the phase center of the antenna. Theoretically the field observed by an object of extended size can be represented by the measurement at a point; however, in practice, the exact position of the phase center can be a complicated function of the surrounding environment and of the frequency of operation. A communication signal can be carried by modulating (slowly, compared with the carrier wave frequency, modifying the phase and amplitude) of this plane wave with a complex baseband signal, s(t). By assuming that the measurable electric field in some direction is given by the real part of ψ(x, t), the resulting complex amplitude of the wavefront has the form ψ(x, t) = a e−iω t+ ik·x s(t − τ0 ) ,

(6.6)

where τ0 is the time delay for the signal to propagate from the transmitter to some local axis origin. The location x is measured from this origin. After frequency downconversion that shifts the signal down to a complex baseband, removing the carrier frequency term e−iω t , the received baseband signal at the mth receive antenna is given by zm (t), zm (t) = a eik·x m s(t − τ0 ) + nm (t) ,

(6.7)

where xm is the location for the mth receive antenna and nm (t) is the additive noise as a function of time. For the sake of clarity in the following discussion, we will assume in this section that the noise is negligibly small. The received signal under the narrowband baseband signal approximation is then an attenuated and delayed version of the transmitted signal zm (t) ≈ a ei k·x m s(t − τ0 ) .

6.1.1

(6.8)

Geometric interpretation The ei k·x m term can be understood intuitively by recognizing that the phase difference at each receive antenna is the result of the relative time delay for the wavefront to impinge upon the various receive antennas, as seen in Figure 6.1. This time delay is proportional to the relative position of the antennas along the direction of the wavevector. The component of the displacement from the origin to the antenna along the wavevector for the mth receive antenna is given by the inner product of the normalized wavevector and the antenna location vector, k · xm . k

(6.9)

6.1 Wavefront

173

The delay relative to the origin for the mth receive antenna is given by the relative distance divided by the propagation speed, Δtm =

k k

· xm

c

.

(6.10)

The relative phase is given in the argument of e−i ω (t−Δ t m ) . By focusing on the relative phase of the baseband signal, the following antenna-dependent phase term is found, eiω Δ t m = eiω

k ·x k  m c k

= eik k  ·x m = eik·x m ,

(6.11)

where the relationship k = 2π/λ = ω/c is used.

6.1.2

Steering vector The phase relationships, given a set of receive antennas, can be represented compactly when it is assumed that there is no dispersion across the array, by using a vector notation. In the absence of noise (in the very high SNR limit), the received complex baseband signal is proportional to the steering vector, z ∝ v(k) s(t − τ0 ) .

(6.12)

Depending upon what is convenient for the analysis, two different normalizations are commonly employed for the steering vector: v(k) 2 = 1 or v(k) 2 = nr . For this discussion, the former is used. Both forms are used in the text. The receive steering vector v(k) ∈ Cn r ×1 is defined to be ei k·x m . {v(k)}m = √ nr

(6.13)

As an aside, a potential source of confusion is the use of transmit versus receiver steering vectors by different authors. This decision of usage changes the sign on the wavevector. Here, it is assumed that the wavevector points along the direction of propagation. It is important to note that for this form to be valid, it is assumed that the antenna response of each antenna is identical and that each antenna has the same polarimetric and angular response for all angles. An example of this is a set of identical vertical dipole antennas, which are assumed to not interact. At best, all of these assumptions are only approximately true. As will be discussed in the following section, the assumption of having a flat phase response as a function of angle can often be relaxed because many metrics, such as beam patterns, are insensitive to overall phase, so long as the phase response as a function of angle is identical for all antennas. For applications for which knowledge of the array response is important, careful element and array calibration [104, 181] must be performed. There are numerous sources of errors that create the requirement for

174

Antenna arrays

calibration. These include position errors, electrical mismatch of antennas, and coupling of antennas to the local structure as well as other antennas. In general, it is possible to extend the steering vector response to include the individual responses of the each antenna in the array, in which case the steering vector becomes ei k·x m {v(k)}m = Am(k) √ , nr

(6.14)

where Am (k) is the direction and antenna-dependent amplitude factor. While it is typically not useful from a terrestrial communications point of view, for a variety of applications (such as communications between airborne platforms, geolocation, and radar), it is sometimes useful to find the direction or angle of arrival of the signal from a transmitter. Under the assumption that a single plane wave is impinging upon the array, an estimate of the direction to a source can be found by maximizing the magnitude of the inner product between the observed array response vector z (which contains the array’s phase and amplitude response to the impinging wavefronts plus noise) and a steering vector as a function of angle. v† (φ) z . φˆ = argmaxφ v† (φ)

(6.15)

If multiple observations are given of the receive array response, then a rich set of approaches is available. Some of these techniques are discussed in Chapter 7. The difference between array response and steering vector is rather subtle. The steering vector is the theoretical set of antenna array phases and amplitudes that is a function of the angle between the wavefront and some reference angle on the array. In the absence of the multipath scattering that is assumed in this chapter, the array response and steering vector are the same for some given wavefront propagation direction. The observed receive array response may include noise of the observation. In the more typical environment in which there is multipath, the array response will be more complicated than is allowed by the simple model assumed by the steering vector. Both of these terms are differentiated from a transmit or receive beamformer (discussed in Section 6.2) in that the beamformer is selected by the radio to achieve some goal. The inner product between the receiver beamformer and the observed array response can be used as an estimator of the transmitted signal. A reasonable choice for a beamformer in a scatterer-free environment is to employ the steering vector associated with the direction to the other radio.

6.2

Array beam pattern A receive beamformer w ∈ Cn r ×1 contains coefficients that modify the phases and amplitudes of the nr signals received by an array of antennas and then

6.2 Array beam pattern

175

sums the results, producing a single stream of data. The beamformer can be considered a form of spatial filtering, which is analogous to spectral filtering. A beamformer cam be constructed to selectively receive energy from some given direction. The resulting array beam pattern is a measure of power at the output of the beamformer (or array factor in amplitude) for signals coming from other physical directions. In general, the term beamformer can be applied to either transmission or reception. The mathematical formulation is the same up to a reversal of time. There are a variety of approaches for constructing the beamformer. Various approaches are considered in Chapter 9. The implementation of the beamformer can be at the carrier frequency; however, in modern digital receivers the beamforming is typically applied to the complex baseband signal with the equivalent effect. A transmit beamformer is the same up to a time reversal (causing a conjugation), so that, for transmitting, w∗ would be employed rather than w. The average amount of power Pw at the output of a beamformer w ∈ Cn r ×1 applied to the received data stream as a function of time z(t) ∈ Cn r ×1 , built from the vector of received signals in Equation (6.7) is given by & % † w z(t) 2 Pw = w 2 % & † w v(k) 2 a 2 s(t) 2 , = w 2

(6.16)

where it is assumed that the received signal consists of a single wavefront propagating along the wavevector k. The normalizing term of w 2 in the denominator keeps the noise power constant for different scales of the beamformer. Once again, noise is assumed to be negligibly small for this discussion. The steering vector, defined in Equation (6.13), indicates the relative phases and amplitudes & for the % incoming signal. The mean square of the transmitted signal s(t) 2 = Pt is associated with the transmit power and does not affect the shape of the beam pattern. It is sometimes useful to consider the normalized beam pattern ρw (k). It is constructed so that the matched response (when w ∝ v(k)) would be unity and is given by

ρw (k) =

w† v(k) 2 . w 2 v(k) 2

(6.17)

The relative power at the output of the beamformer is given by the square of the normalized inner product between the beamformer and the steering vector for wavefronts propagating along various directions.

176

Antenna arrays

The beamformer that maximizes ρw (k) for a wavefront propagating along some direction k0 is the matched beamformer, w = argmaxw ∝ v(k0 ) .

w† v(k0 ) 2 w 2 v(k) 2

(6.18)

Here, the solution for w is invariant under a change in some nonzero multiplicative complex constant, so that for some constant a the solution a w would produce the same normalized beam pattern, ρa w (k) = ρw (k).

6.2.1

Beam pattern in a plane For geometries such that the source and the array are constrained to be in a plane (at least approximately), the direction information contained in k can be expressed with a single angle φ relative to some axis (typically along {x}1 from Figure 6.1, which is assumed in this discussion). Consequently, the relative beam pattern can be expressed as ρw (φ) =

w† v(φ) 2 . w 2 v(φ) 2

(6.19)

Here, the angle φ indicates the angle between the axis along {x}1 and a ray pointing toward the transmitter. In an environment free of scatterers or multipath, a beamformer that maximizes the power from a direction φ0 is given by solving Equation (6.18) given the two-dimensional constraint resulting in the form w = v(φ0 ) .

(6.20)

This beamformer is equal to the expected array response, which in this environment is given by the steering vector. For a beamformer matched to the array response of a wavefront coming from φ0 (relative to the {x}1 axis), the acceptance of power at the output of the beamformer from angle φ is given by ρw = v(φ 0 ) (φ) =

v†(φ0 ) v(φ) 2 . v(φ0 ) 2 v(φ) 2

(6.21)

For a wave propagating along the direction φ + π, the mth element of the receive steering vector as a function of φ (pointing back toward the source) is given by {v(φ)}m = =

ei

2π λ

e−i

[{x m }1 cos(φ+ π )+{x m }2 sin(φ+ π )]

2π λ

√ nr

[{x m }1 cos(φ)+{x m }2 sin(φ)]

√ nr

.

(6.22)

Note that the direction of propagation is in the opposite direction of the angle to the source. This direction inversion induces a sign flip in the exponential

6.2 Array beam pattern

177

compared with what one might expect when using the direction of wavefront propagation convention. A sidelobe is a local peak of received power in a direction different from the intended beamformer direction. For many applications, such as geolocation, the levels of these sidelobes can be important because they can cause confusion with regard to direction to the signal source. In environments with interfering users, the beam pattern can be used to reduce power from interfering users at different directions. High sidelobes indicate the potential for higher levels of interference power at the output of a receive beamformer. A reasonable question is, do sidelobes matter? For many wireless communication applications, they are not important. If there is significant multipath such that the notion of line-of-sight beam patterns is not valid, then the idea of sidelobes for a line-of-sight beam pattern has little applicability. Also, if there is a single line-of-sight source, then accepting energy from other directions in addition to receiving energy from the intended direction will cause no adverse effects. Similarly, given a small number of interferers, if adaptive processing is used, then the sidelobes can be distorted to avoid accepting energy from the potential interferers. Once again, there are no adverse effects for most applications. Conversely, if line-of-sight propagation is a reasonable model, and either there is a very large number of interferers so that adaptivity is not effective, or adaptivity is not possible, then sidelobe levels can be important. If the array is being used for direction of arrival estimation in the presence of significant noise, then the sidelobes can be important because of the potential of confusing the correct angle of arrival with an angle corresponding to a sidelobe direction. Circular array example As an example, consider a regular circular receive array with nr element positions given by xm , such that {xm }1 = r cos(2πm/nr )

{xm }2 = r sin(2πm/nr ).

(6.23)

The steering vector is given by {v(φ)}m =

e−i2π r [cos(2π m /n r ) cos(φ)+ sin(2π m /n r ) sin(φ)] , √ nr

(6.24)

where r is the radius of the array measured in wavelengths, and φ indicates the direction to the source. Consider an eight-antenna regular circular array with a radius of one wavelength. The geometry is displayed in Figure 6.2. Here we employ a matched-filter beamformer optimized for φ = 0. This angle is sometimes denoted “boresight.” Conversely, the angle along the array is sometimes denoted “end fire.” While for a circular array these definitions make little sense, they are used regularly, and are more sensible for linear arrays. The matched-filter beamformer is equal to the anticipated array response. Relative power at the output of the beamformer as a function of transmitter angle is given in Figure 6.3.

Antenna arrays

1.5

0.5 0.0 0.5

x

2

Axis wavelengths

1.0

1.0 1.5 1.5

1.0 x

1

0.5 0.0 0.5 1.0 Axis wavelengths

1.5

Figure 6.2 Antenna array geometry for an eight-element array of radius one

wavelength.

Relative Beamformer Output Power dB

178

0 5 10 15 20 25

150

100

50 0 50 Angle deg

100

150

Figure 6.3 The beam pattern for eight-element circular array with radius of one

wavelength.

The relative power at the output of the beamformer is typically denoted an antenna array beam pattern. The region around φ = 0 is the mainlobe of the pattern. The width of the mainlobe in terms of radians is very approximately given by the wavelength divided by the aperture, in this case about 1/2 or

6.3 Linear arrays

179

a little less than 30 degrees. The aperture is the length of the array.3 For a beamformer optimized for φ = 0, the amount of power accepted from other directions can be relatively high. In this example, at the angle of about ±85 degrees, the attenuation is only down by 5 dB. This region of relatively low attenuation is denoted a sidelobe. It is often desirable to minimize the height of these sidelobes to minimize interference from undesired sources. The various approaches to do this include both adaptive and nonadaptive techniques.

6.3

Linear arrays For a linear receive antenna array in the {x}1 –{x}2 plane with the antennas along the {x}2 axis starting at the origin (as seen in Figure 6.4), with regular antenna spacing d, the positions are given by {xm } = (m − 1) d .

(6.25)

The inner product between the wavevector and the position of the antenna (determined by angle φ) is given by 2π (m − 1) d sin(φ) . (6.26) λ In the special case of a line-of-sight transmitter, the array response is given by k · xm = −

{v(φ)}m =

e−

i 2 π ( m −1 ) d s i n ( φ ) λ

√ nr

,

where the arbitrary normalization is chosen by using the term magnitude of the array response is 1,

(6.27) √

v(φ) = 1 .

nr so that the (6.28)

Given this formulation, the beam pattern is given by ρv(φ 0 ) (φ) = v†(φ) v(φ0 ) 2 $ n −1 $2 r $1  $ −i 2 π m d s i n ( φ 0 ) $ i 2 π m d sin (φ ) $ λ λ =$ e e $ $ nr $ m =0 $2 $n −1 r $ 1 $ $  i 2 π m d [ s i n (λφ ) −s i n ( φ 0 ) ] $ = 2$ e $ . $ nr $

(6.29)

m =0

In particular, consider an eight-antenna regular linear array with spatial sampling of 1/2 wavelength. This sampling is the spatial equivalent of Nyquist sampling in the temporal domain. The geometry of the array is displayed in 3

The notion of aperture is not precisely defined. Sometimes it is useful to define it in terms of the root-mean-square size of an array, because this corresponds to a parameter developed by the Cramer–Rao bound. Often a sufficient definition is the largest length between any two elements.

Antenna arrays

2

Axis wavelengths

3

2

1

x

180

0 2

1 x

0 1 1 Axis wavelengths

2

Figure 6.4 Antenna array geometry for an eight-element linear array with spacing of

1/2 wavelength.

Figure 6.4. For a matched-filter beamformer optimized for φ = 0 (along the {x}1 axis), with steering vector 1 v(φ0 ) = √ nr ⎛ 1 ⎜ 1 ⎜ 1=⎜ . ⎝ .. 1



⎟ ⎟ ⎟, ⎠

(6.30)

the relative power at the output of the beamformer as a function of transmitter angle is given in Figure 6.5. As with the circular array, the region around φ = 0 is the mainlobe of the pattern. The width of the mainlobe is approximately given in radians by the wavelength divided by the aperture. In this case, the beamwidth is about 1/4 radians or a little less than 15 degrees. In this example, the peak sidelobes are lower than the peak by about 13 dB. In fact, because of the rotational symmetry about the {x}2 axis, energy received from various angles will be equal in response for the line along a cone at the given angle as displayed in Figure 6.6. In this particular example in a plane, the rotational symmetry creates an exact forward–backward ambiguity in the beamformer, causing the acceptance of power to be equal at φ = 0 and φ = 180 degrees. For isotropic antenna elements, the Nyquist spacing for an antenna array is d = λ/2. At this spacing, there will be no ambiguities over the range of angles φ ∈ (−π/2, π/2) for signals in the {x}1 –{x}2 plane. Out of this plane, there can be some confusion. The direction from which energy is preferentially received

Relative Beamformer Output Power dB

6.3 Linear arrays

181

0 5 10 15 20 25

150 100

50 0 50 Angle deg

100 150

Figure 6.5 The beam pattern for an eight-element linear array with spacing of 1/2

wavelength.

Cone of Ambiguity

{x}2 axis

ous bigu Am ection Dir

Dire

ctio Sou n To rce

Source

Antenna Array

φ {x}1 axis

Figure 6.6 For linear array direction to source and cone of ambiguity.

is determined unambiguously up to a cone generated by rotating a ray at the steering angle about an axis along which the antenna array lies as seen in Figure 6.6. If each antenna in the array has some beam pattern so that it only receives energy over some limited range of angles, the Nyquist sampling distance would be larger without ambiguity. This discussion assumes that each of the antennas is pointed in the same direction. While it is convenient to discuss isotropic antennas here, many practical systems employ antenna arrays with antennas or subarrays that have some gain on their own.

182

Antenna arrays

6.3.1

Beam pattern symmetry for linear arrays Because of the construction of linear arrays, matched-filter array beam patterns evaluated for these arrays have a reflection symmetry about the pointing direction. The beam pattern for an array pointing in direction φ0 receiving power from φ is given by ρv(φ 0 ) (φ) = v†(φ) v(φ0 ) 2 $2 $n −1 r $ 1 $ $  i 2 π m d [ s i n (λφ ) −s i n ( φ 0 ) ] $ e = 2$ $ . $ nr $ m = 0 $n −1   r 2π m d [sin(φ) − sin(φ0 )] 1 $ $ cos = 2$ nr $ m = 0 λ $2  2π m d [sin(φ) − sin(φ0 )] $ $ +i sin $ λ $ $n −1 2   r 2π m d [sin(φ) − sin(φ0 )] $ 1 $ $ $ cos = 2$ $ $ nr $ m = 0 λ $n −1  $2 r 2π m d [sin(φ) − sin(φ0 )] $ 1 $ $ $ sin + 2$ $ . $ nr $ m = 0 λ

(6.31)

The array response is symmetric under the transformation

[sin(φ) − sin(φ0 )] → −[sin(φ) − sin(φ0 )] .

(6.32)

If the reference direction is along boresight φ0 = 0, then the symmetry is observed for the transformation φ → −φ. This symmetry is broken in two-dimensional arrays.

6.3.2

Fourier transform interpretation By considering a discrete sampling in sin(φ), the beamformer can be constructed from the discrete Fourier transform (DFT) of the antenna weighting vector, which is defined by the existence of antennas on a regular lattice. We consider a periodic or regular sampling of the sine of the angle φ, u = sin(φ) ∈ [−1, 1] .

(6.33)

The samples in u are represented by regular samples q Δu, where q ∈ {−M/2, . . . , M/2} is the index parameter and Δu is the distance between samples in u. The value of Δu is chosen so that there are M + 1 samples from u = −1 to u = 1. For convenience, it is assumed that M is an even integer. The value of the angular step size Δu is given by Δu =

2 2 = . M +1−1 M

(6.34)

6.3 Linear arrays

183

For this discussion, it will be convenient to define the antenna weighting vector a ∈ CM ×1 indexed from 0 to M − 1, {v(φ0 )}m m ∈ {0, . . . , nr − 1} . (6.35) {a}m = 0 m ∈ {nr , . . . , M − 1} The vector contains information about both the existence of an antenna at some lattice position and the phasing of that element. Given these definitions and the assumption of λ/2 antenna spacing, the beam pattern is given by ρa (φ = arcsin[q Δu]) = v†(φ) v(φ0 ) 2 $n −1 $2 r $ $ $ $ ∗ =$ {v(arcsin[q Δu])}m {a}m $ $ $ m =0 $ $2 r −1 $ 1 n $ $ $ iπ m q Δ u e {a}m $ = $√ $ nr $ m =0 $ $2 −1 $ 1 M $ $ $ eiπ m q Δ u {a}m $ = $√ $ nr $ m =0 $ $2 −1 $ 1 M $ i2π m q $ $ = $√ e M {a}m $ , $ nr $

(6.36)

m =0

so that the the beam pattern is now evaluated at a discrete set of points determined by q Δu. Here we employ the observation that extending the range of summation from nr − 1 to M − 1 indices has no effect because of the zero entries in the antenna weighting vector a. The argument of the norm operator, which we will denote the complex beam pattern b(q) as a function of q, is given by M −1 1  i2π m q e M {a}m . b(q) = √ nr m = 0

(6.37)

The M + 1 values of q are given by q ∈ {−M/2, −M/2 + 1, . . . , M/2 − 1, M/2} .

(6.38)

However, because the exponential has the same value for arguments with imaginary components separated by integral multiples of 2π, the first and the (M +1)th indices are redundant, i2π

eM

mq

i2π

=eM

m (q +M )

.

(6.39)

So only the values of q ∈ {−M/2, −M/2 + 1, . . . , M/2 − 1}

(6.40)

184

Antenna arrays

are necessary. For convenience, consider the admittedly strange ordering of possible values of q,  M M M M M q ∈ 0, 1, . . . , − 2, − 1, − , − + 1, − + 2, . . . , −1 . (6.41) 2 2 2 2 2 Notice that the negative values have been moved to the right-hand side of the list. Furthermore, because of the same modularity characteristic used above, a new index variable q ′ is constructed spanning the same space of angles, q ′ ∈ {0, 1, . . . , M − 2, M − 1} .

(6.42)

The difference between q and q ′ is analogous to the difference in considering the discrete Fourier transform of a time domain sequence. The spectrum can be represented in an approximately symmetric domain about zero frequency, or the spectrum can be represented by a domain covering zero to approximately twice the maximum physical frequency. Given this new index variable q ′ , the complex amplitude beam pattern b(q ′ → q) now indexed by q ′ rather than q is given by M −1 1  i2π m q ′ b(q ′ → q) = √ e M {a}m . nr m = 0

(6.43)

By defining the vector b, ⎛

⎜ ⎜ b=⎜ ⎝

b(q ′ = 0) b(q ′ = 1) .. . b(q ′ = M − 1)



⎟ ⎟ ⎟, ⎠

(6.44)

the explicit relationship between the complex beam pattern and the discrete Fourier transform of the antenna weighting vector can be found, √ M (6.45) b = √ Fa, nr where F is the discrete Fourier transform matrix defined in Equation (2.225). Consequently, the beam pattern is given by the discrete Fourier transform of the antenna weighting vector, ρa (arcsin[q Δu]) =

6.3.3

M 2 {F a}q ′ . nr

(6.46)

Continuous Fourier transform approximation It is sometimes useful to consider a continuous approximation of the antenna array because it enables convenient intuition. This approximation is developed by considering the limiting forms of Equations (6.35) and (6.46). The development is similar to the discrete version discussed previously.

6.3 Linear arrays

185

In the continuous version of the vector inner product, one can think of an infinite-dimensional vector indexed by some continuous parameter, as introduced in Section 2.1.4. While we will not be particularly involved in the technical details, this vector space is sometimes referred to as an infinite-dimensional Hilbert space. For example, the antenna weighting vector a can be indexed by the position along the linear array x, under the assumption of some pointing direction u. The continuous antenna weighting function is denoted a → fa (x; u) ,

(6.47)

where the function is defined as a distance along the antenna array x in units of wavelength. Inner products in this complex infinite-dimensional space are given by integrating over the indexing parameter. In this case it is x. The complex infinite-dimensional inner product between function f (x) and g(x) is denoted  (6.48) f (x), g(x) = dx f (x) g ∗(x) . The beam pattern is related to the magnitude squared of the Fourier transform of a continuous version of the antenna weighting vector. Similarly, the continuous version of complex beam pattern is denoted b → fb (u) ,

(6.49)

where function is defined in terms of the direction u = sin(φ). In a continuous analog to the steering vector, the phasing of fa (x; u) is given by ⎧ ⎪ 0 ;xL

where L is the length of the antenna array in units of wavelength. The inner product between the continuous steering vectors at some direction u and boresight u = 0 is given by  fa (x; u), fa (x; 0) = dx fa (x; u) fa∗ (x; 0)  1 L dx e−i2π u x = L 0  −1  −i2π u L e −1 = i2π u L  e−iπ u L  iπ u L e − e−iπ u L = i2π u L sin (πuL) = e−iπ u L πuL (6.51) = e−iπ u L sinc (uL) ,

186

Antenna arrays

where the normalized sinc function4 is given by sinc(x) = sin(πx)/(πx). This inner product is similar to the Fourier transform of the continuous antenna weighting pointed toward boresight fb (u). The continuous version of the complex pattern b is given by the Fourier transform of the antenna weighting function fa (x; u),  fb (u) = dx e−i2π u x fa (x; 0)  L 1 dx e−i2π u x =√ L 0 L 1 i −i2π u x  √ e =  L 2πu 0

1 1 (1 − e−i2π u L ) =√ L i2πu √ = L e−iπ u L sinc(uL) .

(6.52)

Similar to the discrete case, the inner product between steering vectors is related to the Fourier transform of the array weighting vector, 1 fa (x; 0), fa (x; u) = √ fb (u) . L

(6.53)

The normalized beam pattern is given by the magnitude squared of the normalized inner product of continuous steering vectors or functions, 2

ρ(u) =

fa (x; 0), fa (x; u) fa (x; 0), fa (x; 0) fa (x; u), fa (x; u) 2

= fa (x; 0), fa (x; u)

= sinc2(uL) 1 2 fb (u) . = (6.54) L By using this analysis, the peak sidelobe in the beam power pattern ρ(u) is about 13 dB below the peak, as seen in Figure 6.7. The value of the peak sidelobe can be reduced by using windowing or tapering techniques [137]. However, windowing increases the width of the mainlobe and causes loss in peak gain.

6.4

Sparse arrays

6.4.1

Sparse arrays on a regular lattice There is no fundamental reason to require filled regular arrays. In fact, for many applications, performance can be improved by using sparse arrays given the same number of antennas. For a given number of antenna elements, the width of the 4

While it sometimes unfortunately causes confusion, there are two commonly used normalizations for the sinc function. In this text, the normalized sinc function (preferred in the communications and signal processing literature) is used rather than the unnormalized sinc function, sin(x)/(x), that is commonly used in mathematics.

6.4 Sparse arrays

187

fa x;0

(a)

1 0.5 0 0.5

0

0.5 xL

1

(b)

Beam Pattern, ␳ u

0 5 10 15 20 25 30

3

2

1 0 1 Direction, u L

2

3

Figure 6.7 (a) The continuous antenna weighting function in terms of position along

the antenna array, assuming the array is phased to point perpendicularly to the array (along boresight). (b) The beam pattern of the continuous antenna array approximation as a function of the product of the direction u = sin(φ) and the array side L.

mainlobe narrows if a sparse array is used. The approximate mainlobe width of a linear array is proportional to the inverse of the root mean square antenna position, under the assumption that the average element position is zero. However, the sidelobe levels increase. Depending upon the application, the increase in sidelobe levels may or may not be of interest. For linear antenna arrays constrained to a regular lattice of spatial sample points, Equation (6.46) can be employed by using a sparse antenna weighting vector a in which the values of 1 and 0 are mixed over some aperture (assuming that the array is phased to point perpendicularly to the array). For a given number of antennas, a sparse array will have a narrower main beam and higher peak sidelobes. An example form of a beam pattern for sparse arrays on a regular lattice is given by M {b}q ′ 2 nr √ M b = √ Fa, nr

ρ(φ = arcsin[q Δu]) =

(6.55)

where a notional example of a sparse array is given by T

a = (1 · · · 0 · · · 1 · · · 0 · · · 1 0 0 · · · 0) .

(6.56)

188

Antenna arrays

For applications such as direction finding, with sparse arrays the sidelobes can become sufficiently high to cause angle-estimation confusion. For communications, these sidelobes are not typically an issue because, in complicated multipath environments, knowledge of the direction is not particularly meaningful. However, the sparse array element can improve the spatial diversity, typically improving performance.

6.4.2

Irregular random sparse arrays There is no fundamental reason to require that antennas are placed on a regular lattice. Random arrays with continuous irregular spacing are considered here. For the sake of discussion, it is assumed that the antennas are constrained to a linear array. Although we typically think of these arrays in the context of reception, so that we use nr to indicate the number of antennas, here the arrays could be used for either transmit or receive. The sidelobe distribution is the same. Approximations for the distribution of peak sidelobes for random arrays are addressed in References [83, 4, 234, 294]. Given an array of length L with nr elements placed with a uniform random distribution within the region of length L in units of wavelength, the probability for the ratio of the magnitude of the peak sidelobe to the mainlobe (r2 = s.l./m.l.) in terms of power to be less than some value η 2 is approximately given by  8 √ 9 2 L2 − 4π n r η 2 e −n r η 12π −n r η 2 , (6.57) ]e Pr(r < η) ≈ [1 − e which will be developed within this section. The approximate result in Equation (6.57) for the distribution of the ratio of peak sidelobe to mainlobe of a beam pattern for a linear random array is found by estimating the probability Pn oC r (η) of not having a beam pattern moving from below the threshold to exceeding the threshold along at any point as one sweeps across in angle. If, as the test angle is changed, the sidelobe moves from below the threshold to above the threshold there is said to be an upward crossing. This probability is multiplied by the probability that the sidelobe at the first angle tested was below of the threshold Pbelow (η). Thus, the probability of starting below the threshold and remaining below the threshold over the field of view is given by Pr(r < η) ≈ Pn oC r (η) Pbelow (η) .

(6.58)

The probability of being below the threshold at a given point Pbelow (η) is found by observing that the sum of independent random phasors tends toward a Gaussian distribution because of the central limit theorem. The power received at the output of a matched-spatial-filter beamformer constructed for direction u0 = sin φ0 given an array response from some other direction u = sin φ, for angles φ0 and φ from boresight, is given by the inner product between the two steering vectors v† (u0 ) v(u), where the steering vector v(u) ∈ Cn r ×1 . For some direction

6.4 Sparse arrays

189

u, the value of the complex ratio of the sidelobe to mainlobe amplitude z is given by z(u) =

v† (u0 ) v(u) v† (u0 ) v(u0 )

= v† (u0 ) v(u) r(u) = z(u) ,

(6.59)

where, in this section, it is assumed that the norm of the steering vector is 1, v(u) = 1 ∀ u.

(6.60)

To make contact with previous sections, the square of the sidelobe to mainlobe amplitude ratio is the normalized beam pattern, which is given by ρ(u) = r2 (u) .

(6.61)

It is assumed that the steering vector can be constructed by using a simple plane wave model for a linear array. The element of the steering vector associated with the mth randomly placed antenna is given by 1 {v(u)}m = √ eik u x m , nr

(6.62)

where k = 2π/λ is the wavenumber for wavelength λ, and xm is the position of the mth randomly placed antenna. Note that because we assume a linear array, the distance along the array can be expressed by using the scalar xm . The value of the ratio of sidelobe to mainlobe amplitude is given by z(u) = v† (u0 ) v(u) 1  −ik u x m ik u 0 x m = e e nr m 1  −ik x m (u −u 0 ) e . = nr m

(6.63)

By invoking the central limit theorem (assuming u = u0 ) in the limit of a large number of antennas nr , the probability distribution for z(u) in the sidelobe region is approximated well by a complex Gaussian distribution. The distribution for the magnitude of the ratio r(u) is then given by a Rayleigh distribution. The variance of the sum of independent phasors is given by the number of elements nr . Because of the 1/nr amplitude normalization, the mean of sidelobe-to-mainlobe power ratio r2 (or the variance of sidelobe-to-mainlobe power ratio) is 1/nr , dPbelow (r) = 2 nr r e−n r

r2

dr .

(6.64)

By integrating the probability density from zero to the threshold η, the value for the probability of being below the threshold at some sidelobe level at a specific

Antenna arrays

Prob SL Below Threshold

190

1.0 0.8 0.6 0.4 0.2 30

25

20 15 10 Threshold dB

5

0

Figure 6.8 Probability that sidelobe-to-mainlobe ratio for sparse arrays with 25 (light

gray), 20, 15, and 10 (black) randomly placed antennas is less than some value.

direction away from the mainlobe is given by Pbelow (η) = =



η

0 η

dPbelow (r) dr 2 nr r e−n r

r2

0

= 1 − e−n r

η2

.

(6.65)

This approximation for the probability of ratio of sidelobe-to-mainlobe amplitude r to be less than some threshold is displayed in Figure 6.8. This probability is displayed for 10, 15, 20, and 25 antenna elements. Because the approximation employs the central limit theorem, a relatively large number of elements is assumed. It is interesting to note that this distribution is not explicitly dependent upon the size of the array, although it is assumed that the array is sparse, that is, the aperture is large in units of wavelength compared to the number of antenna elements. If the array were small in number of elements or aperture, then the validity of the central limit theorem approximation would be questionable. Simple approximation At this point, one could observe that the Nyquist sampling density is 1/(2L), and that the scanned space for u is from −1 to 1, so that there are up to approximately 2 · 2L/λ − 1 distinct sidelobes. Under the generous approximation that they are independent, the probability of exceeding the threshold is approximately given by  Pr(r < η) ≈ 1 − e−n r

η2

4L /λ−1

.

(6.66)

6.4 Sparse arrays

191

Threshold crossing formulation A better approximation [83, 4], by using techniques discussed in Reference [27] and references therein, is given by constructing the probability Pn oC r (η) that there are no crossings within the field of view (the domain of potential sidelobe directions). The probability density of no sidelobe crossing from below to above the threshold within some small region du of u is denoted pcr (u; η). The probability of not crossing over the entire observed region is given by integrating over the probability density as a function of direction u for threshold η,  Pn oC r (η) = 1 − du pcr (u; η) pcr (u; η) du = Pr {η − r′ (u) du < r(u) < η}  η  ∞ ′ pcr (u; η) du = dr dr p(r, r′ ) , 0

=





η −r ′ (u )du

dr′ p(η, r′ ) r′

0



du ,

(6.67)

where r′ = r′ (u) = ∂r(u)/∂u is the derivative of the sidelobe-to-mainlobe amplitude ratio with respect to u that is evaluated at u, and p(r, r′ ) is the joint probability density values of sidelobe-to-mainlobe ratio r and the derivative of the ratio with respect to direction u. The explicit functional dependence of the sidelobe-to-mainlobe ratio r and and its derivative r′ is usually suppressed, but is sometimes displayed for clarity. The integral over r′ is only for positive values because only upward crossings are considered. Gaussian probability density model The probability density for the probability of a particular sidelobe-to-mainlobe amplitude ratio r and the derivative of the ratio with respect to direction u (denoted r′ ) is given by − r e p(r, r ) =  ′2 2 2πσr σr ′



r2 2σ r



+ 2 σr ′2 r



,

(6.68)

where σr2 and σr′2 are the variances of the real part of the complex amplitude ratio and the derivative of the real part of it respectively. To develop this density, a few intermediate results are required. It is assumed that the probability of the value of the real zr and imaginary zi parts of the sidelobe-to-mainlobe complex amplitude ratio z and their derivatives (zr′ and zi′ ) can be represented by real independent Gaussian distributions. The probability density for these variables is given by − 12 1 e p(zr , zi , zr′ , zi′ ) = ′ (2π)2 σr σi σr′ σi



z r2 σ r2

+

z i2 σ2 i

z ′2

+ σ r′2 + r

z i′2 σ ′2 i



,

(6.69)

where σi2 and σi′2 are the variances of the imaginary portion of the amplitude ratio and its derivative with respect to the direction parameter u. In the sidelobes,

192

Antenna arrays

it is expected that there is symmetry between the real and imaginary portions of the amplitude ratio. Consequently, the variances for the real and imaginary parts are equal, σi2 = σr2 and σi′2 = σr′2 . The probability density for the real and imaginary parts of the amplitude ratio can be expressed in terms of polar coordinates, and are given by zr = r cos(θ) zi = r sin(θ) ,

(6.70)

where the parameter θ = arctan(zi /zr ) is implicitly a function of u. The derivatives with respect to the direction parameter u of the real and imaginary part of the amplitude ratio are given by zr′ = r′ cos(θ) − r θ′ sin(θ)

zi′ = r′ sin(θ) + r θ′ cos(θ) ,

(6.71)

where θ′ is the derivative of θ with respect to u. The sum of the squares of the real and imaginary components of the amplitude ratio and their derivatives are given by zr2 + zi2 = r2 zr′2 + zi′2 = r2 + r2 θ′2 .

(6.72)

The probability density in terms of the polar coordinates is given by − 12 r2 p(r, r , θ, θ ) = e 2 2 ′2 (2π) σr σr ′





r2 σ r2

+r

′2 + r 2 θ ′2 σ r′2



,

(6.73)

To find the probability density for the magnitude of the amplitude ratio and its derivative, an integration over the angular components is performed,   p(r, r′ ) = dθ dθ ′ p(r, r′ , θ, θ′ ) = =









− 21 r2 e dθ ′ (2π)2 σr2 σr′2

− 12 r2 dθ e 2 ′2 2π σr σr ′

− 21 r2 e = 2 ′2 2π σr σr



− 21 r e =√ 2π σr2 σr′

r2 σ r2



 ′2

+ σr ′2

r2 σ r2

r

r2 σ r2

+r



σr′

′2

+ σr ′2 r



r2 σ r2

+r

′2 + r 2 θ ′2 σ r′2



′2 + r 2 θ ′2 σ r′2







r 

,

(6.74)

where the integral over θ is from 0 to 2π, and the integral over θ′ is from −∞ to ∞. The next issue is to determine the variances of the amplitude ratio and its derivative. The variance of the sidelobe-to-mainlobe ratio σ 2 is 1/nr . Consequently, the variance of the real (and equivalently the imaginary) component of

6.4 Sparse arrays

193

the amplitude ratio is given by σr2 =

1 . 2 nr

(6.75)

The evaluation of the variance of the derivative of the amplitude ratio is slightly more involved. The variance of the derivative term is given by the second derivative of the autocovariance as its argument approaches zero [241], σr′2 = − lim

δ u →0

∂2 Rz ,z (δu) , ∂δu r r

(6.76)

where Rz r ,z r (δu) is the autocovariance5 of the real component of the amplitude ratio zr . This relationship can be understood by noting that for a zero-mean stationary (which means the statistics are independent of angle far from the mainlobe) processes a(u), the autocovariance Ra,a (u1 , u2 ) is given by Ra,a (u1 , u2 ) = a(u1 ) a∗ (u2 ) ∂ Ra,a (u1 , u2 ) Ra ′ ,a (u1 , u2 ) = ∂u1 ∂2 Ra ′ ,a ′ (u1 , u2 ) = Ra,a (u1 , u2 ) . ∂u1 , ∂u2

(6.77)

If the process is stationary (angle independent) in u (which is a reasonable assumption far from the mainlobe if the probability distribution of the locations of the antennas changes slowly across the aperture), then the autocovariance is a function of the difference δu = u1 − u2 and is given by Ra ′ ,a ′ (δu) = Ra ′ ,a ′ (u1, u2) ∂2 Ra,a (u1 , u2 ) ∂u1 ∂u2 ∂2 = Ra,a (u1 − u2 ) ∂u1 ∂u2 ∂2 Ra ′ ,a ′ (δu) = − Ra,a (δu) , ∂δu2 =

(6.78)

where ∂/∂u1 → ∂/∂δu and ∂/∂u2 → −∂/∂δu because δu = u1 − u2 . Consequently, the variance of a′ (u) (which is the derivative of some random process a with respect to the variable u), denoted σa2 ′ , is found by setting the second derivative of the autocovariance to zero: σa2 ′ = − lim

δ u →0

∂2 Ra,a (δu) . ∂δu2

(6.79)

In the sidelobe region, it is expected that the autocovariance of the real and imaginary components are approximately equal. 5

Here we use R {·, ·} (·) to indicate covariance rather than correlation used elsewhere in the text.

194

Antenna arrays

To calculate the autocovariance of the amplitude ratio, we will make a couple of observations. Because, in this section, length is represented in units of wavelength, the wavenumber k is equal to 2π, and the contribution of the mth antenna is proportional to ei2π y m u . It is assumed here that the antenna elements are distributed randomly with uniform probability. Consequently, the probability density for the elements’ positions is given by p(y) = 1/L. Thus, the autocovariance of the real component of the amplitude ratio is given by Rz r ,z r (δu) = =

 1 nr

nr 1  cos(2π ym u) cos(2π ym [u + δu]) nr m = 1     sin δ u2k L 1 sin 21 2πL(δu + 2u) + 2πL δu nr 2πL(δu + 2u)  δ u 2π L  sin 2 2πL δu

dy p(y)

1 nr 1 sinc(δu L) , = 2 nr ≈

(6.80)

where, because we are considering the sidelobes, the second term in the second line of the above equation (decaying rapidly with the term L [δu + 2u]) is small. The variance of the derivative of the real part of the amplitude ratio is given by ∂2 Rz r ,z r (δu) δ u →0 ∂δu   cos(δu Lπ) sin(δu L π) L π sin(δu L π) − − = − lim δ u →0 δu2 2δu3 L π 2δu 2 2 L π = . 6 nr

σr′2 = − lim

(6.81)

By substituting the value for the variances, the joint probability density for the sidelobe-to-mainlobe ratio and the derivative of the ratio is given by − 12 r p(r, r′ ) = √ e 2π σr2 σr′

=√ 2π

=r





+ σr ′2 r

− 12

r 1 2 nr

′2

r2 σ r2

L2 π 2 6 nr

12 −1 e 2 π 3 L2 n3r

e 



 r2 1 2 nr

+

r ′2 L2 π2 6 nr

′2

2 n r r 2 + 6 Ln2r πr 2



,



(6.82)

where Equations (6.75) and (6.81) are employed to determine the values for σr2 and σr′2 . The probability pcr (u; η) of crossing the threshold at some direction u [formulated in Equation (6.67)] is given by integrating the joint probability density over all derivative values near the threshold value of interest dr ≈ r′ du. The resulting probability density is a function of the direction and threshold

6.4 Sparse arrays

195

level only, and is given by pcr (u; η) du =



0

=

,



 dr′ r′ p(η, r′ ) du

π nr η L e−n r 3

η2

du .

(6.83)

From the above discussion, the total probability of the peak sidelobe being below the threshold η is approximated by P (r < η) ≈ Pn oC r (η) Pbelow (η)   2 = 1 − e−n r η Pbelow (η)  M   um ax − um in pcr (u; η) 1− Pbelow (η) = lim M →∞ M m =1    M  um ax − um in m−1 pcr um in + (um ax − um in ); η 1− = lim M →∞ M M m =1   ∞ M   2 ′ ′ ′ 1− = lim dr r p(η, r ) M →∞ M 0 m =1 ,  M   2 2 π nr η L e−n r η 1− = lim M →∞ M 3 m =1 1 0 √ 2 − [ 4 π 3n r η L e −n r η ] =e , (6.84) where um ax and um in are the limits of direction and are given by 1 and −1 respectively, and pcr (u; η) is evaluated by using Equation (6.83). Here we employ the observations that the argument of the product above is independent of u and that from Equation (2.15) the product can be expressed as an exponential by using the following relationship,  x n lim 1 + = ex . (6.85) n →∞ n Consequently, the product of the probability of exceeding the threshold at a given initial angle times the probability of crossing the threshold η over the visible region is given by 0 √ 1 2 − [ 4 π 3n r η L e −n r η ] −n r η 2 ]e . (6.86) Pr(r < η) ≈ [1 − e

This approximation for the probability of the ratio of peak sidelobe-to-mainlobe power r2 to be less than some threshold is displayed in Figure 6.9. Here an ensemble of random arrays with uniform element likelihood under an aperture constraint of 50 wavelengths is displayed for 10, 15, 20, and 25 antenna elements.

Antenna arrays

Prob s.l. m.l.

r2

2

196

1.0 0.8 0.6 0.4 0.2 0.0

10

8

6

Threshold,

4 2

2

0

dB

Figure 6.9 Under an aperture constraint of 50 wavelengths, the probability of

sidelobe-to-mainlobe ratio exceeding a given threshold for random sparse arrays with 25 (light gray), 20, 15, and 10 (black) randomly placed antennas is less than some value.

There are a couple of useful interpretations of the probability Equation (6.86), depending upon how one wishes to use this result. One use of this result is for applications in which the random distribution of elements over some given area cannot be designed. An example might be for randomly distributed nodes in a sensor network. Given some desirable peak sidelobe level, the number of nodes or size of aperture can be modified so that there is a high probability that the peak sidelobe is no larger than some design value. Alternatively, one could use these probabilities as a system design tool. One can quickly define system parameters for which some good sparse antenna array exists with some high likelihood. Later, a specific array can be designed either by some optimization process or by simply simulating multiple throws of random arrays until one finds an array that satisfies the sidelobe requirements.

6.5

Polarization-diverse arrays It is common in discussions of antenna arrays to ignore the existence of polarization. This is a reasonable approximation if the antennas in the array all have identical polarization. Any mismatch in the polarizations between the transmitter and receiver is folded into the overall attenuation. However, by ignoring polarization, some opportunities in increasing diversity and angle estimation are missed.

6.5.1

Polarization formulation A wavefront propagating in free space has the freedom to propagate with some linear combination of two independent polarizations [154, 290, 229]. These polarizations are often expressed as horizontal and vertical, or, by a simple transformation, left and right circular polarizations. The horizontal and vertical

6.5 Polarization-diverse arrays

197

Figure 6.10 Propagation of wavefront along the {x}1 -axis with horizontal and vertical polarization.

polarization axes are defined by the directions of the electric field oriented horizontally and vertically in the plane perpendicular to the direction of propagation. While line-of-sight propagation is considered within this chapter, more generally it is worth noting that because many channels have complicated multipath scattering, energy propagating along any direction has some probability of getting from the transmitter to the receiver. Consequently, all directions of propagation and polarization are of interest. In particular, one can imagine employing an array of three crossed dipole antennas, all centered at some point. The simple plane wave description in Equation (6.5) is extended here to include polarization. The vector of electric field components e(x, t) for each direction as a function of location and time under the assumption that a plane wave is propagating along the {x}3 -axis such that {k}1 = {k}2 = 0 is given by ⎧⎛ ⎞⎫ ⎨ ψ1 (k, t) ⎬ e(x, t) = ℜ ⎝ ψ2 (k, t) ⎠ ⎩ ⎭ 0 ⎫ ⎧⎛ ⎞ ⎬ ⎨ a1 = ℜ ⎝ a2 ⎠ ei({k}3 {x}3 −ω t) , ⎭ ⎩ 0

(6.87)

where ψ1 (k, t) and ψ2 (k, t) are the plane wave solutions as a function of time t to the wave equation propagating along k associated with polarization along the {x}1 -axis and polarization along direction {x}2 -axis, respectively. The parameters a1 and a2 are the complex amplitudes along the 1 and 2 axes. Given the defined geometry (as seen in Figure 6.10), the horizontal and vertical polarizations of the wavefront are associated with the {x}1 -axis and {x}2 -axis, respectively. The third element is 0 because there is no electric field along the direction of propagation in free space. A horizontally polarized wavefront (as seen in Figure 6.10) is characterized by a1 = 0 , and a2 = 0 .

(6.88)

198

Antenna arrays

Similarly, a vertically polarized wavefront is characterized by a2 = 0 , and a1 = 0 .

(6.89)

An arbitrary linear polarization is given by a1 = r1 eiφ , and a2 = r2 eiφ ,

(6.90)

where r1 and r2 are real parameters, and a1 and a2 have a common complex phase, φ. This basis corresponds to starting with a horizontally or vertically polarized wave and rotating the axes (or physically rotating the antenna) about the direction of propagation. An arbitrary elliptical polarization allows for values a1 = 0 , and a2 = 0 .

(6.91)

Right- and left-handed circularly6 polarized wavefronts correspond to a1 = b , and a2 = ±i b ,

(6.92)

where b is some complex valued parameter. The positive sign for the ±i term indicates a right-handed polarization, and the negative sign indicates a lefthanded polarization. The circular basis for electric polarization, {right, left, propagation}, for ecir is related to the linear basis for a wavefront propagating along {x}3 by ⎧⎛ ⎞⎛ ⎞⎫ 1 1 ⎪ ψ1 (k, t) ⎪ ⎬ ⎨ √2 −i √2 0 ⎟ ⎜ (6.93) ecir (x, t) = ℜ ⎝ √12 i √12 0 ⎠ ⎝ ψ2 (k, t) ⎠ . ⎪ ⎪ ⎭ ⎩ 0 0 0 1

Problems 6.1 Considering the plane wave approximation for the reception of a narrowband signal on a continuous linear antenna array in a plane with a source located along boresight of the array, evaluate the root-mean-square error as a function of range R, length L, and signal wavelength. 6.2 Construct unit-normalized steering vectors as a function of azimuthal angle φ and angle from zenith θ for the following geometries: (a) an eight-element square with elements at ±1 wavelength and on point half way in between each corner along the periphery of the square; (b) an 11-element spiral that begins at the origin and ends at 2 wavelengths along the {x}1 axis that follows the polar form in which radius of the nth element follows the form rn = a(n − 1) and angle of the nth element follows the form φn = b(n − 1), where a and b are undetermined coefficients. 6

The definition of left versus right is arbitrary. Some authors employ the reverse of the definition used here.

Problems

199

6.3 For a four-element linear regular array with 1 wavelength spacing that incorporates the array element amplitude pattern a(θ) a(θ) = 2 cos[sin(θ)π/2] , where the angle θ is measured from boresight of the array: (a) formulate an unnormalized steering vector in a plane; (b) assuming the array is pointed at boresight, evaluate ratio of power beam pattern of this array to an eight-element array with isotropic elements and half wavelength spacing; (c) assuming the array is pointed at θ = π/4, evaluate the ratio of power beam pattern of this array to an eight-element array with isotropic elements and half-wavelength spacing. 6.4 For the continuous array construction discussed in Section 6.3.3 that exists over the spatial domain of 0 ≥ x ≥ L, find the normalized power beam pattern under the assumption the receive array uses the following tapering or windowing functions. (a) Triangular: 2x ; 0 ≥ x ≥ L/2 L . w(x) = ; L/2 > x ≥ L 2 − 2x L (b)

Hamming: w(x) = 0.54 − 0.46 cos



2πx L



.

6.5 Consider the linear sparse array problem with randomly moving elements that have uniform probability density, assuming 32 isotropic antennas; find an the aperture in terms of wavelengths such that the peak sidelobe is no worse than 5 dB 90% of the time. 6.6 Consider the linear sparse array design problem assuming 32 isotropic antennas; find an the aperture in terms of wavelengths such that a designer would likely find an array with peak sidelobe is no worse than 5 dB after ten random array evaluations. 6.7 By assuming that a source is in the plane spanned by {x}1 and {x}2 , construct the unnormalized steering vector for an array of three phase centers with half-wavelength spacing along {x}2 axis, assuming that the elements are constructed with small electric dipoles and that: (a) the array elements and single source are vertically (along {x}3 ) polarized; (b) the array elements are horizontally polarized along the {x}2 axis and the single source is horizontally polarized (in the {x}1 –{x}2 plane) and perpendicular to the direction of propagation; (c) the array is phased to point at source, find the ratio of received power for the horizontally polarized to vertically polarized systems as a function of angle.

200

Antenna arrays

6.8 By assuming that a source is in the plane spanned by {x}1 and {x}2 , construct the unnormalized steering vector for an array of three phase centers with half wavelength spacing along {x}2 axis, assuming that the elements are constructed with small electric dipoles, that at each phase center there is an electric dipole along each axis ({x}1 , {x}2 , and {x}3 ) and that: (a) source is vertically (along {x}3 ) polarized; (b) source has arbitrary polarization; (c) the array is phased to point at source, find the ratio of received power for the arbitrarily polarized to vertically polarized sources as a function of angle.

7

Angle-of-arrival estimation

Although angle estimation of a source is not typically of significant interest in communication systems, because angle-of-arrival estimation is commonly addressed in treatments of multiple-antenna systems, we will consider it here in this chapter. In addition, some of the tools and intuition are helpful when considering adaptive multiple-antenna receivers. Furthermore, there are special cases of communications systems for which line-of-sight propagation is valid and for which angle-of-arrival estimation is of value. Angle estimation to the source is sometimes denoted direction finding. There is a large body of work addressing this topic [294, 223], and numerous useful approaches (for example, those in References [16, 266]), many of which will be skipped for brevity. In general, the direction to the source requires both azimuthal and elevation information, but for most examples here it is assumed that the source is in the plan of the array, so only azimuthal information encoded in the angle φ is required. A few approaches are considered here as an introduction to the area. Within this chapter, it is assumed that any multipath scattering is minimal and that the signal is not dispersive across the array; that is, the array is small compared with the speed of light divided by the bandwidth of the signal. This assumption is sometimes denoted the narrowband signal assumption. Furthermore, to simplify the introduction, it is assumed that the direction can be characterized by a single angle φ. Here it is assumed that multiple samples of the received signal are available. The number of samples is denoted ns . The model of the received signal Z ∈ Cn r ×n s for the nr receive antennas is given by Z=

nt 

am v(φm ) sm + N ,

(7.1)

m =1

where am is the common complex attenuation from the transmitter to the receiver for the mth (of the nt ) sources that has array response v(φm ) ∈ Cn r ×1 (or steering vector), which contains the phase differences in propagation because of small relative delays from the transmitter to each receive antenna as discussed in Section 6.1.2. These phases are a function of the propagation wavevector km ∈ C3×1 . Array responses for a single incoming signal are expected to exist somewhere along the array manifold defined by the continuous set of vectors defined by v(φ) for all φ (or more generally v(k) for all wavevectors k). The additive noise for the receiver is given by N ∈ Cn r ×n s . Here it is assumed that

202

Angle-of-arrival estimation

the entries in N are such that the columns are independently drawn from a unitvariance complex Gaussian distribution with potentially correlated rows. The transmitted complex baseband signal for the mth single-antenna transmitter is given by sm ∈ C1×n s . For many of the examples discussed here, it is assumed that the entries in sm are unknown and are independently drawn from a complex Gaussian distribution with unit variance, although estimation bounds are considered for both known signal and Gaussian signal models. On the basis of the assumptions described here, the signal-plus-noise spatial covariance matrix Q ∈ Cn r ×n r is given by & 1 % Z Z† ns ' n ( t & 1 % 1  † † 2 N N† = am v(φm ) sm sm v (φm ) + ns m = 1 ns   n t  am 2 v(φm ) v† (φm ) + R , =

Q=

(7.2)

m =1

where &the external interference-plus-noise covariance matrix is given by R = % N N† /ns so that the columns in N are drawn from the Gaussian & % complex distribution CN (0, R), and an average unit-variance signal sm s†m = ns is used. When attempting to estimate the angle of arrival, it is often assumed that the noise is not spatially correlated. However, by spatially whitening the data with respect to the interference-plus-noise covariance matrix R, many traditional approaches can be exploited. An effect of spatial whitening is to flatten the eigenvalues along the directions in Hilbert space associated with the whitening matrix. Consequently, if a matrix is whitened with respect to itself, then the result is the identity matrix that has all unit eigenvalues. A whitened spatial ˜ is given by signal covariance matrix Q ˜ = R−1/2 Q R−1/2 Q & 1 −1/2 % Z Z† R−1/2 R = ns ' n ( t 1  = am 2 R−1/2 v(φm ) sm s†m v† (φm ) R−1/2 ns m = 1 6 1 5 −1/2 + R N N† R−1/2 ns   n t  am 2 R−1/2 v(φm ) v† (φm ) R−1/2 + I , =

(7.3)

m =1

where the square root of a matrix R−1/2 satisfies the relationship R−1/2 R−1/2 = R−1 . Thus, environments with more complicated correlated noise can be considered. One complication of operating in the whitened space is that the norm of the steering vector R−1/2 v(φ) may be dependent upon direction.

7.1 Maximum-likelihood angle estimation with known reference

7.1

203

Maximum-likelihood angle estimation with known reference For some applications, much may be known about the waveform that is being transmitted. The details of what is known about the signal being transmitted may vary from something about the statistics of the signal to knowing the exact transmitted signal [31] such as a known training sequence. This knowledge of the waveform can be exploited to improve the angle-estimation performance. In this discussion, it is assumed that there is a single source antenna nt = 1. If the signal of the transmitter of interest s ∈ C1×n s is known, in the presence of Gaussian spatially correlated noise with known spatial covariance R ∈ Cn r ×n r , then the probability density function of an observed data matrix Z condition upon the known transmitted signal s, the unknown overall complex attenuation a, and the unknown azimuthal angle φ is given by p(Z|s, a, φ) =

† −1 1 e−tr{[Z−a v(φ) s] R [Z−a v(φ) s]} . πn r n s

(7.4)

Because s is a known reference, we can define s 2 = ns which is stronger than just knowing its expectation equals ns . To find an estimate of the signal direction, the likelihood is maximized. Because the logarithm monotonically increases with its argument, maximizing the likelihood is equivalent to maximizing the logarithm of the likelihood. If the log-likelihood is denoted f (Z|s, a, φ), then it is given by f (Z|s, a, φ) = −tr{[Z − a v(φ) s]† R−1 [Z − a v(φ) s]} + b

= −tr{R−1/2 [Z − a v(φ) s] [Z − a v(φ) s]† R−1/2 } + b

= −tr{R−1/2 [Z Z† − a v(φ) sZ† − a∗ Z s† v† (φ) + ns a 2 v(φ) v† (φ)] R−1/2 } + b ,

(7.5)

where b = − log(π n r n s ) is a constant containing parameters not dependent upon direction or attenuation. The matrix identity tr{A B} = tr{B A} has been employed. To remove the nuisance parameter a containing the overall complex attenuation, the log-likelihood is maximized with respect to a, ∂ (7.6) f (Z|s, a, φ) = tr{s† v† (φ) R−1 [Z − a v(φ) s]} , ∂a∗ where Wirtinger calculus, discussed in Section 2.8.2, is invoked. Because the log-likelihood is negative and the expression is quadratic in attenuation, the stationary point must be a maximum. The likelihood is maximized when 0 = tr{s† v† (φ) R−1 [Z − a v(φ) s]}

= tr{v† (φ) R−1 [Z − a v(φ) s] s† }

= tr{v† (φ) R−1 Z s† − a v† (φ) R−1 v(φ) ns }

am ax =

v† (φ) R−1 Z s† . ns v† (φ) R−1 v(φ)

(7.7)

204

Angle-of-arrival estimation

Consequently, the log-likelihood optimized for the nuisance parameter a = am ax is given by 0 f (Z|s, am ax , φ) = −tr R−1/2 [Z Z† − am ax v(φ) sZ† − a∗m ax Z s† v† (φ) 1 +ns am ax 2 v(φ) v† (φ)] R−1/2 + b +  v† (φ) R−1 Z s† v(φ) sZ† = −tr R−1/2 Z Z† − ns v† (φ) R−1 v(φ) ∗  † v (φ) R−1 Z s† Z s† v† (φ) − ns v† (φ) R−1 v(φ) 7  $ $ † $ v (φ) R−1 Z s† $2 † −1/2 $ $ +ns $ +b v(φ) v (φ) R ns v† (φ) R−1 v(φ) $ v† (φ) R−1 Z s† = −tr R−1/2 Z Z† R−1/2 − R−1/2 v(φ) sZ† R−1/2 ns v† (φ) R−1 v(φ) ∗  † v (φ) R−1 Z s† R−1/2 Z s† v† (φ) R−1/2 − ns v† (φ) R−1 v(φ) 7 $ $ † $ v (φ) R−1 Z s† $2 −1/2 † −1/2 $ $ +b R v(φ) v (φ) R +ns $ ns v† (φ) R−1 v(φ) $ v† (φ) R−1 Z s† = −tr{R−1/2 Z Z† R−1/2 } + sZ† R−1 v(φ) ns v† (φ) R−1 v(φ)   s Z† R−1 v(φ) v† (φ) R−1 Z s† + ns v† (φ) R−1 v(φ) $ $ † $ v (φ) R−1 Z s† $2 † $ v (φ) R−1 v(φ) + b $ − ns $ ns v† (φ) R−1 v(φ) $ = −tr{R−1/2 Z Z† R−1/2 } + 2

v† (φ) R−1 Z s† 2 ns v† (φ) R−1 v(φ)

v† (φ) R−1 Z s† 2 +b ns v† (φ) R−1 v(φ) v† (φ) R−1 Z s† 2 + b. = −tr{R−1/2 Z Z† R−1/2 } + ns v† (φ) R−1 v(φ) −

(7.8)

The portion of the log-likelihood that is a function of the direction is given by f (Z|s, am ax , φ) =

v† (φ) R−1 Z s† 2 + b2 , ns v† (φ) R−1 v(φ)

(7.9)

where b2 = b − tr{R−1/2 Z Z† R−1/2 } is another constant independent of direction. The term z=

Z s† ns

(7.10)

7.2 Beamscan

205

can be interpreted as an estimator of a vector proportional to the steering vector. The maximum-likelihood estimator of the direction under the assumption of a known reference signal is given by v† (φ) R−1 z 2 . φˆ = argmaxφ † v (φ) R−1 v(φ)

(7.11)

If the norm of the steering vector is independent of direction and the interferenceplus-noise covariance matrix R is white (that is, proportional to the identity matrix) then the estimate is given by φˆ = argmaxφ v† (φ) z 2 .

7.2

(7.12)

Maximum-likelihood angle estimation with unknown signal By assuming that transmitted signals of interest are randomly drawn from a Gaussian distribution with unknown but deterministic overall complex attenuation am with steering vectors determined by azimuthal unknown angles φ1 , φ2 , . . . , φn t , and with interference signals drawn from a Gaussian distribution as described in the previous section, the likelihood of a given value of Z is given by p(Z|Q) =

πn r

† −1 1 e−tr{Z Q Z} . |Q|n s

(7.13)

ns

For a given number of transmitters of interest in the presence of some external interference and noise characterized by interference-plus-noise covariance matrix R, the receive covariance matrix Q is characterized by Q=

nt 

m =1

am 2 v(φm ) v† (φm ) + R .

(7.14)

The maximum-likelihood solution for the angle estimates is given by {φ1 , φ2 , · · · , φn t } = argmaxa m ,φ m p(Z|Q) = argmaxa m ,φ m

πn r

"n t

† · e−tr{Z [(

m =1

ns

|[

1 2 v(φ ) v† (φ )] + R|n s a m m m m =1

"n t

−1

a m 2 v(φ m ) v † (φ m ) )+ R ]

Z}

,

(7.15)

where the values of am are considered nuisance parameters. While this approach provides the maximum-likelihood solution, for many problems it is too expensive computationally. In addition, often the number of sources is not known, although the number of sources of interest can be estimated [339] by considering the eigenvalue distribution of the whitened spatial covariance matrix R−1/2 Q R−1/2 .

7.3

Beamscan The most direct and possibly the most intuitive approach to estimate the angle of arrival is to scan a matched filter for all possible expected array responses. This

206

Angle-of-arrival estimation

is similar to the analysis discussed in Section 6.2, in which the beam pattern is discussed and is sometimes denoted beamscan. The beamscan approach is developed by considering the maximum-likelihood solution under the condition of a single transmitter and of spatially white noise. Under the assumption that there is a single random complex Gaussian source (nt = 1), the maximumlikelihood solution simplifies to φ = argmaxa,φ p(Z|Q) 1 π n r n s | [ a 2 v(φ) v† (φ)] + R|n s −1 2 † † · e−tr{Z [a v(φ) v (φ)+R ] Z} .

= argmaxa,φ

(7.16)

Under the assumption of a single source, the determinant of the signal-plus-noise spatial covariance matrix Q is given by |Q| = | a 2 v(φ) v† (φ) + R|

= | a 2 R−1/2 v(φ) v† (φ) R−1/2 + I| |R|

= ( a 2 v† (φ) R−1 v(φ) + 1) |R| = ( a 2 κ + 1) |R| ,

(7.17)

where the whitened inner product κ = v† (φ) R−1 v(φ) is defined for convenience. Because the whitened signal-plus-noise spatial covariance matrix R−1/2 Q R−1/2 is represented by an identity matrix plus a rank-1 matrix (as presented in Equation (2.114)), its inverse is given by Q−1 = ( a 2 v(φ) v† (φ) + R)−1 = [ R1/2 ( a 2 R−1/2 v(φ) v† (φ) R−1/2 + I) R1/2 ]−1 = R−1/2 ( a 2 R−1/2 v(φ) v† (φ) R−1/2 + I)−1 R−1/2   a 2 R−1/2 v(φ) v† (φ) R−1/2 −1/2 R−1/2 . I− =R 1 + a 2 κ

(7.18)

Thus, the probability density for the received signal is given by p(Z|Q) =

1 [( a 2 κ + 1) |R|]n s    a 2 R −1 / 2 v ( φ ) v † ( φ ) R −1 / 2 −1 / 2 R −tr Z † R −1 / 2 I− Z 2 1 + a  κ

πn r n s ·e

.

(7.19)

This likelihood is maximized when the log of the likelihood is maximized, which is the equivalent of maximizing    a 2 R−1/2 v(φ) v† (φ) R−1/2 −1/2 † −1/2 ˆ R Z φ = argmaxa,φ tr Z R 1 + a 2 κ − ns log(1 + a 2 κ) .

(7.20)

7.4 Minimum-variance distortionless response

207

Now a couple of simplifying assumptions are employed. The interference-plusnoise covariance matrix R is assumed to be spatially white with unit-variancenormalized thermal noise so that it can be given by the identity matrix, and the norm of the array response vector v(φ) is a constant, R=I κ = v(φ) 2 = nr . The maximization simplifies to  a 2 Z† v(φ) v† (φ) Z − ns log(1 + a 2 κ) φˆ = argmaxa,φ tr 1 + a 2 κ = argmaxφ v† (φ) Z Z† v(φ) ,

(7.21)

(7.22)

where terms independent of angle φ have been discarded, and the optimization of angle φ decouples from the optimization of the attenuation. When plotted, the quadratic term being maximized is a useful estimate of energy received as a function of angle, ηbs (φ). v† (φ) Z Z† v(φ) ns † ˆ = v (φ) Q v(φ) ,

ηbs (φ) =

(7.23)

ˆ = Z Z† /ns is an estimate of the signal-plus-noise covariance matrix. where Q When the scanned beam in Equation (7.23) points far from a source of energy, then the value of the beamscan test statistic is approximately the noise energy per antenna times the number of antennas. When the scanned beam points toward a source, the output is approximately the received source power per antenna times the number of antennas squared under the assumption of a fairly strong source. The term beamscan (or sometimes beamsum) is employed because the matched filter is scanned across possible angles. Beamscan is an estimate of the spatial energy response of the array. A variety of techniques with a similar angle-estimation goal to this approach are available. These estimators provide an estimate of the energy as a function of direction in a manner similar to the way spectral estimators provide estimates of energy as a function of frequency. Other directional energy estimation approaches impose additional constraints that distort the energy estimation and are often denoted pseudospectral estimators.

7.4

Minimum-variance distortionless response For some receive beamformer w ∈ Cn r ×1 , which is a function of a reference direction φ, and total receive signal-plus-interference-plus-noise covariance matrix defined in Equation (7.2), the energy at the output of the beamformer is given by ηw (φ) = w† Q w .

(7.24)

208

Angle-of-arrival estimation

The minimum-variance distortionless response (MVDR) pseudospatial-spectral estimator (sometimes denoted the Capon method [51]) attempts to minimize the energy accepted by the receive beamformer w, while requiring a distortionless response (where distortionless indicates that the inner product of the beamformer and ideal array response is a known constant), w† v(φ) = nr .

(7.25)

To minimize ηw (φ) as a function of w subject to the constraint, a Lagrange multiplier λ can be used. The value of w that minimizes ηw (φ) is given by argminw w† Q w − λ w† v(φ) w = λ Q−1 v(φ) .

(7.26)

By imposing the distortionless response constraint w† v(φ) = nr , the values of λ, and thus w, are found, nr λ= † v (φ) Q−1 v(φ) nr Q−1 v(φ) . (7.27) w= † v (φ) Q−1 v(φ) The minimum-variance distortionless-response energy estimator is then given by ηm v dr (φ) = w† Q w nr Q−1 v(φ) nr v† (φ) Q−1 Q v† (φ) Q−1 v(φ) v† (φ) Q−1 v(φ) n2r . = † v (φ) Q−1 v(φ)

ηm v dr (φ) =

(7.28)

Because only estimates of the spatial receive covariance matrix are available, Q ˆ so that the estimator is given by is replaced with the estimate Q, ηm v dr (φ) ≈

n2r . ˆ −1 v(φ) v† (φ) Q

(7.29)

With this normalization, when the pseudospectrum estimator defined in Equation (7.29) is pointed far from a source, the output is approximately the product of the noise energy per sample and the number of receive antennas. Alternatively, when the pseudospectrum estimator is pointed toward an isolated source, the output is approximately the product of the signal energy per sample and the number of receive antennas squared, although the level of the output is sensitive to any sources with similar array responses.

7.5

MuSiC Another common spatial pseudospectral estimator is the multiple-signal classification (MuSiC) approach [276, 296]. Here it is presented in the context of a known

7.5 MuSiC

209

noise and interference environment characterized by the spatial interference-plusnoise covariance matrix R ∈ Cn r ×n r . In this approach, the signal subspace and the noise subspace in the estimate of the nr receive antenna spatial covariance matrix Q ∈ Cn r ×n r [defined in Equation (7.2)] are identified. This identification is often done by setting some threshold in the eigenvalue distribution of the spatial covariance matrix Q, so that eigenvalues less than some threshold are considered noise eigenvalues, and their associated eigenvectors can be used to span the noise space. For sorted eigenvalues λm + 1 > λm ∀ 1 ≤ m < nr , the spatially whitened covariance matrix can be decomposed as R−1/2 Q R−1/2 =

M 

λm em e†m +

m =1

=

>?

n oise

@

nr 

λm em e†m ,

(7.30)

m =M +1

=

>?

sig n al

where the first M eigenvalues are below the signal threshold, M = max{m : λm ≤ threshold} ,

@ (7.31)

and the eigenvector for the mth eigenvalue is denoted em . Under the assumption of unit-norm eigenvectors, the projection operator (as discussed in Section 2.3.5) for the noise subspace Pn oise ∈ Cn r ×n r is given by  em e†m . (7.32) Pn oise = m

If there is energy coming from some direction φ, then it is expected that the quadratic form v† (φ) R−1/2 Pn oise R−1/2 v(φ)

(7.33)

would be small because array responses would be contained in the signal space, which is orthogonal to the noise projection matrix Pn oise . Conversely, in other directions with “noise-like” spatial responses, this quadratic form would be approximately equal to v† (φ) R−1 v(φ). Thus, the ratio of these two values would be a reasonable indicator of energy, and the MuSiC spatial pseudospectral estimator ηm u sic (φ) is given by ηm u sic (φ) =

v† (φ) R−1 v(φ) . Pn oise R−1/2 v(φ)

v† (φ) R−1/2

(7.34)

Because the spatial signal-plus-noise covariance matrix (or a whitened version of it) can typically only be estimated, the MuSiC spatial pseudospectral estimator ˆ (and is generally implemented using the estimated spatial covariance matrix Q ˆ if whitening is used, R). To be clear, MuSiC is a relatively poor estimator for energy of received signals. With this normalization, the pseudospectrum is approximately unity when directed far from any source. When pseudospectrum is pointed toward a source, the output is approximately equal to the energy per sample times the number of

210

Angle-of-arrival estimation

Pseudospectrum (dB)

25 BS MVDR MuSiC

20 15 10 5 0 −1

−0.5

0 Sin( φ )

0.5

1

Figure 7.1 Beamscan, minimum-detection distortionless response, and MuSiC pseudospectra for a regular, filled, 10-antenna array with 1/2 wavelength spacing. The SNR of the source per receive antenna is 0 dB, and the uncorrelated sources are located at sin(φ) = −0.55, −0.1, 0.1. The receive covariance is estimated using 50 samples, and assumed noise floor was 0 dB per antenna.

antennas squared. However, any given example is strongly dependent upon the instantiation of noise, so it fluctuates significantly.

7.6

Example comparison of spatial energy estimators In Figure 7.1, an example of a comparison of pseudospectra is displayed. In this example, the receive array consists of ten isotropic antenna elements spaced at half a wavelength spacing. The signal-to-noise ratio is assumed to be 0 dB per receive antenna. Three sources are located at sin(φ) = −0.55, −0.1, and 0.1. The receive signal-plus-noise covariance matrix is estimated using 50 samples. As can be observed in the figure, the isolated source sin(φ) = −0.55 can be identified easily by using any of the statistics. The beamscan pseudospectrum has the broadest peak, and the MuSiC approach has the narrowest peak. It is tempting to think that the widths of the peaks are an indication of the angular accuracy of the approach; however, this is misleading. For the two peaks that are close together, sin(φ) = −0.1 and 0.1, the story is somewhat more complicated. The minimum-variance distortionless response and the MuSiC approaches attempt to isolate the two sources, while in the beamscan pseudospectrum the two peaks might be confused with a single peak. The two peaks are most easily identified when MuSiC is used. In this example, it was assumed that the characteristics of the receive antenna arrays were known perfectly. In practice, errors in antenna position or other mismatches between the assumed array manifold and the real manifold

7.7 Local angle-estimation performance bounds

211

(denoted calibration errors) can be significant. Each pseudospectrum estimator has different sensitivities to these errors. It is not uncommon for pseudospectrum estimators to look promising theoretically, but perform badly in practice because of calibration errors [356].

7.7

Local angle-estimation performance bounds For a given array geometry and SNR of the signal of interest, there are two performance metrics of interest, as seen in Figure 7.2. The first is the asymptotic in the number of samples angle-estimation error bound given by the Cramer–Rao formulation discussed in Section 3.8 and in References [77, 312, 172]. The Cramer– Rao parameter performance bound is a local bound. It assumes that the probability for an estimator to confuse the value of a parameter with a value far from the actual value is zero. The second metric is the threshold point. This is the point at which an estimator diverges dramatically from the asymptotic estimation bound. Because of similarities in the array response (array manifold) at different physical angles, there is some probability of making large angle-estimation errors by confusing an observed array response in noise with the wrong region of the array manifold. These regions of potential confusion can be seen by those regions of relative high sidelobes in the array response as a function of angle. The high sidelobes are an indication that, while the angles are significantly different, the array responses are similar. Consequently, when the angle is estimated in the presence of significant noise, the array response associated with the erroneous angle at a large sidelobe can sometimes be a closer match to the observed array response than the array response associated with the correct angle. The threshold point is not well defined. However, because the average estimation typically diverges quickly as a function of SNR, the exact definition is typically not a significant concern. There are a variety of bounds that attempt to incorporate the nonlocal effects, such as the Bhattacharyya, and the Bobrovsky–Zakai bounds. Many of these bounds are special cases of the more general Weiss–Weinstein bound [342].

7.7.1

Cramer–Rao bound of angle estimation By employing the reduced Fisher information matrix from Equation (3.204), a relatively simple form for the angle-estimation performance bound can be found. The Cramer–Rao bound is discussed in Section 3.8. In the discussion in this chapter, it is assumed that while potentially spatially correlated, the noise is drawn independently in each temporal sample. To simplify the discussion, it is assumed that the source and linear array lie in a plane. The angle φ is measured from the boresight (the direction perpendicular to the axis along which the antenna array lies) and is associated with some direction u = sin(φ). Here two possible models for the signal source are considered.

212

Angle-of-arrival estimation

Log Estimation Variance

Angle-Estimation Performance Estimator Performance

CR B

ound

Threshold SNR (dB) Figure 7.2 Notional performance of parameter estimation. The high SNR performance

is characterized by the Cramer–Rao bound. Below some threshold point, the estimation diverges from the Cramer–Rao bound.

• The signaling sequence is known, s ∈ C1×n s , so that the signal is modeled by the mean of the received signal-plus-noise vector. • The signaling sequence is random and drawn from a random distribution, so that the signal contribution is parameterized in its contribution to the received spatial covariance matrix.

7.7.2

Cramer–Rao bound: signal in the mean For the first case, a known signal is transmitted. An example might be for a known training or pilot sequence. In this case, the probability distribution of the observed signal is given by p(Z|s, u; R) =

|R|n s

1 πn r

ns

e−tr{[Z−av(u ) s]



R −1 [Z−av(u ) s]}

,

(7.35)

where a is the complex attenuation, and v(u) is the steering vector as a function of direction parameter u = sin(φ) with φ indicating the angle from boresight. In this case, the mean is given by µ = av(u) s ,

(7.36)

where the reference sequence s is normalized so that s 2 = ns . The covariance matrix R is given by the covariance matrix of the external interference plus noise. The Fisher information matrix for all ns samples is s 2 = ns times the Fisher information matrix for a single sample. From Section 3.8, in Equation (3.259), the reduced Fisher information matrix is given by 0 1 ˙ Ju(r,u) ({s}m ) = 2 a 2 {s}m 2 ℜ x˙ † P⊥ x(u ) x  Ju(r,u) ({s}m ) Ju(r,u) = m

0 1 ˙ , = 2 a 2 ns ℜ x˙ † P⊥ x(u ) x

(7.37)

7.7 Local angle-estimation performance bounds

213

where the spatially whitened vector and derivative matrix are defined by x(u) = R−1/2 v(u) ∂ v(u) , (7.38) x˙ = R−1/2 ∂u and the projection operator (discussed in Section 2.3.5) for the subspace orthogonal to the column space spanned by the whitened array response x(u) is given by † −1 † P⊥ x (u) . x(u ) = I − x(u) [x (u) x(u)]

(7.39)

For the sake of discussion, it is assumed that there is no external interference and the units of power are scaled so that R = I.

(7.40)

The reduced Fisher information simplifies to  † ∂v (u) ⊥ ∂v(u) Ju(r,u) = 2 ns a 2 ℜ Pv(u ) . ∂u ∂u

(7.41)

As discussed in Section 6.1, the components of the array response or steering vector v(u) ∈ Cn r ×1 are given by {v(u)}m = eik y m

u

,

(7.42)

under the assumption of the normalization v(u) 2 = nr , where ym is the position of the mth antenna along the linear array in units of wavelength and k is the wavenumber or equivalently the magnitude of the wavevector. The derivative with respect to the direction variable u is given by  ∂v(u) = ik ym v(u) . (7.43) ∂u m The reduced Fisher information is then given by  † ∂v (u) ⊥ ∂v(u) (r ) 2 Pv(u ) Ju ,u = 2 ns a ℜ ∂u ∂u 7 + † †  ∂v (u) ∂v(u) (u) v(u) v 2 ym − = 2 ns a 2 ℜ k 2 ∂u nr ∂u m ⎧ $2 ⎫ $ $ ⎬ ⎨ 1 $ $ $ 2 = 2 ns a 2 k 2 yn $ ym − . $ $ ⎭ ⎩m nr $ n

(7.44)

By setting the origin of the y-axis so that the average element position is zero, " n yn = 0, the second term in the braces of Equation (7.44) goes to zero and the Fisher information is given by + 7  (r ) 2 2 2 ym Ju ,u = 2 ns a k m

= 2 ns a 2 k 2 nr σy2 ,

(7.45)

214

Angle-of-arrival estimation

" 2 and the notation for σy2 = m ym /nr indicates the mean-squared antenna position, under the assumption that the mean position is zero. Consequently, the reduced Fisher information is given by the direction term exclusively. The variance in the estimate of direction u is limited by 1 " 2 2 k 2 a 2 ns m ym 1 , = 2 k 2 a 2 ns nr σy2

% & ˆ u − u 2 ≥

(7.46)

where u ˆ is the estimate of u = sin φ. It is interesting to note that the estimation variance bound decreases as the square of the mean-square antenna position σy2 and decreases as the integrated SNR (given by a 2 ns recalling that the noise has unit variance). Here the phrase integrated SNR indicates the ratio of the coherently integrated signal power, which grows as n2s , over the incoherently integrated noise, which grows as ns .

7.7.3

Cramer–Rao bound: random signal In this scenario, the signal is random. In particular, it is assumed that the signal is drawn from a zero-mean complex circular Gaussian distribution with variance P = a 2 per receive antenna per sample. By construction, the mean of the signal is zero, and thus its derivatives, are zero. The nr × nr receive spatial covariance of this signal is given by Q = R + P v(u) v† (u) ,

(7.47)

where the interference-plus-noise receive covariance matrix is given by R. For discussion, it is assumed here that there is no interference and the units of power are defined so that R = I. In this section, it is assumed that the steering vectors v(u) are normalized so that v(u) 2 = nr . For ns independent observations of this signal Z ∈ Cn r ×n s , the probability density function is given by p(Z|P, u) =

|Q|n s

† −1 1 e−tr{Z Q Z} n n r s π

= [p(z|P, u)]

ns

,

(7.48)

under the assumption of independent columns in Z, where the probability density for a single column of Z (with a single column denoted z) is given by p(z|P, u) =

† −1 1 e−tr{z Q z} . |Q| π n r

(7.49)

7.7 Local angle-estimation performance bounds

215

Consequently, because the Fisher information matrix is a function of derivatives of logarithms of the probability density function, the Fisher information matrix for Z is ns times the Fisher information matrix for z, log p(Z|P, u) = ns log p(z|P, u) J(Z) = ns J(z) .

(7.50)

As defined in Equation (3.200), the mean portion of the signal implicitly incorporates the multiple samples; however, the covariance portion of the Fisher information matrix does not, and thus includes the coefficient ns . Given some vector of parameters θ, the {m, n}th component of the Fisher information matrix, is given by  ∂Q(θ) −1 ∂Q(θ) . {J}m ,n = ns tr Q−1 (θ) Q (θ) ∂{θ}m ∂{θ}n

(7.51)

The reduced Fisher information for the real direction u and power P parameters is given by Ju(r,u) = Ju ,u − Ju ,P J−1 P ,P JP ,u .

(7.52)

The derivative of the receive spatial covariance matrix is given by  ∂  ∂ I + P v(u) v† (u) Q(u) = ∂u ∂u   † ˙ ˙ , = P v(u) v† (u) + v(u) v(u)

(7.53)

˙ where the notation v(u) indicates the derivative of the steering vector with respect to the direction parameter ˙ v(u) =

∂ v(u) . ∂u

(7.54)

By using Equation (2.114), the inverse of the rank-1 plus identity receive spatial covariance is given by Q−1 = I −

P v(u) v† (u) . 1 + nr P

(7.55)

For notational convenience, the explicit functional dependence with respect to the direction parameter u is dropped for the remainder of this discussion. The

216

Angle-of-arrival estimation

direction component of the Fisher information matrix is given by    †  P v v† P v˙ v + v v˙ † I− 1 + nr P      †  P v v† † P v˙ v + v v˙ · I− 1 + nr P      † P v v† P v˙ v + v v˙ † = ns v † I − 1 + nr P    P v v† P v˙ + c.c. · I− 1 + nr P      † P v v† = 2 ns ℜ v† I − P v˙ v + v v˙ † 1 + nr P    P v v† P v˙ · I− 1 + nr P +   2  †  P v v† 2  = 2 ns P ℜ v I − v˙  1 + nr P      P v v† P v v† † † v v˙ I− I− +v v˙ 1 + nr P 1 + nr P    † 2 nr P v v˙  = 2 ns P 2 ℜ nr − 1 + nr P      P v v† n2r P † I− + nr − v˙ v˙ , 1 + nr P 1 + nr P

Ju ,u = ns tr



(7.56)

where the notation c.c. indicates the complex conjugate of the previous term. The mth element of the array response or steering vector associated with the element at position ym along the antenna array is given by {v}m = eik y m

u

.

(7.57)

The derivative of the steering vector with respect to the direction parameter is given by ∂ {v}m ∂u = ik ym eik y m u  v† v˙ = ik e−ik y m

˙ m = {v}

u

ym eik y m

u

m

= ik



ym .

(7.58)

m

Similar to the development in the previous section, to simplify the evaluation, the origin of the axis can be chosen so that the average position of antennas is " zero, m ym /nr = 0. The inner product between the steering vector and the

7.7 Local angle-estimation performance bounds

derivative of the steering vector is then zero,  ym = 0 → v† v˙ = 0 .

217

(7.59)

m

The inner product between the derivative of the steering vectors is given by  2 ik y m u e−ik y m u ym e v˙ † v˙ = k 2 m

2

= k nr σy2 .

(7.60)

By using these results, the component of the Fisher information matrix associated with the direction parameter u is given by   nr + n2r P − n2r P Ju ,u = 2 ns P 2 v˙ † v˙ 1 + nr P   nr k 2 nr σy2 . (7.61) = 2 ns P 2 1 + nr P The component of the Fisher information matrix associated with the received signal power is given by   P v v† v v† I− JP ,P = ns tr 1 + nr P    P v v† † vv · I− 1 + nr P  2   P v v† v = ns v † I − 1 + nr P nr . (7.62) = ns 1 + nr P The cross-parameter component of the Fisher information matrix is given by      † P v v† Ju ,P = ns tr P v˙ v + v v˙ † I− 1 + nr P    P v v† v v† · I− 1 + nr P      † P v v† P v˙ v + v v˙ † = ns tr v† I − 1 + nr P    P v v† v · I− 1 + nr P = 0.

(7.63)

Because the cross-parameter term is zero, from Equation (7.59) (and the power term is nonzero), the reduced Fisher information in Equation (7.41) is the same (r ) as the Fisher information matrix without the nuisance parameters Ju ,u = Ju ,u .

218

Angle-of-arrival estimation

The variance in the unbiased estimation σu2 of u cannot be better than σu2 ≥ J−1 u ,u 1 + nr P = . 2 n2r ns P 2 k 2 σy2

(7.64)

As the SNR P becomes large, the variance on the estimation bound converges to that of the deterministic signal in Equation (7.46) from above.

7.8

Threshold estimation The threshold point occurs at the SNR at which the probability of confusing a mainlobe with sidelobe starts contributing significantly to the angle-estimation error. This notion is not a precise definition. Depending upon the details, various systems may have varying sensitivities to the probability of confusion. A variety of techniques are available to extend parameter-estimation bounds to include nonlocal effects. One example is the Weiss–Weinstein bound [342]. Here an approximation is considered. By using the method of intervals [263], nonlocal contributions to the variance are introduced in an approximation. The total parameter-estimation variance is approximated by considering the local contributions associated with the mainlobe, which are characterized by the Cramer– Rao bound, and the nonlocal contributions, which are approximated by adding the variance contributed by a small number of large sidelobes. These sidelobes correspond to array responses that are similar to that of the mainlobe. This estimation assumes that the variance is the sum of the variance contributed by the Cramer–Rao bound times the probability that there is no sidelobe confusion plus the error squared of introducing an error near some sidelobe peak. For some parameter φ, its estimate φˆ is given by maximizing some test statistic t(φ) (or equivalently some spatial spectral estimator), φˆ = argmax{t(φ)} .

(7.65)

The test statistic is that which maximizes the likelihood, given a model for a signal. As an example, consider the single Gaussian signal model in the absence of interference. In this case, finding the peak of the beamscan test statistic is the maximum-likelihood solution. Consequently, t(φ) is given by t(φ) =

1 † v (φ) Z Z† v(φ) . ns

(7.66)

The method of intervals parameter-estimation variance estimate is given by  σφ2 ≈ Pm .l. (SNR) σC2 R ,φ (SNR) + Ps.l.(m ) (SNR) φ2s.l.,m , (7.67) m

σC2 R ,φ (SNR)

where is the variance bound provided by the Cramer–Rao bound at some SNR, Pm .l. (SNR) is the probability of not being confused by a sidelobe at

7.8 Threshold estimation

219

some SNR, Ps.l.(m ) (SNR) is the probability of being confused by the mth sidelobe at some SNR, and φs.l.,m is the location of the peak of the mth sidelobe. This form can be simplified further by noting that the nonlocal contributions to the error are typically dominated by the largest sidelobe. The probability of confusing the observed array response with the largest sidelobe is denoted Ps.l. (SNR). Consequently, for mainlobe direction φ0 , the variance is approximated by σφ2 ≈ [1 − Ps.l. (SNR)] σC2 R ,φ (SNR) + Ps.l. (SNR) (φs.l. − φ0 )2 .

(7.68)

The probability of confusing a sidelobe φs.l. for a mainlobe φ0 is given by Ps.l. (SNR) = Pr{t(φs.l. ) > t(φ0 )} = Pr{ v† (φs.l. ) Z 2F > v† (φm .l. ) Z 2F } .

(7.69)

Throughout this section, we will not attempt to be precise about Pr{t(φs.l. ) > t(φ0 )} versus Pr{t(φs.l. ) ≥ t(φ0 )} because it will not introduce a meaningful difference.

7.8.1

Types of transmitted signals Similar to the variety of assumptions about the transmitted signal discussed in Section 7.7, a number of assumptions can be made about the transmitted signal when considering the probability of being confused by a sidelobe. Some possible assumptions are (1) (2) (3) (4)

known (deterministic) sequence, single observation with deterministic amplitude, unknown sequence of random complex Gaussian signals, unknown sequence of deterministic amplitude.

Items (1) and (2) have the same test statistic up to a simple scaling. The probability of confusion for these types of signals will be considered in Sections 7.8.3 and 7.8.4. The third type of signal with a sequence of random complex Gaussian signals is considered in Section 7.8.5. If the length of the sequence is long, then an unknown sequence of constant amplitude can be approximated by the Gaussian signal assumption. However, we will not explicitly evaluate the probability of confusion for the deterministic signal here.

7.8.2

Known reference signal test statistic For a known reference s, normalized such that s 2 = ns , the variable z = √ Z s† / ns , is equivalent to a single observation with a deterministic amplitude. √ The ns normalization is to keep the Gaussian noise from growing as the number of samples ns increases, by employing the result that the inner product of

220

Angle-of-arrival estimation

the vector of Gaussian variables and the reference is a Gaussian variable. The beamscan angle-of-arrival test statistic for a single observation simplifies to ˆ v(φ) t(φ) = v† (φ) Q = v† (φ) z 2 ,

(7.70)

where the observed response is given by z ∈ Cn r ×1 , Z s† z= √ ns (˜ a v(φm .l. ) s + N) s† √ ns √ =a ˜ v(φm .l. ) ns + n

=

= a v(φm .l. ) + n .

(7.71)

Here a ˜ indicates the received signal amplitude per receive antenna (implying the steering vector normalization v(φ) 2 = nr ), and N ∈ Cn r ×n s is the additive noise. In the case for which the multiple observations under the assumption of a known reference collapses to a single observation, the amplitudes for the two √ ˜. The single-observation amplitude a indicates cases are related by a = ns a the received signal amplitude per receive antenna (implying the steering vector normalization v(φ) 2 = nr ), and n ∈ Cn r ×1 is the additive noise. The probability of selecting a sidelobe over the mainlobe is given by the probability that the inner product of the theoretical array response and the observed response is larger for the sidelobe than the mainlobe, Pr{t(φs.l. ) > t(φ0 )} = Pr{ v† (φs.l. ) z 2 > v† (φm .l. ) z 2 } = Pr{ v† (φs.l. ) z > v† (φm .l. ) z } .

(7.72)

Define the normalized inner product between the array responses ρ, ρ=

v† (φs.l. ) v(φm .l. ) . nr

(7.73)

The probability of selecting the wrong lobe is developed in Sections 7.8.3 and 7.8.4 and is given by  ,  , 2n   1 a 2 nr a r 1 − QM (1 + 1 − ρ 2 ), (1 − 1 − ρ 2 ) Ps.l. = 2 2 2 ,  ,   a 2 nr a 2 nr 2 2 (1 − 1 − ρ ), (1 + 1 − ρ ) + QM , 2 2 (7.74)

where the Marcum Q-function QM (·) is discussed in Section 2.14.8.

7.8 Threshold estimation

7.8.3

221

Independent Rician random variables To find the probability of confusing a sidelobe for a mainlobe presented in Equation (7.74), it is noted that this is the probability that one Rician random variable fluctuates above another Rician random variable. This is equivalent to asking if one noncentral χ2 variable fluctuates above another noncentral χ2 random variable. To complicate the issue, these two Rician variables are correlated (they have the same signals). To begin, the probability that one Rician fluctuates above another when the two Rician distributions are independent is discussed in References [280, 293]. The mth (specifying 1 or 2) Rician is given by rm = am + zm ,

(7.75)

2 . Withwhere the random central complex Gaussian variable zm has variance σm out loss of generality, the mean parameter am is assumed to be real. The probability density for rm is given by the Rician distribution,   2 2 2 2 am rm 2 rm fm (rm ) drm = 2 I0 e−(r m + a m )/σ m drm , (7.76) 2 σm σm

where I0 (·) indicates the modified Bessel function of the first kind, discussed in Section 2.14.5. The probability of the wrong Rician fluctuating to a level higher than the other is given by Reference [293]. The probability of Rician r2 exceeding r1 is given by

  2 a22 2 a21 , Pr{r2 > r1 } = QM σ12 + σ22 σ12 + σ22   a2+a2 − 12 22 a1 a2 σ2 , (7.77) − 2 1 2 e σ 1 + σ 2 I0 2 2 σ1 + σ2 σ1 + σ22 where the integral over the second Rician variable r2 . In the following discussion, we evaluate this probability by noting that the complementary CDF of the noncentral χ2 distribution is given by the Marcum Q-function that is discussed in Section 2.14.8. By using the relationship developed in Problem 7.6, √ √ √ √ √ (7.78) QM ( 2a, 2b) + QM ( 2b, 2a) = 1 + e−(a+b) I0 (2 a b) , the probability is given by

   1 2 a22 2 a21 2 2 σ1 + σ2 QM , Pr{r2 > r1 } = 2 σ1 + σ22 σ12 + σ22 σ12 + σ22

  2 a21 2 a22 2 − σ1 QM . , σ12 + σ22 σ12 + σ22

(7.79)

To evaluate the probability that one Rician variable fluctuates higher than another Rician variable under the assumption of independence that is given in

222

Angle-of-arrival estimation

Equation (7.77), the probability can be written formally as

Pr{r2 > r1 } = =





0



0

dr1





dr2 f1 (r1 ) f2 (r2 )  √ √    2 2 2 a1 r1 dr1 a2 2 r1 2 2 r1 − a 1σ+2r 1 1 , (7.80) I0 QM , e σ1 σ2 σ2 σ1 σ12 r1



where the probability density is defined in Equation (7.76), and the discussion in Section 3.1.14 provides the form for the definite integral. As discussed in Section 2.14.5, the modified Bessel function of the first kind can be expressed as a contour integral [343, 53]. This integral is given by

Im (z) =

1 2πi



z

dx x−m −1 e 2 (x+ 1/x) ,

(7.81)

C

where C is a contour that encircles the origin. The zeroth order modified Bessel function is given by 1 I0 (a x) = 2πi =

1 2πi



dp

C



dp

eax(p+ 1/p)/2 p 2 2 e(a p+x /p )/2 p

C

,

(7.82)

where the substitution p → p a/x is used. By substituting this integral form for the Bessel function into the integral form for the Marcum Q-function, the Marcum Q-function can be expressed as a form with a single integral

QM (a, b) =





dx e−

a2+x2 2

x I0 (ax)

b

=



b



−a

dx e

2+x2 2

1 x 2πi



C

2 2 e(a p+ x /p )/2 . dp p

(7.83)

Because the path of a contour integral over holomorphic functions can be deformed without consequence while no poles are crossed, the contribution of the integrand is zero for the left-half plane and at some finite distance into the right-half plane connect to a path at infinite radius, the contour integral can be expressed as a line integral at some finite constant positive offset γ from the

7.8 Threshold estimation

223

imaginary axis. The Marcum Q-function is then expressed as QM (a, b) =





dx e−

a2+x2 2

x

b

= = =

1 2πi 1 2πi 1 2πi

=−



dp



γ + i∞

dp γ + i∞

dp 

γ +i∞

 

dp

γ −i∞

e−(a + b 2πi

2

dx x



dx x

)/2



b2

γ −i∞

1 2πi



b

γ −i∞







2 2 e(a p+ x /p )/2 p 2 2 a2+x2 e− 2 e(a p+ x /p )/2

dp

p

b

γ −i∞

2

=

γ + i∞

1 2πi



2 2 e(a (p−1)+x (1/p−1) )/2 p

2 du e(a (p−1)+u (1/p−1) )/2 2 p

2 2 e(a (p−1)+b (1/p−1) )/2 (1/p − 1)p

dp

2 2 e(a p+ b /p )/2 , p−1

;

;

γ>0

;

γ>0 γ>0

; (1/p − 1) < 0 (7.84)

where the substitution p → p + 1 is employed, and the final contour encloses the pole at p = 1. For the sake of notational expedience, the following normalized variables are defined: √ r1 2 r= σ1 √ a1 2 α1 = σ1 √ a2 2 α2 = σ2 σ2 . (7.85) ν= σ1 With these definitions, the probability of one Rician variable fluctuating above the other is given by  √ √    2 2  ∞ a1 r1 dr1 a2 2 r1 2 2 r1 − a 1σ+2r 1 1 Pr{r2 > r1 } = I0 2 2 QM , e σ1 σ2 σ2 σ1 σ1 0  ∞  r  − α 21 + r 2 2 re = I0 (α1 r) dr QM α2 , ν 0 2 2 2 r2   ∞ e(α 2 p+ (r /ν ) /p )/2 − α 21 + r 2 e−(α 2 + ν 2 )/2 2 dp re = dr I0 (α1 r) 2πi p−1 0 2 2 2 r2   ∞ e(α 2 p+ (r /ν ) /p )/2 − α 21 + r 2 e−(α 2 + ν 2 )/2 2 dp re = dr 2πi p−1 0 2 2  e(α 1 q + r /q )/2 1 dq , (7.86) · 2πi q

224

Angle-of-arrival estimation

where the contour integrals enclose the pose at p = 1 and q = 0. Collecting the arguments of the exponentials in Equation (7.86), we find the sum of the arguments, α22 p + α12 q − α12 − α22 + r2 2



1

ν2 p

+

1 q



1 ν2

−1−

.

(7.87)

By using the substitution that u = r2 , the probability of one Rician variable fluctuating higher than another Rician variable becomes

Pr{r2 > r1 } = =

=

−(α 21 + α 22 )/2

e

2πi −(α 21 + α 22 )/2

e

2πi −(α 21 + α 22 )/2

e

 

dp dp

1 2πi



1 2πi





dp 2πi  1 · dq  2πi q − [1 +

dq





0

dq  1+

2 α2 2 p+α1 q+u

du e 2

1 + 1 −1 − 1 q ν2p ν2 2



q (p − 1)

e 1 ν2



2 α2 2 p+α1 q 2

1





ν2 p

1 q



q (p − 1) (7.88)

e 1 ν2



2 α2 2 p −α 1 2

1 −1 ν2 p ]



e

α2 1 2

[1 +

q 1 ν2



1 ν 2 p ] (p

. − 1)

The contour integral over q can be evaluated directly by using residues and is discussed in Section 2.9.1 and in Reference [53], and is given by 1 2πi



α 21

e2 q dq q − [1 + ν12 −

1

−1 ν2 p ]

=e

α2 1 2

[1+ ν12 − ν 21 p ] −1

.

(7.89)

By incorporating this evaluation, the form for the probability is given by 2

2

e−(α 1 + α 2 )/2 Pr{r2 > r1 } = 2πi 2

2

e−(α 1 + α 2 )/2 = 2πi



γ 1 + i∞



γ 1 + i∞

α2 2 p

α2 1

[1+ ν12 − ν 21 p ] −1

e 2 e2 dp [1 + ν12 − γ 1 −i∞ α2 2 p

1 ν 2 p ] (p

− 1)

α2 1 p p 2 [ p + 2 − 12 ] ν ν

e 2 e dp [1 + ν12 − γ 1 −i∞

1 ν 2 p ] (p

− 1)

.

(7.90)

The resulting argument of the product of exponentials within the integral is given by 1 2



α22

α12 p p+ p p + ν 2 − 1/ν 2



1 = 2



   α2 α22 p2 1 + ν12 − ν 22 p + α12 p   . p 1 + ν12 − 1/ν 2

(7.91)

7.8 Threshold estimation

225

The form of the exponential can be simplified by using the substitution   1 p 1 + 2 − 1/ν 2 → p ν   p + 1/ν 2 . (7.92) p→ 1 + 1/ν 2 This substitution will not affect the contour integral if the contour encloses the poles. The probability of one Rician variable fluctuating about another then becomes

Pr{r2 > r1 } =

=

−(α 21 + α 22 )/2

e

2πi

e

−(α 21 + α 22 )/2

2πi

1 2





α 22 p+ p +

e dp   p 1+



dp 1 + 1/ν 2



1 ν2 ⎛

e

1 2

α2 1 p p −1 / ν 2 ν2



p  − 1/ν 2 (p − 1)

⎜ 2 ⎝α2



p + 1/ ν 2 1+ 1/ ν 2

p



8

+

α2 1



p + 1/ ν 2 1+ 1/ ν 2 p

p+ 1/ν 2 1+1/ν 2





⎞ ⎟ ⎠

9 −1



p+ 1/ν 2 1+1/ν 2



1   2 2 (α 2 + α 2 ν 2 + α 22 ν 2 p+ α 21 /p )  p + 1/ν 2 e 2(1+ ν 2 ) 2 1 e−(α 1 + α 2 )/2 dp = 2πi 1 + 1/ν 2 p (p − 1) 2 2 2 1    (α ν p+α 1 /p ) p + 1/ν 2 e 2(1+ ν 2 ) 2 κ dp , (7.93) = 2πi 1 + 1/ν 2 p (p − 1)

where the final contour integral encloses poles at p = 0 and p = 1, and the constant κ is given by 2

1

2

κ = e−(α 1 + α 2 )/2 e 2 ( 1 + ν 2 ) − 12

=e

α 21 + ν 2 α 2 2 1+ ν 2

(α 22 + α 21 ν 2 )

.

(7.94)

By employing the partial fraction expansion for the denominator of the integrand 1+γ γ p+γ = − , p (p − 1) p−1 p

(7.95)

the probability becomes 1   (α 2 ν 2 p+ α 21 /p )  p + 1/ν 2 e 2(1+ ν 2 ) 2 κ dp Pr{r2 > r1 } = 2πi 1 + 1/ν 2 p (p − 1)    2 2 2 1 1/ν 2 1 + 1/ν 2 κ dp 2 ) (α 2 ν p+ α 1 /p ) 2 ( 1 + ν − e = 2πi 1 + 1/ν 2 p−1 p

= A1 − A2 ,

(7.96)

where the integrals A1 and A2 are defined implicitly by expanding the parenthetical term. By substituting the value of the parameter for κ in Equation (7.94),

226

Angle-of-arrival estimation

the two integrals A1 and A2 are given by    1 1 κ (α 2 ν 2 p+ α 21 /p ) dp e 2 ( 1 + ν 2 ) 2 A1 = 2πi p−1 α 22 ν 2 + α 21    − 2 2 α2 2 ν p+α1/p e 2 (1+ ν 2 ) 1 dp e 2 ( 1 + ν 2 ) = 2πi p−1 ⎛ ⎞

α22 ν 2 α12 ⎠ = QM ⎝ , 1 + ν2 1 + ν2 A ⎛A B 2 a2 σ 2 B 2 a2 ⎞ 1 B 22 22 B ⎜B σ σ B σ 12 ⎟ = QM ⎝C 2 σ 12 , C ⎠ σ2 1 + σ 22 1 + σ 22 1 1

  2 a22 2 a21 , = QM σ12 + σ22 σ12 + σ22

(7.97)

and   2 2 2 1 1 dp 2 ) (α 2 ν p+α 1 /p ) 2 ( 1 + ν e 1 + 1/ν 2 ν2 p 2 2 2 1  (α ν p+ α 1 /p ) κ e 2(1+ ν 2 ) 2 1 = dp 1 + ν 2 2πi p   2 2 2 α2 ν α1 κ = I0 1 + ν2 1 + ν2

κ A2 = 2πi

− 12

=

e



σ 2 2 a 22 2 a 21 + 22 σ 12 σ 1 σ 22 σ2 1 + 22 σ1

1+

σ 22 σ 12

⎛  2 a2

I0 ⎝

2 2 2 σ2 2 a1 σ 22 σ 12 σ 12

1+

σ 22 σ 12

⎞ ⎠

  a2+a2 − 12 22 a1 a2 σ12 σ1+σ2 . = 2 I0 2 2 e σ1 + σ22 σ1 + σ22

(7.98)

Consequently, the probability that one independent Rician variable fluctuates above another is given by

    a2+a2 − 21 22 a1 a2 σ12 2 a22 2 a21 σ1+σ2 Pr{r2 > r1 } = QM . − 2 I0 2 2 , e σ12 + σ22 σ12 + σ22 σ1 + σ22 σ1 + σ22 (7.99)

7.8.4

Correlated Rician random variables The discussion in the previous section described the evaluation of the probability of one Rician variable fluctuating above another under the assumption that the two variables are independent. In general, this is not true. Here the random

7.8 Threshold estimation

227

variables are associated with inner products between the observed array response and the theoretical array response for the mainlobe and the sidelobe. These random variables are correlated. Here an approach to translate the results from the previous section to the problem of correlated Rician variables is discussed. A thorough discussion of correlated Rician random variables can be found in Reference [293], for example. The Rician variables r1 and r2 are given by the magnitudes of the complex Gaussian variables x1 and x2 . In this section, uncorrelated variables are constructed by applying a transformation to the correlated variables. The newly constructed uncorrelated complex Gaussian variables will be indicated by the vector and scalars, x=



x1 x2



,

(7.100)

and the underlying correlated variables are indicated by y=





y1 y2

1 =√ nr



v† (φm .l. ) z v† (φs.l. ) z



,

(7.101)

√ where the notation from Equation (7.71) is used, and the normalization of nr is employed so that the variance of the noise is 1. In both cases, the desire is to determine if x2 > x1 or y2 > y1 . With the above vector notation, this can be addressed with the form   1 0 x r1 } = 2 2 2 ,  ,   a2 nr a2 nr 2 2 (1 + 1 − ρ ), (1 − 1 − ρ ) − QM , 2 2 (7.124)

where the SNR per receive antenna under the assumption of a single observation is a2 .

7.8.5

Unknown complex Gaussian signal For a transmitted signal with ns samples that are drawn independently from a unit-variance complex Gaussian distribution, represented by the row vector s ∈ C1×n s , the beamscan test statistic t(φ) (discussed in Section 7.3) as a function of angle φ is given by ˆ v(φ) t(φ) = v† (φ) Q 1 = v† (φ) Z 2 , ns

(7.125)

where the ns observed independent samples are contained in the matrix Z ∈ Cn r ×n s , Z = a v(φm .l. ) s + N ,

(7.126)

a indicates the received signal amplitude per receive antenna (implying the steering vector normalization v(φ) 2 = nr ), the complex Gaussian signal is indicated by s ∈ C1×n s , and N ∈ Cn r ×n s is the additive noise. The probability of the test statistic at some sidelobe fluctuating above the mainlobe Ps.l. (SNR) is given by  v† (φs.l. ) Z 2 >1 Ps.l. (SNR) = Pr v† (φm .l. ) Z 2 } = Pr{ v† (φs.l. ) Z 2 − v† (φm .l. ) Z 2 } > 0} .

(7.127)

We construct the output of the matched-filter response associated with the mainlobe y1 and the sidelobe y2 , respectively: y1 = v† (φm .l. ) Z = a nr s + v† (φm .l. ) N y2 = v† (φs.l. ) Z = a nr ρ s + v† (φs.l. ) N ,

(7.128)

where the normalization v† (φ)v(φ) = nr is assumed and the normalized correlation variable is defined to be ρ = v† (φs.l. )v(φm .l. )/nr . It is convenient to

232

Angle-of-arrival estimation

construct a matrix with the correlated variables that is denoted   y1 ∈ C2×n s . Y= y2

(7.129)

Similar to the discussion in Sections 7.8.2, 7.8.3, and 7.8.4, to evaluate the probability of confusion (which is a function of the F distribution), we need to construct a set of uncorrelated random variables by transforming correlated random variables. The related uncorrelated variables are denoted   x1 ∈ C2×n s . (7.130) X= x2 With the original variables contained within Y and the related uncorrelated versions contained within X, the two forms are related by the transformation matrix A ∈ C2×2 under the transformation X = AY.

(7.131)

By noting that the magnitude squared of the inner product of the steering vector and the observation matrix can also be expressed by using a trace v† (φ) Z 2 = tr{v† (φ) Z Z† v(φ)} ,

(7.132)

the difference between the magnitudes squared of the inner products can be expressed by    1 0 v† (φm .l. ) Z 2 − v† (φs.l. ) Z 2 = tr Y† Y 0 −1    1 0 −1 † −† = tr X A A X . (7.133) 0 −1 By exploiting the relationship found in Equation (7.111), the decorrelating transformation matrix is given by   1 1 + β (1 − β) e−iα √ A= , (7.134) 1 − β (1 + β) e−iα 4β where the parameters α and β need to be determined. Because the Gaussian variables have zero mean, X = Y = 0 ,

(7.135)

and because the values in s are drawn independently from sample to sample, the covariance CY ∈ C2×2 of Y is given by & % CY = Y Y† ( ' y1 y†1 y1 y†2 = y2 y†1 y2 y†2   P +1 ρ∗ (P + 1) 2 , (7.136) = ns nr ρ (P + 1) P ρ 2 + 1

7.8 Threshold estimation

233

where the SNR per sample per receive antenna is given by P = a 2 . In the end, the overall scale will not be significant, so it is convenient to consider the ˜ Y is given by normalized covariance C ˜Y = C

1 CY . ns n2r

(7.137)

The transformed normalized covariance matrix is given by ˜ Y A† = AC



σ12 0

0 σ22



.

(7.138)

The parameters α and β for Equation (7.134) can be found to be α = arg(ρ)

(7.139)

and β=

(P + 1) + (P ρ 2 + 1) + 2 ρ (P + 1) . (P + 1) + (P ρ 2 + 1) − 2 ρ (P + 1)

(7.140)

By evaluating Equation (7.138) with forms for α and β, the variances for the uncorrelated variables is given by 2(P + 1)(ρ + 1) β(P (1 − ρ) + 2) − P (ρ + 1) 2(P + 1)(ρ + 1) . σ22 = β(P (1 − ρ) + 2) + P (ρ + 1)

σ12 =

(7.141)

Because only the relative values of these variances is of interest, it is useful to consider the ratio, β [P (1 − ρ) + 2] + P (ρ + 1) σ12 . = σ22 β [P (1 − ρ) + 2] − P (ρ + 1)

(7.142)

Because the noise and the signal are assumed to be Gaussian, the difference expressed in Equation (7.133) is the difference between random χ2 variables. However, also from Equation (7.133), the test expressed as the difference between the magnitudes squared of the vectors can also be expressed as a test in terms of the ratio of these magnitudes squared. The ratio of two degree-normalized central χ2 variables is given by the F distribution, as discussed in Section 3.1.13. The probability density of a given value of the ratio q is denoted q ∼ pF (q; d1 , d2 ), where d1 and d2 indicate the degrees of the χ2 variables. In our case, the two χ2 variables have the same degree. If the ratio of two equal-degree complex χ2

234

Angle-of-arrival estimation

random variables with degree ns is denoted q, q=

"n s 1 2 m = 1 ℜ{x2 }m + σ 22 /2 m =1 "n s "n s 2 + 1 } ℜ{x 1 m m =1 m =1 σ 12 /2  "n s  2 {x2 }n "nns= 1 2 {x 1 }m m =1 "n s {x2 }n 2 , q˜ ; q˜ = "nns= 1 2 m = 1 {x1 }m

1 σ 22 /2 1 σ 12 /2

=

σ12 σ22

=

σ22 σ12

"n s

ℑ{x2 }m 2 ℑ{x1 }m 2

(7.143)

where q˜ is the unnormalized ratio associated with the test statistic. Confusion between the sidelobe and the mainlobe occurs when the random variable q˜ > 1, so that the probability of selecting the sidelobe over the mainlobe Ps.l. (SNR) is given by q > 1} Ps.l. (SNR) = Pr{˜  σ2 = Pr q > 12 σ2  ∞ = dq pF (q; 2ns , 2ns ) σ 12 /σ 22

= 1 − PF (σ12 /σ22 ; 2ns , 2ns ) ,

(7.144)

where PF (q; d1 , d2 ) is the cumulative distribution function for the F distribution discussed in Section 3.1.13. The cumulative distribution function can be expressed in terms of beta functions, discussed in Section 2.14.3. The cumulative distribution function for the F distribution is given by B PF (q; d1 , d2 ) =



q d1 q d1 + d2

; d21 , d22   B d21 , d22



.

(7.145)

Consequently, the probability of selecting the sidelobe over the mainlobe Ps.l. (SNR) is given by B Ps.l. (SNR) = 1 −

2 2

σ 12 σ 22

σ 12 σ 22

ns

n s + 2n s

; ns , ns



B(ns , ns ) B

=1−

 

σ 12 σ 12 + σ 22

; ns , ns

B(ns , ns )



,

(7.146)

where the ratio of the decorrelated variable variances is given by Equation (7.142) using Equation (7.140) in which the SNR per sample per receive antenna is given by P .

7.9 Vector sensor

H2

235

H1 E2

H3 E1

E3 Figure 7.3 Notional representation of a vector sensor. All three electric (E1 , E2 , E3 )

and magnetic fields (H1 , H2 , H3 ) are measured simultaneously based at a single phase center. The electric fields are measured by using the dipole antennas, and the magnetic fields are measured by using the loop antennas.

7.9

Vector sensor The antenna array discussed previously exploits the relative phase delay induced by the time delay of signals impinging upon each antenna to determine direction. A vector sensor employs an array of antennas at a single phase center. Consequently, there is no relative phase delay. Instead, the vector sensor finds the direction to a source by comparing the relative amplitudes [229]. The vector sensor employs elements that are sensitive to electric and magnetic fields along each axis, as seen in Figure 7.3. Depending upon the direction of the impinging signal, different elements will couple to the wavefront with different efficiencies. Because the polarization of the incoming signal is unknown, and different polarizations will couple to each antenna with different efficiencies, the incoming signal polarization must be determined as a nuisance parameter. The electric and magnetic fields are indicated by ⎞ ⎛ E1 (7.147) e = ⎝ E2 ⎠ E3

and

⎞ H1 h = ⎝ H2 ⎠ , H3 ⎛

(7.148)

respectively. Under the assumption of free space propagation, the Poynting vector [154] with power flux P is given by the cross product of the electric field and the magnetic field. Here, the unit-norm direction vector u indicates the direction from the receive array to the source (the opposite direction of the Poynting

236

Angle-of-arrival estimation

vector): u P = −e × h 1 u×e=− e×h×e P e 2 u×e=− h P

(7.149)

by using the relationship a × b × c = b(a · c) − c(a · b) .

(7.150)

The six receive signals are given by the three electric field measurements zE (t) = e + nE (t)

(7.151)

and e 2 h + nH (t) , (7.152) P where the noise for the electric and magnetic field measurements are indicated by nE (t) ∈ C3×1 and nH (t) ∈ C3×1 , respectively. The six measured receive signals are a function of direction and polarization, and are given by     I zE (t) V ξ(t) + n(t) , (7.153) = [u×] zH (t) zH (t) = −

where the noise vector n(t) is given by   nE (t) , n(t) = nH (t) the direction cross-product operator [u×] is given by ⎛ ⎞ 0 −{u}3 {u}2 [u×] = ⎝ {u}3 0 −{u}1 ⎠ , −{u}2 {u}1 0

(7.154)

(7.155)

and for a given polarization vector ξ(t) ∈ C2×1 , the matrix V ∈ R3×2 that maps the two polarization components orthogonal to the direction of propagation to the three spatial dimensions is given by ⎛ ⎞ −sin φ −cos φ sin θ V = ⎝ cos φ −sin φ sin θ ⎠ . (7.156) 0 −cos θ Here the angle φ is defined as the angle from the 1 axis in the 1–2 plane, and θ is the angle from the 3-axis. Because the vector sensor has no aperture, the intrinsic resolution is relatively poor, of the order one radian. This intrinsic resolution can be determined from the multiplicative constant term in the Cramer–Rao bound [229]. To achieve reasonable angle-estimation performance requires beamsplitting under the assumption of a high SNR signal.

Problems

237

0.4

sin φ

0.2 0.0

- 0.2 - 0.4 0.0

0.2

0.4 0.6 cos φ

0.8

1.0

Figure 7.4 Beam pattern as a function of angle φ from boresight of vector sensor

assuming eθ polarization for elevations of 0 (black, in the plane of the transmitter), 45 degrees (dark gray), and 67.5 degrees (light gray).

In Figure 7.4, the beam pattern for a vector sensor is displayed. Only the electric field along the polar direction (eθ in using the notation from Section 5.1) response is considered in the beam pattern. Because the vector sensor has no intrinsic aperture, the beamwidth is very wide. In addition to beam pattern in the plane (0 degrees), patterns for elevations of 45 and 67.5 degrees are both smaller in their response to the in-plane (0 degrees) excitation.

Problems 7.1 Considering a five-element regular linear array with 1/2-wavelength spacing and isotropic antennas, under the assumption of receive SNR per sample per antenna of 10 dB and 10 samples independently drawn from a complex Gaussian distribution for each source, evaluate and plot the pseudospectrum as a function of direction parameter u = sin φ for angle φ, where φ = 0 is along boresight: (a) for beamscan and MVDR with a single source at sin φ = −0.3, (b) for beamscan and MVDR with sources at sin φ = −0.3, 0.4, (c) for beamscan and MVDR with sources at sin φ = −0.4, −0.3, 0.4, (d) for beamscan and MVDR with sources at sin φ = −0.8, −0.4, −0.3, 0.0, 0.3, 0.4, 0.8. 7.2 Considering a five-element regular linear array with 1/2-wavelength spacing and isotropic antennas, under the assumption of receive SNR per sample per antenna of 10 dB where all sources coherently transmit the same 10 samples drawn from a complex Gaussian distribution, evaluate and plot the

238

Angle-of-arrival estimation

psuedospectrum as a function of direction parameter u = sin φ for angle φ, where φ = 0 is along boresight: (a) for beamscan and MVDR with a single source at sin φ = −0.3, (b) for beamscan and MVDR with sources at sin φ = −0.3, 0.4, (c) for beamscan and MVDR with sources at sin φ = −0.4, −0.3, 0.4, (d) for beamscan and MVDR with sources at sin φ = −0.8, −0.4, −0.3, 0.0, 0.3, 0.4, 0.8. 7.3 Considering the best unbiased angle-estimator variance for a linear array with mean-squared antenna position σy2 , with receive SNR per sample per antenna P and direction parameter u = sin φ for angle φ, (a) evaluate the ratio of best variance of direction parameter u estimation for a single transmitted sequence that is drawn from a Gaussian distribution relative to a known sequence, and (b) discuss the ratio in the regime of small SNR but large number of samples. 7.4 Considering the best unbiased angle-estimator variance for a linear array with mean-squared antenna position σy2 , with receive energy per sample per antenna P and direction parameter u = sin φ for angle φ, (a) evaluate best unbiased angle-estimator variance for angle estimation φ for a single transmitted sequence that is drawn from a Gaussian distribution, and (b) discuss the variance as φ approaches end fire of the array. 7.5 Show that the following relationship between Marcum Q-functions and Bessel functions is true, √ √ √ √ √ QM ( 2a, 2b) + QM ( 2b, 2a) = 1 + e−(a+b) I0 (2 a b) . (7.157) 7.6 Considering the problem of angle estimation based upon a single observation of a narrowband signal of wavelength λ for the antenna array with phase centers at positions {0, 1, 3, 5, 8}λ/2 along the {x}2 axis, find the approximate probability of confusing a sidelobe for a mainlobe as a function of per receive antenna SNR. 7.7 Considering the problem of angle estimation based upon a single observation of a narrowband signal of wavelength λ for the antenna array with phase centers at positions {0, 1, 3, 5, 8}λ/2 along the {x}2 axis, evaluate the variance bound using the method of intervals, keeping only the dominant sidelobe as a function of per receive antenna SNR. 7.8 Show for Equation (7.134) that the values for the parameters α and β given in Equations (7.140) and (7.139): (a) decorrelate the variables y1 and y2 found in Equation (7.129), (b) produce the variances presented in Equation (7.141). 7.9 For a single source known to be in the {x}1 –{x}2 plane, observed by vector sensor, evaluate the Cramer–Rao angle-estimation bound as a function of integrated SNR and polarization.

8

MIMO channel

By using the diversity made available by multiple-antenna communications, links can be improved [349, 109]. For example, the multiple degrees of freedom can be used to provide robustness through channel redundancy or increased data rates by exploiting multiple paths through the environment simultaneously. These advantages can even potentially be employed to reduce the probability of interception [1]. Knowledge of the channel can even be used to generate cryptographic keys [332]. The basic concept of a multiple-input multiple-output (MIMO) wireless communication link is that a single source distributes data across multiple transmit antennas, as seen in Figure 8.1. Potentially independent signals from the multiple transmit antennas propagate through the environment, typically encountering different channels. The multiple-antenna receiver then disentangles the signal from the multiple transmitters [99, 209, 308, 33, 116, 258, 84]. There is a wide range of approaches for distributing the data across the transmitters and in implementations for the receiver.1 While MIMO communication can operate in line-of-sight environments (at the expense of some of the typical assumptions used in MIMO communications), the most common scenario for MIMO communications is to operate in an environment that is characterized by complicated multipath scattering. Consequently, most, if not all, of the energy observed at the receive array impinges upon the array from directions different from the direction to the source. Consequently, the line-of-sight environment assumption employed in Chapters 6 and 7 is not valid for most applications of MIMO communications. The environment in which the link is operating is referred to as the channel. The capacity of a MIMO link is a function of the structure of this channel, so a number of channel models are considered in the chapter.

8.1

Flat-fading channel It is said that a signal is narrowband if the channel between each transmit and receive antenna can be characterized, to a good approximation, by a single 1

c Some sections of this chapter are IEEE 2002. Reprinted, with permission, from Reference [33].

240

MIMO channel

Receiver Transmitter Scattering Field

Figure 8.1 Notional multiple-input multiple-output (MIMO) wireless communication

link from a transmitter to a receiver. The transmitted signal propagates through some scattering field.

complex number. This is a valid characterization of the channel when the signal bandwidth B is small compared to the inverse of the characteristic delay spread Δt, B≪

1 . Δt

(8.1)

This regime is also described as a flat-fading channel because the same complex attenuation can be used across frequencies employed by the transmission and is consequently flat (as opposed to frequency-selective fading). The elements in the flat-fading channel matrix H ∈ Cn r ×n t contain the complex attenuation from each transmitter to each receiver. For example, the path between the mth transmitter and nth receiver has a complex attenuation {H}n ,m . A received signal z(t) ∈ Cn r ×1 as a function of time t is given by z(t) = H s(t) + n(t) ,

(8.2)

where the transmitted signal vector and additive noise (including external interference) as a function of time are denoted s(t) ∈ Cn t ×1 and n(t) ∈ Cn r ×1 , respectively. It is often convenient to consider a block of data of ns samples. The received signal for a block of data with ns samples is given by Z = HS + N,

(8.3)

where the received signal is given by Z ∈ Cn r ×n s , the transmitted signal is given by S ∈ Cn t ×n s , and the noise (including external interference) is given by

8.2 Interference

241

N ∈ Cn r ×n s . The notion that the channel is static for at least ns samples is implicit in this model.

8.2

Interference Historically, external interference (interference from other communication links) was easily avoided by employing frequency-division multiple access (FDMA). However, because of the significant increase in the use of wireless communications, external interference is becoming an increasingly significant problem. We typically identify interference as external interference. Interference from within a communication system’s own transmitters is typically is defined as internal interference. For a MIMO system this is particularly important because the signals from the multiple transmitters of a signal node typically interfere with each other at the receiver. One of the important wireless regimes is unregulated or loosely controlled frequency bands, such as the industrial, scientific, and medical (ISM) bands. In these bands, various communication systems compete in a limited spectrum. In addition, wireless ad hoc and cellular networks, discussed in detail in later chapters, can introduce cochannel interference (that is interference at the same frequency). By decreasing sensitivity to interference in wireless networks, higher link densities or higher signal-to-noise ratio (SNR) links can be achieved; thus, network throughput is increased. In any case, the ability to operate in interference can significantly increase the range of potential applications. To describe the effects of the interference, two essential characteristics need to be specified: the channel and knowledge of the interference waveform. The received signal described in Equation (8.3) is given by Z = HS + N  ˜, Jm Tm + N = HS +

(8.4)

m

where the interference contained within N is expressed in the sum of the terms ˜ is the remaining thermal noise. The interference channels Jm Tm . The term N Jm are typically statistically equivalent to those for the signal of interest of the channel H. The nature (really the statistics) of the external interference signal can have a significant effect on a communication system’s performance; thus, priors on the probability distributions for the interference can have a dramatic effect on receiver design. As an example, if the interference signal and its channel J T are known exactly, then the interference has no effect because a receiver that is aware of these parameters can subtract the contributions of the interference from the received signal perfectly, assuming a receiver with ideal characteristics. However, a receiver that cannot take advantage of this knowledge will be forced

242

MIMO channel

to operate in the presence of a large noise-like signal. From a practical point of view, the effect of known interference (even if the channel is unknown) can be minimized at the receiver by projecting onto a basis temporally orthogonal to the interfering signal, ˜ = Z P⊥ Z T † † −1 P⊥ T. T = I − T (TT )

(8.5)

In the limit of a large number of samples, ns ≫ rank{T}, the loss associated with this projection approaches zero because the size of the total signal space is ns , but the size of the transmitted signal subspace is fixed; thus the fraction of the potential signal subspace that is subtended by the projection operation goes to zero as ns become large. If the interfering signal is known exactly by the transmitter, then, in theory, the adverse effect of the interference can be mitigated exactly by using “dirty-paper coding” [67] that is discussed in Section 5.3.4. However, useful implementations of dirty-paper coding remain elusive. In the context of information theory, the worst-case distribution for an unknown noise, or in this case interference signal, is Gaussian. As discussed in Section 5.3.3, this distribution maximizes the entropy associated with the interference and consequently minimizes the mutual information. As a somewhat amusing aside, when receivers are optimized for Gaussian interference and noise, non-Gaussian signals can sometimes be the most disruptive.

8.2.1

Maximizing entropy As discussed in Section 5.3.3, entropy is a measure of the number of bits required to specify the state of a random variable with a given distribution. To evaluate MIMO channel capacity (discussed in Section 8.3), it is useful to identify the probability distributions that maximize entropy. The differential entropy (entropy per symbol) of a multivariate random variable is given by h(x) = −



dΩx p(x) log2 [p(x)] ,

(8.6)

where p(x) is the probability density function for the random vector x ∈ Cn r ×1 and dΩx indicates the differential hypervolume associated with the integration variable x, as discussed in Section 2.9.2. As discussed in Section 5.3.3, the Gaussian distribution maximizes entropy for a given signal variance. This property is also true in the case of multivariate distributions. The differential entropy of an nr -dimensional multivariate mean-zero complex Gaussian distribution denoted

8.3 Flat-fading MIMO capacity

243

x with covariance matrix R ∈ Cn r ×n r is    1 1 −x † R −1 x −x † R −1 x e log e h(x) = − dΩx 2 |R| π n r |R|π n r    1 −x † R −1 x −x † R −1 x nr − log = − dΩx e ] + log [e] log[e ] [|R|π 2 2 |R| π n r    † −1 1 e−x R x − log2 [|R|π n r ] − log2 [e] x† R−1 x = − dΩx |R| π n r  x† R−1 x −x † R −1 x nr = log2 [|R| π ] + log2 [e] e dΩx |R|π n r  y† y −y † y e ; y = R−1/2 x dΩy |R| = log2 [|R| π n r ] + log2 [e] |R|π n r  n  n  2 r r   e−{y}n  2 nr {y}m dΩy = log2 [|R| π ] + log2 [e] π m =1 n=1 = log2 [|R| π n r ] + log2 [e] nr = log2 ([π e]n r |R|) ,

(8.7)

where it is observed that the determinant of the interference-plus-noise covariance matrix |R| is the Jacobian associated with the change of variables from x to y, and the last set of integrals each evaluate to 1 because the integral over the & probability density is 1, and the variance of the zero-mean variable % Gaussian {y}m 2 = 1.

8.3

Flat-fading MIMO capacity The maximum spectral efficiency at which the effective error rate can be driven to zero (or in other words capacity) for a flat-fading link is found by maximizing the mutual information [68], as introduced in Section 5.3.2. The spectral efficiency is defined by the data rate divided by the bandwidth and has the units of bits per seconds per hertz (b/s/Hz) or equivalently (b/[s Hz]). The units of bits per seconds per hertz are just bits, although it is sometimes useful to keep the slightly clumsy longer form because it is suggestive of the underlying meaning. To find the channel capacity, both an outer bound and an achievability (inner) bound must be evaluated, and it must be shown that these two bounds are equal. In the following discussion, it is assumed without proof that Gaussian distributions are capacity achieving for MIMO links. More thorough discussions are presented in [308, 68]. There are various levels of channel-state information available to the transmitter. The spectral efficiency bound increases along with the amount of information available to the transmitter. As we use it here, the term capacity is a spectral efficiency bound. However, not all useful spectral efficiency bounds are capacity; because of some other constraints or lack of channel knowledge, a given spectral efficiency bound may be less than the channel capacity given complete

244

MIMO channel

channel knowledge. One might argue reasonably that only when the entire system has knowledge of the channel (with the exception of noise) is the maximum achievable spectral efficiency bound; thus, is the channel capacity. However, in practice it is common to refer to as capacity, multiple spectral efficiency bounds with different assumptions on system constraints. Given this practice, some care must be taken when a given spectral efficiency bound is identified as channel capacity. In maximizing the mutual information, a variety of constraints can be imposed. The most common constraint is the total transmit power. For the MIMO link, an additional requirement can be placed upon the optimization: knowledge of channel-state information (CSI) at the transmitter. If the transmitter knows the channel matrix, then it can alter its transmission strategy (which in theory can be expressed in terms of the transmit signal covariance matrix) to improve performance. Conversely, if the channel is not known at the transmitter, and this is more common in communication systems, then the transmitter is forced to employ an approach with lower average performance. Because the channel is represented by a matrix in MIMO communications compared to a scalar in SISO communications, the notion of channel knowledge is more complicated. In both cases, the channel state can be completely known exactly or statistically. However, in the case of MIMO, the notion of statistical knowledge is even more involved. As an explicit example, all flat-fading SISO channels of the same attenuation have the same capacity as a function of transmit power, but all MIMO channel matrices with the same Frobenius norm (which implies the same average attenuation) do not have the same capacity. An issue in considering performance of communication systems is in relating theoretical and experimental analyses of performance. In general, this is true for both SISO and MIMO systems, although it is slightly more complicated for MIMO systems. Theoretical discussions of MIMO communications are typically discussed in terms of average SNR per receive antenna. However, the SNR estimate produced from a channel measurement is not the same. Explicitly, this is understood by noting that the estimate of the SNR for a particular estimated channel instance is not the same as the average SNR for an ensemble of channels, * ) 2 ˆ 2  ∝ H F = SNR ∝ H F , (8.8) SNR nr nr where the notation ˆ· indicates an estimated parameter. This difference is discussed in greater detail in Section 8.11. Implicit in this formulation of SNR is the notion that each transmit antenna excites the channel with independent signals with equal power (the optimal solution for the uninformed transmitter). If the transmit antennas incorporate correlations to take advantage of the channelstate information (an informed transmitter solution), then this discussion is even more complicated. A more thorough discussion of channel-state information is presented in Section 8.3.1.

8.3 Flat-fading MIMO capacity

245

If the channel capacity over some bandwidth B is indicated by C, then under the assumptions of a flat-fading or spectrally constant channel of interest, the bounding spectral efficiency c and total data rate are related by C = B c.

(8.9)

Often without specifying any underlying assumptions or constraints being considered (which can lead to confusion), both the bounding spectral efficiencies and the bounding total data rate are referred to as the channel capacity. The information theoretic capacity of MIMO systems has been discussed widely, for example in References [308, 33]. The development of the informed transmitter (“water filling” [68]) and uninformed transmitter approaches is discussed in Sections 8.3.2 and 8.3.3. The relative performance of these approaches is discussed in Section 8.3.4. Here the informed transmitter will have access to an accurate estimate of the channel matrix and a statistical estimate of the interference. The application of channel information at the transmitter is sometimes given the somewhat unfortunate name “precoding,” although often this name implies a suboptimal linear precoding approach [330, 268, 274, 162, 197]. As will be developed in Sections 8.3.2 and 8.3.3, the capacities (the bounding spectral efficiency in units of bits per second per hertz) of the informed and uninformed transmitter in flat-fading environments are given by   (8.10) log2 I + R−1 H P H†  c= max P: tr{P}≤P o

and

    Po −1 c = log2 I + R H H†  , nt

(8.11)

respectively. In the informed transmitter case, the transmit spatial covariance matrix P ∈ Cn t ×n t contains the optimized statistical cross correlations between transmit antennas. The total transmit power is indicated by Po . The interferenceplus-noise spatial covariance matrix is indicated by R ∈ Cn r ×n r . For convenience, it is often assumed that the transmit spatial covariance matrix P and the interference-plus-noise spatial covariance matrix R ∈ Cn r ×n r are expressed in units of thermal noise. Under this normalization, a thermal noise covariance matrix is given by I. As a reminder, we are considering signals in a complex baseband representation. Consequently, for each symbol there are two degrees of freedom (real and imaginary), so the “1/2” in the standard form of capacity “1/2 log2 (1 + SNR)” is not present.

8.3.1

Channel-state information at the transmitter In information theoretic discussions about MIMO links, a variety of models are used with regard to the knowledge of the channel. There is some confusion with regards to what is meant by the channel. In the most general sense channel knowledge would include the complex attenuation from any transmit antenna to

246

MIMO channel

any receive antenna as a function of frequency and time. It would also include any noise or interference introduced in the channel. In discussions of dirty-paper coding, introduced in Section 5.3.4, it is the noise or the interference that is referenced when the concept of channel-state information is considered. In this chapter, and in most practical wireless communications, the focus of channel knowledge is the complex attenuation between transmit and receive antennas that is represented by the channel matrix for MIMO links. Channel-state information may also include knowledge of the statistical properties of the interference and noise, typically represented by the interference-plus-noise covariance matrix. It is typically assumed that the receiver can estimate the channel. This estimation can be done by employing joint data and channel estimation, or, more typically, by including a known training or pilot sequence with which the channel can be estimated as part of the transmission. At the transmitter, access to knowledge of the channel state is problematic. In rare circumstances, the transmitter can exploit knowledge of the geometry and an exact model for the environment (such as of line-of-sight channels), but this approach is rarely valid for terrestrial communications. If there is not a means for a transmitter to obtain channel-state information, then the transmitter is said to be uninformed. If there is a communication link from the receiver, then channel estimates can be sent back from the receiver to the transmitter, and the link has an informed transmitter. Approaches to efficiently encode channel estimates have been considered [195] and are discussed in Section 8.12.2. If the link is bidirectional, on the same frequency and using the same antennas, then reciprocity can be invoked so that the channel can be estimated while in the receive mode and then exploited during the transmit mode. As discussed in Section 8.12.1, there are some technical issues in using the reciprocity approach. When using either channel-estimation feedback or reciprocity, the time-varying channels can limit the applicability of these techniques [30]. If the channel is very stable, which may be true for some static environments, then providing the transmitter with channel-state information may be viable. If the channel is dynamic, as in the case of channels with moving transmitters and receivers, the channel may change significantly before the transmitter can use the channel-state information. In this case, it is said that the channel-state information is stale. In reaction to potentially stale channel-state information, one approach is to provide the transmitter access to statistical characteristics of the channel. As an example, if the typical distribution of the singular values of the channel matrix can be estimated, then space-time codes can be modified to take advantage of these distributions. Explicitly, if channels can be characterized typically by high-rank channel matrices, then codes with higher rates may be suggested. Conversely, if the channels can be characterized typically by low-rank channel matrices, the codes with high spatial redundancy may be suggested. Trading rate for diversity is discussed in Chapter 11. In addition, there are different levels of knowledge of interference for the transmitter. If the interference signals are known exactly at the transmitters, then

8.3 Flat-fading MIMO capacity

247

dirty-paper coding techniques can be employed [67], although practical implementations for dirty-paper coding is an open area of research. Also, the interference may be known in some statistical sense. The most common example would be for an interference-plus-noise spatial covariance to be estimated at the receiver and passed back to the transmitter. As will be shown in this chapter, for unknown Gaussian interference, the optimal channel-state information is the spatially whitened channel matrix. A whitened channel matrix is a channel matrix premultiplied by the inverse of the square root of the noise-plus-interference covariance matrix. Consequently, there are a variety of levels of knowledge of the channel state at the transmitter. Some common levels of channel-state information are listed here. Channel matrix

Interference

unknown known transmitter power known known known statistically

unknown unknown unknown known signal known statistically known statistically

This list is provided partly as a warning. In is not uncommon in the literature to make some assumptions about the channel-state information at the transmitter without providing a clear description of these assumptions.

8.3.2

Informed-transmitter (IT) capacity For narrowband MIMO systems, the coupling between the transmitter and receiver for each sample in time can be modeled by using Equation (8.2). In this section, it is assumed that the transmitter is informed with knowledge of the channel matrix. The transmitter also has knowledge of the spatial interferenceplus-noise covariance matrix, so that the interference is known in a statistical sense assuming Gaussian interference. For notational convenience, it is also assumed that power is scaled so that the variance of the baseband complex thermal noise (the noise in the absence of external interference) associated with each receive antenna is 1. By using the definitions in Equation (8.2), the capacity is given by the maximum of the mutual information [68] as is discussed in Section 5.3.2, *  ) p(z|s) I(z, s) = log2 p(z) = h(z) − h(z|s) ,

(8.12)

and is maximized over the source conditional probability density p(s|P) subject to various transmit constraints on the transmit spatial covariance matrix P ∈ Cn t ×n t . The differential entropies for the received signal and for the

248

MIMO channel

received signal conditioned by the transmitted signal are given by h(z) and h(z|s), respectively. For the sake of notational convenience, the explicit parameterization of time z(t) ⇒ z is suppressed. Here the maximum mutual information provides an outer bound on the spectral efficiency. As discussed in Section 5.3 and discussed for MIMO systems in [308], the mutual information is maximized by employing a Gaussian distribution for s. The worst-case noise plus interference is given by a Gaussian distribution for n. The probability distribution for the received signal given the transmitted signal p(z|s) is given by p(z|s) =

† −1 1 e−(z−H s) R (z−H s) . n r |R| π

(8.13)

The probability distribution for the received signal without knowledge of what is being transmitted p(z) is typically modeled by p(z) =

† −1 1 e−z Q z , |Q| π n r

(8.14)

where the combined spatial covariance matrix Q ∈ Cn r ×n r is given by Q = R + H P H† .

(8.15)

The differential entropy for the received signal given knowledge of what is transmitted h(z|s) is just the entropy of the Gaussian noise plus the interference h(n) because n = z − H s and is given by h(z|s) = h(n) = log2 (π n r en r |R|) .

(8.16)

Similarly, the entropy for the received signal is given by h(z) = log2 (π n r en r |R + H P H† |) .

(8.17)

Consequently, the mutual information in units of bits per seconds per hertz is given by I(z, s) = h(z) − h(z|s)

= log2 ([π e]n r |R + H P H† |) − log2 ([π e]n r |R|)   = log2 I + R−1 H P H†      = log2 I + R−1/2 H P H† R−1/2  .

(8.18)

There are a variety of possible constraints on P, depending on the assumed transmitter limitations. As an example, one might imagine imposing a peak power constraint upon each transmit antenna. In the following discussion, it is assumed that the fundamental limitation is the total power transmitted. The %optimization of the nt × nt noise-normalized transmit covariance matrix, & P = s s† , is constrained by the total thermal-noise-normalized transmit power, Po . This optimization falls under the category of nonlinear programming. A

8.3 Flat-fading MIMO capacity

249

unique solution exists if the Karush–Kuhn–Tucker (KKT) conditions are satisfied, as discussed in Section 2.12.2. If different transmit powers are allowed at each antenna, the total power constraint can be enforced by using the form tr{P} ≤ Po . The channel capacity is achieved if the channel is known by both the transmitter and receiver, giving cI T =

sup P: tr(P)=P o

log2 |In r + R−1/2 H P H† R−1/2 | .

(8.19)

To avoid radiating negative power, the additional constraint P ≥ 0 (that is all the eigenvalues of P are greater than or equal to 0) requires that only a subset of transmit covariance eigenvalues will be used. Much of the literature invokes the water-filling solution for capacity [314, 68] by employing the standard KKT solution at this point, where the mth eigenvalue for the optimized transmit covariance matrix P is given by λm {P} =

 ν+

1 −1/2 λm {R H H† R−1/2 }

+

.

(8.20)

The notation (a)+ = max(0, a) indicates here that if the value of argument is limited to non-negative values, and the parameter ν is varied so that the following condition is satisfied,  ν+ m

1 −1/2 λm {R H H† R−1/2 }

+

= Po .

The informed transmitter capacity is then given by    cI T = log2 1 + λm {R−1/2 H H† R−1/2 } λm {P} .

(8.21)

(8.22)

m

We redevelop the same capacity explicitly providing a useful form. The whitened channel R−1/2 H can be represented by the singular-value decomposition   D 0 (W W)† , (8.23) R−1/2 H = (U U) 0 D where the nonzero singular values and corresponding singular vectors are partitioned into two sets. A subset of n+ singular values of the whitened channel matrix is contained in the diagonal matrix D ∈ Rn + ×n + , and the remaining min(nr , nt ) − n+ are contained in the diagonal matrix D. In the following discussion, we develop the criteria for finding the subset of whitened channel matrix singular values. The corresponding left and right singular vectors are contained in U, U, W, and W. The columns of U ∈ Cn r ×n + are orthonormal, and the columns of W ∈ Cn t ×n + are orthonormal. For some subset (contained in D) of whitened channel singular values, the subspace of the nonzero eigenvector of P is constrained to be orthogonal to the columns of W,

250

MIMO channel

so that the term R−1/2 H P H† R−1/2 simplifies to R−1/2 H P H† R−1/2 = R−1/2 H P (W

W)

= R−1/2 H P W D† U†



D 0 0 D

†

(U U)†

= U D W† P W D† U† .

(8.24)

Given this decomposition, the transmit covariance P is optimized for possible subsets (contained in D) of whitened channel singular values. The best solution that satisfies the constraint of positive transmit power P > 0 (indicating that all the eigenvalues of transmit covariance matrix are positive) is selected. However, as one would expect, it is the singular values with larger magnitudes (modes with better propagation) that are more helpful. For a given test evaluation with n+ channel modes, the spectral efficiency optimization can be written by using the noise-free receive covariance C ∈ Cn + ×n + that satisfies the following relationship, C = DW† PWD† .

(8.25)

The total transmit power is given by Po ≥ tr{P}

= tr{W† P W} = tr{D−1 C (D† )−1 }

= tr{(D† D)−1 C}

(8.26)

because all of the power in P is contained in the subspace defined by the orthonormal matrix W, replacing the transmit covariance matrix with the quadratic form W† P W does not change the total power, tr{P} = tr{W† P W}. Because D is a real, symmetric, diagonal matrix, the transpose contribution of the Hermitian conjugate has no effect; thus, D† D = D∗ D. The capacity (optimized spectral efficiency) is given by cI T =

sup P: tr(P)=P o

=

log2 |In r + U D W† P W D† U† |

sup C ; tr(C (D † D ) −1 )=P o

log2 |In + + C| .

(8.27)

By employing a Lagrangian multiplier η ′ to enforce the constraint, the optimal noise-free receive covariance matrix can be found by evaluating the derivative with respect to some arbitrary parameter α of the noise-free receive covariance matrix C = C(α), so that  ∂  log2 |I + C| − η ′ tr{C(D† D)−1 } ∂α   −1 ∂C ′ † −1 ∂C − η tr (D D) . = log2 (e) tr (I + C) ∂α ∂α

0=

(8.28)

8.3 Flat-fading MIMO capacity

251

To simplify the expression, the notation η = η ′ / log2 (e) is used. This relationship is satisfied if C is given by the diagonal matrix C=

1 † D D − In + . η

(8.29)

The value for the Lagrangian multiplier η is found by imposing the total power constraint, Po = tr{P} = tr{C (D† D)−1 }  I † −1 = tr − (D D) η

Po + tr{(D† D)−1 } 1 = . η n+ Consequently, the noise-free receive covariance matrix C is given by   Po + tr{(D† D)−1 } C= D† D − I n + . n+

(8.30)

(8.31)

The non-negative power constraint is satisfied if the eigenvalues of the transmit covariance matrix are positive, P≥0

P = C (D† D)−1 ≥ 0 .

(8.32)

The capacity is maximized by employing the largest subset of channel singular values such that the above constraint is satisfied using Equation (8.31). The constraint can be rewritten such that the values of the diagonal matrix D† D must satisfy {D† D}m ,m ≥

n+ . Po + tr{(D† D)−1 }

(8.33)

Assuming that the selected singular values of the channel matrix contained in D are sorted by their magnitude, if Equation (8.33) is not satisfied for some {D† D}m ,m , it will not be satisfied for any smaller {D† D}m ,m . As a reminder, the diagonal entries in the diagonal matrix D† D ∈ Rn + ×n + contain the n+ largest eigenvalues of the whitened channel matrix 0 1 (8.34) λm R−1/2 H H† R−1/2 , where λm {·} indicates the mth eigenvalue. The resulting capacity, by substituting Equation (8.31) in Equation (8.27), is given by     P + tr{(D† D)−1 }  o †  D D . cI T = log2  (8.35)   n+

252

MIMO channel

In this discussion, it is assumed that the environment is stationary over a period long enough for the error associated with channel estimation to vanish asymptotically. In order to study the typical performance of quasistationary channels sampled from a given probability distribution, capacity is averaged over an ensemble of quasistationary environments. Under the ergodic assumption (that is, the ensemble average is equal to the time average), the mean capacity cI T  is the channel capacity. It is worth noting that this informed transmitter capacity is based upon an average total transmit power, tr{P} ≤ Po . For some practical systems, this may not be the best constraint. If the transmitter is operating near its peak power output, then the power limit may be imposed on a per transmit element basis so that {P}m ,m ≤

Po , nt

(8.36)

which is a different optimization than the one discussed in this section and is beyond the discussion in this text. In addition, there are other typically suboptimal approaches, sometimes denoted precoding, that may be logistically desirable [330, 268, 274, 162, 197]. Precoding techniques can be extended to consider the interaction on channel dynamics and accuracy of channel-state feedback [325].

8.3.3

Uninformed-transmitter (UT) capacity If the channel and interference are stochastic and not known at the transmitter, then the optimal transmission strategy for an isolated link is to transmit equal power from each antenna, P = Po /nt In t [308]. This optimization becomes more complicated in the context of a network, as discussed in Chapters 12, 13, and 14. Assuming that the receiver can accurately estimate the channel, but the transmitter does not attempt to optimize its output to compensate for the channel, the maximum spectral efficiency under the assumption of a diagonal transmit covariance matrix is given by     Po −1 †  R HH  . cU T = log2 In r + (8.37) nt

This is a common transmit constraint as it may be difficult to provide the transmitter channel estimates. In the following discussion, it is shown that this transmitter strategy is optimal in an average sense. This strategy can be demonstrated under the assumption of a random Gaussian (sometimes denoted Rayleigh) channel with independent elements. Forms for the capacity under other channel-fading models, such as Rician [112], are also possible to develop. Under this assumption, any unitary transformation of the whitened channel matrix is just as likely as any other, such that the probability

8.3 Flat-fading MIMO capacity

253

density p(R−1/2 H) p(R−1/2 H) = p(R−1/2 H U)

(8.38)

for any unitary matrix U. The goal is to optimize the transmit covariance matrix P. Because any Hermitian matrix can be constructed by U P U† , starting with a diagonal matrix P, there is no reason to consider any transmit covariance matrices with off-diagonal elements. Another way to view this is that under the random channel matrix assumption, the transmitter cannot have knowledge of any preferred direction. If the whitened channel matrix can be represented by ˜S ˜V ˜ † , then the ergodic (average the singular-value decomposition R−1/2 H = U over time) capacity is given by 5 6 ˜S ˜† V ˜ ˜ † P V| cU T  = log2 |I + S  6 5 ˜S ˜† V ˜ ˜ † P V} log2 1 + λm {S , (8.39) = m

˜ is a random unitary matrix, and S ˜ is a random diagonal matrix. Because where V the logarithm is compressive, the largest average sum will occur if the diagonal contributions in the power covariance matrix are equal. This is a consequence of the fact that for some real values am , the sum of the logarithm of 1 + am , explicitly  log(1 + am ) , (8.40) m

"

under the constraint of m am = constant, is maximized when the various elements are equal am = an . For a given total power, the maximum average capacity occurs when the variance in eigenvalues is minimized. This minimum variance occurs when the diagonal entries in the transmit covariance matrix P are equal. Consequently, under the assumption of an uninformed transmitter, when Equation (8.38) is satisfied, the optimal transmit covariance matrix is given by P=

Po I. nt

(8.41)

High SNR limit In the limit of the absence of external interference R → I, the ratio of the capacity of the MIMO link cU T to the SISO link cS I S O capacity in the limit of high SNR is given by the number of antennas used by the MIMO link. The single-input signle-output (SISO) link has a channel attenuation a. By using the notation λm {·} to indicate the mth eigenvalue of the argument (sorted so that the largest eigenvalue is given by m = 1), the ratio of capacities is

254

MIMO channel

given by cU T cS I S O

=

=

=

→ →

  log2 In r +

Po nt

  H H† 

log2 (1 + a2 Po ) 0 "n r log λ 2 m In r + m =1

Po nt

H H†

log2 (1 + a2 Po ) 0 "m in(n r ,n t ) In r + log λ m 2 m =1

Po nt

1

H H†

log2 (1 + a2 Po )

"m in(n r ,n t ) m =1

1

 0 1 log2 (Po ) + log2 λm n1t H H† log2 (Po ) + log2 (a2 )

"m in(n r ,n t )

log2 (Po ) log2 (Po )

m =1

= min(nr , nt )

(8.42)

in the limit of large transmit power. The convergence to this asymptotic result is very slow. Consequently, this often-quoted result is mildly misleading because practical systems typically work in SNR regimes for which this limit is not valid. Furthermore, the advantages of MIMO are often in the statistical diversity it provides, which improves the robustness of the link. Nonetheless, the above result and the following sections can be used to provide some insight into potential performance improvements or limits when used properly. High SNR and higher INR limit Here we develop the ratio for the capacity with cU T ,I and without cU T ,N I external interference for ni infinite power interference sources in the limit of high SNR. Implicit in the following discussion is the notion that the interference power is growing faster than the signal power of the intended signal. Furthermore, the assumption is employed that the interference can be modeled by asymptotically high-power Gaussian signals that are spatially correlated such that they are completely contained within a subspace of the interference-plus-noise covariance matrix. This assumption is important because in practice nonideal limitations of receivers typically cause the rank of the interference to grow as the power increases. This rank increase will overwhelm the degrees of freedom of the receiver. Here we will assume an ideal receiver and perfect parameter (that is, channel and interference-plus-noise covariance matrix) estimation. It is assumed here that the receive interference-plus-noise spatial covariance matrix has the form R = I + J J† ,

(8.43)

where power is normalized so that thermal noise contributes the identity matrix I to the covariance matrix and the receive spatial covariance matrix of the ni

8.3 Flat-fading MIMO capacity

255

interferers is given by J J† . Here the columns of the interference channel matrix J ∈ Cn r ×n i contain the array responses times the receive power for each interferer. As the interference power increases, the inverse of the spatial covariance matrix approaches a projection matrix P⊥ J that is orthogonal to the space spanned by J, R

−1

−1  J J† = I+ ni −1  1 J† J =I− J I+ J† ni ni → I − J (J† J)−1 J† = P⊥ J ,

(8.44)

by employing Equation (2.113) in the limit of large interference power compared to the noise. If the number of interfering antennas is equal to or larger than the number of receive antennas ni ≥ nr , then the capacity is zero because the projection matrix is orthogonal to the entire space except for a set of examples of zero measure in which the spatial response of the interferers are contained completely in a subspace that is smaller than the number of interferers. The ratio of the no-interference to high-interference capacities in the high-interference limit is given by    Po ⊥ † log + P H H  I n 2 r J nt cU T ,I   =   cU T ,N I log2 In r + Pn ot H H†     † ⊥ log2 In r + Pn ot P⊥ J H H PJ    =   log2 In r + Pn ot H H†   3 4 "m in(n r −n i ,n t ) † ⊥ log2 1 + Pn ot λm P⊥ J H H PJ m =1   . (8.45) = "m in(n r ,n t ) log2 1 + Pn ot λm {H H† } m =1

In the limit of high SNR, the capacity ratio is given by    3 4 "m in(n r −n i ,n t ) Po † ⊥ + log2 λm P⊥ log 2 nt J H H PJ m =1 cU T ,I   → "m in(n r ,n t ) cU T ,N I log2 Pn ot + log2 (λm {H H† }) m =1   "m in(n r −n i ,n t ) Po log 2 m =1 nt   → " m in(n r ,n t ) Po log2 n t m =1 →

min(nr − ni , nt ) , min(nr , nt )

(8.46)

256

MIMO channel

where we employ the observations that the summation only needs to occur over arguments of the logarithm that are not unity, and that the eigenvalues of the finite channel components are small compared to the large power term. The convergence to the final result is relatively slow. In general, the theoretical capacity is not significantly affected as long as the number of antennas is much larger than the number of interferers. A practical issue with this analysis is that at very high INR, the model that J can be completely contained within a subspace fails because of more subtle physical effects. As examples, the effects of dispersion (that is resolvable delay spread) across the array or receiver linearity can cause the rank of the interference covariance to increase. However, for many practical INRs, the analysis is a useful approximation.

8.3.4

Capacity ratio, cIT /cU T In general, the informed transmitter has higher capacity than the uninformed transmitter. However, there is also increased overhead. It is reasonable for a system designer to ask the question, is the increase in performance worth the overhead? In general, there are a number of subtleties related to system limitations and specifics of the assumed phenomenology which will often drive expected performance. However, here a few limiting cases are considered that will aid in developing system design intuition. High SNR limit At high SNR, cI T and cU T converge. At high SNR, for finite interference, all the available modes are employed; that is, n+ is equal to the minimum of the number of transmit and receive antennas, n+ = min(nr , nt ). If the number of receive antennas is larger than or equal to the number of transmit antennas nr ≥ nt , then this convergence can be observed in the large Po limit of the ratio of Equations (8.35) and (8.37),   † −1 −1   log2  P o + tr{(H n +R H ) } H† R−1 H cI T   =   cU T log2 In r + Pn ot H† R−1 H     † −1 H ) −1 } + log2 H† R−1 H nt log2 P o + tr{(H nR t   = .   log2 In t + Pn ot H† R−1 H

(8.47)

In the limit of large SNR (Po ≫ tr{(R−1 H† H)−1 }), the difference between the various channel eigenvalues becomes unimportant, and the capacity ratio is given

8.3 Flat-fading MIMO capacity

257

by 



  + log2 H† R−1 H     log2  Pn ot H† R−1 H     nt log2 Pn ot + log2 H† R−1 H   → nt log2 Pn ot + log2 |H† R−1 H|

cI T → cU T

nt log2

Po nt

= 1.

(8.48)

The convergence to one is relatively slow. If the number of transmit antennas is greater than the number of receive antennas nt > nr (considered in Exercise 8.1), then the result is essentially the same. Low SNR limit At low SNR, the informed transmitter selects the dominant singular value (n+ = 1) of the whitened channel. Essentially, the system is selecting matched transmit and receive beamformers that have the best attenuation path through the whitened channel. The corresponding eigenvalue of the dominant mode is given by d = λm ax {R−1/2 H H† R−1/2 } = D† D ,

(8.49)

where in this limit the matrix D collapses to a scalar of the dominant singular value because n+ = 1. In this limit of low SNR, the ratio of the informed to the uninformed capacity cI T /cU T is given by log2

cI T  =  cU T log2 In r + =

  log In r

="

m

="

m



P o + (d) −1 n+

 d

  R−1/2 H H† R−1/2    −1 d log P o +n(d) +   + Pn ot R−1/2 H H† R−1/2  Po nt

log (1 + Po d) 1 0 log λm In r + Pn ot R−1/2 H H† R−1/2

log (1 + Po d) 0 1 . log 1 + λm Pn ot R−1/2 H H† R−1/2 

(8.50)

In the low SNR limit, the eigenvalues are small, so the lowest-order term in the logarithmic expansion about one is a good approximation; thus, the capacity

258

MIMO channel

ratio is given by Po d cI T 1 0 →" Po cU T −1/2 H H† R−1/2 m λm n t R =

=

1 nt 1 nt

λm ax {R−1/2 H H† R−1/2 } 3 4 " −1/2 H H† R−1/2 m λm R λm ax {H† R−1 H} " , † −1 H} m λm {H R

(8.51)

by using Equation (8.35) with n+ = 1 and Equation (8.37). Given this low SNR asymptotic result, a few observations can be made. The spectral-efficiency ratio is given by the maximum to the average eigenvalue ratio of the whitened channel matrix H† R−1 H. If the channel is rank one, such as in the case of a multiple-input single-output (MISO) system, the ratio is approximately equal to nt . Finally, in the special, if physically unlikely, case in which R−1/2 H H† R−1/2 has a flat (that is, all equal) eigenvalue distribution, the optimal transmit covariance matrix is not unique. Nonetheless, the ratio cI T /cU T approaches one. It is worth repeating here that, when embedded within a wireless network, the optimization and potential performance benefits are not the same as an isolated link discussed above.

8.4

Frequency-selective channels In environments in which there is frequency-selective fading, the channel matrix H(f ) and the interference-plus-noise spatial covariance matrix R(f ) are functions of frequency f . Receiver approaches for frequency-selective channels are considered in more detail in Chapter 10; however, for completeness, it is discussed briefly here in the context of MIMO capacity. By exploiting the orthogonality of frequency channels, the capacity in frequency-selective fading can be calculated using an extension of Equations (8.35) and (8.37). As a reminder, in each frequency bin, only 1/nf of this power is employed. Similarly, in each frequency bin only 1/nf of the noise power is received. Consequently, if the power is evenly distributed among frequency bins, the noise-normalized transmit power Po is the same, independent of bin size. For the uninformed transmitter, this even distribution assumption leads to the frequency-selective spectral-efficiency bound, df cU T (Po ; H(f ), R(f )) df   "n f   Po −1 (fn ) H(fn ) H† (fn ) n = 1 Δf log2 I + n t R "n f ≈ n = 1 Δf    P 1 o ˇ −1 ˇ ˇ †  log2 I + R HH  , ≈ nf nt

cU T ,F S =

(8.52)

8.5 2 × 2 Line-of-sight channel

where the distance between frequency samples is given ˇ is given by frequency-partitioned channel matrix H ⎛ 0 0 0 H(f1 ) ⎜ 0 0 H(f2 ) 0 ˇ ≡⎜ H ⎜ . . ⎝ . 0

0

H(fn f )

259

by Δf , and the nf -bin ⎞

⎟ ⎟ ⎟, ⎠

(8.53)

and the frequency-partitioned interference-plus-noise spatial covariance matrix is given by ⎞ ⎛ 0 0 0 R(f1 ) ⎟ ⎜ 0 0 R(f2 ) 0 ⎟ ˇ ≡⎜ (8.54) R ⎟. ⎜ .. ⎠ ⎝ . 0

0

R(fn f )

In order to construct the discrete approximation, it is assumed that any variation in channel or interference-plus-noise covariance matrix within a frequency bin is insignificant. For the informed transmitter channel capacity, power is optimally distributed among both spatial modes and frequency channels. The capacity can be expressed as cI T ,F S ≈ max ˇ P

  1 ˇ −1 H ˇP ˇH ˇ † , log2 I + R nf

(8.55)

which is maximized by Equation (8.35) with the appropriate substitutions for the frequency-selective channel, and diagonal entries in D in Equation (8.33) are ˇH ˇ † . Because of the block diagonal structure of selected from the eigenvalues of H ˇ H, the (nt · nf ) × (nt · nf ) space-frequency noise-normalized transmit covariance ˇ is a block diagonal matrix, normalized so that in each frequency bin matrix P the average noise-normalized transmit power is Po , which can be expressed as ˇ tr{P}/n f = Po . There are a number of potential issues related to the use of discretely sampled channels. Some of these effects are discussed in greater detail in Section 10.1.

8.5

2 × 2 Line-of-sight channel While the 2 × 2 line-of-sight link is not a common terrestrial communications problem, for instructive purposes, it useful to consider it [33] because explicit analytic expressions are tractable. Here it is assumed that all antennas are identical and transmit and receive isotropically. If one imagines an environment in which the transmit and receive arrays exist in the absence of any obstructions, then this is a line-of-sight environment. While this phrase is used commonly in an informal way, there can be some confusion as to what is assumed. Here it is also assumed that there are no significant scatterers, so that the knowledge

260

MIMO channel

of the antenna geometry is sufficient to determine the channel. It is also typically assumed that each array is relatively small, so that each array is not able to resolve the antennas of the opposing array. In the following discussion, the assumption of small arrays is explicitly and parametrically broken. If it is assumed that the arrays are small, so that the arrays cannot resolve the antennas of the opposing array, then the channel can be characterized by a rank-1 matrix. This matrix is proportional to the outer product of steering vectors from each array pointing at the other, H = avw† ,

(8.56)

where a is the overall complex attenuation, and v and w are the receive and transmit array steering vectors, respectively. To further study the line-of-sight model, we consider an example 2 × 2 channel in the absence of external interference, and in which the transmit and receive arrays grow. To visualize the example, one can imagine a receive array and a transmit array each with two antennas so that the antennas are located at the corners of a rectangle, as seen in Figure 8.2. The ratio of the larger to the smaller channel matrix eigenvalues can be changed by varying the shape of the rectangle. When the rectangle is very asymmetric (wide but short) with the arrays being far from each other, the rank-1 channel matrix is recovered. The columns of the channel matrix H can be viewed as the receive-array response vectors, one vector for each transmit antenna, H=



2 (a1 v1

a2 v2 ) ,

(8.57)

where a1 and a2 are constants of proportionality (equal to the root-mean-squared transmit-to-receive attenuation for transmit antennas 1 and 2 respectively) that take into account geometric attenuation and antenna gain effects, and v1 and v2 are unit-norm array response vectors of the receive √ array pointing at transmit antenna 1 and transmit antenna 2, respectively. The 2 compensates for the use of the unit-norm array response vectors. For the purpose of this discussion, it is assumed that the overall attenuations are equal a = a1 = a2 , which is valid if the rectangle deformation does not significantly affect overall transmitter-to-receiver distances. The capacity of the 2 × 2 MIMO system is a function of the channel singular values and the total transmit power. The eigenvalues of the channel matrix innerproduct form



2

H H = 2a



1 v1† v2

(v1† v2 )∗ 1



(8.58)

8.5 2 × 2 Line-of-sight channel

261

H V1

V2

Figure 8.2 Simple line-of-sight 2 × 2 channel.

are given by  1 + 1 ± (1 − 1)2 + 4 v1† v2 2 μ = 2a2  2 † 2 μ1 = 2a 1 + v1 v2   μ2 = 2a2 1 − v1† v2 ,

(8.59)

by using the results from Section 2.3.2. From the above result, the normalized inner product of the array responses pointed at each transmit antenna is the important parameter. To make contact with physical space, it is useful to parameterize the “distance” (the inner product) between array responses by the physical angle between them. To generalize this angle, it is useful to express the angle in units of beamwidths. There are a variety of ways in which beamwidths can be defined. A common definition is the distance in angle between the points at which the beam is down from its peak by 3 dB. Here a somewhat more formal definition is employed. The separation between receive array responses is described in terms of the unitless parameter generalized beamwidths b introduced in Reference [96], where the distance in generalized beamwidths is defined by b=

0 1 2 arccos v1† v2 . π

(8.60)

It is assumed here that v1 = v2 = 1. The beamwidth separation indicates the angular difference normalized by the width of the beam. As it is defined, the generalized beamwidth separation varies from 0 corresponding to the same array response, to 1 corresponding to orthogonal array responses. For small angular separations, this definition of beamwidths closely approximates many ad hoc definitions for physical arrays. One of the useful applications for the generalized beamwidth definition is for complicated scattering environments for which the ad hoc or physical definitions might be difficult to interpret.

MIMO channel

µ1

Eigenvalue/a2 (dB)

262

0

- 10

µ2

- 20 - 30 0

0.2 0.4 0.6 0.8 1 Generalized Beamwidth Separation

Figure 8.3 Eigenvalues of HH† for a 2 × 2 line-of-sight channel as a function of array

c generalized beamwidth separation. IEEE 2002. Reprinted, with permission, from Reference [33].

The eigenvalues μ1 and μ2 are displayed in Figure 8.3 as a function of generalized beamwidth separation. When the transmit and receive arrays are small, indicated by a small separation in beamwidths, one eigenvalue is dominant. As the array apertures become larger, indicated by a larger separation, one array’s individual elements can be resolved by the other array. Consequently, the smaller eigenvalue increases. Conversely, the larger eigenvalue decreases slightly. Equations (8.33) and (8.35) are employed to determine the capacity for the 2 × 2 system. The water-filling technique first must determine if both modes in the channel are employed. Both modes are used if the following condition is satisfied, μ2 > Po > >

2 Po +

1 μ1

+

1 μ2

,

1 1 − μ2 μ1 v1† v2 ,  a2 1 − v1† v2 2

(8.61)

assuming μ1 > μ2 , where Po is the total noise-normalized power. If the condition is not satisfied, then only the stronger channel mode is employed and the capacity, from Equation (8.35), is given by cI T = log2 (1 + μ1 Po )   = log2 1 + 2a2 [1 + v1† v2 ] Po ;

(8.62)

8.5 2 × 2 Line-of-sight channel

263

Spectral Efficiency (b/s/Hz)

10 5 2 1 0.5 – 10

–5

0

5

10

15

20

2

a Po (dB) Figure 8.4 The informed transmitter capacity of a 2 × 2 line-of-sight channel, c assuming generalized beamwidth separations of 0.1 (solid) and 0.9 (dashed). IEEE 2002. Reprinted, with permission, from Reference [33].

otherwise, both modes are used and the capacity is given by    Po + 1 + 1  μ1 0   μ1 μ2 cI T = log2    0 μ2  2 + 7 2 μ1 μ2 Po + μ1 + μ2 1 = log2 2 μ1 μ2   1 0 = 2 log2 a2 Po 1 − v1† v2 2 + 1 1 0 − log2 1 − v1† v2 2 .

(8.63)

The resulting capacity as a function of a2 Po (mean SISO SNR) for two beamwidth separations, 0.1 and 0.9, is displayed in Figure 8.4. At low a2 Po , the capacity associated with small beamwidth separation performs best. In this regime, capacity is linear with receive power, and small beamwidth separation increases the coherent gain. At high a2 Po , large beamwidth separation produces a higher capacity as the optimal MIMO system distributes the energy between modes. The above discussion is useful to help develop some intuition, although it is unreasonable to expect communications to have access to widely separated antennas in most situations. However, most terrestrial communications are characterized by complicated multipath scattering. In complicated multipath environments, small arrays employ scatterers to create virtual arrays of a much larger effective aperture. The effect of the scatterers upon capacity depends on their number and distribution in the environment. The individual antenna elements can be resolved by the larger effective aperture produced by the scatterers. The larger effective aperture increases the distance between transmit antennas in terms of generalized beamwidths. As was demonstrated in Figure 8.3, the ability to

264

MIMO channel

resolve antenna elements is related to the number of large singular values of the channel matrix and thus the capacity.

8.6

Stochastic channel models In rich scattering environments, the propagation from each transmitter to each receiver appears to be uncorrelated for arrays that are not oversampled spatially. One can think of the channel matrix as a stochastic variable being drawn from some distribution. For many applications, it is reasonable to assume that the channel is approximately static for some period of time, changing relatively slowly. Furthermore, it is often reasonable to assume that the distribution is stationary over even longer intervals of time. The description of the MIMO channel can be relatively complicated. Each environment may have its own channel probability distribution. Because there is no one correct answer when one is considering channel phenomenology, it is useful to have a variety of approaches from which one can select [110, 301, 30], depending upon the characteristics of the particular environment of interest. Inevitably, a study of the channel phenomenology for each particular environment must be performed [30]; however, there is value to studies of system performance based on simple parametric models. When characterizing system performance, it is useful to simulate a distribution of channels. As an example, some modulations or space-time codes may have a relative advantage relative to other codes, depending upon the channel distributions. This is easy to see when comparing modulation approaches in channels that do or do not have frequency selectivity. Assuming simple receivers, some modulations are sensitive to frequency-selective fading. Similarly, there is a rate versus diversity trade-off [361] when considering space-time coding approaches. Practical optimization of this coding trade-off is sensitive to the channel correlation. Here the flat-fading (not frequency-selective) channel is considered. From Equation (8.11), it can be observed that capacity is a function of the eigenvalues of HH† ,     Po cU T = log2 I + H H†  nt   Po 1+ λm {H H† } (8.64) = log2 nt under the simplification that the interference-plus-noise covariance matrix becomes the identity matrix R = I. The total noise-normalized transmit power is given by Po , and λm {·} indicates mth eigenvalue. The channel can be decomposed into the product of two unitary matrices U and V and a diagonal matrix D by using the singular value decomposition, H = U D V† .

(8.65)

8.6 Stochastic channel models

265

As discussed in Section 8.3, the capacity is insensitive to the structure in U and V, and is dependent upon the singular values contained in D, λm {H H† } = λm {U D V† V D† U† } = λm {D D† }

= {D}m ,m 2 .

(8.66)

By focusing on the capacity, it can be seen that the important characteristics of the channel are the overall attenuation and the shape of the singular-value distribution. It is often convenient to separate these two characteristics. To help disentangle these two characteristics, a normalized channel F is defined here such that H = aF, where the real parameter a is the average attenuation defined by & % H 2F 2 , a = nt nr

(8.67)

(8.68)

and & % F 2F = nt nr .

(8.69)

SNR = a2 Po .

(8.70)

By using this normalization, the average SNR per receive antenna is given by

Because of the extra and unnecessary freedom in the amplitude for the parameter’s average attenuation a2 , noise-normalized total transmit% power& Po , and channel matrix H, it is sometimes assumed H = F, such that H 2F = nt nr , depending upon the situation.

8.6.1

Spatially uncorrelated Gaussian channel model At the other end of the spectrum of channel models from the line-of-sight model is the uncorrelated Gaussian channel model. If the signal associated with a particular transmit antenna seen at a receiver is the result of the superposition of a large number of independent random scatterers, then the central limit theorem suggests that the channel matrix element can be drawn from a Gaussian distribution. This model is the most commonly assumed one in the literature discussing MIMO communication. The channel matrices are drawn independently from a circular complex Gaussian distribution, such that H ∼ p(H) =

πn t n r

−2 † 1 e−tr{a H H } . 2n n t r a

(8.71)

This model is based upon the notion that the environment is full of scatterers. The signal seen at each receive antenna is the sum of a random set of wavefronts bouncing off the scatterers. For a SIMO system under the assumptions of a

266

MIMO channel

nondispersive array response (bandwidths small compared with the ratio of the speed of light divided by the array size) and scatterers in the far field of the array, the channel vector hm ∈ Cn r ×1 (mth column of H) is given by  am ,n v(km ,n ) hm = n

∼ g,

(8.72)

where the vector g is drawn from the limiting (that is, large number of scatterers) distribution, v(km ,n ) ∈ Cn r ×1 is the array response for a single wavefrontassociated direction of the wavevector km ,n , and am ,n is a random complex scalar. The values of am ,n are determined by the propagation from the transmitter impinging on the array from direction km ,n . For physically reasonable distributions for am ,n and km ,n , in complicated multipath environments, the central limit theorem [241] drives the probability distributions for the entries in hm to independent complex circular Gaussian distributions. Consequently, by employing the assumption that all transmit–receive pairs are uncorrelated, the entries in H are drawn independently from a complex circular Gaussian distribution. The random matrix with elements drawn from a complex circular Gaussian distribution with unit variance is often indicated by G, such that H = aG,

(8.73)

where a2 is the average SISO attenuation in power. This expected Frobenius norm squared of G is given by % & & % G 2F = {G}m ,n 2 m ,n

= nr nt .

8.6.2

(8.74)

Spatially correlated Gaussian channel model While the independent complex circular Gaussian model is an important stochastic model for describing distributions of channel matrices, it can miss some important characteristics sometimes present in real channels. In particular, because of the details of the environment, there can be spatial correlations in the directions to scatterers. The most general Gaussian model is constructed by using a coloring matrix M such that vec{H} ∝ M vec{G′ } ,

(8.75)

where the random matrix G′ ∈ Cn r ×n t has entries drawn independently from a complex circular Gaussian distribution, and all the cross correlations are contained in M ∈ Cn r n t ×n r n t . By employing a slightly more constrained physical model, a simplified formalism is constructed. Imagine that the environment is full of scatterers in the far field of the transmitter and receiver; however, the

8.6 Stochastic channel models

267

scattering fields as seen by either the transmit and receive array from the other array subtends a limited field of view. As a consequence, the random channel has spatial correlation. For the described situation, the model for the channel [110, 30], H ∝ Mr G′ M†t ,

(8.76)

vec{H} ∝ (M∗t ⊗ Mr ) vec{G′ } .

(8.77)

can be employed, so that

Consequently, this model is sometimes denoted the Kronecker channel. The matrices Mr and Mt introduce spatial correlation associated with the receiver and transmitter respectively. The spatially coloring matrices Mr and Mt can be decomposed by using a singular-value decomposition such that Mr = Ur Dr Vr†

(8.78)

Mt = Ut Dt Vt† .

(8.79)

and

The spatially correlated channel can then be represented by H = a Ur Dr Vr† G′ Vt† Dt U†t = a Ur Dr G Dt U†t ,

(8.80)

where G and G′ are matrices with elements drawn independently from a complex circular unit-variance Gaussian distribution. The matrices G and G′ are related by a unitary transformation. The two matrices are statistically equivalent because unitary transformation of a complex circular unit-variance Gaussian matrix with independent elements produces another Gaussian complex circular unit-variance Gaussian matrix with independent elements. Reduced-rank channels When one is simulating channels, random unitary and Gaussian matrices can be generated for a given average attenuation a and diagonal matrices Dr and Dt . There is significant literature on selection values for the average SISO attenuation, a [140, 260, 188]. However, it is less clear on how to determine values for Dr and Dt . One model is to assume that the diagonal values are given by some specified number of equal-valued elements and zero otherwise, of the form   Im r 0 , (8.81) Dr = 0 0 where mr sets the rank of Dr . The form of Dt is given by replacing mr with mt . In the channel model given in Equation (8.80), the unitary matrices Ur and Ut are full rank by construction. The Gaussian matrix G can, in principle, have any rank; however, the size of the set of reduced-rank Gaussian matrices

268

MIMO channel

is vanishingly small compared to the size of the set of full-rank matrices (that is the matrix G is full rank with probability one). Thus, the set of reducedrank Gaussian matrices forms a set of zero measure and can be ignored for any practical discussion. Because the rank of a matrix produced by the product of matrices can be no more than the smallest rank of the constituent matrices, this form would produce a channel matrix with a rank limited by the smaller of mr and mt . For the rank to be reduced, the unitary matrices UG and VG in the singular value decomposition of the Gaussian matrix G = UG DG VG would have to transform (which is a rotation in some sense) the subspace of Dr and Dt such that there is no overlap on one dimension. Given the random nature of the matrix G, this is extremely unlikely. Consequently, from any practical point of view, the rank of the channel is given by min(mr , mt ). Under this model, the expected Frobenius norm squared of the channel matrix is given by 5 6 % & H 2F = a2 tr{Ur Dr G Dt U†t Ut D†t G† D†r U†r } % & = a2 tr{Dr G Dt (Dr GDt )† } = a2

m r ,m t

j r = 1,j t = 1

% & {G}j r ,j t 2 = a2 mr mt ,

(8.82)

where Dr and Dt are Hermitian and idempotent (Dr Dr = Dr ) in this particular situation. The notion of a reduced-rank channel model has a mixed set of implications. From a phenomenological point of view, it is unlikely for any environment to be so free of scatterers that the channel is actually reduced in rank. However, it is possible for the smaller (in magnitude) singular values to be small enough that they are not of value from a communication point of view under the assumption of finite SNR. For example, imagine a case in which equal power is transmitted in the strongest and the weakest singular values. If the magnitude squared of the smallest singular value is 40 dB smaller than the largest, then the power coupled into this smallest mode only contributes 0.01% into the total received power, and has little effect on the total receiver power. A waterfilling solution would avoid using this small mode for any reasonable transmit power. Exponential shaping model An alternative approach for values contained in Dr and Dt is to assume a shaped distribution of singular values. An approach that matches measured channel distributions with reasonable fidelity [30] is to assume an exponential shaping. The spatial correlation matrices can be factored so and the receive shaping model Dr = ∆α r and the transmit shaping model Dt = ∆α t are

8.6 Stochastic channel models

269

positive-semidefinite diagonal matrices. The channel is then given by H = a Ur ∆α r G ∆α t U†t ∆α =



0

n

1

(8.83)

diag{α , α , . . . , α

n −1

}

tr (diag{α0 , α1 , . . . , αn −1 }2 )

,

(8.84)

where the shaping parameter α and the number of antennas n can be either αr or αt and nr or nt , respectively. For many environments of interest, the environments at the transmitter and receiver are similar, assuming that the numbers of transmit and receive antennas are equal and have similar spatial correlation characteristics. In this regime, the diagonal shaping matrices can be set equal, ∆α = ∆α L = ∆α R , producing the new random channel matrix H. The form of shaping matrix ∆α given here is arbitrary, but has the satisfying characteristics that in the limit of α → 0, only one singular value remains large, and in the limit of α → 1, a spatially uncorrelated Gaussian matrix is produced. Furthermore, empirically this model provides good fits to experimental distributions [30]. The normalization for ∆α is chosen so that the expected value of H 2F is a2 nt nr . Rician MIMO channel The correlating matrix approach is only one of a variety of possible modeling approaches. Another approach is to assume that there is a line-of-sight contribution in addition to the contribution from a rich scattering field. This approach is the spatial extension to the Rician channel [314]. If the transmit and receive arrays are small and far from each other, then the line-of-sight component (specular) of the channel can be characterized by the rank-1 contribution v w† , where v and w are deterministic, given by the array response to a plane wave. The stochastic contribution from the rich scattering field is given by G that has entries drawn independently from a circular complex Gaussian distribution. The channel by this model is given by

H=a

,

K v w† √ nt nr + K + 1 v w

,

1 G K +1



,

(8.85)

where K is the K-factor that varies according to the relative contribution of specular and random contributions. The normalizations of the specular contribution and the Gaussian components are defined so that the expected square of the Frobenius norm of the channel is equal to a2 nt nr . The expectation of the

270

MIMO channel

norm squared is given by  ' +, , & % K 1 v w† √ 2 2 G H = a tr nt nr + K + 1 v w K +1 , † ⎫( , ⎬ † √ K 1 vw · G nt nr + ⎭ K + 1 v w K +1 *  ) K 1 nt nr + tr G G† = a2 K +1 K +1 = a2 nt nr ,

(8.86)

where the observation that G has zero mean is used to remove the cross terms.

8.7

Large channel matrix capacity As has been discussed previously, a common channel modeling approach is to construct a matrix G by independently drawing matrix elements from a unitvariance complex Gaussian distribution, mimicking independent Rayleigh fading, H = aG.

(8.87)

This matrix is characterized by a relatively flat distribution of singular values and is an appropriate model for very rich multiple scattering environments. In the limit of a large channel matrix, the eigenvalue probability density function for a Wishart matrix with the form (1/nt )GG† asymptotically approaches the Marcenko–Pastur distribution [206, 315], as is discussed in Section 3.6. Of course, implemented systems will have a finite number of antenna elements; however, because the shape of the typical eigenvalue distributions quickly converges to that of the asymptotic distribution, insight can be gained by considering the infinite dimensional case.

8.7.1

Large-dimension Gaussian probability density The probability that a randomly chosen eigenvalue of the nr ×nr Wishart matrix (1/nt )GG† is less than μ is denoted Pκ (μ). Here G is an nr × nt matrix, and the ratio of nr to nt is given by κ = nr /nt . As discussed in Section 3.6, in the limit of nr → ∞, the probability measure associated with the distribution Pκ (μ) is given by pκ (μ) + cκ δ(μ) ,

(8.88)

where the constant associated with the “delta function” or atom at 0 is given by   1 . (8.89) cκ = max 0, 1 − κ

Probability Density

8.7 Large channel matrix capacity

271

0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 – 20

– 15

– 10 – 5 0 Eigenvalue (dB)

5

10

Figure 8.5 Eigenvalue probability density function for the complex Gaussian channel

((1/nt )GG† ), assuming an equal number of transmitters and receivers (κ = 1) in the c infinite dimension limit. IEEE 2002. Reprinted, with permission, from Reference [33].

The first term of the probability measure, pκ (μ), is given by ⎧ √ ⎪ ⎪ ⎨ (μ−a κ ) (b κ −μ) ; a ≤ μ ≤ b κ κ 2π μ κ pκ (μ) = , ⎪ ⎪ ⎩ 0 ; otherwise

(8.90)

where

√ aκ = ( κ − 1)2 √ bκ = ( κ + 1)2 .

(8.91)

The eigenvalue probability density function for this matrix expressed using a decibel scale is displayed in Figure 8.5. By using the probability density function, the large matrix eigenvalue spectrum can be constructed and is depicted in Figure 8.6.

8.7.2

Uninformed transmitter spectral efficiency bound In the large matrix limit, the uninformed transmitter spectral efficiency bound, defined in Equation (8.37) and discussed in References [259, 34, 33], can be expressed in terms of a continuous eigenvalue distribution,     Po †  cU T = log2 In r + HH  nt     1 GG†  = log2 In r + a2 Po nt    1 GG† log2 In r + a2 Po λm = nt m  ∞ ≈ nr dμ pκ (μ) log2 (1 + μ a2 Po ) , (8.92) 0

272

MIMO channel

Eigenvalue (dB)

0 −5 −10 −15 −20 −25 0

0.2 0.4 0.6 0.8 Fraction of Eigenvalues

1

Figure 8.6 Peak-normalized eigenvalue spectrum for the complex Gaussian channel

((1/nt )GG† ), assuming an equal number of transmitters and receivers (κ = 1) in the c infinite dimension limit. IEEE 2002. Reprinted, with permission, from Reference [33].

where λm {·} indicates the mth eigenvalue, and the continuous form is asymptotically exact. This integral is discussed in Reference [259].2 The normalized asymptotic capacity as a function of a2 Po and κ, cU T /nr ≈ Φ(a2 Po ; κ), is given by     1−ρ x 1 w− w+ + log2 , − Φ(x; κ) = ν log2 ν ρ 1 − w− ρ log(2) , ν 1  ν 2 1 ρ w± = + + ± 1+ρ+ − 4ρ , 2 2 2x 2 x   1 1 . (8.93) , ν = ρ = min r, κ max(1, κ) In the special case of M = nt = nr , the capacity is given by a2 Po cU T 2 ≈ (8.94) 3 F2 ([1, 1, 3/2], [2, 3], −4 a Po ) M log(2) √  √ 4a2 Po + 1 + 4a2 Po log 4a2 Po + 1 + 1 − 2a2 Po (1 + log(4)) − 1 , = a2 Po log(4) (8.95) where p Fq is the generalized hypergeometric function discussed in Section 2.14.2.

8.7.3

Informed transmitter capacity Similarly, in the large matrix limit, the informed transmitter capacity, defined in Equation (8.35), can be expressed in terms of a continuous eigenvalue distribution [34, 33]. To make the connection with the continuous eigenvalue probability 2

Equation (8.93) is expressed in terms of bits rather than nats as it is in Reference [259].

8.7 Large channel matrix capacity

273

density defined in Equation (8.88), D from Equation (8.35) is replaced with D = a2 nt Λ, where diagonal entries of Λ contain the selected eigenvalues of (1/nt ) GG† .     Po + 21 trΛ−1   a nt 2 a nt Λ cI T = log2    n+   2 ∞ a nt Po + nr μ c u t dμ′ pκ (μ′ ) μ1′ ≈ g nr log2 g nr  ∞ + nr dμ pκ (μ) log2 (μ) , (8.96) μc u t

where g is the fraction of channel modes used by the transmitter,  ∞ n+ g= dμ pκ (μ) , ≈ nr μc u t

(8.97)

and μcu t is the minimum eigenvalue used by the transmitter and is the solution to the integral in Equation (8.98), given by the continuous version of Equation (8.33), n+ dm = nt a2 μ > , ∞ Po + nr a 21n t μ c u t dμ pκ (μ) μ1 μcu t =

κ

∞ μc u t

a2 Po + κ

dμ′ pκ (μ′ ) ∞ μc u t

dμ pκ (μ) μ1

.

(8.98)

The approximations are asymptotically exact in the limit of large nr . For a finite transmit power, the capacity continues to increase as the number of antennas increases. Each additional antenna increases the effective area of the receive system. Eventually, this model breaks down as the number of antennas becomes so large that any additional antenna is electromagnetically shielded by existing antennas. However, finite random channel matrices quickly approach the shape of the infinite model. Consequently, it is useful to consider the antennanumber normalized capacity cI T /nr . The normalized capacity is given by   2 ∞ a Po + r μ c u t dμ′ pκ (μ′ ) μ1′ cI T ≈ g log2 nr κg  ∞ + dμ pκ (μ) log2 (μ) . (8.99) μc u t

By using the asymptotic eigenvalue probability density function given in Equation (8.90), the integrals in Equations (8.98) and (8.99) can be evaluated. The relatively concise results for κ = 1 are displayed here:  ∞ dμ pκ= 1 (μ) μc u t  √  μc u t (4 − μcu t ) μcu t + 4 arcsin 2 , (8.100) =1 − 2π

MIMO channel

10 Spectral Efficiency (b/s/Hz/M)

274

1 0.1 0.01

−30

−20

−10 0 a2 Po (dB)

10

20

Figure 8.7 Asymptotic large-dimension Gaussian channel antenna-number-normalized

spectral efficiency bounds, cI T /M (solid) and cU T /M (dashed) (b/s/Hz/M), as a function of attenuated noise-normalized power (a2 Po ), assuming an equal number of c 2002. Reprinted, with transmitters and receivers (κ = 1, M = nt = nr ). IEEE permission, from Reference [33].

and 



1 μ ,

dμ pκ= 1 (μ)

μc u t

=−

1 1 + 2 π

1 4 − μcu t + arcsin μcu t π

√

μcu t 2



.

(8.101)

To calculate the capacity, the following integral must also be evaluated,  ∞ dμ pκ= 1 (μ) log2 (μ) μc u t      3 3 μcu t 1 1 1 , , , , , = 4 3 F2 2 2 2 2 2 4     2 4 4 − μcu t − √ arcsec √ + μcu t μcu t  √ μcu t 2π . (8.102) log(μcu t ) × (1 − log[μcu t ]) − √ μcu t π log[4] By implicitly solving for cut-off eigenvalue μcu t , the capacity as a function of a Po is evaluated and is displayed in Figure 8.7. The uninformed transmitter spectral efficiency bound is plotted for comparison. For small a2 Po , μcu t approaches the maximum eigenvalue supported by pκ (μ). In this regime, the ratio of the informed transmitter to the uninformed transmitter capacity cI T /cU T approaches 4. To be clear, this limiting value for the ratio occurs in the case of a symmetric number of transmitters and receivers. Conversely, at large a2 Po , the normalized informed transmitter and uninformed transmitter spectral efficiency bounds converge, as predicted by Equation (8.48). 2

8.9 SNR distributions

8.8

275

Outage capacity The term outage capacity is poorly named because it is not really a capacity. However, it is a useful concept for comparing various practical systems. In particular, it is useful for comparing various space-time codes and receivers. Given the assumption of a stochastic model of the channel drawn from a stationary distribution defined by a few parameters, the outage capacity is defined to be the rate achieved at a given SNR with some probability [253]. Because the capacity is dependent upon the given channel matrix, under the assumption of stochastic channel, the capacity becomes a stochastic variable. If the capacity for some attenuated total transmit power a2 Po (the average SISO SNR), channel matrix H, and interference-plus-noise spatial covariance matrix R is given by c(a2 Po , H, R) for some random distribution of channels H, then the P (c ≥ η) is the probability that the capacity for a given channel draw is greater than or equal to a given spectral efficiency η. When the capacity for a given channel draw is greater than or equal to the desired rate, then the link can close, theoretically. Explicitly, the probability that the link can close, Pclose , is given by 4 3 Pr c(a2 Po , H, R) ≥ η = Pclose .

(8.103)

If the link does not close, then it is said to be in outage. Implicit in this assumption is the assumption that the capacity is being evaluated for a single carrier link in a flat-fading environment. When the link has access to alternative types of diversity, the discussion becomes more complicated. Typically, the probability of closing is fixed and the SNR is varied to achieve this probability. As an example, the outage capacities under the assumptions of 90% and 99% probabilities of closure for an uncorrelated Gaussian 4 × 4 MIMO channel link in the absence of interference are displayed as a function of a2 Po (average SNR per receive antenna) in Figure 8.8. The curves in this figure are constructed by empirically evaluating the cumulative distribution function of spectral efficiency for each SNR value. The values for 90% and 99% probability of closing the link are extracted from the distributions.

8.9

SNR distributions In Section 8.3.4, the channel capacity in the limit of high and low SNR was considered. Here the discussion of approximations to the uninformed transmitter capacity in the limit of low SNR is extended. While capacity is the most fundamental metric of performance for a wireless communication system, it is often useful to consider the distribution of SNR, particularly at low SNR. At lower SNR, capacity is proportional to SNR. In addition, for practical systems, SNR is often much easer to measure directly.

MIMO channel

8 Spectral Efficiency (b/s/Hz)

276

90% prob close 99% prob close 6 4 2

0 −10

−5

2 0 a P o (dB)

5

10

Figure 8.8 Outage capacity for a 4 × 4 MIMO link under the assumption of an uncorrelated Gaussian channel as a function of SNR per receive antenna (a2 Po ). Link closure probabilities of 90% and 99% are displayed.

The uninformed transmitter spectral efficiency bound in the presence of interference is given by     Po −1 cU T = log2 In r + R H H†  . (8.104) nt

The form of the spectral efficiency bound can be simplified by considering low SNRs and strong interference limits. In the low SNR regime, the determinant can be approximated using the trace of the whitened received signal spatial covariance matrix by using the approximation M ≈ log(I + M) for small M,     Po −1 †  cU T = log2 In r + R HH  nt   Po −1 1 † tr log In r + R HH = log 2 nt  Po −1 1 † tr . (8.105) R HH ≈ log 2 nt

By defining R ≡ I + Pin t M M† and using Woodbury’s formula from Section 2.5, the inverse of the interference-plus-noise covariance matrix is given by  −1 † M , (8.106) (I + Pin t MM† )−1 = I − Pin t M I + Pin t M† M

which in the strong interference regime becomes the projection operator (I + Pin t MM† )−1 ≈ P⊥ M

† −1 M† , P⊥ M ≡ I − M(M M)

(8.107)

projecting onto a basis orthogonal to the space spanned by M. Consequently, the low SNR capacity in the presence of strong interference is given by

8.9 SNR distributions

 Po ⊥ 1 tr PM HH† log 2 nt $2 1 Po  $ $P⊥ $ , = M hn log 2 nt n

277

cU T ≈

(8.108)

where hn is the nth column of the channel matrix H, and we have made use of the idempotent property of projection matrices. In this limit, the spectral efficiency bound can be expressed as the sum of beamformer outputs each having an array-signal-to-noise ratio (ASNR) (which is the SNR at the output of a receive beamformer), $ 1 Po  $ $ $h†n P⊥ cU T ≈ M hn log 2 nt n ≡

nt $ † $2 1 Po  $wn hn $ log 2 nt n = 1

P⊥ M hn ⊥ PM hn nt  ASNRm

wn =



1 log 2

m =1

1 ζ, ≡ log 2

(8.109)

where ζ is the sum of ASNRs optimized for each transmit antenna.

8.9.1

Total power As an example, the uncorrelated Gaussian channel model is assumed. The channel matrix is proportional to a matrix, G ∈ Cn r ×n t , where the entries are independently drawn from a unit-norm complex circular Gaussian distribution such that H = aG.

(8.110)

By using the notation that gn is the nth column of G, the low SNR spectral efficiency bound (that is, when a2 Po is small) in the presence of strong interference is given by 1 a2 Po   ⊥ 2 cU T ≈ PM gn log 2 nt n =

= =

1 a2 Po  † † gn UU† P⊥ M UU gn log 2 nt n 1 a2 Po  ′ † (gn ) JK gn′ log 2 nt n K ·n 1 a2 Po t gm 2 , log 2 nt m = 1

(8.111)

278

MIMO channel

where U is a unitary matrix that diagonalizes the projection matrix such that the first K diagonal elements are one and all the other matrix elements are zero, represented by JK , assuming K is the rank of the projection matrix, K = nr − ni ,

(8.112)

where ni is the number of interferers for nr > ni . While the particular values change, the statistics of the Gaussian vector g are not changed by arbitrary unitary transformation Ug, so the statistics of each element of gn and gn′ are the same. Here gm is used to indicate a set of random scalar variables sampled from a unit-norm complex circular Gaussian distribution. As a consequence, the statistical distribution of the approximate spectral efficiency bound is represented by a complex χ2 -distribution. The array-signal-to-noise ratio, denoted ASNRm , is the SNR at the output of the beamformer associated with the mth transmitter, assuming that the strong interferer is spatially mitigated. In the low SNR regime, the presence of the other transmitters does not affect the optimal beamformer output SNR. The probability density function of the complex χ2 -distribution from Section 3.1.11 is given by pC χ 2 (x; N ) =

xN −1 −x e . Γ(N )

(8.113)

Thus, the following probability density function for the low SNR sum of ASNR ζ can be expressed as   nt nt ζ C p(ζ) ≈ pχ 2 2 ; (nr − ni ) · nt . (8.114) 2 a P0 a Po Similarly, the cumulative distribution function (CDF) for the complex χ2 -distribution is given by  x0 PχC2 (x; N ) = dx f (x; N ) 0

=1−

γ(N, x0 ) , Γ(N )

(8.115)

where γ(N, x0 ) is the incomplete gamma function. Consequently, the CDF for the sum of the ASNRs ζ is given by   2 C nt , a Po ; K · nt P (ζ) = Pχ 2 ζ   γ [nr − ni ] · nt , an2tPζo =1− . (8.116) Γ([nr − ni ] · nt ) As an example, the CDFs for 2 × 2, 3 × 3, and 4 × 4 MIMO systems, with and without a single strong interferer, are compared as in Figure 8.9. The horizontal axis is normalized such that the average SISO total receive power, normalized by a2 Po , is 0 dB. In this flat, block-fading environment, the SISO system (not shown)

8.9 SNR distributions

279

CDF

1 0.5

2 ×2

0.1 0.05

4 ×4

3 ×3

0.01 0.005

0 Interferer 1 Interferer

-10

-5

0 SASNR/a2 Po (dB)

5

10

Figure 8.9 CDFs for total receive power at the output of beamformers, which is the

sum of ASNRs (SASNR or ζ), for 2 × 2, 3 × 3, and 4 × 4 MIMO systems, with (solid) and without (dashed) a single strong interferer.

would perform badly. In the low SNR regime, the information-theoretic spectral efficiency bound is proportional to the sum of ASNRs ζ. Thus, the probability of outage is given by the complementary CDF (that is, 1 − CDF). For example, the 99% reliability (or outage capacity) is associated with the sum of ASNRs ζ at the probability of 0.01. Because of the spatial diversity and because of the receive-array gain, performance improves as the number of antennas increases. In the presence of a strong interferer, the 4 × 4 MIMO system receives more than 13 dB more power than the 2 × 2 system if 99% reliability is required. At this reliability, the 3 × 3 MIMO system has only lost 3 dB compared to the average SISO channel. At this reliability, a SISO system would suffer significant losses compared to the MIMO systems and would have infinite loss in the presence of a strong interferer.

8.9.2

Fractional loss The distribution of fractional loss caused by spatially mitigating strong interference can be found in a manner similar to that in Section 8.9.1. Using Equation (8.108), the ratio of low SNR capacity with and without the presence of the strong interference is given by "K ·n t gm 2 cin t m , (8.117) η ≡ U T ≈ "K ·n t " n r ·n t 2 2 cU T n =K ·n t + 1 gn n = 1 gn + which is described by the beta distribution, discussed in Section 3.1.17, where the difference between the number of receivers and strong interferers is indicated by K = nr − ni . Here, channel coefficients are proportional to gm ∼ CN (0, 1) that are drawn from a circularly symmetric complex Gaussian distribution. Here the probability density of the beta distribution, which describes the statistical

MIMO channel

1 0.5

CDF

280

2´2

0.1 0.05

3´3

0.01 0.005

-14

-12

-10 -8 -6 -4 Power Loss (dB)

4´4 -2

0

Figure 8.10 CDFs for fractional power loss caused by spatial mitigation of a single

strong interferer for 2 × 2, 3 × 3, and 4 × 4 MIMO systems.

distribution of the ratio "k

2 m = 1 gm " j 2 m = 1 gm + m =1

"k

′ 2 gm

,

(8.118)

′ is a Gaussian random variable with the same statistics as gm , is given where gm by

pβ (x; j, k) =

Γ(j + k) j −1 x (1 − x)k −1 Γ(j) Γ(k)

(8.119)

and the corresponding CDF is given by  x0 dx pβ (x; j, k) Pβ (x0 ; j, k) = 0

Γ(j + k) B(x0 ; j, k) , = Γ(j) Γ(k)

(8.120)

where B(x; j, k) is the incomplete beta function. Consequently, the CDF P (η) of fractional loss η due to mitigating an interferer is given by P (η) ≈ Pβ (η; K · nt , [nr − K]nt )

≈ Pβ (η; [nr − ni ] · nt , ni · nt ) .

(8.121)

As an example, the comparison of the total ASNR loss CDFs for 2 × 2, 3 × 3, and 4 × 4 MIMO systems with a single strong interferer is shown in Figure 8.10. At a 99% reliability (or outage capacity) the sum of ASNRs ζ loss is no worse than −3.3 dB for a 4 × 4 MIMO system, but is worse than −12 dB for a 2 × 2 system. The advantage of multiple transmitters is illustrated in Figure 8.11. Given four receive antennas and the same total transmit power, there is a significant difference in the performance of a 1 × 4 system versus a 4 × 4 system. At the 99% reliability level, the sum of ASNRs ζ losses are −6.7 dB versus −3.3 dB,

8.10 Channel estimation

281

CDF

1 0.5 0.1 0.05 0.01 0.005

1´4

4´4 -6 -4 -2 Power Loss (dB)

2´4 -10

-8

0

Figure 8.11 CDFs for fractional power loss caused by spatial mitigation of a single

strong interferer for 1, 2, or 4 transmit antennas assuming 4 receiver antennas.

respectively. At a 99.9% reliability, the difference between the sum of ASNRs ζ losses is nearly 6 dB. The difference in performance is caused by the ability of the multiple transmitters to distribute information among multiple spatial modes, ensuring that the spatial mitigation of the strong interferer does not accidentally remove signal-of-interest power.

8.10

Channel estimation Although joint channel estimation and decoding is possible, for most decoding approaches and for any informed transmitter approach, an estimate of the channel is required. While, given some model for the environment, training or channel probing sequences [139] can be designed to improve performance [23], here it is assumed that the sequences associated with each transmitter are independent. The flat-fading MIMO model, assuming nt transmit antennas, nr receive antennas, and ns complex baseband samples is given by Z = HS + N , Po X+N =H nt = AX + N,

(8.122)

as defined for Equation (8.3), where the amplitude-channel product is given by A=

,

Po H, nt

(8.123)

282

MIMO channel

and the normalized reference signal (also known as a training or pilot sequence) X ∈ Cn t ×n s is given by

and is normalized so that

X= 

S

(8.124)

Po /nt

% & X X† = ns I .

(8.125)

The “thermal” noise at each receive antenna can be characterized by the variance. Here it is assumed that the units of power are defined so that the “thermal” noise variance is one. It is also assumed that the channel is temporally static. Here it is worth noting that, in the literature, the normalization of the channel is not always clear. Because the transmit power can be absorbed within the amplitude-channel product matrix estimate A or within the transmit signal S, its definition is ambiguous. Within the context of the discussion of theoretical capacity it is often convenient to explicitly express the power, so that the transmitted signal contains the square root of the power; however, in channel estimation, it is often assumed that the reference signal has some arbitrary normalization, and the transmit power is subsumed into the channel estimate. While this ambiguity can cause confusion, it is typically reasonably clear from context. Nonetheless, within the context of this section, we will endeavor to be slightly more precise to avoid any confusion. Under the assumption of Gaussian noise and external interference, the probability of ns received samples or given received signal is e−tr {(Z−AX) R (Z−AX) } , π n r n s |R|n s †

p(Z|X; A, R) =

−1

where the interference-plus-noise spatial covariance matrix is given by & % N N† . R= ns

(8.126)

(8.127)

Maximizing with respect to an arbitrary parameter α of A gives the following estimator: ∂p(Z|X; A, R) =0 ∂α ˆ X)X† = 0 ⇒ (Z − A ˆ = Z X† (X X† )−1 . A

(8.128)

This maximum-likelihood derivation of the estimator for the channel A is also the least-squared estimate of the channel. A remarkable observation is that the estimator does not use any knowledge of the interference-plus-noise spatial covariance matrix. However, channel-estimation performance is affected by interference and noise.

8.10 Channel estimation

8.10.1

283

Cramer–Rao bound Cramer–Rao bounds were introduced in Section 3.8. The variance of a parameter is given by the inverse of the Fisher information matrix 4 3 ˆ m ,n } = J−1 . var{(A) {m ,n },{m ,n }

(8.129)

Here for notational convenience, the couplet {m, n} indicates an index into a vector of size m · n. Similarly, {m, n}, {j, k} is used to specify an element of a matrix at row {m, n} and column {j, k}. This is done to avoid using the vector operation defined in Equation (2.35). For some known reference sequence X, the received signal mean Y is given by Y = Z = A X + N = AX,

(8.130)

under the assumption of zero-mean noise. Because the channel matrix is not present in the covariance ∂R/∂(A)m ,n = 0 ,

(8.131)

and the derivative of one conjugation with respect to the other is zero, ∂ A = 0, ∂{A}∗m ,n

(8.132)

the only contributing factor to the Fisher information from Section 3.8 for a Gaussian model is given by {J}{m ,n },{j,k } = −

∂2 log p(Z|A, X) ∂(A)∗m ,n ∂(A)j,k

∂2 log tr{(Z − A X)† R (Z − A X)} ∂(A)∗m ,n ∂(A)j,k  ∂(A X)† −1 ∂(A X) R = tr ∂(A)∗m ,n ∂(A)j,k  ∗ T † ∂(A ) −1 ∂(A) R X , (8.133) = tr X ∂(A)∗m ,n ∂(A)j,k =

where Wirtinger derivatives are being used. The derivatives of the channel are given by ∂A = ej eTk ∂(A)j,k ∂A† = en eTm ∂(A)∗m ,n

(8.134)

284

MIMO channel

where the em vector indicates a vector of zeros with a one at the mth row, ⎞ 0 ⎜ .. ⎟ ⎜ . ⎟ ⎟ ⎜ ⎟ =⎜ ⎜ 1 ⎟. ⎜ . ⎟ ⎝ .. ⎠ ⎛

em

(8.135)

0

Interference free For the sake of discussion, first consider the interference-plus-noise covariance in the absence of interference, given by R = In r ,

(8.136)

where power is normalized so that the noise per channel is unity. The information matrix is then given by  ∗ T † ∂(A ) −1 ∂(A) X R {J}{m ,n },{j,k } = tr X ∂(A)∗m ,n ∂(A)j,k 1 0 = tr (em e†n X)† I−1 ej e†k X 1 0 = tr (em e†n X)† ej e†k X = xk x†n δm ,j ,

(8.137)

where xm indicates a row vector containing the mth row of the reference sequence X, and δm ,j is the Kronecker delta function. For sufficiently long sequences of xm , xk x†n ≈ ns δk ,n

(8.138)

is a good approximation. Under this approximation, the sequences are approximately orthogonal. As a consequence, the Fisher information matrix becomes {J}{m ,n },{j,k } ≈ ns δm ,j δk ,n .

(8.139)

The information matrix is thus diagonal with equal values along the diagonal. For the interference-free channel estimate, the variance of the estimate is given by 1 3 0 4 ˆ m ,n = J−1 var (A) {m ,n },{m ,n } ≈

1 . ns

(8.140)

It is sometimes useful to consider the estimation variance normalized by the mean variance of the channel because it is the fractional error that typically

8.10 Channel estimation

285

drives any performance degradation in communication systems. The mean variance of an element in the channel matrix is given by & % A 2F Po = a2 (8.141) nr nt nt under the assumption that A is drawn randomly from a stationary distribution. The ratio of estimation variance to channel matrix element variance is then given by ˆ m ,n } ˆ m ,n } var{(A) var{(A) = A2F  a2 Pn ot nr nt

ˆ m ,n } var{(H) a2 nt ≈ ns a2 Po nt 1 , = ns SNR

=

(8.142)

where SNR indicates the mean SISO signal-to-noise ratio. In interference From above in Equation (8.133), the Fisher information matrix is given by  ∂(A∗ )T −1 ∂(A) X R {J}{m ,n },{j,k } = tr X† ∂(A)∗m ,n ∂(A)j,k 4 3 † † −1 (8.143) = tr xn em R ej xk .

The information matrix is given by

4 3 {J}{m ,n },{j,k } = tr x†n e†m R−1 ej xk = e†m R−1 ej xk x†n

= {R−1 }m ,j xk x†n

J = R−1 ⊗ (X X† )

J−1 = R ⊗ (X X† )−1 .

(8.144)

The variance of the channel estimate is then given by ˆ m ,n ) = {R}m ,m {(X X† )−1 }n ,n . var({A}

(8.145)

By using the same approximation employed previously for the interference-free bound, for which the mean of the reference signal is zero and covariance of the reference signal is approximately proportional to the identity matrix XX† ≈ ns In t ,

(8.146)

286

MIMO channel

the information matrix becomes J ≈ ns R−1 ⊗ In t 1 J−1 ≈ R ⊗ In t , ns

(8.147)

and the corresponding channel-estimation variance is given by ˆ m ,n ) ≈ var({A}

1 {R}m ,m . ns

(8.148)

For a rank-1 interferer, the interference-plus-noise covariance matrix is given by R = In r + a2i Pi vv† ,

(8.149)

where the array response v (containing the complex attenuations from the single interfering transmitter to each receive antenna) is constrained by v 2 = nr with interference received power per antenna a2i Pi . For this covariance structure under the approximately orthogonal signal assumption, the variance of the channel estimation becomes   ˆ m ,n ≈ 1 {In + a2 Pi vv† }m ,m var {A} i r ns 1 ≈ (8.150) (1 + a2i Pi {v}m 2 ) . ns Once again, it is sometimes useful to consider the ratio of the channel estimation variance to the average attenuation, ˆ m ,n } nt var{(A) ≈ (1 + a2i Pi vm 2 ) 2 A  F ns a2 Po nr nt



nt , ns SINRm

(8.151)

where SINRm is the received signal-to-noise-plus-interference ratio at the mth receive antenna.

8.11

Estimated versus average SNR It would seem that the SNR of a signal would be an easy parameter to define. However, its definition is somewhat problematic for block-fading MIMO channels. As was suggested in Section 8.10, the channel estimate, average attenuation, and transmit power are coupled. In attempting to analyze space-time codes, which are discussed in Chapter 11 in an absolute context, there are a number of technical issues. The first is that theoretical analyses of space-time codes often do not lend themselves to experimental interpretation. A primary concern is the definition of SNR. In addition, the performance of space-time codes is typically dependent upon channel delay properties. Delay spread in the channel translates to spectral diversity that can be exploited by a communication system.

8.11 Estimated versus average SNR

8.11.1

287

Average SNR In most discussions of space-time codes found in the literature, the code performance is evaluated in terms of bit error rate as a function of average SNR. This average is taken over random channels and noise. Often the entries in the channel and noise are assumed to be drawn independently from complex circularly symmetric Gaussian distributions, as discussed in Section 8.6.1. The random channel variable is often given in the context of a random channel F ∈ Cn r ×n t and some overall average channel attenuation a such that the channel matrix is given by H = aF,

(8.152)

where F is a random variable with the normalization  F 2F  = nt nr . By evaluating the expectation over noise, channels, and transmitted signal, the average SNR per receive channel is given by SNRave = SNR % & H S 2 =  N 2  % & tr{H S S† H† } = tr{N† N} % & tr{H† H S S† } = n n % †s &r % † & tr{ H H S S } = n n % † s & rP o tr{ H H n t I} = nr & 2 % F 2F a nPt o } = nr =

nt nr

a2 Po nt

nr = a Po , 2

(8.153)

where S S†  = ns Po /nt I. This average SNR is equivalent to the average singleinput single-output (SISO) SNR, assuming the same total power. When developing a simulation, it is often assumed that power is defined in units of noise and average attenuation is set to one a2 = 1. One can generate random channels and noise while scaling the transmit power to determine the error performance.

8.11.2

Estimated SNR The determination of SNR in an experiment can be different because the expectation often cannot be taken over a set of channels with a fixed average attenuation. Consequently, the estimate of the average attenuation and channel

288

MIMO channel

variation become coupled. Imagine the extreme case for which a single carrier system is employed with a static channel. The code is not exercised over an ensemble of channel matrices, and the estimated SNR is biased by the single particular channel draw. As described in Section 8.10, for some ns samples, the single-carrier channel response Z ∈ Cn r ×n s can be expressed by Z = AX + N,

(8.154)

 where A = Po /nt H denotes the amplitude-channel product, X ∈ Cn t ×n s is the normalized transmitted signal, and N ∈ Cn r ×n s is the additive noise. Under the assumption of a known transmit training sequence X, the channel can be estimated. In estimating the channel, an amplitude-normalized version X of the reference is typically employed such that X 2F = nt ns .

(8.155)

The least-squared-error channel estimator is given by ˆ = Z X† (X X† )−1 A = A X X† (X X† )−1 + N X† (X X† )−1 = A + N X† (X X† )−1 ,

(8.156)

where ˆ· indicates the estimated Recall that the expected independent & parameter. % 2 2 channel takes the form H F = a nt nr . From this, we can define an estimated receive SNR, which is coupled with the particular realization of the channel, $ $2 $ˆ$ $A$ 2  a Po F = . nt nt nr

(8.157)

The estimated SNR per receive antenna for a given channel realization and estimation is given by 2P  = a SNR o

=

Z X† (X X† )−1 2F . nr

(8.158)

In the limit of large integrated SNR for the channel estimate, the estimation error approaches zero and ˆ → A. A

(8.159)

8.11 Estimated versus average SNR

289

In this limit, a simple relationship between the estimated and average SNR can be found, 2  = A F SNR nr H 2F Pn ot = nr F 2F SNRave . = nt nr

(8.160)

To be clear, even though the channel is estimated perfectly, there is a bias in the SNR estimate. This discussion becomes somewhat complicated in the context of frequencyselective fading. Specifically, if we assume that orthogonal-frequency-division multiplexing (OFDM) modulation is employed where there is a large number of carriers, then an average SNR maybe estimated. Under the assumptions of a constant average attenuation across frequency, and of significant resolvable delay spread, which indicates significant frequency-selective fading, the average SNR can be found by averaging across the SNR estimates (remembering to perform this on a linear scale). In this regime, the average estimated SNR converges to the average SNR, 5 6  → SNR . SNR (8.161) However, if there is not significant delay spread, then there will not be a large set of significantly different channel matrices to average over. Consequently, the average SNR and the average estimated SNR will not be the same.

8.11.3

MIMO capacity for estimated SNR in block fading It is assumed here that we have block fading, such that for each draw from the probability distribution for the channel matrix, the channel matrix is constant for long period of time. There are various ways of describing the limiting performance of a MIMO link. The information-theoretic capacity of MIMO systems with uninformed transmitters is considered here in terms of the ergodic capacity and the outage capacity, each in terms of either the average SNR or the SNR measured for a given draw of the channel matrix, which is denoted here the estimated SNR. Here the estimated SNR is assumed to be accurate. From Equation (8.37), the capacity of a MIMO system for a particular channel in the absence of external interference is given by     Po †  HH  . (8.162) cU T = log2 In r + nt The ergodic capacity, introduced in Section 8.3.2, is the average capacity in which the expectation is taken over a distribution of channel matrices.

290

MIMO channel

Consequently, the ergodic capacity in the absence of external interference is given by ce = cU T   * )   Po = log2 In r + H H†  nt  * )   a2 Po = log2 In r + F F†  , nt

(8.163)

where the only remaining random variable is the channel distribution. From this expression, it can be observed that the standard formulation of the ergodic capacity is given in the context of the average SNR, which is a parameter that may or may not be able to be estimated in an experimental context. We can reformulate the relationship between the estimated and the average SNR for the asymptotic estimated channel (that is, the well-estimated channel) found in Equation (8.160) to replace the argument in the capacity, a 2 Po  nr . = SNR nt F 2F

(8.164)

Consequently, the ergodic capacity as a function of estimated SNR is given by *  )   nr †   ce, SNR F F = log + SNR (8.165) I  2  nr 2  , F F

where the expectation is taken over channel realizations. Similar to the discussion with regard to the ergodic capacity in Section 8.11.3, the outage capacity can be represented in terms of the estimated rather than the average SNR. The outage capacity co, SNR  is found by implicitly solving    a2 nt nr  , H, R ≥ η = Pclose . Pr c SNR H 2F

8.11.4

(8.166)

Interpretation of various capacities In Figure 8.12, the ergodic and outage capacities for both the average and estimated SNRs are presented. In general, all four curves are useful in different regimes. In simulations it is easy to use the average SNR. Both the outage and ergodic capacities can be evaluated. Often in practical applications, it is the outage capacity that is of interest because it is a closer match to the use of modern communications that allow frames to be resent in the case of outages. That is, the receiver of the dropped frame asks the transmitter to resend the frame. For experimental applications with limited frequency or temporal diversity, only the estimated SNR is available. Good space-time codes can operate within a few decibels of this estimated SNR capacity. For applications with significant diversity, such as OFDM systems with very large symbols and significant delay spread,

8.12 Channel-state information at transmitter

291

Spectral Efficiency (b/s/Hz)

5 4 3

99Out Est SNR Est SNR 99Out Ave SNR Ave SNR

2 1

0 −10

−5 0 5 10 SNR Per Receive Antenna (dB)

15

Figure 8.12 Performance bounds for a 2 × 2 MIMO system under the assumptions:

“99Out Est SNR” – the outage capacity with a 99% probability of closing the link under the assumption that the SNR per receive channel is estimated from the received signal; “ Est SNR” – the average or ergodic capacity under the assumption that the SNR is estimated from the received signal; “99Out Ave SNR” – the outage capacity with a 99% probability of closing the link under the assumption that the SNR is the average SNR given by a distribution of channels; “ Ave SNR” – the average or ergodic capacity under the assumption that the SNR is the average SNR given by a distribution of channels.

the coding should approach the ergodic capacity and the average estimated SNR should approach the average SNR.

8.12

Channel-state information at transmitter To perform informed transmit approaches, channel-state information (CSI) must be known at the transmitter. There are two ways typically used to provide this information, the first is reciprocity and the second is a feedback channel link. The channel may be tracked [190] or estimated in blocks of data. In either approach, a bidirectional link is assumed. Implicit in this discussion is the notion that the channel is sufficiently stable so that an estimate constructed at one point in time is a reasonable estimate at some later point in time [30]. Under some models, the changing channel can be predicted for some period of time [49, 302], although that possibility is not considered here.

8.12.1

Reciprocity For two radios, the reciprocity approach takes advantage of the physical property that, after time reversal is taken into account, the channel is in principle the same

292

MIMO channel

from radio 1 to radio 2 as it is from radio 2 to radio 1. As an example, in a flatfading environment if the channel from radio 2 to 1 is given by H2→1 , then the channel from radio 1 to 2 is given by H1→2 = HT2→1 .

(8.167)

There are, however, a few technical caveats to this expectation. Radios often use different frequencies for different link directions. In general, the channels in multipath environments decorrelate quickly as a function of frequency. If frequency-division multiple access (FDMA) is employed, then reciprocity may not provide an accurate estimate of the reverse link. Another potential practical issue is that the path within the radio is not exactly the same between the transmit and the receive paths. From a complex baseband processing point of view, these hardware chains are part of the channel. In principle, these effects can be mitigated by calibration. However, it is a new requirement placed upon the hardware that is not typically considered. The most significant potential concern is that the reciprocity approach only captures the interference structure at the local radio and does not capture the interference structure at the “other” radio. The optimal informed transmitter strategy is to use the interference-plus-noise whitened channel estimate. As an example, in a flat-fading environment the channel from radio 1 to 2 is given by H1→2 , the interference-plus-noise covariance matrix at radio 2 is denoted R2 , and the whitened channel matrix used by the optimal strategy is given −1/2 H1→2 . While a reasonable estimate for the channel can be made by by R2 reciprocity, the interference structure R2 cannot be estimated by a reciprocity approach. Consequently, for many informed transmitter applications, reciprocity is not a valid option.

8.12.2

Channel estimation feedback A variety of techniques provide feedback of channel estimates. This can be done directly or indirectly and with various degrees of fidelity. As an indirect approach, the transmitter can search for the best channel. In this approach, the receiver provides the transmitter a simple feedback message. At each iteration, the feedback message contains information indicating if the last iteration’s transmit approach was better or worse. The message may be simple or provide derivative information. More commonly considered is a technique that provides a direct channel estimate in the feedback message. One of the issues when implementing this approach is how to represent the channel (really the whitened channel) for feedback. Typically, lossy channel estimation source compression is considered. As an example, having the receiver feed back beamforming vectors that occupy sparsely on some Grassmannian manifold are considered in References [198, 195].

Problems

293

Problems 8.1 Develop the result in Equation (8.48) for the case in which nt > nr . 8.2 Under the assumption of an uninformed transmitter, 4 × 4 MIMO system and an i.i.d. complex Gaussian channel, find the probability that at least 50% of the energy remains after mitigating (a)

1

(b)

2

(c)

3

strong interferers. 8.3 Evaluate the capacity in Equation (8.52) if the interference, noise, and channel are all not frequency selective. 8.4 Considering the received signal projected onto a subspace orthogonal to a known interference that is presented in Equation (8.5). (a)

Express the projection in terms of the least-squares channel estimation.

(b)

Express the average fractional signal loss due to the temporal mitigation in terms of the number of samples.

8.5 Reevaluate the relations in Equation (8.48) under the assumption that the interference of rank ni is much larger than the signal SNR (INR ≫ SNR). 8.6 Evaluate the informed to uninformed capacity ratio cI T /cU T in the limit of an infinite number of transmit and receive antennas (as discussed in Section 8.7) and in the limit of low SNR as a function of the ratio of receive to transmit antennas κ. 8.7 For a frequency-selective 2 × 2 MIMO channel that is characterized with reasonable accuracy by two frequency bins with channel values ⎛ ⎞ 1 −3 0 0 ⎜ 0 ⎟ ˘ = ⎜ 3 −1 0 ⎟, H (8.168) ⎝ 0 0 1 2 ⎠ 0 0 −2 −1 evaluate the informed transmitter capacity expressed in Equation (8.55) as a function of per receive antenna SNR. 8.8 Consider the outage capacities (at 90% probability of closing) presented in Equation (8.103) for a 4×4 flat fading channel matrix characterized by the exponential shaping model introduced in Equation (8.83). Assume that the transmit and receive shaping parameters have the same value α. As a function of per receive antenna SNR (−10 dB to 20 dB), numerically evaluate and plot the ratio of

294

MIMO channel

the outage capacity (at 90% probability of closing) for the values of exponential shaping parameter α: (a) α = 0.25 (b) α = 0.5 (c) α = 0.75 to the outage capacity assuming that α = 1.

9

Spatially adaptive receivers

The essential characteristic of an adaptive receiver is to be aware of its environment and to adjust its behavior to improve its typical performance. One might notice that this description is relatively close to the definition of a cognitive radio, which will be discussed in Chapter 16, but the flexibility of an adaptive receiver is typically limited to estimating parameters used in its algorithms at the receiver. When arrays of antennas are employed, the adaptive receiver usually attempts to reduce the adverse effect of interference and to increase the gain on signals of interest [294, 246, 223, 205, 312, 248, 189, 26, 182]. This interference can be internal or external as discussed in Sections 4.1 and 8.2. In Figure 9.1, a typical communication chain is depicted. The information is encoded and modulated. While a continuous formulation is possible, most modern communication systems use sampled approaches, and the sampled approach will be employed here. The nt number of transmitters by the number of samples ns signal is indicated by S ∈ Cn t ×n s . For convenience, the signal S is represented at complex baseband. At the receiver, the transmitted signal, corrupted by the channel, is observed by nr receive antennas, represented at complex baseband by Z ∈ Cn r ×n s . In many receivers, to reduce complexity an estimate of the ˆ transmitted signal is evaluated as S. For the approaches considered within this chapter, it is assumed that a portion of the transmitted signal, represented by S, or more precisely a normalized version of the transmitted signal X ∈ Cn t ×n s , is “known” at the receiver. The normalization chosen here is X 2F = nt ns , such that the transmitted signal matrix S is given by , Po S= X, nt

(9.1)

(9.2)

where Po is the total noise-normalized power (really energy per sample). It is common for communication systems to transmit a predefined signal for a portion of a transmission frame. This portion of the transmitted signal is referred to as a reference signal, as a pilot sequence, or as a training sequence. The beamformer is constructed by using these training data and is then applied to extract an estimate of the transmitted signal in some region of time near the training data.

296

Spatially adaptive receivers

Figure 9.1 Basic communication chain. Depending upon the implementation, the

coding and modulation may be a single block. Similarly, the receiver and decoding may be a single block.

In some sense, the “right” answer is to not separate coding and modulation or even channel estimation. The optimal receiver would instead make a direct estimate of the transmitted information based upon the observed signal and a model of the channel. In reality, the definition of optimal is dependent upon the problem definition (for example, incorporating the cost of computations). Ignoring external constraints, the optimal receiver in terms of receiver performance is the maximum a posteriori (MAP) solution, that is, it maximizes the posterior proability. To begin, we consider the maximum a posteriori solution for a known channel. Consider an information symbol represented by a single element selection from a set α ∈ {α1 , α2 , . . . , αM } (which many take many bits or a sequence of channel uses to represent). Given this notation, the maximum a posteriori solution is formally given by α ˆ = argmaxα pα (α|Z) = argmaxα p(Z|α)

pα (α) p(Z)

= argmaxα p(Z|α) pα (α) ,

(9.3)

where Bayes’ theorem, as introduced in Equation (3.4), has been applied. As introduced in Section 3.1.7, pα (α|Z) indicates the posterior probability distribution for information symbols given some observed data, pα (α) indicates the prior probability distribution of the information symbols, and p(Z|α) is the conditional probability distribution of observing Z given some information symbol. Because it is expected that all information symbols are equally likely pα (αm ) = pα (αn ), the maximum a posteriori solution is given by the maximum-likelihood (ML) solution. As a specific example, in the case of the Gaussian MIMO channel, described in Equation (8.3), with external interference-plus-noise covariance matrix R ∈ Cn r ×n r , under the assumption of a known channel matrix, the Gaussian likelihood is maximized by searching over each transmitted sequence hypothesis

Spatially adaptive receivers

297

αm ⇒ Sm , where the subscript m indicates the mth hypothesis. The maximumlikelihood solution for the estimated symbol α, ˆ which corresponds to sequence ˆ S, is given by α ˆ ⇐ argmaxS m

1 πn r n s

|R|

e−tr{(Z−H S m )



R −1 (Z−H S m )}

.

(9.4)

Evaluating this form directly for any but the smallest set of hypotheses is untenable, although approaches to reduce the cost of the search have been considered as discussed in Section 9.6 and in Reference [72], for example. This optimization can be extended to a general maximum a posteriori receiver by including a searching over all possible channels H, and external-interferenceplus-noise covariance matrix R, in which case the estimate of the symbol α ˆ, associated with Sm , is given by α ˆ ⇐ argmaxS m ;H ,R

1 πn r

ns

|R|

e−tr{(Z−H S m )



R −1 (Z−H S m )}

pH (H) pR (R) pα (α) , (9.5)

where pH (H) is the probability density function for channel matrices, and pR (H) is the probability density function for interference-plus-noise covariance matrices. More typically, the likelihoods for each possible symbol as a function of time are passed to a decoder that estimates the underlying information bits. The decoder used in practice is usually far less computationally expensive than the full maximum-likelihood or maximum a posteriori search. A suboptimal approximation of the maximum-likelihood receiver searches over potential beamformers and consequently does not require explicit training data. An example is a receiver in which beamformers are guessed. For each guess, decoding is attempted. Once reasonable decoding performance is achieved, remodulated data (decoded data that are passed through the encoding and modulation blocks to build the reference) can be used as training data. This process can be repeated and receiver performance iteratively improves. A version of this receiver is discussed in Section 9.6 For many practical receiver implementations, it is useful to employ training and to separate the receiver into components. One common approach is to perform adaptive spatial processing before decoding. At the output of the spatial ˆ or more precisely X, ˆ is processing stage, an estimate of the transmitted signal S, provided. This estimate may be in the form of hard decisions for the symbols or may be a continuous approximation of the transmitted signal that can be used to calculate likelihoods of various symbols. In this chapter, a number of approaches for adaptive processing are discussed. Often the adaptive spatial processing is employed to remove both internal and external interference. When mitigating interference for which the signal is known (as in the case of a MIMO training sequences) or can be estimated, the interference can also be mitigated by using temporal interference mitigation. Most of these techniques discussed in this chapter focus on spatial processing; in Section 9.6, a combination of spatial and temporal processing is considered.

298

Spatially adaptive receivers

9.1

Adaptive spectral filtering There is a strong connection between approaches used for spectral filtering and those used for adaptive spatial processing. As an introduction, the Wiener spectral filter [346, 142, 256, 273] is considered here for a single-input single-output (SISO) link. Specifically, we will consider a sampled Wiener filter applied to spectral compensation, which is the minimum-mean-squared error (MMSE) rake receiver [254]. The name rake receiver comes from the appearance of the original physical implementation of the rake receiver in which the various mechanical taps off a transmission line used to introduce contributions at various delays looked like the teeth of garden rake. This filter attempts to compensate for the effects of a frequency-selective channel. The effect of channel delay is to introduce intersymbol interference, which is used to describe delay spread in the channel introducing copies of previous symbols at the current sample. Given some finite bandwidth signal, the channel can be accurately represented with the sampled channel if the bandwidth B satisfies Ts < 1/B. Note that the standard Nyquist factor of two for real signals is absent because these are complex samples and, consequently, we are taking B to span both the positive and negative frequencies at baseband. To be clear, there are a number of issues related to the use of discretely sampled channels. In particular, scatterers placed off the sample points in time can require large numbers of taps to accurately represent the channel effect. These effects are discussed in greater detail in Sections 4.5 and 10.1.

9.1.1

Discrete Wiener filter For a transmitted complex baseband signal s(t) ∈ C, a received complex baseband signal z(t) ∈ C, and channel impulse response h(t) ∈ C in additive Gaussian noise n(t) ∈ C, the receive signal is given by the convolution of the transmitted signal and the channel plus noise, z(t) =



dτ h(τ ) s(t − τ ) + n(t) .

(9.6)

In order to approximately reconstruct the original signal with minimum error, a rake receiver with coefficients wm is applied to the received signal, sˆ(t) =

 m

∗ wm z(t − m Ts ) ,

(9.7)

where sˆ(t) is the estimate of the transmitted signal, and Ts is the sample period for which it is assumed that the sampling is sufficient to satisfy Nyquist sampling

9.1 Adaptive spectral filtering

299

requirements. For an error ǫ(t), the mean-squared error is given by  ∗ wm z(t − m Ts ) − s(t) ǫ(t) = m

$2 ( '$ $ $ % & $ $ ∗ ǫ(t) 2 = $ . wm z(t − m Ts ) − s(t)$ $m $

(9.8)

The MMSE solution is found by taking a derivative of the mean-squared error with respect to some parameter α of the filter coefficient wm , & ∂ % ǫ(t) 2 = 0 ∂α

(9.9) ' ∗ (    ∂ ∗ ∗ = wn z(t − n Ts ) − s(t) wm z(t − m Ts ) − s(t) ∂α n m ' ∂  ∗ = w z(t − m Ts ) z ∗ (t − n Ts ) wn ∂α m ,n m ( ∗ z(t − m Ts ) s∗ (t) − s(t) z ∗ (t − n Ts ) wn + s(t)s∗ (t) − wm

∂ = ∂α

  m ,n

∗ wm z(t − m Ts ) z ∗ (t − n Ts ) wn

∗ − wm







z(t − m Ts ) s (t) − s(t) z (t − n Ts ) wm + s(t)s (t)

Constructing the filter vector w, autocorrelation matrix Q, and the cross-correlation vector v by using the definitions {w}m = wm

{Q}m ,n = z(t − m Ts ) z ∗ (t − n Ts ) {v}m = z(t − m Ts ) s∗ (t) ,

(9.10)

the derivative of the mean-squared error or average error power1 can be written as &  ∂  † ∂ % ǫ(t) 2 = w Q w − w† v − v† w + s(t)s∗ (t) ∂α ∂α   ∂ † w [Q w − v] + h.c. , (9.11) = ∂α where h.c. indicates the Hermitian conjugate of the first term. This equation is solved by setting the non-varying term to zero, so that the filter vector w is given 1

Strictly speaking, the output of the beamformer should be parameterized in terms of energy per symbol, but it is common to refer to this parameterization in terms of power. Because the duration of a symbol is known, the translation between energy per symbol and power is a known constant.



.

300

Spatially adaptive receivers

by Qw = v w = Q−1 v ; Q > 0 .

(9.12)

This result is known as the Wiener–Hopf equation [346, 256]. The result can be formulated in more general terms, but this approach is relatively intuitive. Thus, the MMSE estimate of the transmitted signal sˆ(t) is given by  ∗ wm z(t − m Ts ) sˆ(t) = m

wm = {Q−1 v}m .

9.2

(9.13)

Adaptive spatial processing By using the same notation as that found in Equation (8.122), the nt transmitter by nr receiver sampled flat-fading MIMO channel model can be given by either of two forms depending upon whether the power parameter is absorbed into the transmitted signal or the channel matrix in which the received data matrix Z is given by either Z = HS + N , Po X + N; =H nt or = AX + N;

A=

S= ,

,

Po X, nt

Po H, nt

(9.14)

where the received signal is indicated by Z ∈ Cn r ×n s , the channel matrix is indicated by H ∈ Cn r ×n t , the transmitted signal is indicated by S ∈ Cn t ×n s , the noise plus interference is indicated by N ∈ Cn r ×n s , the amplitude-channel product is indicated by A ∈ Cn r ×n t , the normalized transmitted signal is indicated by X ∈ Cn t ×n t , and the total thermal-noise-normalized power is indicated by Po . It may be overly pedantic to differentiate between these two forms (X versus S) because it is typically clear from context. Nonetheless, for clarity in this chapter, we will maintain this notation. By employing a linear operator, denoted the beamforming matrix W ∈ Cn r ×n t , ˆ is given by an estimate of the normalized transmitted signals X ˆ = W† Z , X

(9.15)

where the columns of the beamforming matrix W contain a beamformer associated with a particular transmitter. The complex coefficients within each column are conjugated and multiplied by data sequences from each receive antenna data stream. These modified data streams are then summed and are used to attempt to reconstruct the signal associated with a given transmitter.

9.2 Adaptive spatial processing

301

Adaptive spatial processing is sometimes referred to as spatial filtering. The word filter is used because of the strong formal connection between spatial processing and spectral processing. The spatial location of the antennas corresponds to the delay taps in a spectral filter. The spatial direction corresponds to the frequency. There is an unlimited number of potential receive beamformer approaches. Four approaches are discussed here: matched filter in Section 9.2.1, minimum interference (zero forcing) in Section 9.2.2, MMSE in Section 9.2.3, and maximum SINR in Section 9.2.4.

9.2.1

Spatial matched filter The concept of a matched filter is used to construct a beamformer that maximizes the receive power to thermal noise ratio associated with a particular transmitter. The beamformer that maximizes this power has a structure that is matched to the received array response for a particular transmitter. This type of beamformer is sometimes denoted a maximum ratio combiner (MRC). Note, this formulation does not necessarily maximize the signal-to-interference-plus-noise ratio (SINR). In the case of a line-of-sight environment, the filter corresponds to the steering vector for the direction to the particular transmitter. Here the beamformer for each transmitter is constructed individually. The beamformer for the mth transmitter is wm ∈ Cn r ×1 , and the channel between the mth transmitter and the receive array (the mth column of the amplitudechannel product A) is given by am ∈ Cn r ×1 . The transmit sequence from the mth transmitter is given by xm ∈ C1×n s . The received signal matrix Z is given by Z = AX + N  a m xm + N . =

(9.16)

m

By ignoring the signal from other transmitters (the internal interference) and the external interference plus noise, the power at the output of the beamformer associated with a particular transmit antenna Qm is given by & 1 % † wm am xm 2 ns % & 1 † tr{wm am xm x†m a†m wm } = ns † = tr{wm am a†m wm } ,

Qm =

(9.17)

where the expectation is over the transmitted signal, and the average power of the signal transmitted by the mth antenna is normalized to be one. Because the unconstrained beamformer that maximizes the power at the output has infinite coefficients, some constraint on the beamformer is required. Here, it is required

302

Spatially adaptive receivers

that the squared norm of the beamformer be unity, wm 2 = 1 .

(9.18)

The beamformer that maximizes the average power under this constraint is found by using the method of Lagrangian multipliers discussed in Section 2.12.1:  ∂  Qm − λm wm 2 ∂α  ∂  † † tr{wm am a†m wm − λm wm wm } = ∂α  †  ∂wm  + h.c. , = tr am a†m wm − λm wm ∂α

0=

(9.19)

where λm is the Lagrangian multiplier, α is some arbitrary parameter of wm , and h.c. indicates the Hermitian conjugate of the first term. This form is solved by the nontrivial eigenvector that satisfies am a†m wm = λm wm am . wm = am

(9.20)

The beamformer is matched to the array response or equivalently the appropriate column of the channel matrix. This result is consistent with the intuition provided by the Cauchy–Schwarz inequality, which requires that † wm am ≤ wm am .

(9.21)

The equality is achieved only if wm ∝ am . At this point, it is worth mentioning that amplitude-channel vector am is typically not known prior to the symbol estimation and must be estimated. The transmitter of interest (the mth) emits the sequence xm . Similar to the development of Equation (8.128), by assuming a Gaussian model, the log-likelihood is given by 3 4 log p(Z) = −tr (Z − am xm − A m X m )† R−1 (Z − am xm − A m X m ) − ns log |R| + const. ,

(9.22)

where the channel matrix without the channel vector from the transmitter of interest A m ∈ Cn r ×(n t −1) is given by A m = (a1 · · · am −1

am + 1 · · · an t ) .

(9.23)

Similarly, the transmit signal matrix without the transmitter of interest X m ∈ C(n t −1)×n s is given by X m = (xT1 · · · xTm −1

xTm + 1 · · · xTn t )T .

(9.24)

9.2 Adaptive spatial processing

303

The maximum-likelihood estimate (evaluated in Problem 9.3) of the amplitudechannel vector am under the Gaussian model is given by ˆm = a

† Z P⊥ X m xm

† xm P⊥ X m xm

,

(9.25)

where the the projection operator P⊥ X m orthogonal to the row space of X m is defined by † † −1 X m . P⊥ X m = I − X m (X m X m )

(9.26)

To evaluate the beamformer, here it is assumed that X is a known training sequence. By using Equation (9.25), the beamformer wm is given by wm ≈

† Z P⊥ X m xm

† Z P⊥ X m xm

.

(9.27)

If the transmit sequences are approximately orthogonal, which is typically true, ˆm = Z x†m (xm x†m )−1 . then a

9.2.2

Minimum-interference spatial beamforming While the matched-filter beamformer discussed in the previous section ignored the presence of external interference or even other transmit antennas, minimuminterference beamformers focus on the effects of interference. Depending upon the constraints and regime of operation assumed in their development, these beamformers have a variety of names (such as minimum interference, zero forcing [199], and killer weights [32]) that are often used. The form of a minimum-interference receiver varies based upon the assumptions employed. The receiver may or may not assume that the external interference plus noise is spatially uncorrelated. In addition, the number of transmit and external interference sources may or may not be larger than the number of receive antennas. In the regime in which the number of receive antennas is equal to or larger than the number of transmit and interference sources, the minimum-interference beamformer has a convenient property that the signals associated with each beamformer output stream are uncorrelated. It is, consequently, a decorrelating beamformer. For analysis, this can be useful because correlations can complicate analytic results. However, the minimum-interference beamformer can often overreact to the presence of interference by attempting to null interference that is weak, unnecessarily using degrees of freedom that consequently reduce the signal-to-interference-plus-noise ratio (SINR) at the output of the beamformers. Channel inversion or zero forcing A common beamforming approach is the channel inversion technique, which is also denoted the zero-forcing receiver. This approach is a spatial extension to the

304

Spatially adaptive receivers

spectral zero-forcing equalizer [199] and reconstructs the transmitted sequences exactly in the absence of noise if nr ≥ nt . In this approach, the beamformers WZ F ∈ Cn r ×n t are constructed using the pseudoinverse of the amplitudechannel product, revealing the transmit sequence corrupted by noise, WZ F = A (A† A)−1 ˆ = W† Z = (A† A)−1 A† Z X ZF = (A† A)−1 A† A X + (A† A)−1 A† N = X + (A† A)−1 A† N .

(9.28)

The outputs of this beamformer, under the assumption of perfect channel knowledge and at least as many receive antennas as transmit antennas (nr ≥ nt ), have no contributions from other transmitters. The beamformer adapted for the mth transmitter wm is given by wm = WZ F em = A(A† A)−1 em ,

(9.29)

where the selection vector {em }n = δm ,n is given by the Kronecker delta, that is one if m and n are equal and zero otherwise. Because the channel is not known, this beamformer wm must be approximated by an estimate of the channel. By substituting the maximum-likelihood channel estimate, under the Gaussian interference and noise model that is found in Equation (8.128), into Equation (9.29), the estimated channel inversion beamformer is found, ˆ A ˆ † A) ˆ −1 em wm ≈ A(

= Z X† (X X† )−1 [(X X† )−1 X Z† Z X† (X X† )−1 ]−1 em = Z X† [X Z† Z X† ]−1 X X† em = Z X† [X Z† Z X† ]−1 X x†m ,

(9.30)

where to estimate the beamformer it is assumed that X is a known training sequence. Orthogonal beamformer Imagine a scenario in which there is a MIMO link with no external interference. The interference from other transmitters within the MIMO link is minimized by constructing a beamformer wm for each transmit antenna that is orthogonal to the spatial subspace spanned by the spatial responses of the other transmitters, wm ∝ P⊥ A m am ,

(9.31)

n r ×n r is the operator that projects onto a column space orthogwhere P⊥ A m ∈ C onal to the spatial response of the interfering transmitters. This construction is heuristically satisfying because the beamformer begins with the matched-filter

9.2 Adaptive spatial processing

305

array response and then projects orthogonal to subspace occupied by the internal interference. If there were no interference, then the beamformer would become the matched filter. We will show that the beamformer constructed by using this model is proportional to the zero-forcing beamformer and is therefore equivalent. By using Equation (9.23), the projection operator that projects onto a basis orthogonal to the receive array spatial responses of all the other transmitters P⊥ A m is given by P⊥ A m = I − PA m

PA m = A m (A †m A m )−1 A †m .

(9.32)

This form can be found by considering the beamformer that minimizes the interference. The average interference power at the output of a beamformer from (in t) other transmitters of the MIMO link Q m is given by * ) $2 1 $ (in t) † $wm A m X m $ Q m ∝ ns 5 6 1 † = wm A m X m X †m A †m wm ns † = wm A m A †m wm ,

(9.33)

where the expectation is evaluated over the ensemble of transmitted sequences. It is assumed that the cross correlation between normalized transmitted signals is zero and the power from each transmitter is one,2 5 6 X m X †m = ns In t −1 . (9.34)

By minimizing the expected interference power under this constraint, the beamformer is found. The constraint on the norm of wm is enforced by using the method of Lagrangian multipliers discussed in Section 2.12.1,  ∂  (in t) Q m − λm wm 2 0= ∂α  ∂  † † tr{wm A m A †m wm − λm wm wm } = ∂α    ∂w† † m + h.c. , (9.35) A m A m wm − λm wm = tr ∂α

where λm is the Lagrangian multiplier, α is some arbitrary parameter of wm , and h.c. indicates the Hermitian conjugate of the first term. The beamformer lives in the subspace spanned by the eigenvectors associated with the eigenvalues with zero value, 0 wm = A m A †m wm . 2

(9.36)

Implicit in this formulation is the assumption that the MIMO system is operating in an uninformed transmitter mode.

306

Spatially adaptive receivers

The relationship can be simplified by recognizing that a projection onto the space orthogonal to the column space of amplitude-channel product X m imposes the same constraint, 0 wm = PA m wm ,

(9.37)

by multiplying both sides of Equation (9.36) by A m (A †m A m )−2 A †m . An orthogonal beamformer must satisfy this relationship. If the numbers of receive and transmit antennas are equal, nr = nt , then the beamformer is uniquely determined by the above equation. However, if nr > nt , then the beamformer has only been determined up to a subspace. While any beamformer that satisfies the constraint in Equation (9.37) is a minimuminterference beamformer, the remaining degrees of freedom can be used to increase the expected power from the signal of interest. In other words, in the space of possible beamformers that satisfy the constraint, we want to find the beamformer that maximizes signal power. Consequently, it is desirable to maximize the inner product between the beamformer and the amplitude-channel product † am , subject to the orthogonality constraint. wm The manipulation is similar to that performed for the matched-filter beam† PA m wm = 0, former in Section 9.2.1 with the additional constraint that wm  ∂  † † † tr{wm am a†m wm − λm wm wm − ηm wm PA m wm } ∂α  †  ∂wm  † + h.c. , = tr am am wm − λm wm − ηm PA m wm ∂α

0=

(9.38)

where λm and ηm are Lagrangian multipliers, α is some arbitrary parameter of wm , and h.c. indicates the Hermitian conjugate of the first term. This relationship is satisfied when am a†m wm − λm wm − ηm PA m wm = 0 .

(9.39)

The second constraint can be satisfied by requiring that the beamformer be ⊥ limited to a subspace spanned by P⊥ A m because PA m PA m = 0. Consequently, the beamformer wm satisfies wm = P⊥ A m wm .

(9.40)

By substituting this relationship into Equation (9.39), the constrained form is found, ⊥ ⊥ 0 = am a†m P⊥ A m wm − λm PA m wm − ηm PA m PA m wm ⊥ = am a†m P⊥ A m wm − λm PA m wm

⊥ ⊥ † = P⊥ A m am am PA m wm − λm PA m wm ,

(9.41)

where the observation that projection operators are idempotent (which indicates the operation can be repeated without affecting the result) is employed. This

9.2 Adaptive spatial processing

307

eigenvalue problem is solved by wm =

P⊥ A m am

P⊥ A m am

.

(9.42)

Once again, the channel response is not typically known; however, by using a reference signal, the beamformer wm can be estimated by employing Equation (9.25), wm ≈

ˆ ⊥ Z P⊥ x† P A m X m m , ⊥ ⊥ ˆ PA m Z PX m x†m

(9.43)

where an estimate for the projection matrix is given by ˆ m (A ˆ† A ˆ⊥ ≈ I − A ˆ † )−1 A ˆ† . P A m

m

m

m

(9.44)

Similar to the development of Equation (9.25), the maximum-likelihood estimate for A m is given by ˆ m = Z P⊥ X† (X m P⊥ X† )−1 , A xm xm

m

m

(9.45)

† † where P⊥ x m = I − xm xm /(xm xm ).

Equivalence of the zero-forcing and orthogonal beamformers The connection between the orthogonal beamformer found in Equation (9.31) and the zero-forcing beamformer found in Equation (9.29) can be shown by demonstrating that they are proportional, A (A† A)−1 em ∝ P⊥ A m am .

(9.46)

To show this proportionality, we note that P⊥ A m am must lie within the subspace spanned by the matrix A, because PA and PA m can be jointly diagonalized and thus commute ⊥ P⊥ A m PA am = PA PA m am

9 8 = A (A† A)−1 A† P⊥ A m am .

(9.47)

Here the projector on the column space of amplitude-channel product A has no effect on the right-hand side of the above equation because am is contained within the column space of A. Consequently, the two beamformers must be proportional to each other if em ∝ A† P⊥ A m am

† = (P⊥ A m A) am ,

(9.48)

where the Hermitian property of projection matrices is exploited. Without loss of generality, the first transmitter can be designated the transmitter of interest, so that m = 1. The channel response associated with the transmitter of interest

308

Spatially adaptive receivers

a1 can be decomposed into two orthogonal subspaces defined by P 1 = PA m and ⊥ P ⊥ 1 = PA m for m = 1, a1 = P 1 a1 + P ⊥ 1 a1 .

(9.49)

The amplitude-channel matrix A can then be expressed   A = P 1 a1 +P ⊥ a2 · · · an t . 1 a1

(9.50)

The projection onto the subspace orthogonal to that spanned by the columns of the channel matrix other than the first column is given by  ⊥ P⊥ P 1 a1 +P ⊥ a2 A m A = P 1 1 a1  ⊥  = P 1 a1 0 · · · 0

···

 0 (PA m A)† a1 = P ⊥ 1 a1 ⎛ † ⊥ a1 P 1 a1 ⎜ 0 ⎜ =⎜ .. ⎝ .

0

an t



(9.51)

⊥ because P⊥ A m which in this example is indicated by P 1 is constructed to be orthogonal to the subspace containing the vectors a2 · · · an t . Consequently, a form proportional to the selection vector is found,

0

∝ e1 .

··· ⎞ ⎟ ⎟ ⎟ ⎠

†

a1

(9.52)

Similarly, for any value of m, this relationship holds, so the two beamformers are the same up to an overall normalization. Minimum interference in external interference Up to this point in the discussion of the minimum interference beamformer, it has been assumed that the channels to the interfering transmitters are known or can be estimated. External interference is that which is caused by sources for which there is insufficient information to estimate the channel explicitly. As an example, the timing and frame structure of external interference may not be known, and more importantly, the training sequences may not be known. However, the effect on the receiver as expressed in terms of the interference-plus-noise covariance matrix R ∈ Cn r ×n r can be estimated. The interference may be sampled from any probability distribution, but it is often assumed that the interference is drawn from a Gaussian distribution. The Gaussian distribution represents the worst-case distribution in terms of the adverse effect on link capacity because it has the maximum entropy. Furthermore, many distributions can be modeled with reasonable fidelity by the Gaussian distribution. The important parameter of the distribution is the external

9.2 Adaptive spatial processing

interference-plus-noise covariance matrix R, & 1 % N N† . R= ns

309

(9.53)

Thus, the probability distribution for the received signal p(Z|X; A, R) is given by the complex Gaussian distribution p(Z|X; A, R) =

† −1 1 e−tr{(Z−A X ) R (Z−A X)} . π n s n r |R|n s

(9.54)

If the thermal noise is normalized to unity per receive antenna, then the interference-plus-noise covariance matrix R can be expressed by R = J J† + I ,

(9.55)

where the external interference is characterized by J J† . Along each column of J is the receiver array response times the amplitude associated with a particular interferer. Quoting the result found in Equation (8.128), under the Gaussian interference and noise, the maximum-likelihood estimate is given by ˆ = Z X† (X X† )−1 . A

(9.56)

By employing this estimate for the channel vector, the log-likelihood is given by ˆ R; x ) = −tr{(Z − Z X† (X X† )−1 X)† R−1 log p(Z|A, m

· (Z − Z X† (X X† )−1 X)} − log |R|n s + const.

† −1 (Z P⊥ = −tr{(Z P⊥ X) R X )}

− ns log |R| + const. ,

(9.57)

where the projection operator P⊥ X removes components within the subspace associated with the row space (that is, the temporal space) of the normalized transmit reference matrix X, † † −1 P⊥ X. X = I − X (X X )

(9.58)

By setting the derivative of the log-likelihood with respect to some parameter α of the interference-plus-noise covariance matrix R to zero, the maximum-likelihood estimator is found, ∂ ˆ R; X) = − ∂ [tr{(Z P⊥ )† R−1 (Z P⊥ )} + ns log |R|] log p(Z|A, X X ∂α   ∂α −2 ∂R ⊥ ⊥ † (Z PX ) = tr (Z PX ) R ∂α  −1 ∂R −ns tr R ∂α    ⊥ † −2 −1 ∂R = tr Z PX Z R − ns R ∂α = 0.

(9.59)

310

Spatially adaptive receivers

The solution of this equation provides an estimate for the interference-plus-noise ˆ given by covariance matrix R ˆ = 1 Z P⊥ Z† . R X ns

(9.60)

An eigenvalue decomposition can be used to find the subspace in which the interference exists, ˆ = U D U† , R

(9.61)

where U is a unitary matrix and D is a diagonal matrix containing the eigenˆ Selecting the q largest eigenvalues and collecting the corresponding values of R. columns in U, the matrix Uq ∈ Cn r ×q is constructed. There are a variety of techniques for selecting the correct value of q. A common approach is to set a threshold some fixed value above the expected noise level. The selection of eigenvalues for a related problem is discussed for angle-of-arrival estimation in Section 7.5. Once the orthonormal matrix Uq is selected, a modified version of the minimum-interference beamformer can be constructed. The projection operator P⊥ [U q A m ] orthogonal to both the external interference associated with Uq and the internal interference associated with A m is given by P⊥ [U q

A m ]

= I − [Uq A m ] ([Uq A m ]† [Uq A m ])−1 [Uq A m ]†

† ⊥ † ⊥ −1 −1 = I − P⊥ A m † − P⊥ Uq † . U q A m (A m PU q A m ) A m Uq (Uq PA m Uq ) (9.62)

The minimum-interference beamformer in external interference can be constructed by modifying Equation (9.31) such that the beamformer must be orthogonal to the other MIMO transmitters and orthogonal to the external interference. The beamformer wm is given by wm ∝ P⊥ [U q

A m ]

am ,

(9.63)

where Equation (8.128) can be used to estimate am and A m This beamformer form does not have a sensible interpretation if the number of identified interferers q plus the number of transmitters nt is greater than the number of receivers. Over-constrained minimum interference In the over-constrained regime, in which the number of external interferers plus the number of transmitters is greater than the number of receive antennas, the minimum-interference beamformer is proportional to the eigenvector associated with the minimum eigenvalue of the interference-plus-noise covariance matrix Q m ∈ Cn r ×n r , 5 6 X m X †m A †m + J J + I , (9.64) Q m = A m ns

9.2 Adaptive spatial processing

311

where the notation from Equation (9.23) is employed. Because the interference cannot be completely removed, the next best thing is to remove as much as you can. This corresponds to the minimum eigenvalue of the interference-plus-noise covariance matrix Q m . The beamformer w that achieves this goal is given by the eigenvector em in associated with the minimum eigenvalue λm in of Q m that satisfies w ∝ em in

λm in em in = Q m em in .

(9.65)

By using Equations (9.60) and (9.45), an estimate for the interference-plus-noiseˆ m can be evaluated, so that covariance matrix Q ˆ m = A ˆ m A ˆ † + 1 Z P⊥ Z† , Q X

m ns

(9.66)

where for this estimation it is assumed that X is a known training sequence. An alternative estimate that is asymptotically equal to Equation (9.66) in the limit of a large number of samples3 is given by ˆ m = 1 Z P⊥ Z† . Q xm ns

9.2.3

(9.67)

MMSE spatial processing There is a strong formal similarity between the spectral Wiener filter discussed in Section 9.1 and the adaptive MMSE spatial filter. To be clear, it is assumed commonly, as it is here, that MMSE indicates a linear MMSE implementation for adaptive spatial processing. As one might expect, the performance of the MMSE beamformer is problematic when the number of transmitters is larger than the number of receivers [320]. The error matrix E ∈ Cn t ×n s at the output of the beamformer W ∈ Cn r ×n t is given by E = W† Z − X .

(9.68)

& % The mean-squared error E 2F between the output of a set of beamformers and transmitted signals is given by % & & % E 2F = W† Z − X 2F & % = tr (W† Z − X)(W† Z − X)† . 3

This assumes that the signals from each transmit antenna are uncorrelated.

(9.69)

312

Spatially adaptive receivers

To minimize this error, the derivative with respect to some parameter α of the matrix of beamformers W is set to zero, & ∂ % E 2F = 0 ∂α & ∂ % = tr (W† Z − X)(W† Z − X)† ∂α) * ∂ (W† Z − X)(W† Z − X)† = tr ∂α * ) ∂ W + c.c. = tr Z (W† Z − X)† ∂α  & ∂ & % % W + c.c. , (9.70) Z Z† W − Z X† = tr ∂α

where c.c. indicates the complex conjugate of the first term. This relationship is satisfied if for all variations in beamformers ∂W/∂α the argument of the trace is zero. Consequently, the term within the parentheses is set to zero, & & % % Z Z† W − Z X† = 0 &−1 % & % Z X† . (9.71) W = Z Z†

This form has an intuitive interpretation. The first term is proportional to the inverse of the receive covariance (signal-plus-interference-plus-noise) matrix Q ∈ Cn r ×n r and the second term is proportional to an array response estimator. Consequently, this beamformer attempts to point in the direction of the signals of interest, but points away from interference sources. With the assumptions that % & the transmit covariance matrix is proportional to 4 † X X = ns I and that the cross covariance is proportional the identity matrix & % † to the channel Z X = ns A, the mean-squared error for the MMSE beamformer is given by )  † *  % % & & &% &% % & † −1 † −1 † † 2 E F = tr ZZ ZZ XZ Z−X Z−X XZ % 3 † −1 4&   † −1 = tr A Q Z − X Z Q A − X† 3 4 = ns tr A† Q−1 Q Q−1 A − 2A† Q−1 A + I 3 4 = ns tr I − A† Q−1 A 3 4 (9.72) = ns tr I − A† (A A† + R)−1 A . For practical problems, the expectations in Equation (9.71) cannot be known exactly. The expectations can be approximated over some finite number of samples ns . If ns ≫ nr and ns ≫ nt , then the expectations can be approximated well by & % Z Z† ≈ Z Z† % & Z X† ≈ Z X† . (9.73) 4

This assumption implies that the MIMO link is operating in an uninformed mode.

9.2 Adaptive spatial processing

313

By using these relationships the set of approximate MMSE beamformers in the columns of W are given by −1  W ≈ Z Z† Z X† , (9.74) which is also the least-squared error solution.

9.2.4

Maximum SINR A reasonable goal for a beamformer is for it to maximize the SINR performance for a given transmitter. Here it is assumed that both the internal and the external interference are mitigated spatially. To be clear, as will be discussed in Section 9.4, for MIMO systems this beamformer is not necessarily optimal in terms of capacity if the receiver does not take into account the correlations between ˆ m associated with the mth beamformer outputs. The output of a beamformer x transmitter is given by † ˆ m = wm Z, x

(9.75)

ˆ m ∈ C1×n s is the output of the beamformer trained for the mth transwhere x mitter, and wm ∈ Cn r ×1 is the receive beamformer. The received data matrix can be represented by Z = AX + N Z = am xm + A m X m + N ,

(9.76)

where the channel and transmitted signal associated with the mth transmitter are indicated by am ∈ Cn r ×1 and xm ∈ C1×n s and the A m ∈ Cn r ×(n t −1) and X m ∈ C(n t −1)×n s indicate the channel matrix without the column associated with the mth transmitter and the transmit signal matrix without the row associated with the mth transmitter, respectively. The SINR at the output of the mth beamformer is given by the ratio of power at the output of the beamformer associated with the signal to the power associated with the interference plus noise. For a particular beamformer adapted to a particular channel realization (which implies that the expectation is taken over the noise and training sequences), the SINRm is given by & % † wm am xm 2 6 SINRm = 5 † wm (A m X m + N) 2 % & † wm am xm x†m a†m wm 6 5 = † wm (A m X m X †m + NN† )wm =

=

† ns wm am a†m wm

† wm

(ns A m A †m + ns R) wm

† wm am a†m wm † wm Q m wm

,

(9.77)

314

Spatially adaptive receivers

where the covariance matrix for the received internal and external interference for the mth transmitter is indicated by Q m = A m A †m + R, assuming externalinterference-plus-noise covariance matrix R. It is assumed here that the transmit covariance matrix is proportional to the identity matrix, X m X †m /ns = In t −1 . For the beamformer wm that maximizes the SINR for the mth transmitter, the SINR is found by wm = argmaxw m

† wm am a†m wm † wm Q m wm

.

(9.78)

1/2

By employing the change of variables, η m = Q m wm , the optimization is equivalent to −1/2

wm = Q m

argmaxηm

−1/2

η †m Q m

−1/2

am a†m Q m

η †m

ηm

ηm

.

(9.79)

The value of η m that solves this form is proportional to the eigenvector of the −1/2 −1/2 matrix Q m am a†m Q m , which is rank-1 and is constructed from the outer product of the interference-plus-noise whitened (as introduced in Section 8.3.1) channel vector. The eigenvector that solves this equation is proportional to the whitened channel vector. Consequently, the beamformer wm for the mth transmitter that maximizes the SINRm is given by −1/2

wm = Q m =

Q −1 m

−1/2

η m = Q m

−1/2

Q m

am

am .

(9.80)

While the structure of the beamformer is formally satisfying because the contributions of all interfering sources are reduced by the matrix inverse, the form assumes exact knowledge of the model parameters. However by using either Equation (9.66) or Equation (9.67) along with Equation (9.25), an estimate of the beamformer can be evaluated. Maximum SINR and MMSE beamformers equivalence It is interesting that the maximum SINR and MMSE beamformers are proportional to each other and are consequently equivalent in SNR, SINR, and capacity terms under the assumption of a single transmitter or orthogonal train& % ing sequences, such that X X† ∝ I. To demonstrate this, consider the maxM SINR M M SE imum SINR wm and MMSE wm beamformers optimized for the mth transmitter, M SINR = Q −1 wm m am & &−1 % % M M SE Z X† em wm = Z Z†

= Q−1 am ,

(9.81)

where it has been assumed that training sequences from each antenna are orthogonal. By assuming the signals associated with each transmitter in the MIMO system are uncorrelated, the received signal covariance matrix Q and the

9.3 SNR loss performance comparison

315

internal-plus-external interference-plus-noise covariance matrix Q m are related by Q m = Q − am a†m .

(9.82)

The equivalence between beamformers is demonstrated if the two are shown to be proportional: M SINR = Q −1 wm m am

= (Q − am a†m )−1 am

= (Q − am a†m )−1 Q Q−1 am

= (I − Q−1 am a†m )−1 Q−1 am   Q−1 am a†m Q−1 am = I+ 1 + a†m Q−1 am   a†m Q−1 am Q−1 am = 1+ 1 + a†m Q−1 am   a†m Q−1 am M M SE wm = 1+ † −1 1 + am Q am M M SE ∝ wm .

9.3

(9.83)

SNR loss performance comparison The SNR loss is used here as the metric of performance to compare the minimum interference and MMSE beamformer approaches. The SNR loss provides a measure of the loss caused by mitigating interference. It is given by the ratio at the output of an adaptive beamformer in the presence and absence of interference. This metric does not address how well the interference is mitigated. Rather, it provides insight into the cost in reducing SNR induced by attempting to mitigate the interference. This may be of value when comparing various system concepts that do or do not require interference mitigation such as time-division multipleaccess schemes. To simplify this analysis, it is assumed that there is a single source of interest and a single interferer. In the absence of interference, the received signal for some block of data Z from the transmitter of interest is given by Z = a0 x0 + N .

(9.84)

In the above equation, the received signal matrix is indicated by Z ∈ Cn r ×n s , the amplitude-channel product vector is indicated by a0 ∈ Cn r ×1 , the transmitted signal row vector is indicated by x0 ∈ C1×n s , and the noise is indicated by N ∈ Cn r ×n s .

316

Spatially adaptive receivers

In the line-of-sight environment without scatterers, the amplitude-channel vector a0 is proportional to the steering vector v(θ0 ), a0 = a0 v(θ0 ) ,

(9.85)

where a0 is the constant of proportionality and θ0 is the direction to the transmitter of interest. It is conceptually useful sometimes to display SNR loss as a function of the angle between signals of interest and interfering signals. However, for most of this discussion, the array response is represented in the more general form of a0 . The SNR at the output of the adaptive beamformer w0 is given by the ratio of the signal power to the noise power and is defined to be ρ0 , 6 5 w0† a0 x0 2 6 ρ0 = 5 w0† N 2 % & x0 2 w0† a0 2 = w0† In r w0 2 =

w0† a0 2 . w0 2

(9.86)

For beamformer w0 , it is assumed that the noise covariance matrix is proportional to the identity matrix, & % (9.87) N N† = ns In r ,

and that the transmit sequence is normalized such that it has unit variance per sample, % & x0 2 = ns . (9.88)

As was discussed in Section 9.2.1, the optimal adaptive beamformer in the absence of interference is the matched spatial filter, w0 ∝ a0 .

(9.89)

Consequently, the SNR at the output of the matched spatial filter is equal to the ratio ρ0 , ρ0 = =

a†0 a0 2 a0 2

a†0 a0 2 a†0 a0

= a0 2 .

(9.90)

The received signal matrix Z in the presence of a single interferer is given by Z = a 0 x 0 + a 1 x1 + N ,

(9.91)

9.3 SNR loss performance comparison

317

where the interfering signal is indicated by the subscript 1. The SNR ρ0|1 (as opposed to the SINR) at the output of a beamformer in the presence of an interferer is given by the ratio of the signal power at the output of the beamformer to the noise power at the output of the beamformer, ρ0|1

& % † w a0 x0 2 =  w† N 2  % & x0 2 w† a0 2 = w† N N†  w w† a0 2 = . w 2

(9.92)

SNR loss is given by the ratio of the SNR after mitigating interference to the SNR in the absence of interference, SNR Loss = ρ0|1 = ρ0

ρ0|1 ρ 0

w † a 0 2 w 2



a0 2 w† a0 2 = . w 2 a0 2

(9.93)

As with many metrics in engineering, the SNR loss ratio is often expressed on a decibel scale. When expressed in a linear regime, its value is bounded by 0 and 1. However, when expressed on a decibel scale the sign is sometimes inverted.

9.3.1

Minimum-interference beamformer The SNR loss for the minimum-interference beamformer, assuming that there is a single-antenna transmitter and a single-antenna interference, is found by substituting the minimum-interference beamformer into Equation (9.93). By using ⊥ the notation P ⊥ 1 is the spatial projection matrix PA m for m = 1, the beamformer w under this simplified scenario is given by w=

P ⊥ 1 a0

P ⊥ 1 a0

∝ P ⊥ 1 a0

= [I − a1 [a†1 a1 ]−1 a†1 ] a0   a1 a†1 a0 . = I− a1 2

(9.94)

318

Spatially adaptive receivers

By using this form of the minimum-interference beamformer, the SNR loss is given by w† a0 2 a0 2 w 2 $8 † $ † 9 $ $2 $ I − a 1 a 12 a0 a0 $ a 1  $ $ = $8 $2 † 9 $ $ a a a0 2 $ I − a1 1 12 a0 $ $ 8 9 $2 $ † $ a a† $a0 I − a1 1 12 a0 $ $ 8 9 $ = $ $ a a† a0 2 $a†0 I − a1 1 12 a0 $   a1 a†1 1 † a0 a I− = a0 2 0 a1 2

SNR lossM I =

=1−

9.3.2

a†1 a0 2 . a0 2 a1 2

(9.95)

MMSE beamformer Similarly, the SNR loss for the MMSE beamformer, assuming that there is a single-antenna transmitter and a single-antenna interference, is found by substituting the MMSE beamformer into Equation (9.93). By quoting the result in Equation (9.71), the MMSE beamformers W are given by & &−1 % % Z X† . W = Z Z†

(9.96)

Under the assumption that there is a single transmitter of interest (otherwise you would need multiple beamformers), the beamformer w ∈ Cn r ×1 is given by &−1 5 † 6 % (9.97) Z x0 , w = Z Z†

where x0 is the normalized transmitted sequence of the signal of interest. The spatial received covariance matrix is given by & 1 % Z Z† ns 1 [a0 x0 + a1 x1 + N] = ns & · [a0 x0 + a1 x1 + N]†

Q=

= a0 a†0 + a1 a†1 + In r ,

(9.98)

where it is assumed that the noise, x0 , x1 are all independent and have unit variance per sample. From Equation (2.116), the inverse of the rank-2 matrix

9.3 SNR loss performance comparison

319

plus the identity matrix is given by

(I +

a0 a†0

+

a1 a†1 )−1

=I−



 a†0 a1 2 1+ + γ 1 + a†0 a0 1 + a†1 a1   1 † + a a1 a0 a†1 + a†1 a0 a1 a†0 , (9.99) γ 0 a0 a†0

a1 a†1



where γ = 1 + a†0 a0 + a†1 a1 + a†0 a0 a†1 a1 − a†0 a1 2 = (1 + a†0 a0 )(1 + a†1 a1 ) − a†0 a1 2 .

(9.100)

The received signal covariance matrix Q represented by an identity matrix plus the sum of two rank-1 matrices is given by Q = I + a0 a†0 + a1 a†1 .

(9.101)

The inner product φ of the vectors a0 and a1 is given by φ = a†0 a1 = a0 a1 α

φ∗ = a†1 a0 .

(9.102)

Here α represents the normalized inner product between the vectors a0 and a1 , using the definition

α=

a†0 a1 . a0 a1

(9.103)

The received signal versus reference correlation term is given by 5 6 5 6 Z x†0 = [a0 x0 + a1 x1 + N] x†0 6 5 = a0 x0 x†0 = a0 ns .

(9.104)

The MMSE beamformer w under this simplified scenario is given by w = Q−1 a0 .

(9.105)

320

Spatially adaptive receivers

By substituting the form found in Equation (9.99), the MMSE beamformer is given by  a†0 a1 2 a0 1+ + w = a0 − γ 1 + a†0 a0 1 + a†1 a1  1 † a0 a1 a0 a†1 + a†1 a0 a1 a†0 a0 + γ    a†0 a1 2 a†1 a0 a†0 a0 1+ = a0 − a0 + a1 γ 1 + a†0 a0 1 + a†1 a1 

a0 a†0

a1 a†1



a†0 a1 a†1 a0 a† a0 a†0 a0 a0 + 1 a1 γ γ     φ 2 φ 2 a0 2 + a0 1+ = 1− 2 1 + a0 γ γ    ∗ φ∗ φ 2 φ a0 2 − a1 1+ + γ 1 + a1 2 γ +

= k0 a0 + k1 a1 ,

(9.106)

where k0 and k1 are used for notational convenience and are given by γ = 1 + a0 2 + a1 2 + a0 2 a1 2 − φ 2   φ 2 φ 2 a0 2 + 1 + k0 = 1 − 2 1 + a0 γ γ 1 + a1 2 = γ   φ∗ φ 2 φ∗ a0 2 − 1 + k1 = γ 1 + a1 2 γ ∗ φ =− . γ

(9.107)

It is worth noting that k0 is real while k1 is complex. Substituting the above form for w in the form from Equation (9.93), the SNR loss for the MMSE beamformer is given by

SNR lossM M SE = =

w† a0 2 w 2 a0 2 $ † $2 $w a 0 $

2

w 2 a0

.

(9.108)

9.3 SNR loss performance comparison

$2 $ The two terms of interest are $w† a0 $ and w 2 , given by $2 $ † $2 $ $ † $w a0 $ = $ $a0 (k0 a0 + k1 a1 )$ $ $2 = $(k0 a0 2 + k1 φ)$ 2 1  = 2 (1 + a1 2 ) a0 2 − φ 2 γ 2 1  = 2 (1 + a1 2 ) a0 2 − a0 2 a1 2 α2 γ 2 a0 4  (1 + a1 2 ) − a1 2 α2 = 2 γ 2 a0 4  1 + a1 2 (1 − α2 ) , = 2 γ

321

(9.109)

where α is the normalized inner product from Equation (9.103), and w 2 = w† w

= (k0 a0 + k1 a1 )† (k0 a0 + k1 a1 )

= (k02 a0 2 + k0 k1 φ + k0 k1∗ φ∗ + k1 2 a1 2 ) 1 = 2 ([1 + a1 2 ]2 a0 2 − 2[1 + a1 2 ] φ 2 + φ 2 a1 2 ) γ a0 2 ([1 + a1 2 ]2 − 2[1 + a1 2 ] a1 2 α2 + a1 4 α2 ) = γ2 a0 2 2 [α + (1 + a1 2 )2 (1 − α2 )] . (9.110) = γ2 Consequently, the SNR loss is given by w† a0 2 w 2 a0 2  2 1 + a1 2 [1 − α2 ] . = 2 α + (1 + a1 2 )2 (1 − α2 )

SNR lossM M SE =

(9.111)

In the limit of strong interference, the interfering term a1 becomes large, and the SNR loss converges to that of the minimum-interference beamformer described in Equation (9.95),  2 1 + a1 2 [1 − α2 ] lim SNR lossM M SE = lim a 1 →∞ a 1 →∞ α2 + (1 + a1 2 )2 (1 − α2 )  2 a1 2 [1 − α2 ] = lim a 1 →∞ ( a1 2 )2 (1 − α2 )  2 1 − α2 = = 1 − α2 . (9.112) 1 − α2

As an aside, in the case of a single signal of interest and single interferer, the MMSE beamformer is the maximum SINR beamformer discussed in

322

Spatially adaptive receivers

Section 9.2.4. Consequently, the above analysis is valid for the maximum SINR beamformer for this particular problem definition.

9.4

MIMO performance bounds of suboptimal adaptive receivers Many of the advantages made possible by MIMO systems, as discussed in Chapter 8, depend strongly upon the details of the receiver implementation [28, 32]. An important motivation for using suboptimal receivers is computational complexity. Some suboptimal receivers were discussed earlier in this chapter. Computational complexity of various receivers can vary by many orders of magnitude. The performance of MIMO links is strongly tied to the details of the coding and modulation in addition to the receiver, and there is no simple method for estimating overall link performance. The demodulation performance variation between various coding and receiver combinations can also be dramatic [8, 207]. However, it is desirable to develop a set of bounds that are independent of the details of the coding. Rather, these bounds assume ideal codes, but potentially suboptimal spatial receivers. By incorporating various receiver constraints as part of the MIMO channel, estimates on the performance bounds under these constraints can be found. For uninformed transmitter MIMO links, we consider in this section three receivers: the minimum-interference, MMSE, and optimal. The optimal receiver achieves capacity. An additional variable considered in this section is the effect of estimating the interference-plus-noise covariance matrix. For computational and logistical reasons, many receivers make the simplifying assumption that the interference-plus-noise covariance matrix is proportional to the identity matrix. If a carrier-sense multiple access (CSMA) approach is employed, this simplifying assumption may have some validity, although it is common in wireless networks to operate in dynamic interference. In the presence of interference, this simplifying model assumption can have a significant adverse effect upon link performance compared to the optimal receiver. The information-theoretic capacity of MIMO systems was discussed in Section 8.3 for a flat-fading environment. As discussed in that section, the bounds on spectral efficiency can be separated into two classes that are defined by whether the system has an informed transmitter (in which the transmitter has channel matrix and a statistical characterization of the external interference) or an uninformed transmitter (in which only the receiver has channel state information). Here bounds for the uninformed transmitter are considered. The basic premise of the development of the bounds discussed in this section is that the receiver will disregard some of the information available to it. In particular, by considering a set of new channels defined by the output of a set of beamformers optimized for each transmitter in turn, the performance bounds are developed. In principle, the effect of an invertible operation, such as a beamformer, on the received signal would have no effect upon the mutual

9.4 MIMO performance bounds of suboptimal adaptive receivers

323

information between the transmitted signal and observed signal. However, here it is assumed that the receiver ignores any potential performance gain available from considering the correlations between beamformer outputs. In particular, it is assumed that a beamforming receiver can only decode a single signal at the output of each beamformer. This assumption is not valid for the optimal receiver or multiple-user type approaches that mix temporal and spatial mitigation as discussed in Section 9.6 and in References [98, 324, 323, 69]. However, this limitation is a reasonable approximation to the bounding performance for some receivers that separate the transmitted signal by using receive beamformers and ignoring the correlation between noise at the beamformer outputs.5

9.4.1

Receiver beamformer channel In order to determine the effects of adaptive beamforming techniques, one can incorporate the beamformer as part of the channel [32]. This can be done by considering a set of adaptive receive beamformers in the columns of W ∈ Cn r ×n t , each optimized for a given transmit antenna. If a single temporal sample of the observed signal at the receive array is given by z ∈ Cn r ×1 , then the corresponding single sample of the signal vector y ∈ Cn t ×1 at the output of a set of beamformers optimized for each transmitter is given by y = W† z = W† H s + W† n ,

(9.113)

for the transmitted signal s ∈ Cn t ×1 and external-interference-plus-noise n ∈ Cn r ×1 . The dimension at the output of the beamformer is given by the number of transmit antennas rather than the number of receive antennas. This interpretation of the effective channel implies H ⇒ W† H ,

(9.114)

that is, the beamformer is subsumed into the channel. It is typical for beamformers to attempt to reduce the correlations between beamformer outputs because they mitigate interference associated with other transmit antennas. However, there is typically some remaining correlation between beamformer outputs. Similarly, this interpretation implies that the noise-plus-interference covariance matrix becomes % †& % & n n ⇒ W† n n† W . (9.115)

Depending upon the beamforming approach, the signals of interest may suffer SNR losses that may be significant, and the noise at the outputs of the beamformers may become correlated. The beamformers may or may not attempt to estimate parameters of the external interference and thus may or may not mitigate it. 5

c Portions of this section are IEEE 2004. Reprinted, with permission, from Reference [32].

324

Spatially adaptive receivers

To analyze the various performance bounds, the entropies associated with different beamformer models are developed. If the receiver can take into account the correlations between the beamformer outputs, then the entropy is given by the entropy for y = W† z from Equation (8.17) rather than z, under the replacements R ⇒ W† R W   Po I H† W . H P H† ⇒ W † H nt

(9.116)

The resulting bound on spectral efficiency, which is analogous to that developed in Section 8.3 for the uninformed transmitter, is given by      Po † hbf (y|H, R) = log2 πe W† R W + W H H† W  nt   †   hbf (y|s, H, R) = log2 πe W R W .

(9.117) (9.118)

The resulting capacity is given by cbf UT

     † −1 Po † †  = log2 In r + W R W W H H W . nt

(9.119)

If W is invertible, then

    † † −1 −1 † −1 Po   W HH W + W R W cbf = log I 2  nr UT  nt     Po = log2 In r + R−1 H H†  nt = cU T ,

(9.120)

and the capacity is the same as in the absence of the beamformer. This is not surprising because the effect of the beamformers W on the channel H in Equation (9.114) can be reversed if W is invertible. In the case of a receiver based on beamformers that does not share information across beamformer outputs, such as MMSE or minimum interference discussed in Section 9.2, the form of the bound is modified. In this case, there is a separate beamformer optimized for each transmitter. The interference power that could be employed to jointly estimate signals instead contributes power to the noise-like entropy term of the capacity. We attempt to approximate the effects of ignoring the correlations between beamformer outputs by evaluating the entropies while ignoring the correlations. Because knowledge about {s}m is not used by the beamformer to remove interference for {y}k (for k = m), the entropy for the noise-like component becomes the sum of entropies assuming independent sources.

9.4 MIMO performance bounds of suboptimal adaptive receivers

325

The entropy for the mth beamformer hu c,m (y|H, R) is bounded by    Po † † πe wm R wm + wm H H† wm nt    P o † R+ H m H †m wm = log2 πe wm nt  Po † . + wm hm h†m wm nt

hu c,m (y|H, R) ≤ log2

(9.121)

The resulting noise-like entropy hu c,m (y|s, H, R) for the mth beamformer is given by

hu c,m (y|s, H, R) = log2

   Po † πe wm R+ H m H †m wm . nt

(9.122)

Here it is observed that the mean noise-like output (which includes residual interference signals) of each beamformer is given by † wm

  Po R+ H m H †m wm . nt

(9.123)

The resulting approximate spectral-efficiency bound cu c (which is not the channel capacity in general) for beamformers under the receiver assumption of uncorrelated residuals is defined by

cu c = =

nt 

m nt  m

[hu c,m (y|H, R) − hu c,m (y|s, H, R)] 

−1   Po † R+ log2 1 + wm H m H †m wm nt  Po † wm hm 2 , × nt

(9.124)

where the beamformer represented by wm depends upon the choice of receiver. As one would intuitively expect, the uninformed transmitter MIMO capacity is an upper bound on the beamformer channel capacity6 cU T ≥ cu c . For simplicity, it is assumed R = I, and the result can be generalized for a nonzero external covariance matrix (see Problem 9.6). In addition,to simplify this evaluation notationally, the amplitude-channel product A = Po /nt H is employed. The 6

This argument is due to suggestions made by Keith Forsythe.

326

Spatially adaptive receivers

inequality is demonstrated by

cU T

    Po †  = log2 I + HH  nt = log2 |I + AA† | ≥ cu c

cu c = =

nt 

m nt  m



log2 1 +



† wm

  −1 Po Po † † 2 I+ H m H m wm wm hm nt nt

9 8 4 −1  † 3 † I + A A† − am a†m wm log2 1 + wm wm am 2 ,

(9.125)

where am is the amplitude-channel product associated with the mth transmitter. For the mth beamformer, the spectral efficiency bound denoted here as [cu c ]m can SINR ]m under the assumption be no larger than the spectral efficiency bound [cM uc of the maximum SINR beamformer (which is equivalent to MMSE in this case as discussed in Section 9.2.4) associated with the mth transmitter,

9 8 4 −1  † 3 † I + A A† − am a†m wm wm am 2 [cu c ]m = log2 1 + wm SINR ]m ≤ [cM uc 0 8 −1/2  am = log2 λm ax I + I + A A† − am a†m 19  −1/2 · a†m I + A A† − am a†m 9 8 −1  am , = log2 1 + a†m I + A A† − am a†m

(9.126)

"n t where λm ax {·} indicates the largest eigenvalue, such that m = 1 [cu c ]m = cu c "n t SINR SINR ]m = cM . Consequently, any spectral efficiency bound for and m = 1 [cM uc uc beamformers under the receiver assumption of uncorrelated channels is bounded by

SINR = cM uc

 m

8 9  −1 log2 1 + a†m I + A A† − am a†m am .

(9.127)

9.4 MIMO performance bounds of suboptimal adaptive receivers

327

The uninformed transmitter capacity can be rewritten in its successive interference cancellation form cU T = log2 |I + A A† |   "3 "4 "2 |I + a1 a†1 | |I + m = 1 am a†m | |I + m = 1 am a†m | |I + m = 1 am a†m | · · · = log2 "2 "3 |I| |I + a1 a†1 | |I + m = 1 am a†m | |I + m = 1 am a†m | · · · "m −1  |I + j = 1 aj a†j + am a†m | = log2 "m −1 |I + j = 1 aj a†j | m   ⎞−1 ⎛   m −1       † † = log2 I + ⎝I + aj aj ⎠ am am    m j=1   ⎡ ⎞−1 ⎤ ⎛ m −1   ⎥ ⎢ aj a†j ⎠ am ⎦ log2 ⎣1 + a†m ⎝I + = m

=



j=1

[cU T ]m ,

(9.128)

m

where [cU T ]m indicates the mth term in the sum. Finally, it can be seen that SINR (found in Equation (9.127)) is still less each term for the largest bound cM uc than the successive interference cancellation term of the uninformed transmitter capacity, ⎡



⎢ [cU T ]m = log2 ⎣1 + a†m ⎝I +

m −1  j=1

⎞−1



⎥ aj a†j ⎠ am ⎦

9 8 −1  am ≥ log2 1 + a†m I + A A† − am a†m

SINR ]m . = [cM uc

(9.129)

This bound is found by observing ⎛

I + A A† − am a†m = ⎝I +

m −1  j=1





aj a†j ⎠ + ⎝

nr 

j=m +1



aj a†j ⎠

(9.130)

and for any complex vector x and positive definite Hermitian matrices B and positive semidefinite Hermitian matrix C x† (B + C)−1 x ≤ x† B−1 x , where the equality is achieved if C = 0.

(9.131)

328

Spatially adaptive receivers

9.5

Iterative receivers Iterative receivers are useful when sample matrix inversion (SMI) is not computationally feasible. Implicitly, the typical sample matrix inversion approach assumes that the environment is blockwise stationary. In some sense, the underlying assumption of some continuously adapting iterative receivers, such as recursive least squares (RLS) or least mean squares (LMS), can be a better match to continuously changing environments. In practice, the choice between using a sample matrix inversion or an iterative approach is usually driven by logistical and computational considerations. More thorough investigations of RLS and LMS algorithms for adaptive spectral filtering can be found in Reference [142].

9.5.1

Recursive least squares (RLS) The basic concept of the recursive least squares beamformer is to recursively estimate the two components of the estimated MMSE beamformer which is approximated by the least squared beamformer under finite sample support. The estimates for the inverse of the covariance matrix and the cross-correlation matrix are modified at each update [142], for example. By quoting the result and by using the notation in Section 9.2.3, the MMSE beamformers W ∈ Cn t ×n r are given by &−1 % & % Z X† . W = Z Z†

(9.132)

Here it is assumed that the reference signal X ∈ Cn t ×n s is known. Decision feedback extensions to this approach, not discussed here, employ estimates of the transmitted signal as a reference. The spatial covariance matrix Q ∈ Cn r ×n r and the data-reference cross-covariance matrix V ∈ Cn r ×n t are given by & % Z Z† Q= ns and & % Z X† , V= ns

(9.133)

respectively, where ns is the number of samples in the block of data. For the mth update, Qm and Vm indicate estimates of the receive covariance matrix Q and the cross-covariance matrix V, respectively. A column of Z is denoted zm ∈ Cn r ×1 and is the mth observation. A column of X is denoted xm ∈ Cn t ×1 is the mth vector of known symbols transmitted. The (m + 1)th updated estimate of the data-reference cross-covariance matrix

9.5 Iterative receivers

329

Vm + 1 is given by m Vm + zm + 1 x†m + 1 . m+1

Vm + 1 =

(9.134)

Just so that the notation does not become too cumbersome, we have dropped the notation ˆ· for estimated values in this discussion. If the observed data vector zm is drawn from a stationary distribution, then the estimated data-reference cross-covariance matrix converges to the exact solution V, lim Vm = V .

m →∞

(9.135)

Similarly, the (m+1)th updated estimate of the receive spatial covariance matrix Qm + 1 of the received signal is given by m Qm + zm + 1 z†m + 1 , m+1

Qm + 1 =

(9.136)

and under the same assumption for the data vector z, the estimated receive spatial covariance matrix converges to the exact solution Q, lim Qm = Q .

m →∞

(9.137)

In practice, environments are not completely stationary. Consequently, it is of some value to include a memory limitation. This can be done by including a weighting parameter β rather than a function of m. The value of the weighting parameter β is typically fixed. For large m, the older contributions are given a smaller weighting compared to recent data. Over successive updates, the weight of older contributions falls exponentially. While this exponential weighting does allow the beamformer to adapt to nonstationary environments, the beamformer will not converge to the exact solution. Under this weighting, the estimation updates for the (m+1)th updated estimate of the data-reference cross-covariance matrix Vm + 1 and the (m+1)th updated estimate of the receive spatial covariance matrix Qm + 1 are given by Vm + 1 =

β Vm + zm + 1 x†m + 1 β+1

(9.138)

Qm + 1 =

β Qm + zm + 1 z†m + 1 . β+1

(9.139)

and

The value of the weighing parameter 0 ≥ β ≥ 1 is typically set near 1, although the exact value needs to match to the dynamics of the environment.

330

Spatially adaptive receivers

Updates for the estimate of the inverse of the covariance matrix can be found directly. From Equation (2.113), the Woodbury formula is given by

(M + A B)−1 = M−1 − M−1 A (I + B M−1 A)−1 B M−1 .

(9.140)

By using this relationship, the updated estimate of the inverse of the receive covariance matrix Q−1 m + 1 can be found,

† −1 Q−1 m + 1 = (β + 1) (β Qm + zm + 1 zm + 1 )   † −1 −1 (β Q ) z z (β Q ) m m + 1 m m + 1 . = (β + 1) (β Qm )−1 − 1 + z†m + 1 (β Qm )−1 zm + 1

(9.141)

By combining the results for the cross-correlation matrix update and the inverse of the receive covariance matrix update, the (m + 1)th updated estimate of the beamformers Wm + 1 is given by

Wm + 1 = Q−1 m + 1 Vm + 1  =

=

−1

(β Qm )





(β Qm )−1 zm + 1 z†m + 1 (β Qm )−1 1 + z†m + 1 (β Qm )−1 zm + 1

· (β Vm + zm + 1 x†m + 1 )

Q−1 m −

† −1 Q−1 m zm + 1 zm + 1 Qm

β + z†m + 1 Q−1 m zm + 1





zm + 1 x†m + 1 Vm + β



zm + 1 x†m + 1 = Wm + Q−1 m β    −1 Qm zm + 1 z†m + 1 Q−1 zm + 1 x†m + 1 m . − Vm + β β + z†m + 1 Q−1 m zm + 1

(9.142)

It is often convenient to express the (m + 1)th updated of the beamformer in terms of the error qm + 1 between the output of the beamformer and the reference,

† z m + 1 − xm + 1 . q m + 1 = Wm

(9.143)

9.5 Iterative receivers

331

By substituting xm + 1 in terms of the error into the relationship for the beamformer update, the simpler form is given, † zm + 1 (Wm zm + 1 − qm + 1 )† Wm + 1 = Wm + Q−1 m β    † −1 −1 † Qm zm + 1 zm + 1 Qm zm + 1 (Wm zm + 1 − qm + 1 )† − Vm + β β + z†m + 1 Q−1 m zm + 1   −1 Qm zm + 1 q†m + 1 zm + 1 q†m + 1 zm + 1 z†m + 1 Q−1 m = Wm − Q−1 + m β β β + z†m + 1 Q−1 m zm + 1

zm + 1 z†m + 1 Wm + Q−1 m β    † † Q−1 Q−1 m zm + 1 zm + 1 m zm + 1 zm + 1 Wm I+ − β β + z†m + 1 Q−1 m zm + 1   † † Q−1 Q−1 m zm + 1 zm + 1 m z m + 1 qm + 1 = Wm − I − β β + z†m + 1 Q−1 m zm + 1 + Q−1 m ⎛

−⎝

1+ 

= Wm −

9.5.2

zm + 1 z†m + 1 Wm β ⎞ 1 ⎠ z †m + 1 Q −1 m zm + 1 β

β+

1

z†m + 1

Q−1 m

zm + 1

z† Q−1 zm + 1 1 + m +1 m β 



Q−1 m

zm + 1 z†m + 1 Wm β

† Q−1 m z m + 1 qm + 1 .

(9.144)

Least mean squares (LMS) The least mean squares algorithm has a simple interpretation. It attempts to modify the beamformer along a direction that minimizes the error based upon the current observation [345, 344]. As with the recursive least squares, here it is assumed that the reference signal x is known. Decision feedback extensions to this approach, not discussed here, employ estimates of the transmitted signal as a reference. In this discussion, a beamformer for each transmit antenna is constructed independently, the beamformer associated with each transmit antenna at the mth update, wm , can be considered separately. The error for the current output ǫm with the current beamformer is † ǫm = wm zm − xm ,

where xm is the transmitted signal for the mth update. The expected squared-error power at the mth update is given by & & % † % ǫm 2 = wm zm − xm 2 .

(9.145)

(9.146)

332

Spatially adaptive receivers

The goal of the LMS algorithm is to minimize this error. The direction of steepest descent, discussed in Section 2.12, is given by evaluating the additive inverse of the derivative of the error with respect to each of the elements of the beamformer. The nth element of the beamformer is denoted {wm }n . The complete gradient denoted by 2∇w m∗ from Section 2.8.4 denotes a vector of Wirtinger calculus derivatives (as discussed in Section 2.8.4) with respect to each of the beamformer elements. The gradient of the error is given by % & 2∇w m∗ ǫm 2 = 2∇w m∗ ǫm ǫ∗m  % † & zm − xm )(z†m wm − s∗m ) = 2∇w m∗ (wm = 2 ǫ∗m zm 

(9.147)

so that the difference between the updated and the current receive beamformer is given by % & wm + 1 − wm ∝ −2∇w m∗ ǫm 2 = −2 ǫ∗m zm  .

(9.148)

The main contribution of the LMS algorithm is to suggest the relatively questionable approximation that the expected value above can be replaced with the squared error associated with the form % & 2∇w m∗ ǫm 2 ≈ 2∇w m∗ ǫm 2 = 2 ǫ∗m zm .

(9.149)

By using the above approximation, the LMS update to the beamformer is given by wm + 1 − wm ∝ −2 ǫ∗m zm

= −2 (z†m wm − x∗m ) zm .

(9.150)

To reduce the sensitivity to noise, a gradient attenuation constant is introduced. If the constant is given by μ, then the updated beamformer wm + 1 is given by wm + 1 = wm − μ 2 ǫ∗m zm .

(9.151)

Smaller values of the constant μ will improve the stability of the beamformer update by reducing its sensitivity to noise, while larger values of the constant μ will enable the beamformer to adapt more quickly. If the value of μ is smaller than the multiplicative inverse of the largest eigenvalue of the receive covariance matrix, then the beamformer will converge to the MMSE beamformer for a widesense stationary environment. It is sometimes useful to consider the normalized least-mean-squared (NLMS) update. For this version of the update, the constant of proportionality μ is replaced with μ ˜/(z†m zm ). This form reduces the sensitivity to the scale of z when selecting μ ˜.

9.6 Multiple-antenna multiuser detector

9.6

333

Multiple-antenna multiuser detector The underlying assumption in this chapter is that multiple users are transmitting simultaneously in the same band at the same time. Often the transmitters use spreading sequences to reduce multiple-access interference. This interference could be from multiple antennas on a single transmit node, or from multiple nodes in a network. The significant difference between receivers discussed in this chapter and those discussed previously is that here the separation in temporal structure between users is exploited in addition to the differences in spatial responses. Often it is assumed that these systems are employing a direct-sequence spread-spectrum technique. There is some inconsistency in the use of “linear” versus “nonlinear” in discussions regarding multiple-user receivers. Often these receivers are implemented as iterative receivers in which the receiver operates on the same block of data multiple times. An iterative receiver is not linear in some sense. However, if the receiver employs a linear operator applied to some space, then it is generally denoted a linear receiver. In this case, separation between various receive states is separated by a hyperplane in some high-dimensional space. Conversely, nonlinear receivers separate receive states in both angle and amplitude [184]. To further complicate this discussion, receivers that exploit spatial and temporal structures simultaneously are linear in each domain. There is a significant body of literature dedicated to multiuser detectors (MUD). A large portion of this literature is dedicated to systems with a single receive antenna and multiple cochannel users with single transmit antennas. Significant contributions to this area were made in References [324, 323]. Multiple-antenna multiuser detectors (also denoted multiple-channel multiuser detectors or MCMUD) have been discussed by a number of authors [98, 38, 335]. While these concepts were developed for cellular networks, they can be applied to MIMO receivers. In particular, they are well matched to bit-interleaved, coded modulation approaches [28].

9.6.1

Maximum-likelihood demodulation The maximum-likelihood formulation introduced in Equation (9.4) is extended here to take advantage of multiple receive antennas. The multiple-channel multiuser detector discussed in References [98, 38] is presented here.7 Under a Gaussian noise model assumption, the maximum-likelihood statistic is given by max p(Z|X; R, A) = R,A

7



πe ns

−n s n r

† −n s , |Z P⊥ X Z |

(9.152)

c Portions of this section are IEEE 1999. Reprinted, with permission, from Reference [98].

334

Spatially adaptive receivers

where the matrix P⊥ X = In s − PX

PX = X† (X X† )−1 X ,

(9.153)

given that P⊥ X projects onto the orthogonal complement of the row space of † X. The determinant of ZP⊥ X Z is minimized to demodulate the signals for all transmitters jointly. This result is developed in the following derivation. Under the assumption of Gaussian external interference and noise, the probability density of the received signal is given by p(Z|X; R, A) =

† −1 1 e−tr{(Z−A X ) R (Z−A X )} . |R|−n s π n s n r

(9.154)

The maximum likelihood is found by jointly maximizing the probability density for X and the nuisance parameters of the channel matrix A and the externalinterference-plus-noise covariance matrix R. These estimates were found in Equations (8.128) and (9.60), respectively. Those results are quoted here. As presented in Section 8.10, by maximizing the log of the probability distribution with respect to some arbitrary parameter of the channel matrix, the estimate of the ˆ is found, channel A ˆ = Z X† (X X† )−1 . A

(9.155)

ˆ the probability By substituting this form for the estimate of the channel A, density is given by ˆ = p(Z|X; R, A)

1

⊥ †

|R|−n s π n s n r

e−tr{(Z P X )

R −1 (Z P ⊥ X )}

.

(9.156)

Similar to the result found in Equation (9.60), by maximizing the probability density with the above substitution for the nuisance parameter A, for an arbitrary parameter of the interference-plus-noise covariance matrix R, the estimate ˆ is given by R ˆ = 1 Z P⊥ Z† . R X ns

(9.157)

By substituting this result into the probability density, only the received data matrix and the possible transmitted signals are left, ˆ A) ˆ ∝ | Z P⊥ Z† |n s . log p(Z|X; R, X

(9.158)

Although it is theoretically possible to use the form † Z P⊥ X Z

(9.159)

directly for demodulation, this is computationally very expensive. A more practical procedure is to pursue an iterative receiver. One approach is to choose a basis and optimize along each axis of the basis in turn in an alternating projections optimization approach [76]. By using the result from the previous optimization

9.6 Multiple-antenna multiuser detector

335

step and then optimizing along the next axis, the optimization climbs towards a peak that is hopefully the global optimum. This iterative receiver can achieve the maximum-likelihood performance; however, because the optimization criterion is not guaranteed to be convex, convergence to the global maximum is not guaranteed. For many applications, it can be shown empirically that the probability of convergence to the global maximum is sufficient to warrant the significant reduction in computation complexity. A natural choice for bases is the signal transmitted by each individual transmitter. Consequently, the receiver cycles through the various rows of the transmitted signal matrix X. The transmitted signal matrix X ∈ Cn t ×n s can be decomposed into the mth row that is denoted here as x ∈ C1×n s and matrix with mth row remove X m ∈ ˜ of the matrix X, given by C(n t −1)×n s . We can construct a reorder version X   x ˜ = . (9.160) X X m Because row space projection operators are invariant to reordering of rows (actually to any unitary transformation across the rows), the projection matrix for ˜ † (X ˜X ˜ † )−1 X. ˜ so that P⊥ = P⊥ , where P⊥ = I − X ˜ The the matrix X and X, ˜ ˜ X X X ⊥ matrix PX m that projects onto a subspace orthogonal to the row space of X m can be factored into the form † † −1 P⊥ X m X m = I − X m (X m X m )

= U† U ,

(9.161)

(n t −1)×n s

form an orthonormal basis for the complement where the rows of U ∈ C of the row space of S m . By using the definitions ZU = Z U† xU = x U† ,

(9.162)

the data and signal are projected onto a basis orthogonal to the estimates of signals radiated from the other transmitters. It is useful to note that the two quadratic forms are the same in the original or the projected bases, † ⊥ ⊥ † ⊥ Z P⊥ X Z = Z (PX m + PX m ) PX (PX m + PX m ) Z ⊥ ⊥ † = Z (P⊥ X m ) PX (PX m ) Z

˜ † ˜ ˜ † −1 X] ˜ P⊥ Z† = Z P⊥ X m [I − X (X X ) X m ⎤ ⎡ †   † −1    x x x x † ⎦ P⊥ ⎣I − = Z P⊥ X m Z X m X m X m X m X m ⎤ ⎡ −1  †    † † x X x x x x †

m ⎦ P⊥ ⎣I − = Z P⊥ X m Z X m 0 0 X m x† X m X †m  †     x α · x † ⊥ P⊥ (9.163) = Z PX m I − X m Z , 0 0 · ·

336

Spatially adaptive receivers

where · indicates terms in which we are not interested, and α = [xx† − xX †m (X m X †m )−1 X m x† ]−1 † −1 = (xP⊥ X m x )

= (xU x†U )−1

(9.164)

so that from Equation (9.163) 8 9 † −1 † ⊥ † † I − x P⊥ x Z = Z P (x ) x Z P⊥ X X m X m Z U U

† † −1 † xU Z†U = Z P⊥ X m Z − ZU xU (xU xU )

= ZU Z†U − ZU Px U Z†U

† = ZU P⊥ x U ZU .

(9.165)

Consequently, the determinant that is found in Equation (9.152) can be factored into a term with and without reference to x,       †  †  † ⊥ ZU P⊥ x U ZU  = ZU ZU − ZU Px U ZU        = ZU Z†U  In r − ZU Px U Z†U (ZU Z†U )−1        = ZU Z†U  In r − Px U Z†U (ZU Z†U )−1 ZU    (9.166) = |ZU Z†U | In s − Px U PZ U  .

Because the first term is free of x, demodulation is performed by minimizing the second term. Furthermore, because x is a row vector, the second term can be simplified and interpreted in terms of a beamformer   In s − Px PZ  = 1 − (xU x† )−1 xU PZ x† U U U U U =1−

w† ZU x†U , ns

(9.167)

where ˆ −1 a w=R U ˆ, 1 1 † ˆU ≡ ZU Z†U = Z P⊥ R X m Z , ns ns ˆ = ZU x†U (xU x†U )−1 a † † −1 ⊥ . = Z P⊥ X m x (x PX m x )

(9.168)

ˆ U is the The nr × 1 vector, w, contains the receive beamforming weights, R ˆ is the interference-mitigated signal-plus-noise covariance matrix estimate, and a channel estimate associated with x with X m mitigated temporally. Demodulation is performed by maximizing the inner product of the beamformer output, w† ZU , and the interference-mitigated reference signal, xU .

9.7 Covariance matrix conditioning

9.7

337

Covariance matrix conditioning For a variety of reasons, such as insufficient training samples or correlated noise, matrices can be ill-conditioned or poorly conditioned. That is, the matrix will have eigenvalues that are exactly or nearly zero. Covariance matrix estimates that are formed from the outer product of data matrices will in general not be rank deficient if the number of samples is greater than the number of spatial dimensions. As an example, if it is assumed that in a flat-fading environment a single spatially correlated signal is present in addition to uncorrelated Gaussian noise, then the covariance estimate is full rank. With nr receive antennas and ns samˆ ∈ Cn r ×n r formed ples, the estimate of the spatial receive covariance matrix Q n r ×n s is given by from the noisy data matrix Z ∈ C ˆ = 1 Z Z† , Q ns

(9.169)

Z = vs + N .

(9.170)

where

The array response vector and transmitted sequence are denoted v ∈ Cn r ×1 and s ∈ C1×n s , respectively. The variable N ∈ Cn r ×n s represents the Gaussian noise. Here the noise is normalized such that the expectation of the noise spatial covariance matrix is the identity matrix, * ) 1 N N† = In r . (9.171) ns The covariance matrix Q is given by * ) 1 † ZZ Q= ns

= P vv† + In r ,

(9.172) % †& where the total noise-normalized received power is given by P = ss /ns . It is assumed that v 2 = 1. The eigenvalues of the covariance matrix Q are given by λ1 {Q} = P + 1

λ2 {Q} = · · · = λm {Q} = 1 .

(9.173)

It is important to note that the eigenvalues of the covariance matrix and estimated covariance matrix are not equal in general, ˆ .  λm {Q} λm {Q} =

(9.174)

In particular, the smallest eigenvalues of the estimated covariance matrix can be dramatically different. If the number of samples is greater than or equal to the number of receivers ns ≥ nr , then the likelihood of an estimated covariance

Spatially adaptive receivers

15 Estimated λ Covariance λ

10 Eigenvalue (dB)

338

5 0 −5 −10

2

4 6 Eigenvalue Number

8

Figure 9.2 Comparison of estimated versus actual eigenvalue distributions. An

ensemble of 10 estimated eigenvalue distributions is displayed. The number of independent samples is ns = 16, and the number of receive antennas is nr = 8. The total noise-normalized received signal power is 10.

matrix with a zero eigenvalue has zero probability. In particular, while the “small” eigenvalues of the real covariance matrix are all 1, the noise eigenvalues of the estimated covariance matrix (ignoring any mixing with the received signal) are given by the eigenvalues of a Wishart distribution discussed in Section 3.5. An example, assuming ns = 16 samples and nr = 8 receive antennas, of the difference between the eigenvalues is displayed in Figure 9.2. The total noise-normalized received signal power is 10. Depending upon the algorithm in which the eigenvalues will be used, the difference between the small noise eigenvalues of the estimated versus the real covariance matrix may or may not be important. If the covariance matrix is inverted, then the small eigenvalues of the estimate can have a significant effect. This effect can motivate the use of regularization to limit the small eigenvalues. The range of eigenvalues can be much more dramatic in the case of space-time covariance matrices that are temporally oversampled. One approach to regularizing matrices is to perform an eigenvalue decomposition of the matrix of interest Q,

Q = U D U† ,

(9.175)

where U is a unitary matrix containing the eigenvectors of the matrix of interest, and diagonal matrix D contains the associated eigenvalues. The matrix is regularized by imposing a lower limit on the eigenvalues of the matrix of

Problems

339

interest, ˜ m ,m = {D}



{D}m ,m a

; {D}m ,m > a ; otherwise

a = ǫ λm ax {Q} ,

(9.176)

where a is a constant of regularization. It is often set by some small scalar ǫ times the maximum eigenvalue of the matrix of interest. The regularized matrix ˜ is then given by Q ˜ = UD ˜ U† . Q

(9.177)

While effective, the above approach is relatively computationally expensive. A relatively inexpensive alternative is to exploit the observation that adding a term proportional to the identity matrix does not change the eigenvector structure. Furthermore, by using the fact that the trace of a matrix is the sum of its eigenvalues combined with the observation that the sum of eigenvalues can be used as a mediocre approximation to the peak eigenvalue, a reduced-computation ˜ can be constructed, regularized matrix Q ˜ = Q + tr{Q} ǫ I . Q

(9.178)

In general, the value of diagonal loading is determined by examining the perfor˜ is being used. mance algorithm in which the matrix Q

Problems 9.1 At high SNR, compare the symbol error performance ML and MAP decoding for an unencoded QPSK constellation under the assumption that points on the constellations of {±1, ±1} have (a) equal symbol probability: p{±1,±1} = 1/4, (b) symbol probabilities defined by p{1,±1} = 2/6 p{−1,±1} = 1/6 . 9.2 At SNR of 20 dB per receive antenna (can assume high SNR), compare the symbol error performance for a MMSE and MI beamformer for an unencoded QPSK constellation with equal probabilities for each symbol. Assume a fourantenna receiver in a line-of-sight environment in the far field with a signal of interest, and a single interferer of arbitrary power, all with known channels. Assume that the normalized inner product between the array responses of the √ signal of interest and the inner product is 1/ 2. 9.3 Develop the estimators expressed in Equations (9.25) and (9.45).

340

Spatially adaptive receivers

9.4 By employing the Wirtinger calculus, show that Equation (9.74) is the least squared error solution for the estimator of X. 9.5 Evaluate the least-squares error beamformer which minimizes the Frobenius norm squared of the error matrix E defined by E = W† Z − X ,

(9.179)

and show that it provides the same solution as the approximate MMSE beamformer found in Equation (9.74). 9.6 Extend the result in Equation (9.125) to include external Gaussian interference. Show that performance is still bounded by the uninformed transmitter capacity. 9.7 Show that the LMS beamformer solution converges to the MMSE solution in the limit of a large number of samples. 9.8 For a four-antenna receiver observing a known signal with 0 dB SNR per receive antenna in a block-fading i.i.d. Gaussian channel that is static for at least 50 samples over which the beamformers are estimated, numerically evaluate the average (over many channel draws) estimated signal error as a function of samples 1 to 50 for (a) RLS (b) LMS (c) estimated MMSE using blocks of 10 samples, where the RLS and LMS have no knowledge of the channel at the first sample. 9.9 For a four-antenna receiver observing a known signal with 0 dB SNR per receive antenna with a 10 dB INR per receive antenna interferer in a blockfading i.i.d. Gaussian channel that is static for at least 50 samples over which the beamformers are estimated, numerically evaluate the average (over many channel draws) estimated signal error as a function of samples 1 to 50 for (a) RLS (b) LMS (c) estimated MMSE using blocks of 10 samples, where the RLS and LMS have no knowledge of the channel at the first sample. 9.10 For a 10-antenna receiver observing a known signal with 0 dB SNR per receive antenna in a block-fading i.i.d. Gaussian channel that is static for the period of observation over which the beamformers are estimated, numerically evaluate the average (over many channel draws) estimated signal error using the estimated MMSE beamformer of the forming  −1 Z Z† Z X† w= + ǫI , (9.180) ns ns using blocks of five samples as a function of diagonal loading for the form described in Equation (9.178).

10 Dispersive and doubly dispersive channels

Frequency-selective channels are caused by delay spread in the channel. When delay spread is introduced into the channel model, intersymbol interference is observed at the receiver. Intersymbol interference denotes the effect of the channel introducing contamination to the current sample from previous samples. If the communication system does not compensate for this effect, the performance of the link can be degraded significantly. The adverse effects of delay spread can be even more dramatic if a strong interferer that is observed by a multiple-antenna receiver has a channel that is frequency selective. For example, consider a singleantenna interferer. Without delay spread, a capable multiple-antenna receiver can mitigate the effects of the interference. In channels with significant delay spread, the rank of the interference spatial receive covariance matrix can grow from rank-1 to full rank, because each receive symbol can contain contributions from multiple transmit symbols at various relative delays propagation through channels that cause independent spatial responses. Without changing the processing approach, this full-rank interference covariance matrix can overwhelm the communications link. The frequency-selective channel can be represented in the frequency domain by employing a channel representation with coefficients at various frequencies, or in the time domain by employing a channel representation with coefficients at various delays (delay taps). To complicate the channel problem, if the channel is not static because of the motion of the transmitter, receiver, or scatterers, then compensating for delay spread can be more difficult. This dynamic channel can be represented by explicitly employing time-varying channels or by employing a channel representation with coefficients at various Doppler-frequency offsets (Doppler taps). The use of Doppler-frequency offsets covers a number of potential issues that are not technically caused by the Doppler effect. These include local oscillator (or frequency synthesizers) frequency offsets and low-frequency local oscillator phase noise. We will not be precise in differentiating these effects because they look similar from a channel and processing perspective. A channel with significant delay spread and Doppler-frequency spread is denoted doubly dispersive.

342

Dispersive and doubly dispersive channels

10.1

Discretely sampled channel issues To represent a doubly dispersive channel, one can employ a channel that is a continuous channel as a function of time and frequency. This channel can be approximated well (although not exactly in general) by a finite number of taps in delay and Doppler frequency. For a channel with a limited delay range for a bandwidth-limited signal, the number of channel delay taps is denoted nd . For a channel with limited Doppler frequency range the number of frequency taps are denoted nf . Because a bandwidth-limited signal is not temporally limited and because a temporally limited signal is not bandwidth limited, these constraints are intrinsically incompatible. However, this is a typical problem in communications, and as a practical approximation for many problems, this formulation can work well. In general, all channel models are approximations. Similarly, a number of taps can be employed for delay and Doppler-frequency processing. The numbers of taps are denoted nδ and nν , respectively. To be clear, the numbers of significant channel and processing taps are not typically equal. For the sake of introduction, consider a static SISO channel. For a transmitted complex baseband signal s(t) ∈ C, a received complex baseband signal z(t) ∈ C, ˜ and infinite-bandwidth channel impulse response h(t) ∈ C in additive Gaussian noise n(t) ∈ C, the receive signal z(t) as a function of time is given by the convolution of the transmitted signal and the channel plus noise. If the channel contains some set of discrete scatterers with amplitude am ∈ C at relative delay τm ∈ R, then the channel can be described by 

˜ ) s(t − τ ) + n(t) dτ h(τ  ˜ )= am δ(τ − τm ) . h(τ z(t) =

(10.1)

m

˜ ) is constructed from delta functions associated If the channel representation h(τ with point scatterers, it can support signals with infinite bandwidth. In general, it is not possible to represent this channel with a discrete regularly sampled channel model, which requires a finite channel spectral support. The solution is to represent the channel with the same spectral support as the complex signal s(t), which we assume has bandwidth B (including both positive and negative frequencies). By assuming that the noise and signal have the same spectral support (bandwidth B), the received signal can be represented in the spectral domain with the following temporal versus spectral correspondences z(t) ↔ Z(f ) s(t) ↔ S(f )

n(t) ↔ N (f ) ˜ ↔ H(f ˜ ). h(t)

(10.2)

10.1 Discretely sampled channel issues

343

Consequently, the frequency domain version of Equation (10.1) is given by ˜ ) S(f ) + N (f ) . Z(f ) = H(f

(10.3)

If the spectral support of the signal S(f ) is limited to bandwidth B, then the signal is not changed by applying a perfect filter of bandwidth B (including both positive and negative frequencies), so that   f S(f ) , (10.4) S(f ) = θ B where the function θ(x) is 1 for −1/2 ≤ x < 1/2 and zero otherwise. Consequently, the frequency-domain version of the channel model in Equation (10.3) can be written as   ˜ ) θ f S(f ) + N (f ) , Z(f ) = H(f (10.5) B which corresponds to z(t) = h(τ ) = = = =









dτ h(τ ) s(t − τ ) + n(t)   f i 2π t f ˜ df e H(f ) θ B ˜ dν h(ν) B sinc([τ − ν]B) dν

 m

 m

am δ(ν − τm ) B sinc([τ − ν]B)

am B sinc([τ − τm ]B) ,

(10.6)

where ν is a dummy variable corresponding to relative delay. Given this bandwidth-limited channel impulse response, a discrete version of the channel model can be constructed,  hm s(t − m Ts ) + n(t) , (10.7) z(t) = m

where hm = Ts h(m Ts ) indicates a discrete bandwidth-limited representation of the continuous complex channel attenuation associated with the mth delay tap of the channel, and Ts ≤ 1/B is the sample period. For many problems, it is assumed that the channel that is provided for analyses is from a bandwidth-limited sampled estimate, so this discussion is unnecessary. The issue often results when channels are constructed from explicit physical channel models with arbitrary delays. In an approach that is formally similar to the discussion above, channels may have Doppler-frequency spread rather than delay spread. Furthermore, there is a discretized version of the effects of Doppler-frequency spread. Instead of limited bandwidth, the channel must have limited temporal extent to satisfy Nyquist in the Doppler-frequency domain.

344

Dispersive and doubly dispersive channels

While nearly all modern communication systems use sampled signals, there are some subtleties to be considered. As an example, consider a physical channel with an arbitrary delay relative to signal sampling. In general, it will take an infinite number of channel taps to represent the channel. For many problems, this will have little effect on the analysis because it will only take a few channel taps to provide a sufficiently accurate approximation. However, for some problems that require precise representations (often these are found in theoretical analyses), misleading results can be generated. Sampled doubly dispersive channels In sampled representations of doubly dispersive channels (that is, a channel that induces delay and frequency spread), there is an intrinsic problem. In order to satisfy Nyquist in time, a bandwidth-limited signal is required. In general, a bandwidth-limited signal implies a signal of infinite temporal extent. In order to satisfy Nyquist in frequency, a temporally limited signal is required. This temporally limited signal implies signal of infinite spectral extent. Consequently, theoretically, sampled doubly dispersive channels are problematic. However, practically, discrete doubly dispersive channel representations are useful. With a sufficient number of samples, the errors caused by limited spectral and temporal extents can be made small.

10.2

Noncommutative delay and Doppler operations The approaches used to implement delay and Doppler offsets observed throughout this chapter ignore the effects of noncommutative delay and Doppler operators. Because a dense set of delay and Doppler taps is assumed in the processing, the approach is not particularly sensitive to this oversight. However, when one is attempting to use sparse sets of delay and Doppler taps, more care is required. For a delay shift d and Doppler-frequency shift f , the effects on a signal s(t) are sometimes approximated by the operation ei 2π f t s(t − d) .

(10.8)

Two assumptions were used in this formation. First, the velocity difference is small enough that the frequency offset can be described by a frequency shift. Second, the delay-shifting operation is applied before the frequency-shifting operation. This choice was arbitrary. A useful model for considering the frequency shift is to induce the frequency shift via time dilation 1 1 + ǫ. With independent local oscillators, the time dilation is caused by one clock simply running faster than another. Consider the operators Td {·} and Fǫ {·} that delay time by d and 1

We are not defining time dilation here in the special relativity sense.

10.3 Effect of frequency-selective fading

345

dilate time by 1 + ǫ respectively, Td {s(t)} = s(t − d)

Fǫ {s(t)} = s([1 + ǫ]t) .

(10.9)

Note that the operators do not in general commute Td {Fǫ {s(t)}} = Fǫ {Td {s(t)}}

s([1 + ǫ]t − d) = s([1 + ǫ][t − d]) .

(10.10)

However, if the product of the delay spread and the Doppler-frequency spread is small, then the difference between the two operator orderings is small.

10.3

Effect of frequency-selective fading Here we consider a static channel with delay spread. For multiple-antenna receivers, the effect of delay spread can cause the rank of the receive spatial covariance matrix to increase. To demonstrate this effect, consider the following simple two-tap channel model of a transmitted signal s(t) for the received signal z(t) ∈ Cn r ×1 as a function of time t impinging upon an array z(t) = h0 s(t) + hτ s(t − τ ) + n(t) ,

(10.11)

where h0 ∈ Cn r ×1 and hτ ∈ Cn r ×1 are the receive-array responses for the first and second arriving wavefronts. The complex additive noise is given by n(t) ∈ Cn r ×1 . The second wavefront is delayed by time τ . The average transmitted power P is given by P = s(t) s∗(t) .

(10.12)

The units of power are selected so that the spatial covariance of the thermal noise is assumed to be given by % & n(t) n† (t) = I , (10.13) so that noise power per receive antenna is 1. The receive spatial covariance matrix Q ∈ Cn r ×n r is given by & % Q = z(t) z† (t) = h0 h†0 s(t) s∗(t) + h0 h†τ s(t) s∗(t − τ )

+ hτ h†0 s(t − τ ) s∗(t) + hτ h†τ s(t − τ ) s∗(t − τ ) + I

= P h0 h†0 + P ρτ h0 h†τ + P ρ∗τ hτ h†0 + P hτ h†τ + I ,

(10.14)

where the autocorrelation parameter ρτ is given by ρτ =

s(t) s∗(t − τ ) . P

(10.15)

346

Dispersive and doubly dispersive channels

Temporally unresolved multipath scattering In the case of unresolved multipath, the delay τ approaches zero, so that the transmitted signal and some slightly delayed version s(t) ≈ s(t − τ ). The receive spatial covariance matrix Q becomes a rank-1 matrix plus the identity matrix. Q → P (h0 + hτ ) (h0 + hτ )† + I .

(10.16)

The eigenvalues {λ1 , λ2 , . . . , λn r } of the spatial covariance matrix are given by λ1 = P (h0 + hτ )† (h0 + hτ ) + 1

(10.17)

λm = 1 for m > 1.

(10.18)

and

Consequently, even though there are multiple channel paths, there is a single large signal eigenvalue. Temporally resolved multipath scattering In the case of resolved multipath, the relative delay is large enough so that the received data appear to contain two signal versions. The transmitted signals are approximately independent because the autocorrelation between the two delays is small. For the sake of discussion, consider a scenario in which the autocorrelation ρτ at some large delay τ is zero to a good approximation, ρτ ≈ 0 .

(10.19)

From Equation (10.14), the receiver spatial covariance matrix Q is then approximately given by Q ≈ P h0 h†0 + P hτ h†τ + I .

(10.20)

The mth eigenvalue of the receiver spatial covariance matrix λm {Q} is given by λm {Q} ≈ P λm {h0 h†0 + hτ h†τ } + 1 .

(10.21)

For two-tap channels with taps that are well separated in delay, the eigenvalues are given by  2 h0 2 + hτ 2 + ( h0 2 − hτ 2 ) + 4 h†0 hτ 2 λ1 {Q} = 1 + P 2  2 2 2 h0 + hτ − ( h0 2 − hτ 2 ) + 4 h†0 hτ 2 λ2 {Q} = 1 + P 2 (10.22) λm {Q} = 1 ; m ∈ {3, . . . , nr } , where Equation (2.85) has been employed. For notational convenience, we will make the following definitions. The normalized inner product between the array

10.3 Effect of frequency-selective fading

347

responses at the different delays is given by η, and the ratio of the norms of the array responses is given by γ, h†0 hτ h0 hτ hτ . γ= h0 η=

(10.23)

By using these definitions, the eigenvalues λ1 {·}, λ2 {·}, and the rest λm {·} of the receive spatial covariance matrix are given by  2 2 (1 − γ 2 ) + 4γ 2 η 2 1 + γ + 2 λ1 {Q} = 1 + P h0  2 2 2 1 + γ − (1 − γ 2 ) + 4γ 2 η 2 λ2 {Q} = 1 + P h0 2 2 (10.24) λm {Q} = 1 ; m ∈ {3, . . . , nr } . In the special case of equal array response norms so that γ = 1, the first two eigenvalues are given by {λ1 {Q}, λ2 {Q}} = 1 + P h0 2 (1 ± η) .

(10.25)

The ratio of the second to the first eigenvalue in the high-power limit is given by 1−η λ2 {Q} ≈ . λ1 {Q} 1+η

(10.26)

In another special case, if array response norms are not equal, but the array responses are approximately orthogonal so that η ≈ 0, then the first two eigenvalues are given by λ1 {Q} = 1 + P h0 2

λ2 {Q} = 1 + P h0 2 γ 2 .

(10.27)

The approximate orthogonality assumption may not be bad in situations in which the channels are random and the number of antennas is large. The ratio of the second to the first eigenvalue in the high-power limit is given by λ2 {Q} ≈ γ2 . λ1 {Q}

(10.28)

Here given resolvable delay spread in the received signal, it is seen in Equations (10.26) and (10.28) that the rank of the receive covariance matrix for a single transmitter increases from rank-1 to rank-2. As the number of temporally resolvable multipath components increases, so does the rank of the receive spatial covariance matrix. This fact has implications for using the receiver spatial degrees of freedom for interference mitigation.

348

Dispersive and doubly dispersive channels

10.4

Static frequency-selective channel model From Equation (8.2), the received data vector z(t) using the standard flat-fading MIMO channel signal model is given by z(t) = H s(t) + n(t) .

(10.29)

A dispersive channel is one that has temporally resolvable delay spread. This induces frequency-selective channel attenuation. As an extension of Equation (10.7), for a bandwidth-limited signal and a channel with a finite delay range, the frequency-selective channel characteristics are incorporated by including channel taps indicated by delay τm , z(t) =

nd 

m =1

Hτ m s(t − τm ) + n(t) ,

(10.30)

where Hτ m indicates the channel matrix at the mth delay, and τm the nd resolvable delays. In general, a set of physical delay offsets that are not matched to the regularly sampled delay offsets will require an arbitrarily large number of sample delays to represent the channel perfectly. However, given a moderate set of sample delays τm , a reasonably accurate frequency-selective channel can be constructed. ˜ ∈ For a channel represented by nd delays, the space-time channel matrix H n r ×(n t ·n d ) is given by C   ˜ = Hτ (10.31) Hτ 2 · · · Hτ n d . H 1

Similarly, the matrix of the transmitted signal at the nd delays ˜s(t) ∈ C(n t · n d )×1 is given by ⎛ ⎞ s(t − τ1 ) ⎜ s(t − τ2 ) ⎟ ⎜ ⎟ ˜s(t) = ⎜ (10.32) ⎟. .. ⎝ ⎠ . s(t − τn d )

Consequently, the received signal is given by  Hτ m s(t − τm ) + n(t) z(t) = m

˜ ˜s(t) + n(t) . =H

10.5

(10.33)

Frequency-selective channel compensation As discussed in Chapter 9, there are two different perspectives for compensating for channel distortions. One can modify the model to match the data, or one can modify the data to match the model. Under the modify-the-data category,

10.5 Frequency-selective channel compensation

349

the typical approach for compensating for frequency-selective fading in SISO channels is equalization that introduces a tap delay line into the processing. This approach can be extended to the multiple-antenna application by using what is denoted either as space-time adaptive equalization (STAE) or spacetime adaptive processing (STAP). Another approach for compensating for frequency-selective channels is to employ orthogonal-frequency-division multiplexing (OFDM). Either data-modifying or model-modifying approaches can employ an OFDM signaling structure. There are direct extensions to OFDM for multiple-antenna systems. For either approach, it is implicitly assumed that the channel does not vary significantly during the coherent processing interval. In the case of OFDM, the typical approach is to apply narrowband multiple-antenna processing within each OFDM carrier.

10.5.1

Eigenvalue distribution of space-time covariance matrix In general, adaptive processing can be used to compensate for many potential distortions of the transmitted signal. The most common is the frequency-selective fading induced by resolvable delay spread in the channel. By defining the received signal data matrix Zτ ∈ Cn r ×n s distorted by a delay τ ,  Zτ = z(0 Ts − τ ) z(1 Ts − τ ) z(2 Ts − τ )  (10.34) · · · z([ns − 1] Ts − τ ) ,

for a regularly sampled signal with sample period Ts , a space-time data matrix ˜ ∈ C(n r ·n δ )×n s is constructed, Z ⎛ ⎞ Z0 δ τ ⎜ ⎟ Z1 δ τ ⎜ ⎟ ⎜ ⎟ Z ˜ 2 δτ Z=⎜ (10.35) ⎟. ⎜ ⎟ . .. ⎝ ⎠ Z(n d −1) δ τ

As a reminder, we use nδ here rather than nd because nδ indicates the number of delays used in the processing rather than that required to represent the channel with some accuracy. Here there is potentially some confusion because there is temporal sampling both along the traditional temporal sampling dimension which is encoded along ˜ and in delay which is mixed with the the rows of space-time data matrix Z ˜ receive antennas along the columns of space-time data matrix Z. One of the reasons that this structure is interesting is that it can be used to compensate for the eigenvalue spread observed in the spatial covariance matrix in environments with resolvable delay spread. For the example of a single transmitter in an environment with resolvable delay spread, the fraction of nonnoise-level eigenvalues approaches 1/nr as the number of delays and samples

350

Dispersive and doubly dispersive channels

˜ ∈ Cn r n d ×n r n d is given by becomes large.2 The space-time covariance Q 5 6 ˜ = 1 Z ˜Z ˜† . (10.36) Q ns

For the example of a single transmitter, the delay-dependent channel matrix Hτ , from Equation (10.30), collapses to a vector hτ ∈ Cn r ×1 as a function of delay τ . Consequently, the received signal, under the assumption of a sampled channel matrix, is given by  (10.37) hτ m s(t − τm ) + n(t) , z(t) = m

where s(t) is the transmitted signal. The ability to mitigate interference is related to the fraction of the total space occupied by the interference as determined by the distribution of eigenvalues. It will be useful to consider the discrete Fourier transform along the delay dimension. This transform does not affect the eigenvalue distribution. The eigenvalues ˜ are a function of the delay spread of the space-time covariance matrix λm {Q} ˜ The mth eigenvalue of and the number of delays used in the construction of Z. the space-time covariance matrix is given by 61 05 0 1 ˜Z ˜† ˜ = 1 λm λm Q Z ns 61 05 1 ˜Z ˜ † U† UZ λm = ns 05 61 1 ˜Z ˜ † (Fn ⊗ In ) (F†n δ ⊗ In r ) Z , (10.38) λm = r δ ns

where U indicates some unitary matrix, and Fn δ is the discrete Fourier transform matrix of size nδ with unitary normalization; thus, the form F†n δ ⊗ In r is unitary. Here the observation that the eigenvalue distribution of a matrix is invariant under arbitrary unitary transformations is employed.

Covariance matrix rank by using continuous approximation In this section, we show that the fraction of the noise-free space-time covariance space occupied by the interference asymptotically approaches 1/nt . For discussion, consider a data matrix in which the noise is zero. For a set of nr rows of the ˜ associated with delay τ is given by the vector z(t − τ ) space-time data matrix Z as a function of sampled values of time t. To determine the asymptotic limit of the eigenvalue distribution of the space-time covariance matrix, we will consider the limiting continuous form of the Fourier transform along the channel delays. The limiting case will provide a channel formulation that allows for an infinite delay range and an infinitesimal channel sampling period. For a single transmitter propagating through a frequency-selective channel, this data vector 2

The authors would like to thank Shawn Kraut for his thoughtful comments and suggestions on this topic area.

10.5 Frequency-selective channel compensation

351

is given by the convolution of the transmitted signal convolved with the channel,  z(t − τ ) = dq h(q) s(t − τ − q) , (10.39) where h(q) ∈ Cn r ×1 is the channel response as a function of delay. In the continuous limit, the inverse Fourier transform of the space-time data matrix along ˜ is associated with the continuous form the delay space (F†n d ⊗ In r ) Z   −1 i2π f τ dq h(q) s(t − τ − q) Fτ {z(t − τ )} = dτ e   ′ = ei2π f t dτ ′ e−i2π f (τ + q ) dq h(q) s(τ ′ ) ; τ ′ = t − τ − q     ′ 1 dq e−i2π f q h(q) dτ ′ e−i2π f τ s(τ ′ ) = ei2π f t √ 2π i2π f t hf (f ) sf (f ) , (10.40) =e where we have abused the notation somewhat, such that here hf (f ) is the Fourier transform of h(t), and sf (f ) is the Fourier transform of s(t). Implicit in this formulation is the implementation of an infinite-dimensional delay space, which is an approximation to the case in which the space-delay matrix is very large compared with the delay spread of the channel. In evaluating the space-time covariance matrix, the expectation is evaluated over time and draws of the transmitted signal, but the channel as a function of delay is assumed to be deterministic. In a continuous analog to Equation (10.38), the nr × nr cross-covariance matrix associated with the frequencies {f, f ′ } of the outer product of the inverse Fourier transform of the space-delay array response % −1 & ′ † Fτ {z(t − τ )} Fτ−1 , (10.41) ′ {z(t − τ )} where the expectation is taken over time, is given by % −1 & ′ † Fτ {z(t − τ )} Fτ−1 ′ {z(t − τ )} 6 5 ′ ∝ ei2π (f −f ) t sf (f ) s∗f (f ′ ) hf (f ) h†f (f ′ ) & % = sf (f ) 2 hf (f ) h†f (f ) ,

(10.42)

where the expectation over the exponential produces a delta function, under the assumption that the signal is uncorrelated at different frequencies. As a counter example, cyclostationary signals would have some correlation across frequencies. The resulting covariance is block diagonal with the outer product of channel responses hf (f ) h†f (f ) at each frequency. For the finite case, with nδ delays, the corresponding space-time covariance matrix is size nδ · nr × nδ · nr . In the limit of nδ becoming large, because each block is rank-1 out of a nr × nr matrix, the rank of the space-time covariance is given by nδ . Consequently, the fraction of the eigenvalues that are not zeros is bounded by one over the number of receive channels 1/nr .

352

Dispersive and doubly dispersive channels

Covariance matrix rank for finite samples Once again, consider a data matrix in the absence of noise. We assume here that the channel can be represented by nd delays. With a finite number of delays in the space-time data matrix, the rank of the space-time covariance matrix can be bounded by rank ≤ nd + nδ − 1

(10.43)

out of a dimension of nδ nr . Consequently, the fractional space-time covariance matrix rank (that is, the rank divided by the total number of degrees of freedom) is given by frac rank ≤

nd + nδ − 1 , nδ nr

(10.44)

which approaches 1/nr in the limit of large nδ . To develop the result in Equation (10.44), consider a space-time data vector ˜(t) ∈ Cn δ ·n r ×1 , which is a single column from Equation (10.35) at some time t. z & % ˜ = z ˜(t) z ˜† (t) . Under the assumpThe space-time covariance is then given by Q tion that the channel and data matrix are sampled with the same period δτ , the space-time data vector has the form ⎞ ⎛ "n d −1 hm s(t − [m + 0] δτ ) m = 0 ⎜ "n d −1 h s(t − [m + 1] δ ) ⎟ ⎟ ⎜ "m = 0 m τ ⎟ ⎜ n d −1 ⎟. ⎜ h s(t − [m + 2] δ ) m τ ˜(t) = ⎜ z (10.45) m =0 ⎟ .. ⎟ ⎜ ⎠ ⎝ . "n d −1 h s(t − [m + n − 1] δ ) m δ τ m =0

By rearranging the sum so that terms with the same value of delay s(t − kδτ ) are combined, the rank of the space-time covariance matrix can be bounded. The space-time data vector also has the form ⎞ ⎞ ⎛ ⎛ h1 h0 ⎜ h0 ⎟ ⎜ 0 ⎟ ⎟ ⎟ ⎜ ⎜ ⎟ ⎜ ⎜ 0 ⎟ ˜(t) = ⎜ z ⎟ s(t − 0 δτ ) + ⎜ 0 ⎟ s(t − 1 δτ )+ ⎜ . ⎟ ⎜ . ⎟ ⎝ .. ⎠ ⎝ .. ⎠ 0 0 ⎛ ⎞ 0 ⎜ ⎟ 0 ⎜ ⎟ ⎜ ⎟ . .. ··· + ⎜ (10.46) ⎟ s(t − [nd − 1 + nδ − 1] δτ ) . ⎜ ⎟ ⎝ ⎠ 0 h[n d −1]

Under the assumption that the channel and signal at each delay are independent, the rank of the space-time covariance matrix is given by the contribution of each of these nd + nδ − 1 terms; thus, Equations (10.43) and (10.44) are verified. As the fraction of space that a signal occupies decreases, the adverse effect it has

10.5 Frequency-selective channel compensation

353

on the ability for a receiver to decode other signals typically decreases. Thus, by increasing the number of delays in processing, the typical performance of a receiver that is observing multiple signals improves; however, this comes at the cost of an increase in computation complexity.

10.5.2

Space-time adaptive processing In general, approaches that are applicable to adaptive spatial processing can be extended to adaptive space-time processing. As an example, in an extension to the spatial beamformer discussed in Section 9.2.3, the estimate of the transmitˆ at the output of a linear space-time minimum-mean-square error ted signal S ˜ is given by (MMSE) beamformer W ˜, ˆ=W ˜ †Z S

(10.47)

6 5 ˜ †Z ˜ − S 2 W

(10.48)

˜Z ˜ † )−1 Z ˜S ˜† , ≈ (Z

(10.49)

such that F

is minimized. The space-time adaptive beamformer is given by 6−1 5 6 5 ˜ = Z ˜S ˜† ˜Z ˜† W Z where the distorted transmitted training sequence is given by ⎞ ⎛ S0 δ τ ⎟ ⎜ S1 δ τ ⎟ ⎜ ⎟ ⎜ S2 δ τ ˜=⎜ S ⎟ ∈ C(n t n δ )×n s , ⎟ ⎜ . .. ⎠ ⎝

(10.50)

S(n d −1) δ τ

where Sτ indicates the data matrix shifted to delay τ . The data matrix outer product term ˜Z ˜† Z

(10.51)

in the beamformer definition in Equation (10.35) must be nonsingular in the approximate form. A necessary condition for this to be true is that ns ≥ nr nδ .

(10.52)

However, because of the potential correlation between samples, this condition may not be sufficient. As an example, it is common to use temporally oversampled data, such that the sample rate is larger than the number required to support the received bandwidth-limited signal. Approaches to address the resulting poorly conditioned matrices are discussed in Section 9.7. One of the most ˜Z ˜ † + ǫI, where ǫ is ˜Z ˜† → Z common techniques is diagonal loading, for which Z

354

Dispersive and doubly dispersive channels

Data 01101110...

Coding/ Modulation

Transmitted Signal

Cyclic Prefix

IFFT

Figure 10.1 Notional construction of OFDM transmit signal.

an appropriately scaled small number. This technique was discussed in greater detail in Section 9.7.

10.5.3

Orthogonal-frequency-division multiplexing While there are a number of subtleties and variants of orthogonal-frequencydivision multiplexing (OFDM) modulation, the basic premise is that blocks of symbols are defined and constructed in the frequency domain, then an inverse fast Fourier transform (IFFT) converts the signal to the time domain and is transmitted as seen in Figure 10.1. If the ns samples in the frequency domain for the nt transmitters are indicated by S ∈ Cn t ×n s , then the time domain representation X ∈ Cn t ×n s is given by X = S F† n s −1 mn 1  {S}k ,m ei2π ns . {X}k ,n = √ ns m = 0

(10.53)

Thus, each symbol is placed in its own subcarrier. The approximate width of a bin in the frequency (which is approximate because each subcarrier has the spectral shaping of a sinc function) is given by the bandwidth of the complex baseband signal divided by the number of samples B/ns . If the width of a bin is small compared with the inverse of the standard deviation of the delay spread σd 1 B ≪ , ns σd

(10.54)

then the frequency-selective fading will typically move slowly across the frequency bins. Consequently, representing the channel as a complex attenuation in each frequency bin is a good approximation. In this regime, performing narrowband processing within each bin works reasonably well. Approaches to address doubly dispersive channels (discussed later in this chapter) using OFDM waveforms have also been considered [200, 277]. At the OFDM receiver, an FFT is performed upon a block of received data Z ∈ n r ×n s C to attempt to recover an estimate of the original frequency domain signal. However, because of temporal synchronization errors and because of multipath delay spread, the receiver cannot extract the exact portion of data that was transmitted. This mismatch in temporal alignment causes degradation in the

10.5 Frequency-selective channel compensation

355

orthogonality assumption. Noting that a cyclic shift at the input of the FFT induces a benign phase ramp across the output, the adverse effects caused by delay spread and synchronization error can be mitigated by adding a cyclic prefix. A portion (ncp samples) of the time-domain signal from the beginning of the signal is added to the end of the signal at the transmitter, so that the transmitted signal Y ∈ Cn t ×(n s + n c p ) is given by   (10.55) Y = X x 1 x2 · · · x n c p ,

where xm is the mth column of X. Because of the modularity of the exponential ei2π

m (n + n s ) ns

= ei2π

m (n ) ns

,

(10.56)

the transmitted signal with cyclic prefix has essentially the same form as the transmitted signal without the cyclic prefix, ns ( m −1 ) ( n −1 ) 1  ns {Y}k ,n = √ {S}k ,m ei2π ns m = 1

∀ n ∈ {1, 2, · · · , ns + ncp } . (10.57)

The sum over m can be considered the sum over subcarriers that produces the final time-domain signal. The received signal in the time domain Z for the nth sample in time and the jth receive antenna is then given by nt ns  ( m −1 ) ( n −1 ) 1  ns {Hm }j,k {S}k ,m ei2π + {N}j,n , {Z}j,n ≈ √ ns m = 1

(10.58)

k=1

where here Hm ∈ Cn r ×n t is the channel matrix for the mth subcarrier. The result is an approximation because the model of a flat-fading channel within a subcarrier is approximate. The significant advantage of OFDM is the implicit frequency channelization that, given a sufficient density of frequency bins, enables narrowband processing within each channel. In addition, by employing FFTs, the computational complexity increases by order of the logarithm of the number of frequency channels per signaling chip (because it grows order ns log ns for the whole block of ns chips). This increase in computational complexity is much slower than most equalization approaches for single-carrier systems. One of the significant disadvantages of the OFDM approach is that the transmitted signal has a large peak-to-average power ratio, as discussed in Section 18.5. Ignoring the possibility of receivers that can compensate for nonlinearities, which would be very computationally intensive, the large peak-to-average power ratio imposes significant linearity requirements on the transmit amplifier. Typically, improved transmitter amplifier linearity comes at the expense of greater power dissipation. The large peak-to-average ratio can be understood by noting that the time-domain signal is constructed by adding a number of independent frequency-domain symbols together. Even if the starting frequency-domain symbols have a constant modulus, by the central limit theorem, the limiting transmitted signal distribution is Gaussian. While not likely, occasionally values

356

Dispersive and doubly dispersive channels

drawn from a Gaussian distribution can be several times larger than the standard deviation. Thus, the transmitted signal has a large peak-to-average power ratio. Effects of external interference Because OFDM uses a cyclic prefix to maintain orthogonality of carriers, OFDM can be particularly sensitive to external interference. Typically, such interference will not be synchronized with an appropriate cyclic prefix. In multipath for multiple-antenna receivers, despite the multiple-carrier approach, the rank of the external interference in each frequency bin can increase rapidly with channel delay spread. When using FFTs for spectral analysis, nonrectangular windowing is typically used to reduce the spectral sidelobes [172]. There are many widowing approaches: for example, Hamming, Hanning, Blackman, and Chebyshev windows. However, these windowing approaches break the orthogonality between carriers. In principle, these effects can be traded and the windows can be optimized [64].

10.6

Doubly dispersive channel model The doubly dispersive channel model includes the tap delay characteristics of the frequency-selective channel model and allows for the model to vary as a function of time. A general model for the received signal z(t) ∈ Cn r ×1 as a function of time t that extends the static sampled delay channel model and allows time-varying channel coefficients is given by z(t) =

nd 

m =1

Hτ m (t) s(t − τm ) + n(t) ,

(10.59)

where the function Hτ m (t) ∈ Cn r ×n t indicates the time-varying channel matrix at relative delay τm at time t for nd resolvable delays. Given a sufficient set of values for τm , this model can accurately represent a time-varying frequencyselective channel for bandwidth-limited signals. Under the assumption of a bandwidth-limited signal with a channel that can be represented by nd number of delays, the time-varying space-time channel ˜ matrix H(t) ∈ Cn r ×(n t ·n d ) is given by  ˜ H(t) = Hτ 1 (t)

Hτ 2 (t)

···

 Hτ nd (t) .

(10.60)

Similar in form to the static case, the received signal z(t) is given by z(t) =

 m

Hτ m (t) s(t − τm ) + n(t)

˜ ˜s(t) + n(t) . = H(t)

(10.61)

10.6 Doubly dispersive channel model

10.6.1

357

Doppler-domain representation While the temporal dynamics of the channel can be accurately expressed by employing time-varying coefficients in the channel, for some applications it is useful to consider a Doppler-domain representation for the time-varying channel. Here we will consider a continuous SISO channel first and then extend the discussion to a discrete MIMO channel. Doppler-domain SISO channel As an example, consider a time-varying, noise-free, SISO channel without delay spread, z(t) = ht (t) st (t) ,

(10.62)

where z(t), ht (t), and st (t) indicate the complex baseband received signal, the complex attenuation, and the transmitted signal, respectively, in the temporal domain. Transforming to the frequency domain, the transmitted signal is given by  ∞ sf (f ) = dt e−i 2π f t st (t) . (10.63) −∞

The complex attenuation in the frequency domain hD (f ) can be expressed in the Doppler domain,  ∞ hD (f ) = dt e−i 2π f t ht (t) , (10.64) −∞

where aD (f ) is the complex attenuation in the frequency domain (associated with Doppler-frequency shifts). By noting that the multiplication in the temporal domain is convolution in the frequency domain, ht (t) st (t) ⇔ hD (f ) ∗ sf (f ) .

(10.65)

Under the assumption of a limited temporal signal, the transform relationship is explicitly given by  ∞ ht (t) st (t) = df ei 2π f t hD (f ) ∗ sf (f ) −∞  ∞  ∞ df ei 2π f t dfD hD (fD ) sf (f − fD ) = −∞ −∞  ∞ = dfD hD (fD ) ei 2π f D t st (t) , (10.66) −∞

where the convolution hD (f ) ∗ sf (f ) is given by

dfD hD (fD ) sf (f − fD ).

Doubly dispersive SISO channel As discussed in the introduction of this chapter, this formulation is useful if the signal is both bandwidth and temporally limited, which is not possible.

358

Dispersive and doubly dispersive channels

However, for many problems, this formulation can be employed to a good approximation. For the SISO case of a time-varying channel with delay spread, denoted ht (t, τ ) at time t and relative delay τ , the above time-varying channel can be extended to include delays, so that the noise-free received signal z(t) is given by  (10.67) z(t) = dτ ht (t, τ ) st (t − τ ) . By repeating the discussion of transforming to the Doppler domain as above, and noting that the Fourier transform of the delayed transmitted signal is given by  ∞  ∞ ′ dt′ e−i 2π f t st (t′ ) dt e−i 2π f t st (t − τ ) = e−i 2π f τ −∞

−∞

= e−i 2π f τ sf (f ) ,

(10.68)

the received signal is approximated by  z(t) = dτ ht (t, τ ) st (t − τ )    dfD hD (fD , τ ) sf (f − fD ) e−i 2π (f −f D ) τ = dτ df ei 2π f t    i 2π f D τ df ei 2π f (t−τ ) sf (f − fD ) = dτ dfD hD (fD , τ ) e    ′ df ′ ei 2π (f + f D ) (t−τ ) sf (f ′ ) = dτ dfD hD (fD , τ ) ei 2π f D τ   (10.69) = dτ dfD hD (fD , τ ) ei 2π f D t st (t − τ ) . Doubly dispersive MIMO channel By extending this discussion to the MIMO channel, the noise-free received signal as a function of time z(t) ∈ Cn r ×1 is given by  (10.70) z(t) = dτ dfD HD (fD , τ ) ei 2π f D t s(t − τ ) , where the function HD (fD n , τ ) ∈ Cn r ×n t indicates the channel in a delay and Doppler-domain representation.

10.6.2

Eigenvalue distribution of space-time-frequency covariance matrix Received signal matrices with a lattice of various delay or frequency offset distortions can be used in processing by the receiver. A space-time-frequency data ˜ ∈ C(n r ·n δ ·n ν )×n s that is regularly sampled in delay δτ and in Doppler matrix Z

10.6 Doubly dispersive channel model

frequency δν , is constructed by ⎛ ⎜ ⎜ ⎜ ⎜ ˜ =⎜ Z ⎜ ⎜ ⎜ ⎜ ⎝

Z0 δ τ , 0 δ ν .. . Z(n δ −1) δ τ , 0 δ ν Z0 δ τ , 1 δ ν .. . Z(n δ −1) δ τ , (n ν −1) δ ν

359



⎟ ⎟ ⎟ ⎟ ⎟ ⎟, ⎟ ⎟ ⎟ ⎠

(10.71)

where the data matrix for distortions of a particular delay offset τ and frequency offset ν is given by  Zτ ,ν = ei 2π ν 0 T s z(0 Ts − τ ) ei 2π ν 1 T s z(1 Ts − τ ) ei 2π ν 2 T s z(2 Ts − τ )  (10.72) · · · ei 2π ν [n s −1] T s z([ns − 1] Ts − τ ) . ˜ ∈ Cn r ·n δ ·n ν ×n r ·n δ ·n ν The received space-time-frequency covariance matrix Q is given by 5 6 ˜Z ˜† . ˜ = 1 Z (10.73) Q ns

In the absence of noise, the rank of the space-time-frequency covariance matrix ˜ can be found by extension of the rank of the space-time covariance matrix Q found in Section 10.5.1. With a finite number of delays and frequency taps in the space-time-frequency data matrix, the rank of the space-time-frequency covariance matrix can be bound by rank  (nd + nδ − 1) (nf + nν − 1)

(10.74)

out of a dimension of nδ nν nr . Note that the approximation is due to the limitation of having both finite spectral and temporal extent. If it were possible to have signals that have both finite spectral and temporal support, then the inequality would be exact. Consequently, the fraction of the space-time covariance matrix rank is given by frac rank 

(nd + nδ − 1) (nf + nν − 1) , nδ nν nr

(10.75)

which approaches 1/nr in the limit of large nδ and nν . ˜(t) ∈ To develop the above result, consider a space-time-frequency data vector z at some time t. Cn δ ·n ν ·n r ×1 , which is a single column from Equation (10.71), & % ˜ = z ˜(t) z ˜† (t) (which The space-time-frequency covariance is then given by Q is equivalent to the definition in Equation (10.73)). Under the assumption that the sampled channel in the Doppler domain hδ τ ,δ f and data matrix are sampled with the same period δτ and frequency resolution δf , the space-time-frequency

360

Dispersive and doubly dispersive channels

data vector has the form ⎛ "n d −1 "n f −1 m =0 k = 0 hm δ τ ,k δ f s(t − [m + 0] δτ ; [k + 0] δf ) ⎜ "n d −1 "n f −1 h m δ τ ,k δ f s(t − [m + 0] δτ ; [k + 1] δf ) ⎜ m =0 k=0 ⎜ .. ⎜ . ⎜ ⎜ "n d −1 "n f −1 ⎜ 0 "k = 0 hm δ τ ,k δ f s(t − [m + 0] δτ ; [k + nν ] δf ) ⎜ "mn=d −1 n f −1 ⎜ m =0 k = 0 hm δ τ ,k δ f s(t − [m + 1] δτ ; [k + 0] δf ) ⎜ ˜(t) = ⎜ "n d −1 "n f −1 z ⎜ m =0 k = 0 hm δ τ ,k δ f s(t − [m + 1] δτ ; [k + 1] δf ) ⎜ .. ⎜ . ⎜ ⎜ "n d −1 "n f −1 ⎜ m =0 k = 0 hm δ τ ,k δ f s(t − [m + 1] δτ ; [k + nν ] δf ) ⎜ ⎜ .. ⎝ . "n d −1 "n f −1 m =0 k = 0 hm δ τ ,k δ f s(t − [m + nδ ] δτ ; [k + nν ] δf )



⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟, ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

(10.76)

where the notation s(t; δf ) indicates the signal at time t that is shifted by frequency δf . Similar to form of the space-time covariance matrix, by rearranging the sum so that terms with the same value of delay s(t − mδτ ; kδf ) are grouped, the rank of the space-time-frequency covariance matrix can be bounded,3 and the number of contributions to the rank can be found. Because the frequency and delay contributions are independent, for any given frequency there are nd +nδ −1 delay contributions. Consequently, there are (nd +nδ −1) (nf +nν −1) contributing terms. This accounting can be observed in the rearranged space-time-frequency data vector that is given by ⎛

⎜ ⎜ z(t) = ⎜ ⎝

h0 δ τ ,0 δ f 0 .. . 0 ⎛



⎟ ⎟ ⎟ s(t − 0 δτ ; 0 δf ) ⎠ 0 .. .

⎜ ⎜ ⎜ ⎜ 0 ⎜ ⎜ + · · · + ⎜ h[n d −1] δ τ ,0 δ f ⎜ ⎜ 0 ⎜ .. ⎜ ⎝ . 0

3

This argument is due to Shawn Kraut.



⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ s(t − [nd − 1 + nδ − 1] δτ ; 0 δf ) ⎟ ⎟ ⎟ ⎟ ⎠

10.7 Space-time-frequency adaptive processing



0 .. .

⎜ ⎜ ⎜ ⎜ 0 ⎜ ⎜ + · · · + ⎜ h0 δ τ ,1 δ f ⎜ ⎜ 0 ⎜ .. ⎜ ⎝ . 0 ⎛

⎜ ⎜ ⎜ + ··· + ⎜ ⎜ ⎝

361



⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ s(t − 0 δτ ; 1 δf ) ⎟ ⎟ ⎟ ⎟ ⎠ ⎞

0 0 .. . 0

h[n d −1] δ τ ,[n f −1] δ f

⎟ ⎟ ⎟ s(t − [nd − 1 + nδ − 1] δτ ; . ⎟ ⎟ [nd − 1 + nδ − 1] δf ) ⎠

(10.77)

Under the assumption that the channel and signal at each delay and frequency are independent, the rank of the space-time-frequency covariance matrix is given by the contribution of each of these (nd + nδ − 1)(nf + nν − 1) terms; thus, Equations (10.74) and (10.75) are shown.

10.7

Space-time-frequency adaptive processing Space-time-frequency adaptive processing is a direct extension of the space-time ˜ ∈ C(n r ·n δ ·n ν )×n s is adaptive processing. A space-time-frequency data matrix Z ˜ grows quickly with nr , nδ , defined in Equation (10.71). The dimensionality of Z and nν . For the processing to work well, the dimensionality of included distortions must cover those in the environment (nδ > nd and nν > nf ); however, for large numbers of distortions, this processing approach may be untenable. In general, approaches that are applicable to adaptive spatial processing can be extended to adaptive space-time-frequency processing. Similar to the spacetime adaptive processing, the space-time-frequency MMSE receive estimate of the transmitted signal is given by ˜. ˆ=W ˜ †Z S

(10.78)

The space-time-frequency adaptive beamformer is given by 6 6−1 5 5 ˜S ˜† ˜ = Z ˜Z ˜† Z W ˜Z ˜ † )−1 Z ˜S ˜† , ≈ (Z

(10.79)

362

Dispersive and doubly dispersive channels

where the distorted training sequence is given by ⎛ S0 δ τ ,0 δ ν ⎜ S1 δ τ ,0 δ ν ⎜ ⎜ S 2 δ τ ,0 δ ν ⎜ ⎜ .. ⎜ . ⎜ ⎜ S ⎜ (n δ −1) δ τ ,0 δ ν ˜=⎜ S ⎜ S0 δ τ ,1 δ ν ⎜ ⎜ S1 δ τ ,1 δ ν ⎜ ⎜ S 2 δ τ ,1 δ ν ⎜ ⎜ .. ⎝ .

S(n δ −1) δ τ ,(n ν −1) δ ν



⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

(10.80)

Here, Sτ ,ν is defined similarly to Zτ ,ν . A necessary condition to evaluate the space-time-frequency beamformer is that ns ≥ nr nδ nν .

10.7.1

(10.81)

Sparse space-time-frequency processing As discussed previously, the dimensionality of the space-time-frequency covariance matrix can grow quickly as the number of delay and Doppler-frequency taps increase. This dimensionality can quickly become an issue for some applications. For some special channels, the channel taps can be sparse. In this regime, it is useful to perform the processing by using an operator algebra approach [141]. There are numerous difficulties in applying sparse processing, and in a large portion of parameter space, it is simply not possible to implement a sparse solution. Determining the applicability is a strong function of degrees of freedom, phenomenology, and prior knowledge.

Problems 10.1 Consider a static single SISO channel that is represented by ˜ ) = a δ(τ − T ) , h(τ

(10.82)

where the notation in Section 10.1 is employed. For a complex signal bandwidth B = 1/Ts , evaluate the discrete channel representation: (a) T = 0, (b) = T = Ts /2, (c) = T = Ts /4. 10.2 The model frequency shifting (displayed in Equation (10.8)) is commonly used in analyses rather than the more accurate time-dilation approach (displayed in Equation (10.9)). A significant issue in practical systems is that the time

Problems

363

dilation eventually causes a chip slip that cannot be corrected by a simple phase correction. For a filtered BPSK signal with a carrier frequency of 1 GHz and a bandwidth of 1 MHz with a relative fractional frequency error of 10−6 between the transmitter and receiver, evaluate the expected receiver loss because of chip misalignment as a function of time since a perfect synchronization. 10.3 By using the notation in Section 10.2, consider a loop in parameter space for delay and Doppler channel operators on a signal. Starting at some point, and moving operators through the space of delay and Doppler along some path, and then returning to the original point, the signal should be unaffected, because the effect should only be determined by the parameters’ values. However, evaluate an effect on some signal s(t) of the following sequence of operators Td Fǫ T−d F−ǫ and evaluate the error. 10.4 Consider the eigenvalues of the observed space-time covariance matrix that is observing critically sampled signals. For a 10 receive antenna array and a single transmit antenna with a line-of-sight channel (equal channel responses across receive antennas), evaluate the eigenvalue distribution of the receive spacetime covariance matrix of a 0 dB SNR per receive antenna assuming unit variance per antenna noise. Evaluate the eigenvalues under the assumption that the spacetime covariance matrix includes (a) 1 (spatial-only) (b) 2 (c) 4 delay samples at Nyquist spacing. 10.5 Consider the eigenvalues of the observed space-time covariance matrix with two delays. The signal and noise are strongly filtered so that they significantly oversampled (that is, the sampling rate is large compared to the Nyquist sample rate). For a 10 receive antenna array and a single transmit antenna with a line-of-sight channel (equal channel responses across receive antennas), evaluate the eigenvalue distribution of the receive space-time covariance matrix of a 0 dB SNR per receive antenna assuming unit variance per antenna noise in the region of spectral support. Evaluate the eigenvalues approximately under the assumption that signal and noise are temporally oversampled significantly. 10.6 Consider the signal s(t) and the SISO doubly dispersive channel characterized by a time-varying channel ht (t, τ ) and the delay-frequency channel hD (fD , τ ); develop the form of a bound on the average squared error in using the hD (fD , τ ) form under the assumption of a bounded temporal T and bounded spectral B signal. 10.7 Develop the results in Section 10.6.1 by using discrete rather than integral Fourier transforms. 10.8 Develop the Doppler-frequency analysis dual of Equation (10.7).

364

Dispersive and doubly dispersive channels

10.9 Consider a simple discrete-time channel whose impulse response is given by the following 1 1 h(m) = δm ,0 + δm ,1 + δm ,2 , 2 4 where δm ,k is the Kronecker delta function defined in Section 2.1.3. Using a computer, generate 10 000 orthogonal-frequency-division-multiplexing (OFDM) symbols using 16 carriers with each carrier transmitting a BPSK symbol taking values of ±1 with equal probability. Using the inverse fast Fourier transform (IFFT), convert each OFDM symbol into its time-domain representation. Create two vectors to store the time-domain samples, sz p and scp . The vector sz p should contain the time-domain representation of the OFDM symbols with three samples of zero padding between consecutive symbols, and scp should contain a three-sample cyclic prefix. Convolve sz p and scp with the channel impulse response h(m) and add pseudorandom complex Gaussian noise of mean zero and variance 0.09 per sample to the results of the convolutions. A convenient way to do this is to represent the impulse response in a vector h, create a vector of noise samples n, and write zz p = hT sz p + n zcp = hT scp + n . Therefore, zz p and zcp simulate received samples in an OFDM system with zero padding and a cyclic prefix respectively. Decode the OFDM symbols by selecting appropriate portions of zz p and zcp , using a fast Fourier transform (FFT) to convert the time-domain samples into frequency domain values and checking the sign of the frequency domain-values to decode the BPSK data. Estimate the bit error rate by comparing the decoded data to the transmitted data for both the zero-padding and cyclic prefix schemes. This computer exercise is intended to illustrate the effects of using a cyclic-prefix versus simple zero padding for an OFDM system.

11 Space-time coding

As is true for single-input single-output (SISO) communication links, there are many approaches for coding multiple-input multiple-output (MIMO) systems in an attempt to approach the channel capacity. Many of the space-time coding approaches have analogs in the SISO regime. The multiple antennas of MIMO systems enable spatial diversity, increased data rates, and interference suppression. In general, there are trades in these characteristics, depending upon the coding and receiver approach. One of the most important trade-offs in this regard is the trade-off between data rate and diversity whereby the data communication rate is reduced to improve probability of error or outage. That is to say, a fraction of the data rate is sacrificed to improve robustness. There have been numerous contributions in the field of space-time coding. The major contributions include References [8, 99, 307, 305, 306, 361, 362, 314, 57, 58, 292, 86, 207, 166, 119, 87, 222, 270, 42, 43, 269, 226, 196]. Of particular note are Reference [8] which introduced what is now known as the Alamouti code, a simple and elegant space-time code for systems with two transmitter antennas, Reference [307] which introduced systematic methods to evaluate space-time codes, and Reference [361] which analyzed the fundamental trade-offs between diversity and rate in multiantenna systems.

11.1

Rate diversity trade-off The multiple transmit antennas of a MIMO system can be employed to increase the diversity or the data rate of a given link (see References [361, 314]). The tradeoff between the rate and diversity can be analyzed either for specific modulation schemes or in a more general form using an outage capacity formulation. In the former, a fraction of the maximum data rate achievable on a given link using a particular modulation scheme is sacrificed to reduce the probability of symbol or bit error. In the latter, a fraction of the maximum rate is sacrificed to ensure that under different realizations of the fading process, the probability that the capacity of the link is below some target rate is reduced. Most of the practical work in the field of space-time coding has focused on the probability of error formulation as it relates directly to specific coding and modulation schemes. The next two subsections describe these in more detail with specific examples.

366

Space-time coding

The following sections of the chapter discuss various classes of space-time coding schemes.

11.1.1

Probability of error formulation The average probability of bit or symbol error for a multiple-antenna communication system depends strongly on the type of modulation used (for example binary-phase-shift keyeing (BPSK) vs. quadrature-amplitude modulation (QAM)) and algorithms employed at the transmitters and receivers (for example, transmitting independent data through each antenna or repeating the same information on each antenna). Exact expressions for symbol and bit-error probabilities of these systems may be difficult to find and moreover may differ significantly in form complicating comparative analyses of such systems. In general, however, for high SNR, the probability of symbol error of most practical modulation schemes can be bounded in an order-of-growth sense. That is to say,   (11.1) per r (SNR, R) ≤ a SNR−d + O SNR−d−1 ,

where d is known as the diversity coefficient or diversity order and a is a constant. We use the notation per r (SNR, R) to represent the probability of error as a function of the SNR and rate R. For instance, for a SISO system using quadrature-phase-shift keying (QPSK) and channel coefficient g, the probability of error is   ||g||2 SNR , per r (SNR|g) = Q

where Q(.) is known as the Q-function, defined in Section 2.14.7, which is the tail probability of the standard Gaussian probability density function,  ∞ x2 1 dx e− 2 . Q(x) = √ 2π x Note that we use the notation per r (SNR|g) to refer to the probability of error as a function of the SNR, given the channel coefficient g. With Rayleigh fading (complex Gaussian channel coefficients), the magnitude square of the channel coefficient, that is, g 2 is exponentially distributed (see Section 3.1.10). Hence, one can derive the marginal probability of error as follows [255], [314]:  ∞  ∞    √ 2 d g Q g 2 SNR e−g  = dτ Q τ SNR e−τ per r (SNR) = 0 0 ,   1 1 1 1 SNR = +O , = − 2 2 2 + SNR 2 SNR SNR2 where τ is an integration variable over the channel attenuation, that is, τ = g 2 . The last equation indicates that a SISO QPSK system has diversity order 1 since the dominant SNR term has an exponent equal to −1. Note that in a fading system at high SNR, symbol and bit-error events are typically due to a channel realization that is weak (rather than a spike in the

11.1 Rate diversity trade-off

367

noise), in the sense that the norm square of the channel coefficients is small compared with the inverse of the SNR. Hence, for a SISO system with channel coefficient g, for most practical modulation schemes, the probability of error is approximated as follows [314]:  a  , (11.2) per r ≈ Pr g 2 < SNR where a is some constant that depends on the modulation scheme used.

11.1.2

Outage probability formulation One can also analyze the diversity order of a system independent of the modulation scheme by using the capacity of the system, replacing the error event with an outage event. Specifically, we analyze the probability that the capacity of the system is below a target rate under fading. For example, consider the capacity of a slow-fading SISO link as discussed in Section 5.3. The capacity is given by   (11.3) c = log2 1 + g 2 SNR , where we recall that g is the magnitude of the channel coefficient for the duration of communication. At very large SNR, the capacity can be approximated by   (11.4) c ≈ log2 g 2 SNR .

Suppose that a link wishes to communicate at a rate R. The link is in outage if R < c. Outage occurs if g happens to be small for the duration of the communication. To reduce the probability of outage, suppose that the rate R is a fraction r of the capacity of the channel assuming g = 1. That is to say, R = r log2 (1 + SNR) ≈ r log2 (SNR) .

(11.5)

Thus, the probability of outage, assuming Rayleigh fading, is [362] pou t = Pr(c < R) ≈ Pr(log2 ( g 2 SNR) < r log2 SNR)   = Pr g 2 SNR < SNRr   = Pr g 2 < SNRr −1 .

(11.6)

Because we assume Rayleigh fading, the magnitude square of the channel coefficient g 2 is exponentially distributed. Hence,  SNR r −1 dτ e−τ Pr( g 2 < SNRr −1 ) = 0

r −1

= 1 − eSNR     = 1 − 1 − SNRr −1 + o SNR2(r −1) ≈ SNRr −1 = SNRd(r )

(11.7)

368

Space-time coding

for large SNR and multiplexing rate r < 1. The function d(r) is the diversity gain associated with the multiplexing rate r. The previous expression indicates the rate at which outage probability can be improved at the expense of data rate and is a fundamental relationship for fading channels. Consequently, for the SISO Rayleigh channel, the diversity gain d and multiplexing rate r are related by d(r) = r − 1

(11.8)

for high SNR. Any real coding scheme is bound by this relationship. For a general MIMO link with nt transmitter and nr receiver antennas, the optimal diversity-multiplexing trade-off curve was found by Zheng and Tse in Reference [362]. While a precise analysis of this result is quite complicated, we present a brief description of their findings here, which are based on the analysis given in References [362, 314]. The capacity of an uninformed transmitter MIMO link with spatially uncorrelated noise, as discussed in Section 8.3, is given by     Po †  HH  c = log2 I + nt    a2 Po λm log2 1 + = nt m    SNR (11.9) log2 1 + λm , = nt m

where Po is the total noise normalized power and a is the average attenuation from transmit to receive antenna. The variable λm is the mth eigenvalue of GG† , where the matrix is given by G ∈ Cn r ×n t , such that G = H/a, and is drawn from an identically distributed, complex, circularly symmetric, unit-variance Gaussian distribution. The term a2 Po is also the total SNR per receive antenna. Note that at high SNR, the spectral efficiency for such a MIMO system can grow approximately linearly with the minimum of nr and nt , that is, writing n = min(nr , nt ), the MIMO link can support a spectral efficiency of approximately n log2 (SNR), where n is the multiplexing gain provided by the multiple transmit and receive antennas. For some real system with spectral efficiency R operating with a multiplexing rate of r ≤ n, R ≈ r log2 (SNR) .

(11.10)

The probability of outage is given by pou t = Pr(c < R)      SNR ≈ Pr λm < r log2 SNR . log2 1 + nt m

(11.11)

Hence, the outage probability is related to the joint distribution of the eigenvalues of the matrix HH† . The joint distribution of these eigenvalues has a complicated

11.2 Block codes

369

Diversity gain, d(r)

(0, nr nt )

(1, (nr − 1) (nt − 1))

(2, (nr − 2) (nt − 2)) (3, (nr − 3) (nt − 3))

(min (nr , nt ),0)

(r, (nr − r) (nt − r))

0

1

2

3

r

n

Multiplexing gain, r Figure 11.1 Optimal diversity multiplexing trade-off for the MIMO channel.

relationship which results in a complicated analysis of the diversity multiplexing trade-off. The analysis carried out in [362] shows that the optimal diversitymultiplexing trade-off d(r) is a piece-wise linear function between the following points in order: {0, nr nt },{1, (nr − 1) (nt − 1)}, {2, (nr − 2) (nt − 2)}, . . . , {n, (nr − n) (nt − n)} .

(11.12)

This curve is given in Figure 11.1, which illustrates the systematic trade-off between the maximum multiplexing gain of n = min(nr , nt ) and the maximum diversity gain of nr nt . Another observation that is often made is that if the number of antennas at the transmitter and receiver are increased by one, the entire curve shifts to the right by one, increasing the multiplexing gain for a given diversity gain by one.

11.2

Block codes In general, most practical space-time coding schemes encode information over multiple symbols and antennas. Consider a space-time processing system in which the transmitter has nt antennas, the receiver has nr antennas, and a block of length ns is used. The input–output relationship of the system when operating in frequency-flat fading can be represented by the following equation Z = HC + N.

(11.13)

The matrix C ∈ Cn t ×n s represents the transmitted codeword on each of the nt antennas for each sample, H ∈ Cn r ×n t represents the channel coefficients

370

Space-time coding

between the transmitter and receiver antennas, and N ∈ Cn r ×n s is a matrix of additive noise samples. The structure of the codewords C is determined by the coding scheme used. Maximal ratio transmission Maximal ratio transmission is a simple signal processing technique for multipleinput, single-output (MISO) channels where transmissions are encoded over one symbol. It is the transmit-side analog of the spatial matched-filter receiver described in Section 9.2.1, and is also the water filling solution for the MIMO channel described in Section 8.3 with one antenna at the receiver. The main idea behind maximal ratio transmission is that the signals at the transmitter antennas are phased in such a manner that they add coherently at the receiver antenna. Consider a system with nt transmit antennas and a single receive antenna, and the transmission of a single symbol s, that is, ns = 1. Since there is one antenna at the receiver, that is nr = 1, the matrices in Equation (11.13) are either row or column vectors as follows: H = h = (h1 h2 · · · hn t ) ⎛ ⎞ w1 ⎜ w2 ⎟ ⎜ ⎟ C=⎜ . ⎟s . ⎝ . ⎠ wn t

N = n,

where we recall that nt is the number of antennas at the transmitter, h1 , . . . , hn t are the channel coefficients between the antennas of the transmitter and the receiver, and n is a zero-mean, circularly symmetric, Gaussian random variable of variance σ 2 , which represents the noise at the antenna of the receiver. The vector ⎛ ⎞ w1 ⎜ w2 ⎟ ⎜ ⎟ w=⎜ . ⎟ ⎝ .. ⎠ wn t

can be thought of as weights applied to the signals on the antennas of the transmitter. Suppose that the transmitter uses the following w: w=

1 † h . ||h||

(11.14)

Since this w is a unit-norm vector, the transmitted power does not change with h. Additionally, note that this w precompensates for the phase offset introduced by the paths between each transmit antenna and the antenna of the receiver such

11.2 Block codes

371

that the signal adds in phase at the receiver. Equation (11.13) then becomes z = ||h|| s + n.

(11.15)

From (11.2), the probability of error is approximately equal to the probability that ||h||2 SNR is small, that is, 0 a 1 , Pr ||h||2 ≤ SNR for some positive a. If the channel coefficients are independently faded, circularly symmetric, Gaussian random variables, ||h||2 is a real χ2 random variable with 2 nt degrees of freedom or a complex χ2 random variable with nt degrees of freedom as described in Section 3.1.11. The above probability is found using the CDF of χ2 random variables  0 1 a  a 1 = γ nt , Pr ||h||2 ≤ SNR Γ(nt ) 2 SNR   1  a n t 1 ≈ , +o nt 2 SNR SNRn t where the last expression follows from Equation (2.272). Hence, a diversity order of nt is possible with nt transmit antennas.

11.2.1

Alamouti’s code Alamouti’s code, introduced in Reference [8], assumes two transmit antennas and possibly multiple receiver antennas, although we shall focus on the single receiver antenna case here. In other words, we assume nt = 2 and nr = 1. The Alamouti code is implemented over two time slots with two data symbols s1 and s2 . The codeword matrix is constructed using the data symbols and their conjugates as follows,   s1 −s∗2 C= . (11.16) s2 s∗1 Let hj be the channel coefficient between the jth transmit antenna to the antenna of the receiver. Communication takes place over two time slots and the channel matrix H which is a row vector h here, and is assumed to be constant over the two time slots, is defined as follows: h = (h1 h2 ) .

(11.17)

Hence, the received samples at times 1 and 2 are the entries of the row vector z = Z given by z1 = h1 s1 + h2 s2

(11.18)

−h1 s∗2

(11.19)

z2 =

+

h2 s∗1 .

372

Space-time coding

The receiver constructs a new vector w ∈ C2×1 whose first element is z1 and the second is z2∗ . We can then write w as follows:      n1 s1 h 1 h2 . (11.20) + w= n∗2 h∗2 −h∗1 s2 We can thus recover estimates of s1 and s2 by premultiplying w by the following matrix  ∗  1 h2 h1 , (11.21) h∗2 −h∗1 ||h|| which yields the following expression:   ∗     1 sˆ1 h2 s1 h1 h1 h 2 = s2 sˆ2 h∗2 −h∗1 h∗2 −h∗1 ||h||  ∗   1 h2 h1 n1 + h∗2 −h∗1 n∗2 ||h||      s1 0 ||h|| n ˜1 = + n ˜2 s2 0 ||h||     ||h|| s1 n ˜1 = . + n ˜2 ||h|| s2

(11.22)

Note that n ˜ 1 and n ˜ 2 are independent CN (0, 1) random variables since n1 and n∗2 are independent CN (0, 1) variables and the matrix  ∗  1 h2 h1 (11.23) h∗2 −h∗1 ||h|| has orthonormal columns. Hence, two independent data symbols can be transmitted over two time intervals. Since at high SNR the probability of bit error is mainly due to poor fading conditions, the probability of bit error assuming unit transmit power is approximately equal to the probability that ||h||2 is smaller than the inverse of the SNR, that is, 0 a 1 Pr ||h||2 ≤ SNR for some a. If the vector h is a 1 × 2 row vector of independent, circularly symmetric, Gaussian random variables, the norm square of the vector, ||h||2 is distributed as a complex χ2 random variable with two degrees of freedom, or a real χ2 random variable with four degrees of freedom (see Section 3.1.11) and σ 2 = 1. The CDF of a complex χ2 random variable with two degrees of freedom is given in terms of γ(k, x), the lower incomplete gamma function as follows:  x PχC2 (x; 2; 1) = γ 2, 2 1 2 = x + o(x2 ) , 2

11.2 Block codes

373

where the last expression follows from Equation (2.272). Hence the probability of error for the Alamouti code is    a 1 1 ≈ , Pr ||h||2 ≤ + o SNR 2 SNR2 SNR2 which indicates that the diversity order of the Alamouti code is 2. Therefore, the Alamouti code enables one transmission per symbol time and obtains a diversity order of 2. Note that the diversity order of 2 is achieved because the Alamouti code transmits each symbol over each antenna, and so, at high SNR, an error occurs only if the channel coefficients for both antennas are small. Note that this is the same diversity order as that achieved by maximal ratio transmission which was shown previously to achieve a diversity order of nt and one symbol per unit time. Unlike maximal ratio transmission, the Alamouti code however does not require the transmitter to have channel-state information. Transmitter channelstate information requires significant overhead as channel parameters have to be estimated at receivers and then fed back to transmitters.

11.2.2

Orthogonal space-time block codes The Alamouti code is an example of an orthogonal space-time block code. Spacetime block codes, like linear error-correction codes, can be described by a generator matrix G ∈ Cn s ×n t . (Note that the generator matrix here is unrelated to generator functions.) Each row of the generator matrix describes the signals transmitted on each antenna at a given time slot. For instance, consider the following generator matrix ⎞ ⎛ g12 · · · g1n t g11 ⎜ g21 g22 · · · g2n t ⎟ ⎟ ⎜ G=⎜ . (11.24) .. ⎟ . .. ⎝ .. . . ⎠ gn s 1

gn s 2

···

gn s n t

The kth row represents the nt symbols transmitted at time slot k. Hence, the Alamouti space-time code can be described by using the following generator matrix,   s1 s2 , (11.25) G= −s∗2 s∗1

where we recall that the two symbols to be transmitted are s1 and s2 . An orthogonal space-time block code is one in which the generator matrix of the code has orthogonal columns over the transmitted symbols. The orthogonal columns of the generator matrix enable relatively low complexity decoding of the transmitted data, as the following example of the Alamouti code illustrates. For the Alamouti space-time block code in Equation (11.16), observe that the product of the Hermetian transpose of the generator matrix with the generator

374

Space-time coding

matrix is diagonal for any choice of s1 , s2 since   †   ||s1 ||2 + ||s2 ||2 s2 s2 s1 s1 = −s∗2 s1 ∗ −s∗2 s1 ∗ 0

0 2 ||s1 || + ||s2 ||2



.

(11.26)

This property enables simple linear decoding of the space-time code, as shown in the previous section. For a general space-time block code for nt transmitter antennas, let the generator matrix G have the property that G† G =

nt  j=1

|sj |2 I

(11.27)

where the entries of the matrix G are s1 , −s1 , s2 , −s2 , . . . , sn t , −sn t , which represent the transmitted symbols from the nt antennas. If the sj s are real and Equation (11.27) holds, the matrix G is known as a real orthogonal design [305]. Orthogonal designs permit easy maximum-likelihood decoding using only linear operations as described in [305], which also provides some further examples of real orthogonal designs for 4 × 4 and 8 × 8 generator matrices. Note that there are only a small number of real orthogonal designs with all nonzero entries. A 4 × 4 example from Reference [305] is the following: ⎞ ⎛ s1 s2 s3 s4 ⎜ −s2 s1 −s4 s3 ⎟ ⎟. ⎜ (11.28) ⎝ −s3 s4 s1 −s2 ⎠ −s4

−s3

s2

s1

We refer the reader to specialized texts on space-time coding (for example, References [331, 157, 84]) for further examples and analyses of orthogonal block codes. The material in this subsection is based on Reference [157].

11.3

Performance criteria for space-time codes The benefits provided by a space-time code, in terms of the diversity and coding gains, can be systematically analyzed by using codeword difference matrices associated with the space-time coding scheme as introduced in Reference [307], which is the basis for the discussion in this section. The codeword difference matrix between a pair of transmitted codewords Cℓ and Ck is simply the difference between the two codewords Dk ℓ = Cℓ − Ck . Recall that Ck ∈ Cn t ×n s , so Dk ℓ ∈ Cn t ×n s . The codeword difference matrix Dk ℓ is used to bound the probability of a transmitted codeword Ck being erroneously decoded as a codeword Cℓ at the receiver. Assuming that the channels between different pairs of antennas of the transmitter and receiver are independent, identically distributed, circularly symmetric, Gaussian random variables of

11.3 Performance criteria for space-time codes

375

zero mean and unit variance, the error probability can be bounded from above. In order to write this bound, for notational simplicity, define the matrix Ak ℓ as the product of Dk ℓ and its Hermetian transpose as follows, Ak ℓ = Dk ℓ D†k ℓ ,

(11.29)

with λm denoting the mth largest nonzero eigenvalue of the matrix Ak ℓ . Using this notation, the probability of confusing Cℓ with Ck denoted by Pr (Ck → Cℓ ) is bounded from above as follows:     1 rank(A) −n r SNR −rank(A) n r . (11.30) × λm Pr (Ck → Cℓ ) ≤ Πm = 1 4

The derivation of this inequality, which uses bounds for the tail of the Gaussian probability density function and linear algebra properties, can be found in Reference [305]. From the right-hand side of Equation (11.30), the probability of decoding the codeword Ck as Cℓ decays as

1 . (11.31) (A k ℓ ) n r rank SNR Hence, to maximize the diversity order the codewords should be chosen to maximize the rank of the codeword difference matrix Dk ℓ over all distinct k and ℓ. This requirement is known as the rank criterion. Additionally, since the codeword matrices are of dimensions nt ×ns , the largest possible rank of Ak ℓ is min(nt , ns ). Since, in most systems, the latency associated with encoding across ns samples can be large compared to realistic numbers of antennas nt , it is typically the case that nt ≤ ns , which indicates that the maximum possible diversity order for realistic nr × nt MIMO systems is nr nt . Rewriting Equation (11.30) as ⎞ ⎛  rank(A k ℓ ) −n r 1  1 (A ) rank ⎠ Πm = 1 k ℓ λm rank( A k ℓ ) SNR , Pr (Ck → Cℓ ) ≤ ⎝ 4 (11.32)

we can observe that the SNR is scaled by the factor 1   rank(A ) Πm = 1 k ℓ λm rank( A k ℓ ) ,

which means that, with good codewords, one can effectively increase the SNR. Thus, space-time coding can also provide coding gain to the system. To maximize the coding gain of a given space-time code, we need to choose the codewords such that the minimum over all codeword pairs of the quantity in the previous expression is maximized. In the literature, this is known as the determinant criterion since the quantity in the parentheses of the previous expression equals the determinant of Ak ℓ if Ak ℓ is a full-rank matrix. Hence, the rank and determinant criteria can be used to design coding schemes that have the desired diversity order and coding gains. The diversity order is

376

Space-time coding

increased by maximizing the minimum (over all pairs of codewords) rank of the matrix Ak ℓ , which is the product of the codeword difference matrix associated with the kth and ℓth codewords. The coding gain is maximized by maximizing the minimum over all codeword pairs of the determinant of this matrix.

11.4

Space-time trellis codes

11.4.1

Trellis-coded modulation Trellis-coded modulation (TCM) is a technique to combine error-control coding with modulation and typically results in better performance than systems that optimize error-control coding and modulation separately [316, 317]. The coding process is described in terms of a trellis diagram which describes state transitions and output symbols that correspond to input bits. In Figure 11.2, a 4-state trellis from [316] is illustrated. This code uses an 8-PSK constellation whose constellation points are labeled 0, 1, . . . , 7. Two uncoded data bits are mapped into a 3-bit, 8-PSK constellation point. The states are labeled 0, 1, 2, 3. The arcs represent state transitions that occur in response to two input bits. The labels on the arcs represent the constellation points that are transmitted. The particular mapping of arcs to input bit sequences is not important. For this example, suppose that the arcs emanating from each node correspond to input bit pairs 00, 01, 10, and 11 from top to bottom. Starting with state 0, the sequence 11 01 00 is encoded into the constellation points 6, 5, and 2. The state transitions that occur are 0 → 1, 1 → 2, and 2 → 1. The dashed, dotted, and bold lines represent the transitions due to the first, second, and third bit pairs, respectively, in Figure 11.2. At the receiver, maximum-likelihood decoding of the sequence of transmitted symbols could be performed based on the received symbols. The maximumlikelihood decoder finds the path along the trellis that is most likely given the sequence of received symbols. Since the maximum-likelihood decoder is typically computationally expensive, it is common to use an approximation to the maximum-likelihood decoder such as the Viterbi decoder outlined in Section 11.4.2.

11.4.2

Space-time trellis coding Tarokh et al. [307] proposed using trellis coding in the context of space-time coding. They present several different trellis diagrams for systems with nt transmit antennas, nr receiver antennas, and corresponding QPSK and 8-PSK constellations, which are shown in Figures 11.3 and 11.4, respectively. The trellis codes they specify include symbols that are to be transmitted on each antenna. For instance, in Figure 11.5, a code for two transmit antennas with QPSK and four states originally proposed in Reference [307] is illustrated. The state labels correspond to the QPSK symbols to be transmitted on each antenna as a result of

11.4 Space-time trellis codes

377

0 4

0

0

2

6 2

6

1

1 0

4

States

States 5 2

1 2

3 3

7

1

3

7

3

5 Allowed transitions Figure 11.2 8-PSK trellis diagrams that shows the states, and allowed transitions from

left to right. The labels on the arcs represent the transmit symbol corresponding the state transition.

each sequence of bits. For instance in the third state and if the data bits are 01, symbols 2 and 1 are transmitted from antennas 1 and 2, respectively, and the next state will be state 2. This idea can be generalized to the scenario of a larger numbers of transmit antennas where each arc in the trellis diagram corresponds to a codeword q1 q2 · · · qn t whereby qi is the constellation point transmitted on the ith antenna. In addition to being described by a trellis, space-time trellis codes can also be described in terms of an input–output relationship [307]. Consider a 2M ary phase-shift-keying transmission where the M bits associated with the tth transmit symbol are b1t , b2t , . . . , bM t . Let the output at time t at each of the nt antennas be contained in a vector xt ∈ Cn t ×1 . xt can be expressed as xt =

M K m −1 

bm (t−k ) cm k ,

(11.33)

m =1 k=0

where cm k are coefficient vectors whose entries are in {0, 1, . . . , 2M − 1} and Km is the memory depth of the encoder associated with the mth bit of the symbol. Note that the summation in Equation (11.33) is done modulo 2M .

378

Space-time coding

1

1

0.5 0

0

2

−0.5 3

−1 −1

−0.5

0

0.5

1

Figure 11.3 QPSK constellation diagram and labelings for Tarokh space-time trellis

code.

The space-time trellis code described in Figure 11.5 can be written in the form of Equation (11.33) as follows:         2 0 0 1 + b1(t−1) , (11.34) + b1t + b2t xt = b2(t−1) 0 1 2 0 where the addition is performed modulo 4. Another example from Reference [307] is an 8-PSK code given by       4 2 5 + b2(t−1) + b1(t−1) xt = b3(t−1) 0 0 0       0 0 0 + b3t , + b1t + b2t 1 2 4

(11.35)

where the addition is done modulo 8. The receiver uses a Viterbi decoder [327] to estimate the maximum-likelihood transmitted signal over the length of the space-time code. The decision metric that is used in the Viterbi decoder at a given symbol time t is the following: $ $2 nr $ nt $   $ $ h k ℓ qk t $ , (11.36) $yℓt − $ $ ℓ= 1

k=1

where yℓt is the received symbol at antenna ℓ at time t, qk t is the transmitted symbol from antenna k at time t, and hk ℓ is the channel coefficient between transmit antenna k and receive antenna ℓ. Observe that the second summation corresponds to the received symbol at antenna ℓ at time t in the absence of noise.

11.4 Space-time trellis codes

379

1

2

0.8

1

3

0.6 0.4 0.2

4

0

0

−0.2 −0.4 −0.6

5

−0.8

7 6

−1 −1

−0.5

0

0.5

1

Figure 11.4 8-PSK constellation diagram and labelings for Tarokh space-time trellis

code.

Hence, the Viterbi decoder computes the path through the trellis with the lowest accumulated decision metric. Note that the analysis bounding the probability of error which led to (11.30) still holds in this case. The codewords correspond to the different valid sequences through the trellis. Hence, the rank and determinant criteria developed in Section 11.3 can also be used in the context of space-time trellis codes. Chen et al. [57, 58] proposed a different criteria to be used in designing spacetime trellis codes when the product of the number of transmitter and receiver antennas is moderately high (nr nt > 3). They use a central-limit-theorem-based argument to show that the probability of error associated with interpreting a codeword Ck as Cℓ can be bounded from above in the limit as nr → ∞:   nt  1 1 (11.37) λi , lim Pr (Ck → Cℓ ) ≤ exp − nr SNR n r →∞ 4 4 i= 1 where the ith squared singular value of the codeword difference matrix λi is as defined in Section 11.3. Thus, when the number of receiver antennas is large, maximizing the trace of the matrix Ak ℓ should result in a smaller probability of error. This criterion is introduced as the trace criteria by Chen et al. and is used as a design tool to identify space-time trellis codes with low probabilities of error [57, 58].

380

Space-time coding

sym. 0 on antenna 1, sym. 0 on antenna 2 sym. 0 on antenna 1, sym. 1 on antenna 2 sym. 0 on antenna 1, sym. 2 on antenna 2 sym. 0 on antenna 1, sym. 3 on antenna 2 00 01 02 03

Branch labels

10 11 12 13 20 21 22 23 30 31 32 33

Figure 11.5 Space-time trellis code for two transmit antennas with 4-PSK and two bits

per symbol.

Using this scheme, Chen et al. provided several different space-time trellis codes for QPSK and 8-PSK systems in [58]. One that is equal in complexity as Tarokh’s code from Figure 11.5 and Equation (11.34) is given by the following equation, where addition is done modulo 4,         2 0 1 2 xt = b2(t−1) . (11.38) + b1t + b1(t−1) + b2t 3 2 2 0 This code is shown to outperform the Tarokh code by approximately 2.1dB when the number of antennas at the receiver nr = 4. That is to say, the probability of error achievable with the code in Equation (11.38) is equal to the probability of error achievable with the code in Equation (11.34) with approximately 2.1 dB higher SNR. For nr = 1, however, the two codes are comparable which is unsurprising since the Tarokh code was primarily designed for one receiver antenna. The following 8-PSK space-time trellis code was given by Chen et al. [58] and has a complexity comparable to the code given in Equation (11.35),       3 2 4 + b2(t−1) + b1(t−1) xt = b3(t−1) 4 0 0       0 4 2 + b3t , (11.39) + b1t + b2t 4 6 1 where the addition is done modulo 8. The performance of this code is comparable to the code given in Equation (11.35) with nr = 1. For nr = 4, however, this code is better by approximately 1.7 dB. Note that the 1.7 dB number is obtained based on an i.i.d., circularly symmetric Gaussian channel model with constant

11.5 Bit-interleaved coded modulation

Binary Coding

Interleaving

381

Binary Labeling

Figure 11.6 Bit-interleaved coded modulation block diagram. The jth data bit is

denoted by bj with the jth interleaved bit given by bj . The transmit symbol of the kth antenna is denoted by xk .

fading across a given block. We refer the reader to Reference [58] for details and additional space-time codes that were found using the trace criteria.

11.5

Bit-interleaved coded modulation

11.5.1

Single-antenna bit-interleaved coded modulation Zehavi introduced the idea of bit-interleaved coded modulation (BICM) as a technique to improve the performance of code diversity over fading channels through bit-wise interleaving in Reference [360]. Detailed analyses and performance criteria for BICM can also be found in Reference [90]. The basic idea behind bit-interleaved coded modulation is to spread the bits corresponding to a codeword across multiple transmit symbols, and in the case of channels fading over time, multiple channel realizations. The spreading of the bits is accomplished by rearranging the sequence of modulated bits using an interleaver with some predetermined (typically pseudorandom) sequence. The basic structure of a bit-interleaved coded modulator is shown in Figure 11.6. A block of n data bits is first encoded using some binary error-control code of rate R = n/N to produce a codeword of length N bits. The N bits of the codeword produced by the binary encoder are permuted using some permutation function π. The bits in the permuted codeword are then broken into groups of M bits and mapped into constellation points corresponding to some 2M -ary QAM constellation. With sufficient interleaving depth, each transmitted symbol would have contributions from bits from m different codewords. Furthermore, the bits of a given modulated symbol will be effectively uncorrelated as they will belong to different codewords. If a symbol error occurs at the decoder, the resulting erroneous bits will be spread over several codewords. Hence, each codeword will have a small number of errors increasing the chances that the errors can be corrected. In fastfading channels where channel realizations change rapidly over time, interleaving

382

Space-time coding

1 Modulation

bj

Interleaving

cj

Serial to parallel

Binary Coding

Modulation

x1

x2

x Modulation

2

nT T

Figure 11.7 Multiantenna bit-interleaved coded modulation block diagram.

provides a diversity benefit whereby the bits corresponding to a particular codeword are transmitted through multiple channel realizations.

11.5.2

Multiantenna bit-interleaved coded modulation The extension of bit-interleaved coded modulation to multiantenna systems was introduced in Reference [46]. Results from physical implementations can be found in Reference [38]. The discussion in this subsection is based on these works. For multiantenna bit-interleaved coded modulation, the coded bits are interleaved as in the SISO case and the resulting bits are parallelized and then modulated for transmission on each antenna as shown in Figure 11.7. Note that the binary encoder takes a string of input data bits and produces a string of coded bits bj that are then permuted using an interleaver to produce a string of bits cj . Note that cj = bk for some k that is determined by the interleaver and the mapping from the coded bits bj to the interleaved sequence of coded bits cj is known at the receiver. Blocks of m nt bits from the interleaver are then parallelized into nt symbols of m bits each, which are then modulated and transmitted in parallel on the nt antennas. The modulations thus use constellations of size 2m . The transmitted signal on antenna j for a given symbol time is represented by xj . Note that we have suppressed the time index for notational simplicity. Since the bits for a given codeword are spread by the interleaver and transmitted over multiple antennas, each codeword contains bits that were transmitted through multiple channel realizations, one for each antenna and one for each coherence interval (duration for which channel parameters remain constant) over which the coding/decoding takes place. Thus, spatial and time diversity can be obtained.

11.5 Bit-interleaved coded modulation

383

1

2 y2

nR ynR

Log-Likelihood Ratio Computation

y1

(cj)

De-interleaver

(bj)

Binary Decoder

Figure 11.8 Multiantenna bit-interleaved coded modulation receiver. The received

symbols on the kth antenna are denoted by yk with the log-likelihood ratio for the bit cj given by λ(cj ). The log-likelihoods corresponding to the jth deinterleaved bit, bj are denoted by λ(bj ).

A general receiver structure for a bit-interleaved coded modulation system is shown in Figure 11.8. In general, we assume that the receiver performs some variant of maximum-likelihood decoding. As such, appropriate log-likelihood ratios (which are sufficient statistics) are all that are required for decoding. To illustrate the operation of the receiver, consider the transmission of the bits c1 , . . . , cm n t , which we represent in a vector c for simplicity. Suppose that these bits are encoded into nt transmit symbols corresponding to a given symbol time that we represent by the vector x ∈ Cn t ×1 where ⎞ ⎛ x1 ⎜ x2 ⎟ ⎟ ⎜ x = ⎜ . ⎟. ⎝ .. ⎠ xn t

Let the mapping between the m nt bits to constellation points be denoted by ϕ whereby x = ϕ(c). The sampled received signals from the antennas of the receiver are contained in the vector y where ⎛ ⎞ y1 ⎜ y2 ⎟ ⎜ ⎟ y = ⎜ . ⎟ = H x + w. ⎝ .. ⎠ yn r

The matrix H ∈ Cn r ×n t contains the narrowband fading coefficient between transmit antenna k and receiver antenna j in its jkth entry, which we denote

384

Space-time coding

by hj k , and the vector w ∈ Cn r ×1 contains i.i.d. circularly symmetric Gaussian random variables of variance σ 2 , that is w ∼ CN (0, σ 2 I). The receiver computes a log-likelihood ratio of ck k = 1, . . . , m nt based on the received samples y. That is to say, for each ck , the receiver computes the log-likelihood ratio λ(.),   Pr (ck = 1|y) . λ(ck ) = log Pr (ck = 0|y) Assuming that the transmit symbols are equally likely, and the entries of y are conditionally independent if x is known, using some probabilistic manipulations, we can write ⎞  ⎛" ||y−H x||2 exp − 2 x∈Xk 1 σ ⎠ ,  λ(ck ) = log ⎝ " ||y−H x||2 exp − 2 x∈Xk 0 σ

where Xk 0 is the set of x corresponding to all combinations of c1 , . . . , cn t for which ck = 0, and Xk 1 is the set of x corresponding to all combinations of c1 , . . . , cn t for which ck = 1. That is to say: Xk 0 := {x : ck = 0 and x = φ(c)}

Xk 1 := {x : ck = 1 and x = φ(c)} , where the := symbol denotes a definition. The de-interleaver reorders the likelihood ratios according to the inverse of the mapping used to order bk to ck . Hence, λ(bk ) is the likelihood ratio of bit bk , which was computed based on the symbol time corresponding to the transmission of bk . The likelihood ratios can then be used by the binary decoder to estimate the transmitted bits.

11.5.3

Space-time turbo codes As with other forms of codes, turbo codes can also be used in the context of space-time coding [292]. Turbo codes are forward-error-correction codes, originally introduced for SISO channels in Reference [19], that perform close to the Shannon capacity. The general idea of a turbo code in a SISO channel is that the block of information bits is encoded by using two separate encoders. The input bits to one of the encoders are an interleaved version of the input bits to the other encoder. At the receiver, iterative decoding is performed where the bits corresponding to one of the encoders are decoded using soft decisions; that is, the decoder specifies the probability that a given bit is a 1. The decoder then uses these probabilities to decode the bits from the other encoder, yielding more soft decisions, which are in turn used to decode the bits from the first encoder. This process is iterated a number of times. The transmit side of space-time turbo-coded systems is identical in principle to Figure 11.7 with the binary encoder implementing a turbo code. Note that the interleaver in the encoder of Figure 11.7 is not the same as the interleaver

11.6 Direct modulation

385

used in turbo coding. In general, a different interleaver is used in the turbo encoder/decoder. The receiver side may be described as in Figure 11.8 with the turbo decoder taking the place of the binary decoder. Note that alternative strategies for decoding turbo space-time codes that make use of iterations of estimated prior probabilities for cj have also been proposed in works such as [46]. The interested reader should consult a text on turbo coding for space-time systems such as [136] for further details.

11.6

Direct modulation Linear block codes can be used to perform effective space-time coding by direct modulation of the encoder output [207]. Consider a system with nt transmit antennas and a constellation of size M , that is to say, an M -ary symbol is transmitted through each antenna. Since a constellation point can represent m = log2 M bits of information, there are nt m bits of information that can be represented using m-bit constellations on nt antennas. Hence, we can construct a “space-time symbol” that can represent nt m bits of information. Now suppose that the block code is defined over a finite field (Galois field) of order q. Each symbol in a codeword can be represented by log2 q bits and if log2 q = nt m q = 2n t m , we can directly map each codeword symbol into a space-time symbol. Perhaps the best possible error-control codes for a direct-modulation coding scheme are low-density parity-check codes (LDPC), which are a class of capacityachieving, linear, block codes. These codes are defined by their parity-check matrices C, which have a certain structure, and comprise mostly zero-entries, hence the term low-density. Using posterior probability decoding algorithms, these codes have been shown to approach the Shannon capacity to fractions of a decibel in signal-to-noise ratio. We refer the reader to standard texts on error control coding such as Reference [191] for further details. Low-density parity-check codes with direct modulation can perform close to the outage capacity of MIMO links as illustrated in Figure 11.11 in which a low-density parity-check code over GF (256) performs close to the ideal outage capacity for 4×4 and 2×2 MIMO systems. The main drawback of this technique is that it is computationally intensive as the receiver must perform decoding over a large finite field. One notable cost is associated with the fact that simple likelihood ratios are not sufficient for the iterative decoding process since each codeword symbol can take one of q possible values. Using these codes and Bayesian belief networks for decoding, Margetts et al., in Reference [207], find that this technique consistently outperforms the space-time trellis codes proposed by Chen et al. [57] and with a computational complexity of O(ns q log q), where ns is the number of symbols per decoding block.

386

Space-time coding

11.7

Universal codes El-Gamal and Damen [86] introduced a framework for creating full-rate, and full diversity, space-time codes out of underlying algebraic SISO codes. The general approach is to divide a space-time codeword into threads over which single-input single-output codes are used. Each thread is associated with a SISO codeword and defines the antenna and time slot over which symbols corresponding to the codeword are transmitted. Consider a space-time system with nt transmit antennas with a codeword spanning ns time slots. A thread refers to a set of pairs of antenna and time slot indices such that all antennas (numbered 1, 2, . . . nt ) and time slots (numbered 1, 2, . . . , ns ) appear, no two threads have identical antenna and time slot pairs, and the nt antennas appear an equal number of times in each thread. These requirements ensure that each thread is active during every time slot, each thread uses all of the antennas equally, and at any give time slot, there is at most one thread active for each antenna. A simple example of a thread for a system with ns = nt , which we denote by ℓ1 , is ℓ1 = {(1, 1), (2, 2), (3, 3), . . . , (nt , nt )} ,

(11.40)

where the kth antenna is used at the kth time slot. From [86], by offsetting the antenna indices and incrementing them modulo nt , we arrive at the following generalization to the above thread for 1 ≤ j ≤ L ≤ nt : ℓj = {(mod(j, nt ) + 1, 1), (mod(j + 1, nt ) + 1, 2), . . . , (mod(j + ns , nt ) + 1, ns )} . As an example, consider a system with nt = ns = L = 4. The threads given by the expression above are shown in Figure 11.9. In conjunction with this framework, Reference [86] introduces a new class of space-time codes called threaded-algebraic space-time codes (TAST), for which an example is provided which achieves full diversity. The space-time codewords for this class of space-time codes are generated as follows. Suppose that there are K information-bearing symbols (for example, bits) in a vector u. For simplicity, let’s assume that L/K is an integer. Partition the entries of u into L length L/K vectors uj for j = 1, 2, . . . , L (note that this can be generalized to nonequal uj ). A SISO code that achieves full diversity (called a component code) is used to encode the vector uj into a length ns vector sj . The component codes could in general be different for each thread. If the component codes are the same on each thread, the threaded space-time code is called symmetric. Each codeword sj is multiplied with a coefficient φj . Multiplication by these coefficients enable the codewords to occupy independent algebraic subspaces. The coefficients φj are defined as 1

φ1 = 1, φ2 = φ n t , . . . , φL = φ

L −1 nt

.

11.7 Universal codes

TS. 1

TS. 2

TS. 3

TS. 4

Ant. 1

1

4

3

2

Ant. 2

2

1

4

3

Ant. 3

3

2

1

4

Ant. 4

4

3

2

1

387

Figure 11.9 Threads for universal space-time coding with four antennas, four time

slots and four threads. The horizontal and vertical dimensions represent the time slots and antennas respectively. The numbers correspond to the threads. For illustration, the antenna and time-slot pairs corresponding to thread three are shaded.

A space-time formatter is used to assign the codeword φj sj to the jth thread, which determines the sequence of antenna and time-slot pairs over which the samples of the vector φj sj are transmitted. The number φ is carefully chosen and depends on the structure of the component codes that are used. Several examples of codes and corresponding φs are presented in [86]. These φj lie in linearly independent algebraic subspaces and, hence, φj sj and φk sk for k = j lie in different algebraic subspaces and essentially do not interfere with each other in the appropriate algebraic space. As is shown in [86], one can find values of φ to achieve full diversity. Additionally, by choosing the number of threads L equal to the minimum of the number of antennas at the transmitter or receiver, that is, L = min(nt , nr ), it is shown in Reference [86] that full rate can be achieved. Furthermore, with this choice of the number of threads L, maximum-likelihood decoding can be performed using a sphere decoder with polynomial complexity for moderate SNRs. We refer the interested reader to the original source, that is Reference [86] or a text specializing in space-time coding such as Reference [157] for details. D-BLAST The Bell Labs layered space-time (BLAST) architecture is a family of space-time transmission schemes that were developed at Bell Labs. These schemes can be described by using the universal space-time coding framework discussed in the previous section. Diagonal-BLAST (D-BLAST) [99] is the first of these. The D-BLAST scheme uses a diagonal threading scheme as depicted in Figure 11.10 with coefficients φ1 = φ2 = · · · = φL = 1, where L is the number of threads. Each thread is at least nt symbols long, and, hence, is transmitted through all antennas. Since each thread includes transmissions over each of the

388

Space-time coding

TS. 1 TS. 2 TS. 3 TS. 4 Ant. 1 Ant. 2 Ant. 3 Ant. 4

1

TS. 5

2

3

1

2

3

1

2

3

1

2

3

Unused spacetime slots Figure 11.10 Antenna/time-slot assignments for Diagonal-BLAST. Three threads are

shown in this example. The shaded regions correspond to unused time/antenna combinations.

nt transmitter antennas and each symbol is received by nr receiver antennas, with proper coding the full diversity of nt nr is achievable by using D-BLAST. D-BLAST requires computationally intensive decoding as the receiver has to perform joint maximum-likelihood decoding of all the streams. A simpler method of decoding is to use vertically aligned layers in a method known as Vertical-BLAST (V-BLAST) [353, 114]. Note that V-BLAST reduces to simply transmitting independent data on each antenna, that is to say, the codewords are not transmitted across multiple antennas.

11.8

Performance comparisons of space-time codes For single-carrier systems, it is convenient to compare the performance of different space-time codes in terms of their distance from the outage capacity. In Figure 11.11, various space-time codes are compared to outage capacities assuming a 90% probability of closure. As an example, the 2 × 2 Alamouti space-time code sits a little under 6 dB off the 2 × 2 outage capacity.1

11.9

Computations versus performance Since maximum-likelihood decoding of space-time codes is computationally prohibitive, it is common practice to use suboptimal space-time coding schemes, 1

These results are courtesy of Adam Margetts and Nicholas Chang.

11.9 Computations versus performance

389

5 4 ´ 4 Achievable Rate 2 ´ 2 Achievable Rate

4.5

Throughput (bps/Hz)

4 3.5 3

4 ´ 4 BICM−ID, 4 iter.

4 ´ 4 Direct GF(256) LDPC

4 ´ 4 BICM−ID, 1 iter.

2.5

2´2 Alamouti

2 4´4 64 State STTC

1.5 1

2 ´ 2 Direct GF(16) LDPC

2´2 64 State STTC

0.5 0 −5

0

5

10

SNR per Receive Antenna (dB) Figure 11.11 Outage capacity for a 2 × 2 and 4 × 4 MIMO channel as a function of

average SNR per receive antenna (a2 Po ), assuming a 90% probability of link closure (10% outage). The performance of various space-time codes is compared: Alamouti’s code, bit-interleaved coded modulation (BICM), a 64-state space-time trellis code, and direct modulation Galois field low-density parity-check (LDPC) space-time codes with 16 and 256 symbols. Courtesy of Adam Margetts and Nicholas Chang.

which are computationally efficient but may have suboptimal performance. A useful method to compare the performance of suboptimal space-time coding schemes is by using a metric called the excess SNR, which was introduced in [55]. The excess SNR is defined as the additional SNR required for a given suboptimal coding scheme to achieve the same frame-error-rate (FER) as the optimal coding scheme. Thus, the excess SNR for a given coding scheme characterizes the amount of additional transmit power required for that coding scheme to achieve a frame error rate equal to the optimal coding scheme. In Figure 11.11, the excess SNR corresponds to the difference in decibels between the specific coding schemes (for example, 4 × 4 bit-interleaved coded modulation) and the related system bound. In Figure 11.12, the excess SNR versus computational complexity for various space-time coding schemes is shown [55]. The vertical axis illustrates the gap between the ideal SNR required to achieve 10% outage and the SNR required to achieve the same outage probability for a particular coding scheme. The horizontal axis indicates the number of floating-point operations required per

Space-time coding

Excess SNR versus Complexity 6 2´1 Excess SNR (dB) at 10% Outage

390

4−State STTC 8−State STTC 16−State STTC 32−State STTC 64−State STTC BICM LDPC GF(2) Direct LDPC GF(256)

2´2

5

4

3 4´4

4´1

2

1

0 10

2

3

4

10 10 Floating Point Operations per Information Bit

10

5

Figure 11.12 Excess SNR versus computational complexity for space-time trellis codes,

bit-interleaved coded modulation, and direct modulation. The code used to generate this figure is courtesy of Adam Margetts and Nicholas Chang.

information bit. Note that the direct LDPC modulation using GF (256) can achieve an excess SNR of lower than 1 dB. This near-optimal performance comes at the significant computational cost of approximately 8 × 104 floating point operations per information bit.

Problems 11.1 Using Monte Carlo simulations to generate channel matrices H, please plot the empirical outage probability versus r for a system with two transmit and receive antennas, nr = nt = 2, utilizing Equation (11.11). You may use and SNR of 10 dB. 11.2 Evaluate the diversity order of a SIMO system in a Rayleigh-faded additivewhite Gaussian noise channel when the receiver uses a spatial matched-filter receiver described in Section 9.2.1. 11.3 Consider a MIMO system with nt = nr = 2 antennas at the transmitter and receiver. Assume that the Alamouti scheme is used to encode transmissions and let hj k denote the channel coefficient between the jth transmitter antenna and the kth receiver antenna. Additionally, let zj k be the sampled received signal on the jth antenna at the kth time slot and write the following, ∗ ∗ + h∗21 z21 + h∗22 z22 sˆ1 = h∗11 z11 + h∗12 z12 ∗ ∗ sˆ2 = h∗12 z11 − h∗11 z12 + h∗22 z21 − h∗21 z22 .

Problems

391

By comparing the equations above with that of a SIMO system with four receiver antennas, show that a diversity order of 4 is achievable with the Alamouti scheme and two receiver antennas. 11.4 Consider a space-time coding system with nt = 2 transmit antennas, nr = 2 receive antennas, and coding preformed over ns = 2 symbol times. Let the codewords be as follows:     1 −j 1 1 , , C2 = C1 = j j 1 1     1 − j −j 1 1+j C3 = . (11.41) , C4 = 1−j j 1 1−j Using the determinant criteria, find the maximum diversity gain achievable using this space-time code. 11.5 Using the constellation diagram in Figure 11.3 and the space-time trellis code given in Figure 11.5, please list the transmitted symbols from each transmit antenna due to the following sequence of bits: 10 11 11 01 10. You should start at state zero. 11.6 Use the determinant criteria to compute the diversity gain of the Alamouti code with nr receiver antennas. 11.7 Perform a Monte Carlo simulation of an Alamouti space-time coding system with Quadrature-Phase-Shift-Keyeing (QPSK) symbols and single-antenna receivers. Show that the diversity order is what you expect by plotting the logarithm of the error probability at high SNR. 11.8 Show that the 4×4 space-time block code described by the real orthogonal generator matrix in (11.28) has full diversity. 11.9 Explain why the requirement that all antennas are used for any given thread in the universal space-time code framework described in Section 11.7 results in full diversity gain.

12 2 × 2 Network

12.1

Introduction In this chapter, we analyze the performance of networks with two multiantenna transmit nodes and two multiantenna receive nodes. The canonical 2×2 network is illustrated in Figure 12.1. Transmitter 1, equipped with nt1 antennas, wishes to communicate with receiver 1 which has nr 1 antennas, and transmitter 2, equipped with nt2 antennas, wishes to communicate with receiver 2, which has nr 2 antennas. The signal from transmitter 1 acts as interference to receiver 2 and vice versa. Even for this simple network, fundamental capacity results are still unknown. For instance, the capacity region of the 2 × 2 network even in the SISO case under general assumptions is unknown. For certain special cases, it is possible to derive the capacity of such channels, in particular when the interfering signals are strong such as in References [52, 272] and [282]. Most works in the literature have focused on deriving outer bounds to the capacity region such as References [177, 224, 10], and [89] for SISO systems, and References [243] and [281] for MIMO systems. Achievable rates of such networks under different sets of assumptions, such as in References[135, 271, 59] and [283], have also been found. Additionally, in Reference [89], the capacity region of the SISO Gaussian interference channel is derived to within one bit/second/Hz using an achievable rate region based on the Han–Kobayashi scheme introduced in Reference [135], and on novel outer bounds. Recently, the interference channel has been analyzed in the high SNR regime where interference-alignment introduced in Reference [50] has been shown to provide enormous network-wide performance improvements. The 2 × 2 interference channel can also be analyzed in the context of a cognitive radio link whereby one of the links, designated as the cognitive link needs to operate without disrupting an existing legacy link. The 2 × 2 network is useful in the context of larger networks as well. Since the overhead associated with cooperation, particularly for multiantenna networks, can be quite significant, real-world implementations of multiantenna networks will likely have cooperation between only a limited number of nodes. Additionally, it is worth noting that the simpler 2 × 1 and 1 × 2 Gaussian channels are described in the context of broadcast and multiple-access channels in Chapter 13.

12.2 Achievable rates of the 2 × 2 MIMO network

Transmitter 1

Receiver 1

Transmitter 2

Receiver 2

393

Figure 12.1 2 × 2 MIMO channel. Solid arrows indicate signal paths and dashed

arrows indicate interference paths.

The most common approach to analyzing the capacity of communication systems is to first find an upper bound to the capacity and show that the upper bound is achievable. As of this writing, achievable upper bounds to the capacity region of the interference channel for a general set of parameters are not known, but upper bounds that are achievable to within one bit are known through the findings of Etkin et al. in Reference [89]. A thorough treatment of this subject requires detailed information-theoretic arguments that are beyond the scope of this text. However, we summarize some general techniques used for finding the upper bound to the capacity region of the 2 × 2 interference network in this section and leave the motivated reader to consult a text on information theory such as Reference [68] for the details. We start by discussing the achievable rates of the 2×2 MIMO network followed by discussing upper bounds to the capacity region. We then analyze the 2 × 2 network in the cognitive-radio context whereby one of the links is assumed to be a legacy link and the other, a cognitive link which is not allowed to disrupt the legacy link.

12.2

Achievable rates of the 2 × 2 MIMO network

12.2.1

Single-antenna Gaussian interference channel The general capacity region of the 2 × 2 network with single-antenna nodes has been an open problem for a long time in the field of information theory [68]. When the noise at each receiver is Gaussian, the 2 × 2 network is known as the

394

2 × 2 Network

Gaussian interference channel. The sampled received signal at receivers 1 and 2 of a narrowband, flat-fading, Gaussian interference channel can be respectively represented as follows: z1 = h11 s1 + h21 s2 + n1 z2 = h22 s2 + h12 s1 + n2 , where hj k is the channel between transmitter j and receiver k, nj are CN (0, σ 2 ) random variables representing noise, and sj is the transmitted sample of transmitter j. The Gaussian interference channel assumes that the data transmitted by the jth transmitter are intended for the jth receiver only. Let the communication rate between transmitter j and receiver j be represented by Rj . The capacity region of this network is the set of rate pairs (R1 , R2 ) for which communication at arbitrarily low probability of error is possible, subject to the power % & constraints1 |sj |2 = Pj and Pj ≤ P , for j = 1, 2.

Han–Kobayashi scheme The Han–Kobayashi scheme is known to achieve rates within 1 b/s/Hz of the capacity region of the Gaussian interference channel as shown in Reference [89]. The basic idea behind this scheme is that each transmitter partitions its data into two separate streams, a common or public stream that is intended to be decoded by both receivers, and a private stream that is intended to be decoded by just the target receiver. By dividing the transmit power (and hence data rates) appropriately between common and private streams, partial interference cancellation can be performed by each receiver. Suppose that the powers allocated by the jth transmitter to its private and common streams are Ppj and Pcj , and the rates of the private and common streams of link-j are Rpj and Rcj , respectively. Additionally, suppose that the jth private and common symbols are spj and scj and the jth transmitter transmits spj + scj . The sampled signals at receivers 1 and 2 are thus z1 = h11 sc1 + h11 sp1 + h21 sc2 + h21 sp2 + n1

(12.1)

z2 = h22 sc2 + h22 sp2 + h12 sc1 + h12 sp1 + n2 .

(12.2)

The jth receiver decodes the common streams from both transmitters and subtracts the contribution of the common streams before decoding the private stream from the jth transmitter, treating the private stream from transmitter k = j as noise. Thus, the data rates on the private streams must satisfy   Pp1 |h11 |2 Rp1 < log2 1 + (12.3) Pp2 |h21 |2 + σ 2   Pp2 |h22 |2 . (12.4) Rp2 < log2 1 + Pp1 |h12 |2 + σ 2 1

For expedience, we break with our own convention and use | · | to represent the absolute value in this chapter.

12.2 Achievable rates of the 2 × 2 MIMO network

395

Rc2

log 2 1 +

log 2 1 +

Pp1 |h11

Pc2 |h21 |2 + Pp2 |h21 |2 + σ2

log2 1 +

|2

Pc2 |h22 |2 Pp2 |h22 |2 + Pp1 |h12 |2 + σ2

Pp2 |h22

Pc1 |h12 |2 + Pp1 |h12 |2 + σ2

|2

Rc1

2

log 2

1+

Pc1 |h11 | Pc2 |h21 |2 + Pp1 |h11 |2 + Pp2 |h21 |2 + σ2 log 2

1+

Pp1 |h11

Pc1 |h11 |2 + Pp2 |h21 |2 + σ2

|2

log 2

1+

Pc1 |h12 |2 Pc2 |h22 |2 + Pp2 |h22 |2 + Pp1 |h12 |2 + σ2

Figure 12.2 Rate region of common streams for Han–Kobayashi system, case 1.

The common streams are to be decoded by both receivers. For a given receiver, we can model the common streams as a channel between the two transmitters and the given receiver, with the private streams treated as noise. Hence we can model this portion of the system as a multiple-access channel, for which the capacity region is known and discussed in more detail in Section 13.2. Thus, the common rates Rc1 , Rc2 must fall into the intersection of two multiple-access channel capacity regions, each of which is a pentagon. The intersection of the two pentagons can take several different forms depending on the parameters of the system. The possible intersections (excluding cases where one capacity region is a subset of the other, and cases that can be constructed by reversing the roles of the transmit–receive pairs), are illustrated in Figures 12.2 to 12.4. For the common messages to be decoded with arbitrary low probability of error by both receivers, the set of common rates Rc1 , Rc2 must belong to the intersection of the two pentagons illustrated using the solid and bold lines. The three figures represent the different ways that the two multiple-access channels can intersect. The achievable rate region using the Han–Kobayashi scheme is the union over all valid power allocations of all rate pairs (R1 = Rp1 + Rc1 , R2 = Rp2 + Rc2 ) for which (Rc1 , Rc2 ) fall into one of the rate regions above, with the private rates satisfying inequalities (12.3) and (12.4).

396

2 × 2 Network

Rc2 Pc2 |h22 |2 Pp2 |h22 |2 + Pp1 |h12 |2 + σ2

log2 1 +

log2 1 +

Pp1 |h11

log2 1 +

Pc1 |h12 |2 Pp2 |h22 |2 + Pp1 |h12 |2 + σ2

Pc2 |h21 |2 + Pp2 |h21 |2 + σ2

|2

Rc1 2

log2 1 +

Pc1 |h11 | Pc2 |h21 |2 + Pp1 |h11 |2 + Pp2 |h21 |2 + σ2

log2

1+

1+

log2

Pc1 |h11 |2 Pp1 |h11 |2 + Pp2 |h21 |2 + σ2

Pc1 |h12 |2 Pc2 |h22 |2 + Pp2 |h22 |2 + Pp1 |h12 |2 + σ2

Figure 12.3 Rate region of common streams for Han–Kobayashi system, case 2.

Rc2

log 2

log2

1+

1+

Pp1 |h11

Pc2 |h21 |2 + Pp2 |h21 |2 + σ2

log 2

|2

1+

Pc1 |h12 |2 Pp2 |h22 |2 + Pp1 |h12 |2 + σ2

Pc2 |h22 |2 Pp2 |h22 |2 + Pp1 |h12 |2 + σ2

Rc1 2

log2 1 +

Pc1 |h11 | Pc2 |h21 |2 + Pp1 |h11 |2 + Pp2 |h21 |2 + σ2 log2

1+

Pp1 |h11

Pc1 |h11 |2 + Pp2 |h21 |2 + σ2

|2

log2

1+

Pc2 |h22

|2

Pc1 |h12 |2 + Pp2 |h22 |2 + Pp1 |h12 |2 + σ2

Figure 12.4 Rate region of common streams for Han–Kobayashi system, case 3.

Note that in the example given above, we have illustrated a successive decoding scheme where the common and private messages are decoded in a particular sequence. In general, however, better performance (that is larger achievable rates) could be obtained by joint decoding of the common and private streams. We refer the reader to references such as Reference [59] which provides a relatively

12.2 Achievable rates of the 2 × 2 MIMO network

397

compact description of the Han–Kobayashi achievable rate region when joint decoding is used.

12.2.2

Achievable rates of the MIMO interference channel While Han–Kobayashi-type achievable rate regions can be constructed for the MIMO interference channel, their construction is significantly more complicated than in the SISO case as described in Reference [283], on which the development in this subsection is based. The added complication arises from the fact that for MIMO channels, all the transmit covariance matrices have to be jointly optimized, compared to the SISO interference channel where only the joint power allocations have to be optimized. Consider the following two equations which describe the MIMO Gaussian interference channel, z1 = H11 s1 + H21 s2 + n1

(12.5)

z2 = H22 s2 + H12 s1 + n2 .

(12.6)

The vectors z1 ∈ Cn r 1 ×1 and z2 ∈ Cn r 2 ×1 are the sampled received signals at the antennas of receivers 1 and 2 respectively. The signals transmitted by transmitters 1 and 2 are s1 ∈ Cn t 1 ×1 and s2 ∈ Cn t 2 ×1 respectively, and the vectors n1 ∈ Cn r 1 ×1 and n2 ∈ Cn r 2 ×1 represent i.i.d. circularly symmetric complex Gaussian noise at the antennas of each receiver. Suppose that each transmitter partitions its transmit data streams into two independent streams, a common stream to be decoded at both receivers and a private stream to be decoded at the target receiver only. Let sc1 and sc2 be the transmitted signals at a given sample time that encode the common data from transmitters 1 and 2. Let sp1 and sp2 be the transmitted signals at a given sample time that encode the private data from transmitters 1 and 2 respectively. For the SISO case, we defined power allocations corresponding to these four streams. In the MIMO case, however, power allocations alone will not suffice as the covariance matrices of the transmitted signals influences the spatial structure of the signals and interference. Hence, we need to define covariance matrices associated with the signals encoding the private and and common data streams for each transmitter. Let the following respectively denote the covariances matrices associated with the common streams of transmitters 1 and 2 and the private streams of transmitters 1 and 2: 6 5 K1c = s1c s†1c 6 5 K2c = s2c s†2c 6 5 K1p = s1p s†1p 6 5 K2p = s2p s†2p .

398

2 × 2 Network

Recall that R1c and R2c are the rates associated with the common data streams from transmitters 1 and 2 respectively. Similarly, recall that R1p and R2p are the rates associated with the private streams of transmitters 1 and 2 respectively. The various rates and covariance matrices need to satisfy certain requirements so that the common data streams are decodeable at both receivers for a given decoding order. For all choices of decoding order, the private rates need to satisfy       R1p < log I + H11 K1p H†11 σ 2 I + H21 K2p H†21 −1        R2p < log I + H22 K2p H†22 σ 2 I + H12 K1p H†12 −1  .

In the previous two expressions, observe that in decoding the private streams, each receiver only sees interference from the private stream corresponding to the other transmitter as the common streams have all been decoded and subtracted out by the time the private messages are decoded. We can write different sets of inequalities corresponding to the different decoding orders of the common streams. For instance, suppose that receiver 1 decodes its common stream before decoding the common stream from transmitter 2, and, likewise, receiver 2 decodes its common stream before decoding the common stream from transmitter 1. Then the rates and covariance matrices must satisfy the following requirements. For receiver 1 to be able to decode the common stream from transmitter 1, we need       R1c < log I + H11 K1c H†11 σ 2 I + H11 K1p H†11 + H21 (K2p + K2c ) H†21 −1  .

(12.7)

Observe that the matrix that is inverted in the previous expression contains contributions from the noise power, the private stream from transmitter 1, and the private and common streams from transmitter 2. For receiver 2 to be able to decode the common stream from transmitter 1, we need       R1c < log I + H12 K1c H†12 σ 2 I + H12 K1p H†12 + H22 K2p H†22 −1  . (12.8)

Observe that the matrix that is inverted in the previous expression contains contributions from the noise power, the private stream from transmitter 1, and the private stream from transmitter 2. Note that the common stream from transmitter 2 does not contribute to the above expression as it is assumed to have been decoded before the receiver decodes the common stream from transmitter 1. Likewise, for receiver 1 to be able to decode the common stream from transmitter 2, we require that       R2c < log I + H21 K2c H†21 σ 2 I + H11 K1p H†11 + H21 K2p H†21 −1  ,

(12.9)

12.3 Outer bounds of the capacity of the MIMO interference channel

399

and for receiver 2 to be able to decode the common stream from transmitter 2, we need       R2c < log I + H22 K2c H†22 σ 2 I + H22 K2p H†22 + H12 (K1p + K1c ) H†12 −1  . (12.10)

Note that inequalities (12.7) to (12.10) refer to the specific case of the receivers decoding their respective common streams first followed by the other common stream. One may write corresponding equations for other decoding orders. Thus, one can construct an achievable rate region of the MIMO interference channel as the convex hull of the rate pairs R1 = R1c + R1p and R2 = R2c + R2p . The convex hull is taken over all possible decoding orders of the common streams and over all possible covariance matrices K1c , K2c , K1p , and K2p that satisfy the requirements of their respective decoding orders. Furthermore, the covariance matrices must respect the following power constraints, trace (K1c + K1p , ) ≤ P1

trace (K2c + K2p , ) ≤ P2 ,

where P1 and P2 are the power constraints on transmitters 1 and 2 respectively. Since the achievable rate region depends on the covariance matrices in a complicated way, visualizing the achievable rate region described above is complicated. Most works in the literature that deal with the capacity region of MIMO interference channels consider either the sum capacity (for example, Reference [283]) or specific regimes of operation. For instance, in Reference [282], the capacity is found for the case that the interference is strong enough that it can be decoded perfectly and then subtracted out. Note that as in the SISO case, joint decoding of the common and private streams by the receivers can improve the achievable rates compared to the sequential decoding described.

12.3

Outer bounds of the capacity region of the Gaussian MIMO interference channel The achievable rate regions described in the previous section can be combined with appropriate outer bounds in order to characterize the capacity region of the 2 × 2 network. Outer bounds to the capacity region are discussed in this section, and the discussions are based on the pioneering work of Etkin et al. [89] who originally derived these bounds and showed that they Han–Kobayashi scheme described in the previous section is within one bit of the outer bounds.

12.3.1

Outer bounds to the capacity region of the single-antenna Gaussian interference channel The general set of techniques used to derive outer bounds to the capacity region are to treat the channel as a combination of broadcast and/or multiple access

400

2 × 2 Network

channels, and/or to use genie-aided methods. Genie-aided methods refer to a class of methods in which one or more users is given information that it would not normally have access to. For instance, for the interference channel described by Equations (12.1) and (12.2), a possible genie-aided system would be one where receiver 1 knows s2 , information that could be provided to receiver 1 by a genie. We refer the motivated reader to the original source, Reference [89], for details. An outer bound to the capacity region of the Gaussian interference channel can be described by a set of inequalities. For each of the inequalities, we shall summarize the general techniques used to find these bounds. Strong interference bounds The first set of bounds that can be written are based on interference-free communication. These bounds are are achievable if h12 and h21 are zero, in which case, there is no interference. Another scenario is in the so-called strong interference regime where the received interference power is greater than the received signal power, that is, P2 |h21 |2 > P1 |h11 |2

(12.11)

P1 |h12 |2 > P2 |h22 |2 ,

(12.12)

and

where the transmit powers of transmitters 1 and 2 are P1 and P2 respectively. When Equations (12.11) and (12.11) hold, the capacity region is the following:   P1 |h11 |2 R1 ≤ log2 1 + (12.13) σ2   P2 |h22 |2 R2 ≤ log2 1 + , (12.14) σ2 which is the set of rates achievable as if the interfering paths did not exist, that is, h12 = h21 = 0. In the high-interference case, the capacity region is the intersection of two multiple-access channel capacity regions, each corresponding to the multiple-access channel formed by the two transmitters and one of the receivers. One-sided interference channel bounds A second pair of bounds can be written by using two genie-aided channels. In the first, a genie provides s2 to receiver 1, and in the second a genie provides s1 to receiver 2. Since the latter is simply the former case with roles reversed it is sufficient to analyze just one of the cases. A genie-aided system in which receiver 1 knows the transmitted signal of transmitter 2, s2 (presumably revealed by a genie), is equivalent to a one-sided interference channel, which is depicted in Figure 12.5, since receiver 1 can subtract out the signal from transmitter 2.

12.3 Outer bounds of the capacity of the MIMO interference channel

Transmitter 1

Receiver 1

Transmitter 2

Receiver 2

401

Figure 12.5 One-sided interference channel.

Since the case P2 |h21 |2 > P1 |h11 |2 is already treated by the bounds in Equations (12.11) and (12.12), it is sufficient to consider the case where P2 |h21 |2 < P1 |h11 |2 . The sum capacity of the one-sided interference channel when P2 |h21 |2 < P1 |h11 |2 has been found by Sason in Reference [271], for the general one-sided interference channel (that is, with the noise not necessarily Gaussian). For the case of Gaussian noise, the sum capacity is bounded from above by the following     P2 |h22 |2 P1 |h11 |2 + log2 1 + 2 . log2 1 + σ2 σ + P1 |h12 |2 Thus, one can write the following two bounds on the sum rate R1 + R2 of the 2 × 2 Gaussian interference channel,     P2 |h22 |2 P1 |h11 |2 + log2 1 + 2 R1 + R2 ≤ log2 1 + σ2 σ + P1 |h12 |2     P2 |h22 |2 P1 |h22 |2 + log . 1 + R1 + R2 ≤ log2 1 + 2 σ2 σ 2 + P2 |h21 |2 Note that the second bound is simply the first with the roles of the transmitter and receiver reversed. Noisy-interference bounds A third type of bound can be found using a different genie-aided channel in which the genie reveals to a particular receiver a noisy version of the interference which

402

2 × 2 Network

Transmitter 1

Receiver 1

Transmitter 2

Receiver 2

Figure 12.6 Genie-aided interference channel with interference plus noise revealed to

cross receivers.

that link causes to the other receiver. This bound can be explicitly described by defining the following variables, which represent the interference plus noise seen at the opposing receiver: v1 = h12 s1 + n2 v2 = h21 s2 + n1 . The genie reveals the noisy interference at receiver 2, v1 , to receiver 1 and the noisy interference at receiver 1, v2 , to receiver 2. Note that v1 is the interference plus noise seen at receiver 2 and v2 is the interference plus noise seen at receiver 1. This channel, where the broken lines represent information provided by the genie, is illustrated in Figure 12.6. This type of genie-aided network is different from the traditionally used genieaided network in that the information provided by the genie cannot be used by any one node to perfectly cancel out interference. This technique provides a useful bound to the sum capacity in certain regimes. Using detailed informationtheoretic techniques, it can be shown that an upper bound on the sum rate can be written as   P2 |h21 |2 P1 |h11 |2 + R1 + R2 ≤ log2 1 + σ2 σ 2 + P1 |h12 |2   P1 |h12 |2 P2 |h22 |2 + log2 1 + . + σ2 σ 2 + P2 |h21 |2 Again, we refer the reader to Reference [89] for details of the derivation. Noisy-interference bounds with additional receiver A fourth type of bound can be found by using a similar genie-aided system as before, but introducing a second receiver for link 1 that does not have the aid of the genie. This channel can be described by Figure 12.7.

12.3 Outer bounds of the capacity of the MIMO interference channel

403

Receiver 1B Transmitter 1

Receiver 1A

Transmitter 2

Receiver 2

Figure 12.7 Genie-aided interference channel with interference plus noise revealed to

cross receiver and an additional receiver without aid of the genie.

For this type of network, it can be shown that the sum rate including the rate achieved at the additional receiver is     2 P1 |h11 |2 σ + P1 |h11 |2 P2 |h21 |2 + log2 2R1 + R2 ≤ log2 1 + + σ2 σ2 σ 2 + P1 |h12 |2   2 2 P1 |h12 | P2 |h22 | + log2 1 + . (12.15) + 2 2 σ σ + P2 |h21 |2 A similar bound can be written if an additional receiver 2 was introduced instead of an additional receiver 1, as follows:    2  σ + P2 |h22 |2 P2 |h22 |2 P1 |h12 |2 R1 + 2R2 ≤ log2 1 + + log + 2 σ2 σ2 σ 2 + P2 |h21 |2   P2 |h21 |2 P1 |h11 |2 + log2 1 + . (12.16) + 2 2 σ σ + P1 |h12 |2 Thus, we can say that the capacity region of the two-user Gaussian interference channel when both interference channels are weaker than the corresponding direct channels, that is,

P2 |h21 |2 < P2 |h22 |2

P1 |h12 |2 < P1 |h11 |2 ,

404

2 × 2 Network

must simultaneously satisfy all of the following inequalities:   P1 |h11 |2 (12.17) R1 ≤ log2 1 + σ2   P2 |h22 |2 R2 ≤ log2 1 + (12.18) σ2     P2 |h22 |2 P1 |h11 |2 R1 + R2 ≤ log2 1 + + log (12.19) 1 + 2 σ2 σ 2 + P1 |h12 |2     P1 |h22 |2 P2 |h22 |2 R1 + R2 ≤ log2 1 + + log (12.20) 1 + 2 σ2 σ 2 + P2 |h21 |2   P2 |h21 |2 P1 |h11 |2 R1 + R2 ≤ log2 1 + (12.21) + σ2 σ 2 + P1 |h12 |2   P1 |h12 |2 P2 |h22 |2 log2 1 + (12.22) + σ2 σ 2 + P2 |h21 |2    2  σ + P1 |h11 |2 P1 |h11 |2 P2 |h21 |2 2R1 + R2 ≤ log2 1 + + log + 2 σ2 σ2 σ 2 + P1 |h12 |2   P1 |h12 |2 P2 |h22 |2 + log2 1 + (12.23) + 2 2 σ σ + P2 |h21 |2    2  σ + P2 |h22 |2 P2 |h22 |2 P1 |h12 |2 R1 + 2 R2 ≤ log2 1 + + log2 + σ2 σ2 σ 2 + P2 |h21 |2   2 2 P2 |h21 | P1 |h11 | + log2 1 + . (12.24) + 2 σ2 σ + P1 |h12 |2 For the case where one of the interference channels is weaker than the direct channel but the other isn’t, that is, P2 |h21 |2 ≥ P2 |h22 |2

P1 |h12 |2 < P1 |h11 |2 ,

we can write the following bounds that all rate pairs must satisfy:   P1 |h11 |2 (12.25) R1 ≤ log2 1 + σ2   P2 |h22 |2 R2 ≤ log2 1 + (12.26) σ2     P2 |h22 |2 P1 |h11 |2 R1 + R2 ≤ log2 1 + + log2 1 + 2 (12.27) σ2 σ + P1 |h12 |2   P2 |h21 |2 P1 |h11 |2 R1 + R2 ≤ log2 1 + (12.28) + σ2 σ2     P2 |h22 |2 P2 |h22 |2 P1 |h12 |2 R1 + 2R2 ≤ log2 1 + + log2 1 + 2 + σ2 σ2 σ + P2 |h21 |2   2 2 P2 |h21 | P1 |h11 | + log2 1 + . (12.29) + 2 σ2 σ + P1 |h12 |2

12.3 Outer bounds of the capacity of the MIMO interference channel

Transmitter 1

Transmitter 2

405

Receiver 1A

Receiver 2A Receiver 2B

Figure 12.8 Genie-aided mixed interference channel with interference plus noise

revealed to the cross receivers, interfering signal revealed at one receiver, and an additional receiver without aid of the genie.

Except for the last inequality, the remaining expressions are either equivalent to the weak interference channel, or can be found from the results of the weak interference channel. For instance, the first two inequalities are based on interferencefree communication and hold in all cases including the strong, weak, and mixed interference channels. The last inequality can be found by a genie-aided system with an additional antenna (acting as an additional user) at node 2, as depicted in Figure 12.8. The genie reveals the interference plus noise seen at receiver 2A to receiver 1 and the interference plus noise seen at receiver 1 to receiver 2A. Additionally, the genie reveals the interfering signal s1 to receiver 2A. Receiver 2B is not aided by the genie and receives the signal s2 . The last bound can be found using arguments detailed in Reference [89].

12.3.2

Outer bounds to the capacity region of the Gaussian interference channel with multiple antennas The general ideas used in bounding the capacity region of the single-antenna Gaussian interference channel can be extended to multiantenna systems as done in Reference [243]. In this section we summarize the outer bounds provided in that work. Note that very recently, these bounds have been improved on in Reference [170]. Assuming that the signals received at the antennas of users 1 and 2 are z1 ∈ Cn r 1 ×1 and z2 ∈ Cn r 2 ×1 respectively, we write the following, √ √ z1 = ρ1 H11 s1 + γ1 H21 s2 + n1 √ √ z2 = ρ1 H22 s2 + γ2 H12 s1 + n2 .

406

2 × 2 Network

We shall assume that n1 ∈ Cn r 1 ×1 and n2 ∈ Cn r 2 ×1 comprise i.i.d., circularly symmetric, complex, Gaussian random variables with unit variance, and the channel matrix between the jth transmit node and the kth receive node is given by Hj k ∈ Cn r k ×n t j . Let the achievable rates on link 1 and link 2 be denoted by R1 and R2 , that is, links 1 and 2 can simultaneously operate at rates R1 and R2 respectively with arbitrarily low probability of error. We can write the following two bounds, which are based on single-user communication (that is, no interference) on the rates of links 1 and 2.     R1 ≤ log2 I + ρ1 H11 H†11      R2 ≤ log2 I + ρ2 H22 H†22  . We can further write a bound on the sum capacity as follows, R1 + R2 ≤ log2 |K1 | + log2 |K2 | ,

(12.30)

where the matrices K1 ∈ Cn t 1 ×n t 1 and K1 ∈ Cn t 2 ×n t 2 are defined as follows −1  † K1 = I + γ1 R21 + ρ1 H11 T−1 + γ H H H†11 , (12.31) 2 12 12 1 −1  † H†22 , (12.32) K2 = I + γ2 R12 + ρ2 H22 T−1 2 + γ1 H21 H21

and

R11 = H11 T1 H†11 , R22 = R12 =

H22 T2 H†22 H12 T1 H†12 H21 T2 H†21

(12.33)

,

(12.34)

,

(12.35)

R21 = . (12.36) 6 5 Note here that Tj = sj s†j ∈ Cn t j ×n t j is the transmit covariance matrix of

the jth transmitter. Hence the matrix Rj k ∈ Cn r k ×n r k is the covariance matrix of signals received at the nr k antennas of the kth receiver which are due to the transmission of the jth transmitter. We assume here that the transmit covariance matrices Tj have been optimized. This bound can be proved by using a genie-aided network of the form depicted in Figure 12.6, as shown in Reference [243]. The genie provides the following signals to receivers 1 and 2 respectively, √ (12.37) v2 = γ2 H12 s1 + n2 √ (12.38) v1 = γ1 H21 s2 + n1 .

That is to say, the genie provides the jth receiver with the interference caused by the jth transmitter on the kth receiver, for j = k. We can write two more bounds that are simply the sum capacities of the multiple-access channels obtained by

12.3 Outer bounds of the capacity of the MIMO interference channel

407

removing receivers 1 and 2 respectively as follows R1 + R2 < log2 |I + ρ1 R11 + γ1 R21 |

R1 + R2 < log2 |I + ρ2 R22 + γ2 R12 | .

(12.39) (12.40)

The bound in Equation (12.39) applies when receiver 1 is able to decode the messages intended for both receivers. Writing the singular-value decomposition of the channel matrix between the jth transmitter and kth receiver as follows, Hj k = Uj k Σj k Vj k ,

(12.41)

the bound in Equation (12.39) applies when † † −2 γ2 V11 Σ−2 11 V11 − ρ1 V12 Σ12 V12 ≥ 0 .

(12.42)

Similarly, the second bound in Equation (12.40) applies when receiver 2 is able to decode the messages intended for both receivers, which occurs if † † −2 γ1 V22 Σ−2 22 V22 − ρ2 V21 Σ21 V21 ≥ 0 .

(12.43)

If we assume that a genie provides s2 to receiver 1, we have a one-sided interference channel. That is to say, receiver 1 effectively sees no interference from transmitter 1. Then, generalizing the analysis in Reference [89], the following bound found in Reference [243]     −1 (12.44) R1 + R2 ≤ log2 |I + ρ1 R11 | + log2 I + (I + γ2 R12 ) ρ2 R22  . Similarly, if the genie reveals s1 to receiver 2, we have     −1 R1 + R2 ≤ log2 |I + ρ2 R22 | + log2 I + (I + γ1 R21 ) ρ1 R11  .

(12.45)

Suppose now that receiver 1 is decomposed into two separate receivers. Assume that s2 is revealed to one of the sub-receivers at receiver 1, and that v1 is revealed to receiver 2. Then, once again generalizing the analysis of Reference [89], in Reference [243] the following bound is found 2 R1 + R2 ≤ log2 |I + ρ1 R11 + γ1 R21 | + log2 |I + ρ1 R11 |     −1 + log2 (I + γ2 R12 ) K2  ,

(12.46)

R1 + 2 R2 ≤ log2 |I + ρ2 R22 + γ2 R12 | + log2 |I + ρ2 R22 |     −1 + log2 (I + γ1 R21 ) K1  ,

(12.47)

which holds if Equation (12.42) is true. Switching the roles of links 1 and 2 (that is receiver 2 is now decomposed into two separate receivers), yields the following

which holds if Equation (12.43) is true.

408

2 × 2 Network

Using a multiantenna version of Figure 12.8, we can write the following upper bound proved in Reference [243]. R1 + 2 R2 ≤ log2 |K1 | + log2 |I + ρ2 R22 + γ2 R12 | +   −1    †  † −1  log2 I + ρ2 H22 P2 + γ1 H21 H21 H22  .

(12.48)

2R1 + R2 ≤ log2 |K2 | + log2 |I + ρ1 R11 + γ1 R21 | +   −1    †  † −1  log2 I + ρ1 H11 P1 + γ2 H12 H12 H11  .

(12.49)

Once again, switching the roles of links 1 and 2, we have

This collection of inequalities essentially represent the MIMO extension of the outer bounds for the SISO channel given in the previous section.

12.4

The 2 × 2 cognitive MIMO network Cognitive radio systems are, loosely speaking, radio systems that can sense and adapt to their environment in an “intelligent” way. Various authors have used this term to mean different things, and Chapter 16 treats the topic of cognitive radio and its various definitions in more detail. In this chapter, we consider one form of cognitive radio whereby a cognitive transmitter–receiver pair, which we refer to as the secondary link, wishes to transmit simultaneously and in the same frequency band as an existing legacy link, which we refer to as the primary link. Here we assume that the primary link must be able to operate at the capacity as if the cognitive link was absent. In other words, the capacity of the primary link must not be diminished by the existence of the cognitive link. We define two different models for the 2×2 MIMO network, namely a network with a non-cooperative primary link and a network with a cooperative primary link. For the non-cooperative primary link model, the primary link operates as if the secondary link does not exist. Hence, the secondary link must operate in a manner such that it does not reduce the data rate of the primary link, without requiring the primary link to modify its behavior. One possible method for this is for the secondary link to transmit only when the primary link is not accessing the medium or for the secondary link to transmit in a subspace that is orthogonal to that used on the primary link. In the cooperative primary link model, we assume that the primary link will alter its behavior to accommodate the secondary link but not at the expense of its communication rate. In other words, the primary link operates in a manner that is judicious to the secondary link but without sacrificing its data rate. More sophisticated assumptions can be made in the cooperative primary link model as well. For instance, we may allow the primary transmitter to share its data with the secondary transmitter, which can then encode its transmissions in a manner that helps the primary link maintain its maximum data rate. We shall

12.4 The 2 × 2 cognitive MIMO network

409

not consider this type of cooperation in this chapter as it involves a high degree of overhead for the data exchange between the transmitters. Consider the two-link interference network of Figure 12.1, in which the solid arrows are signal paths and broken arrows are interference paths. Suppose that the link between transmitter 1 and receiver 1 is the primary link, and the link between transmitter 2 and receiver 2 is the secondary link. Let R1 and R2 denote the data rates on the respective links, and the matrices Hk j ∈ Cn r j ×n t k denote the channel coefficients between the k-th transmitter and j-th receiver. With zj ∈ Cn r j ×1 denoting the received-signal vector at receiver j, and sk the transmitsignal vector from transmitter k, the following equations hold, z1 = H11 s1 + H21 s2 + n1

(12.50)

z2 = H22 s2 + H12 s1 + n2 ,

(12.51)

where n1 and n2 are i.i.d. complex Gaussian noise vectors of variance σ 2 . Let K1 ∈ Cn t 1 ×n t 1 and K2 ∈ Cn t 2 ×n t 2 respectively denote the covariance matrices of the vectors of transmit samples s1 and s2 with tr(Kj ) ≤ P enforcing a common power constraint on each transmitter. Applying Equation (1) of Reference [92] to our network model, the maximum rate supportable on link 1 if the signal from transmitter 2 is treated as noise is given by the following bound  −1    † † 2 .  (12.52) R1 < log2 I + H11 K1 H11 σ I + H21 K2 H21  For the maximum rate supportable on link 2, simply replace 1 with 2 and vice versa in Equation (12.52) such that   −1   . R2 < log2 I + H22 K2 H†22 H12 K1 H†12 + σ 2 I (12.53) 

We shall assume that the secondary (cognitive) transmitter and receiver know all the channel matrices Hj k and that the number of transmitter antennas at the secondary transmitter nt2 is greater than the number of receiver antennas at the primary receiver, nr 1 .

12.4.1

Non-cooperative primary link Since the primary link does not cooperate with the secondary link, we fix the transmit covariance matrix of the primary link to equal the transmit covariance matrix that maximizes the data rate on the primary link. The primary link can operate at R1 m by using a scheme motivated by singularvalue decomposition of the matrix H11 as follows: H11 = U1 Λ1 V1† ,

(12.54)

where U1 ∈ Cn r 1 ×n r 1 is the left singular matrix, V1 ∈ Cn t 1 ×n t 1 is the right singular matrix and Λ1 ∈ Cn r 1 ×n t 1 contains the singular values of H11 on its

410

2 × 2 Network

1

diagonal. The primary transmitter transmits V1 Φ12 s1 , resulting in a transmit covariance matrix at transmitter 1 of K1 = V1 Φ1 V1† . The primary receiver multiplies the receive-signal vector z1 by U†1 . This operation effectively produces a system with nr 1 parallel channels as follows:   1 (12.55) ˜ z1 = U†1 H11 V1 Φ12 s1 + H21 s2 + n1 1

= U†1 H11 V1 Φ12 s1 + U†1 H21 s2 + U†1 n1 1 2

= Λ1 Φ1 s1 + U†1 H21 s2 + U†1 n1 ,

(12.56)

(12.57)

where Φ1 = diag(φ11 , φ12 , . . . , φ1n t 1 ) contains the power allocations given by a water-filling algorithm, that is, +  σ2 , (12.58) φ1i = η − λi where λi is the ith largest eigenvalue of the matrix H11 H†11 . The notation (x)+ means the maximum of x or zero, in other words, (x)+ = max(0, x). The “water level” η is chosen such that P =

nt 1 

φ1i .

(12.59)

i= 1

Thus, φ1i gives the power that should be allocated to the stream transmitted along vi . Note that this scheme achieves the capacity of the MIMO channel in the absence of interference as described in Section 8.3.2. Since the primary link is non-cooperative, one option is for the secondary link to transmit in a subspace that is orthogonal to the subspace used in the primary link. Suppose that the water-filling power allocation for the primary link allocates zero power to K modes, that is, φ1j = 0 for j > nr 1 −K, for some integer K ≥ 0. Thus, K spatial modes are available for secondary-link transmissions. Note that this is the spatial analog of spectral scavenging, which is a commonly studied cognitive radio paradigm. 1 Suppose that the secondary transmitter transmits K22 s2 instead of s2 , where the matrix K2 ∈ Cn t 2 ×n t 2 is the covariance matrix of the signals transmitted by transmitter 2. Substituting into Equation (12.57) yields 1

1

˜ z1 = Λ1 Φ12 s1 + U†1 H12 K22 s2 + U†1 n1 .

(12.60)

To avoid interfering with the primary link, the first nr 1 − K entries of the second term on the right-hand side must equal zero as these correspond to the parallel channels used by the primary link. Since s2 can be any vector, the first nr 1 − K 1 1 rows of the matrix U†1 H12 K22 must be all zeros. Since U†1 H12 K22 ∈ Cn r 1 ×n t 2 , it is possible to achieve this requirement if nt2 ≥ nr 1 − K. One can express this requirement in matrix form by writing a diagonal matrix D ∈ Cn r 2 ×n r 2 whose first nr 1 − K diagonal entries are unity and the remaining entries are zero. The

12.4 The 2 × 2 cognitive MIMO network

411

1

requirement that the first nr 1 − K entries of the matrix U†1 H12 K22 are all zero can be written as 1

D U†1 H12 K22 = 0 ,

(12.61)

where 0 is a matrix whose entries are all zero. With the transmit covariance matrix of K2 , the maximum rate of the secondary link R2 is given by the mutual information between the transmit and received signals of the secondary link, which can be found by applying Equation (1) from Reference [92],  −1    † † 2 ,  R2 < log I + H22 K2 H22 σ I + H12 K1 H12 

(12.62)

which can be maximized with respect to K2 subject to Equation (12.61) which ensures zero interference to the primary link and the power constraint tr(K2 ) ≤ P . The secondary link can perform interference cancellation if the primary link’s rate R1 m is smaller than the mutual information between s1 and z2 . That is, if R1 m

 −1    † † 2  , < log I + H12 K1 H12 σ I + H22 K2 H22 

(12.63)

the secondary link operates as if there is no interference and the maximum rate on the second link is given by the following bound     1 R2 < log I + 2 H22 K2 H†22  . σ

(12.64)

To maximize R2 in this case, Equation (12.64) needs to be maximized subject to Equation (12.63) which ensures that the interference from the primary link can be decoded, Equation (12.61) which ensures that there is no interference to the transmissions of the primary link, and tr(K2 ) ≤ P , which enforces a power constraint. In this subsection, we have assumed that the primary transmitter knows the channel between itself and its target receiver H11 . This information enabled the primary transmitter to spatially encode its transmissions to increase capacity. When the primary transmitter does not have channel-state information, the optimal behavior of the transmitter is to transmit independent data streams on each antenna. In this case, the primary link always uses all its available channel modes. Thus, for systems without channel-state information at the transmitter of the legacy link, or for systems with transmit channel-state information but with all channel modes used by the legacy link, the secondary transmitter must encode its signals such that they are nulled at each antenna of the primary receiver. Problem 12.4 explores such a system in more detail.

412

2 × 2 Network

12.4.2

Cooperative primary link Suppose that the primary link is willing to cooperate with the secondary link and can cancel interference due to the secondary link. The secondary transmitter can then transmit two separate independent data streams. The first is a private, noninterfering stream transmitted in the unused subspace of the primary link as described in the previous subsection. The other is a common stream at rate R2c in the subspace used by the primary link, but at a rate low enough that it can be decoded and subtracted out by the primary receiver. This second stream is referred to as a common stream as it is intended to be decoded and subtracted by receiver 1 and receiver 2. To ensure that the common stream is decodable at the primary receiver, R2c must satisfy  −1    † † † 2   , (12.65) R2c < log I + H21 K2c H21 σ I + H21 K2p H21 + H11 K1 H11 

where K2c and K2p are the transmit covariance matrices of the common and private streams of the secondary transmitter, respectively. In addition, R2c must be supportable by the secondary link in the presence of interference from the primary link and by self-interference from the private stream of the secondary link which is captured by the following inequality,  −1    † † † 2   . (12.66) R2c < log I + H22 K2c H22 σ I + H22 K2p H22 + H12 K1 H12  To ensure that the private stream of the secondary link does not interfere with the primary link, the following needs to hold: 1

2 = 0. D U†1 H12 K2p

(12.67)

Hence, the secondary transmitter needs to find covariance matrices K2c and K2p as well as rates R2c and R2p such that R2c + R2p is maximized subject to the constraints in Equations (12.65), (12.66), (12.67) and the power constraint tr (K2c + K2p ) ≤ P .

Problems 12.1 Consider the 2 × 2 MIMO interference channel with Han–Kobayashi encoding as described in Section 12.2.2. For each of the possible choices of decoding orders, please write the inequalities that govern the common data rates. 12.2 Consider an extension of the Han–Kobayashi scheme for a network with three transmitter and receiver pairs. How many streams should each transmitter employ to achieve all possible combinations of partial interference cancelation. Also, qualitatively describe the achievable rate region for this type of network.

Problems

413

12.3 For the 2 × 2 MIMO interference channel as described in Section 12.3.2, show that receiver 1 is able to decode both s1 and s2 with arbitrarily low probability of error if (12.42) is satisfied. 12.4 Consider the 2 × 2 MIMO cognitive radio channel described in Section 12.4, but now assume that the primary transmitter does not have channel-state information and hence transmits equal power and independent data on each of its antennas. Derive an expression that the transmit covariance matrix of the secondary link (that is, K2 ) needs to satisfy in order not to interfere with the primary link. 12.5 Consider the 2 × 2 MIMO cognitive radio channel described in Problem 12.4, but assume that the cognitive transmitter knows the transmit signal of the primary transmitter s1 . Show how this information could be used by the cognitive link to increase its data rate compared to the previous problem. 12.6 Suppose that a 2 × 2 MIMO system has nt1 and nt2 antennas at transmitters 1 and 2 and nr 1 and nr 2 antennas at receivers 1 and 2 respectively. Assume that 1 < nt1 < nr 2 , nt2 = 1 and nr 1 > 1. Find the transmit covariance matrix T1 of transmitter 1 that minimizes the interference caused on receiver 2 assuming transmitter 1 and receiver 1 have full channel-state information (that is, they know all the channel matrices in the network) and receiver 2 knows only the channel vector between transmitter 2 and itself. 12.7 Consider a 2 × 2 MIMO channel with a legacy link and a cognitive link. Under what conditions on the relative numbers of antennas at all nodes can the cognitive link operate without disrupting the legacy link? Assume that the legacy link does not change its behavior in response to the presence of the legacy link. 12.8 Consider a legacy 2×2 link for which each transmitter has a single antenna and each receiver has two antennas. It is assumed that the receivers use zeroforcing to cancel the interference from their respective undesired transmitters. Assume that a cognitive link with two transmitter antennas wishes to operate in the same frequency band as this existing 2 × 2 link in a manner such that the existing links are completely unaffected, that is neither their communication rates nor their behavior changes. Show that it is possible for the cognitive link to operate with nonzero rate by appropriately phasing transmit signals. You may make reasonable assumptions on the realizations of the various channel matrices.

13 Cellular networks

13.1

Point-to-point links and networks The simplest wireless communication link is between a single transmitter and a single receiver. In point-to-point systems, data communication rates depend on factors such as bandwidth, signal power, noise power, acceptable bit-error rate, and spatial degrees of freedom. Many wireless systems, however, comprise multiple interacting links. The parameters and trade-offs associated with point-to-point links hold for networks as well. Additional factors play a role in networks, however. For instance, interference between links can reduce data communication rates. An exciting possibility is for nodes to cooperate and help convey data for each other, which has the potential to increase data communication rates. Table 13.1 summarizes some of the key common and differentiating features of point-to-point links versus networks. In this chapter, we analyze the performance of various multiantenna approaches in the context of cellular networks whereby signal and interference strengths are influenced by the spatial distribution of nodes and base stations. Note that we use the term cellular in a broader context than many works in the literature, which refer specifically to mobile telephone systems. Here we consider any kind of network with one-to-many (downlink) and many-to-one topologies (uplink). For most of this chapter except for Section 13.5.1, we shall focus on characterizing systems without out-of-cell interference, whereby we assume that there is some channel allocation mechanism with a reuse factor that results in negligible out-of-cell interference. Examples include wireless networks with access points acting like base stations and sensor networks with data-collection nodes acting like base stations. Note that our focus here is on the different receiver algorithms and the impact of the spatial distribution of nodes and base stations on the performance of such systems. We refer the interested reader to specialized texts on cellular mobile networks, such as [328] and [192], for detailed discussions of such systems.

13.2

Multiple access and broadcast channels In the absence of out-of-cell interference, the uplink and downlink of cellular networks can be viewed as the canonical multiple-access channel (MAC) and

13.2 Multiple access and broadcast channels

415

Table 13.1 Key features of wireless networks Point-to-point

Networks

Signal power, noise power Signal-to-noise-ratio (SNR) Bandwidth Bit-error rate (BER) Processing latency Local optimization Simple point-to-point protocols Point-to-point topology

Signal, interference and noise powers Signal-to-interference-plus-noise-ratio (SINR) Per-link bandwidth Bit-error rate Processing and protocol latency Network or local optimization Sophisticated multiple-access protocols Multiple network topologies Fairness

broadcast channel (BC) respectively. The multiple-access channel is essentially a many-to-one network and the broadcast channel is a one-to-many network. In this section, we analyze these channels from an information theoretic point of view by studying the capacity regions of these channels. The developments here are standard and are discussed in texts such as [68] and [314]. Capacity region of the SISO multiple-access channel For simplicity, consider a multiple-access channel with two transmitters and one receiver where the signals from transmitters 1% and &2 are denoted% by s1& and s2 respectively with average power constraints ||s1 ||2 ≤ P1 and ||s2 ||2 ≤ P2 . Suppose that the complex baseband received signal is z = s1 + s2 + n ,

(13.1)

where n is a complex, white Gaussian noise process with variance σ 2 . Let R1 and R2 denote the rates of link 1 between transmitter 1 and the receiver, and link 2 between transmitter 2 and the receiver respectively. For arbitrarily low probability of error, the following must hold,   P1 (13.2) R1 < log2 1 + 2 σ   P2 (13.3) R2 < log2 1 + 2 , σ which correspond to the capacity of each of the channels with the other turned off. To find a third bound, let’s suppose that transmitter 1 and transmitter 2 can cooperate and share their power. The sum rate without cooperation must be less than or equal to that with cooperation. Then we have   P 1 + P2 R1 + R2 < log2 1 + , (13.4) σ2

416

Cellular networks

R2

log2

R′2 = log2 1 +

1+

Capacity region is inside this pentagon.

P2 σ2

A

C D

P2 P1 +

σ2 B

R′1 = log2 1 +

P1 P2 + σ2

R1 log2

1+

P1 σ2

Figure 13.1 Bounds on the capacity region of the multiple-access channel. P1 and P2 are the received power due to transmitters 1 and 2 respectively, and R1 and R2 are the data rates per channel use of transmitters 1 and 2 respectively.

where the right-hand side of the previous expression comes from the Shannon capacity (for example, see Section 5.3) of a link with transmit power budget of P1 + P2 . These bounds are shown in Figure 13.1. Any rate pair (R1 , R2 ) that can be decoded with arbitrarily low probability of error must be inside the pentagon in Figure 13.1 in order to satisfy the bounds given above. Next, we show that all points inside the pentagon in Figure 13.1 are achievable with arbitrarily low probability of error. In other words, communication with arbitrarily low probability of error is possible at all pairs of rates (R1 , R2 ) that are inside the pentagon in Figure 13.1. For the rest of this chapter, we shall use the term “achievable” to describe a rate at which communication with arbitrarily low probability of error is possible. Consider Figure 13.1. Points A and B are achievable when transmitters 1 and 2 respectively are off, and by using Gaussian code-books at the active transmitter, since with one transmitter off, the system is reduced to an additive white Gaussian noise channel. Point C is achievable if the receiver first decodes the signal from transmitter 1 which transmits at rate R1′ , while treating the signal from transmitter 2 as noise which effectively increases the noise variance by P2 . The signal from transmitter 1 can be decoded with arbitrarily low probability of error since by treating the signal from transmitter 2 as noise, the Shannon capacity result given in Section 5.3 indicates that communication with arbitrarily low probability of error is possible at rates satisfying   P1 ′ . (13.5) R1 < log2 1 + P2 + σ 2 Therefore, the signal from transmitter 1 can be subtracted from the   received signal with arbitrary accuracy, thereby allowing any rate R2 < log2 1 + Pσ 22 to

13.2 Multiple access and broadcast channels

417

be achievable. Point D is achievable using the same technique with the roles of transmitters 1 and 2 reversed. Thus we have shown that all rates at the corner points of the pentagon in Figure 13.1 are achievable. Finally, any point inside the pentagon 0ACDB is achievable by time sharing between the strategies used to achieve the corner points. Therefore, any point inside the pentagon in Figure 13.1 is achievable. For comparison, consider a time-division multiple-access (TDMA) scheme (discussed in Section 4.3.2) in which transmitter 1 uses the channel for a fraction α ≤ 1 of the time and transmitter 2 uses the channel for 1 − α of the time. The achievable rate pairs now must satisfy   P1 (13.6) R1 < α log2 1 + α σ2   P2 . (13.7) R2 < (1 − α) log2 1 + (1 − α) σ 2 The factors of α and 1 − α that scale the noise power are due to the fact that P1 and P2 are long-term average power budgets of transmitters 1 and 2 which are respectively on for fractions α and 1 − α of the time. By varying α, the rate pair (R1 , R2 ) traces out the dashed line shown in Figure 13.2. Note that it is possible to meet the maximum sum-rate by selecting α = P 1P+1P 2 , which yields the following after some algebraic manipulation:   P 1 + P2 R1 + R2 < log2 1 + . (13.8) σ2 This analysis extends in a straightforward manner to multiple-access channels with K users where the capacity region satisfies the following set of constraints "    Pk (13.9) Rk < log2 1 + k ∈T2 σ k ∈T

for all T , which are subsets of the integers 1, 2, . . . K. Similar to the two transmitter case, the sum capacity is obtained by combining powers of the K users, which gives the following sum capacity,   "M M  k = 1 Pk . (13.10) Rk < log2 1 + σ2 k=1

Capacity region of the broadcast channel The downlink channel from the base station to the mobile users is an example of a broadcast channel where one source transmits information to multiple users. In general, we assume that the transmitter sends a distinct message to each receiver. As in the multiple-access channel, we first consider a channel with two mobile users. Unlike for the multiple-access channel, we have to explicitly consider the

418

Cellular networks

V arying A

C D

B

Figure 13.2 Capacity region of the multiple-access channel with achievable rates using

TDMA in dashed lines.

strengths of the channels between the transmitter and each mobile, and the power allocated by the transmitter to each mobile. Suppose that the transmitter allocates power P1 and P2 to receivers 1 and 2 with P = P1 + P2 and that the channels between the transmitter and receivers are denoted by h1 and h2 respectively. If the transmitted signal is the superposition of the signals intended for receiver 1 and receiver 2, we can write the following expression for the signal at the jth receiver, zj = hj s1 + hj s2 + nj ,

(13.11)

Gaussian noise of variance σ 2 , that is, where nj is circularly symmetric complex & % 2 2 nj is distributed as CN (0, σ ), and ||sj || ≤ Pj . Suppose that ||h1 ||2 < ||h2 ||2 . In this case, receiver 2 can decode s1 with arbitrarily low probability of error provided that s1 is transmitted at a rate R1 such that receiver 1 can decode s1 with arbitrarily low probability of error. Hence, receiver 2 can perform successive interference cancellation and remove the interference contribution caused by the signals intended for receiver 1. This strategy leads to the following bound on the achievable rate of link 2,

R2 < log2

  P2 ||h2 ||2 . 1+ σ2

(13.12)

Note that this is the single-user bound, that is, the capacity of the channel between the transmitter and receiver 2 if receiver 1 was not present.

13.2 Multiple access and broadcast channels

419

If receiver 1 treats the signal intended for user 2 as noise, it can then achieve the following:  R1 < log2 1 +

P1 ||h1 ||2 P2 ||h1 ||2 + σ 2



b/s/Hz .

(13.13)

The sum capacity of this channel is given by R1 + R2 < log2

  P2 ||h2 ||2 . 1+ σ2

(13.14)

The proof of this is the subject of Problem 13.2. So far, we have presented achievable rates for the two-user user broadcast channel. Proving that this indeed is the upper bound on the achievable rates and, hence, is the capacity region of the system is quite difficult and the reader is referred to Reference [17] for the detailed proof of this. Vector multiple-access channel The capacity region of the vector multiple-access channel (that is, multipleaccess channel with multiantenna transmitters and base station) turns out to be a relatively straightforward extension of the single-antenna case. For simplicity, consider a multiple-access channel with one receiver and two transmitters where the transmitters each have nt antennas and the receiver has nr antennas. Let the channel between the base station and the jth receiver be denoted by Hj . Then, by using the results for the single-user MIMO channel described in Section 8.3, we can write the following two inequalities for R1 and R2 , the rates of user 1 and 2 respectively as a function of their respective transmit covariance matrices T1 , T2 :     1 †  R1 < log2 I + 2 H1 T1 H1  σ     1 †  R2 < log2 I + 2 H2 T2 H2  . σ

(13.15) (13.16)

Furthermore, we can write the following sum-rate constraint,     1 1 † †  R1 + R2 < log2 I + 2 H1 T1 H1 + 2 H2 T2 H2  , σ σ

(13.17)

which is simply the MIMO channel capacity if both transmitters are treated as a single transmitter with their antennas pooled together. As in the singleantenna case, these rates can be achieved by interference cancellation and time sharing. The capacity region can then be found as the union of the regions described by Inequalities (13.15) to (13.17), over all positive-semidefinite matrices T1 and T2 ,

420

Cellular networks

which respect the power constraints. Thus, the capacity region of the two-user MIMO multiple-access channel is the following: ⎫   ⎧  † 1 ⎪ ⎪ H T H I + < log R ⎪   ⎪ 1 1 1 1 2 σ2 ⎪ ⎪   ⎬ ⎨ J   † 1 (R1 , R2 ) s.t. R2 < log2 I + σ 2 H2 T2 H2   ⎪  ⎪ ⎪ ⎪ ⎪ tr(T j )≤P ,∀j ⎭ ⎩ R1 + R2 < log2 I + 12 H1 T1 H† + 12 H2 T2 H†  ⎪ 1 2 σ σ (13.18)

In the expression above, the inequalities in the braces are restrictions that the pair of rates must satisfy for a given set of transmit covariance matrices T1 and T2 . The union is of all pairs of rates such that the inequalities are satisfied, and is taken over all covariance matrices that respect the power constraints, as the transmitters may choose any pair of covariance matrices that satisfy the power constraints. Like the single-antenna, multiple-access channel, the analysis of the two-user broadcast channel, extends in a straightforward way to broadcast channels with K users, which yields the following capacity region:     J   1  † H T H (R1 , R2 ) s.t. Ri < log2 I + i i i  ∀T ⊆ (1, 2, . . . , K) .   σ2 i∈T i∈T tr(T j )≤P ,∀j (13.19) Vector broadcast channel Unlike the multiple-access channel, the multiantenna broadcast channel is not a simple extension of the single-antenna case. Consider a system with nt antennas at the base station and K single-antenna receivers. Suppose that H ∈ Cn t ×K represents the channel coefficients between the antennas of the base station and the antennas of the K receivers, and the thermal noise at the receivers is represented by n ∈ CK ×1 ∼ CN (0, I). Note that with this notation, the sampled base-band equivalent signals at the transmitters are denoted by the z ∈ CK ×1 as follows z = H† s + n,

(13.20)

where s ∈ Cn t ×1 represents the transmitted samples on each of the transmit antennas. The sum capacity of this broadcast channel was found in Reference [326] to equal     (13.21) csum = sup log2 I + HDH†  , D ∈A

where the set A over which the supremum is taken is the set of all K ×K diagonal matrices with non-negative entries that satisfy the transmit power constraint tr{D} ≤ P . Note that the supremum refers to the smallest upper bound.

13.2 Multiple access and broadcast channels

421

The corresponding capacity region, which holds for systems with multiantenna receivers as well, is more complicated to describe and was derived in Reference [341]. In the following, we shall briefly describe the capacity region and refer the reader to Reference [341] for details of its derivation. Let the received signal at the kth receiver which has nr k antennas be given by zk ∈ Cn r k ×1 as follows, zk = Hk s + nk , where Hk ∈ Cn r k ×n t is the channel matrix between the base station and the antennas of the kth receiver. The transmitted signal of the base station is s ∈ Cn t ×1 , and nk ∈ Cn r k ×1 is a vector of circularly symmetric, complex, Gaussian noise of variance σ 2 at the antennas of the kth receiver. The transmitted signal is a superposition of signals intended for each of the K receivers. If the vector sk ∈ Cn t ×1 represents the signal intended for the kth receiver, the transmitted signal vector is s=

K 

sk .

k=1

The capacity region of this channel is achieved by dirty-paper coding described in Section 5.3.4. The dirty-paper coding for the broadcast channel is done successively. Let the kth user to be encoded be denoted by P(k). We use this notation to emphasize the fact that user k need not be the kth user to be encoded. This property is necessary since the final capacity region is given in terms of all possible encoding orders. The first user to be encoded, P(1), is assigned a codeword s1 that is a function of the data intended for that user. The kth user to be encoded, P(k), is assigned a codeword sk that is a function of the codeword intended for that user with the signal intended for users P(1), P(2), . . . P(k − 1) effectively presubtracted using the dirty-paper coding strategy. Thus, user P(K) sees an effectively interference-free signal, and user P(k) effectively sees interference only from users P(k + 1), . . . P(K). Note that P here refers to an ordering of the integers 1, 2, . . . K. Let the covariance matrices of the codewords be denoted by 6 5 (13.22) Tk = sk s†k . For simplicity, let us again consider the K = 2 case. Using dirty-paper coding, if the signal for receiver 1 is encoded first, the achievable rate bound on link 1 is    2 † † σ I + H1 T1 H1 + H1 T2 H1    R1 < I(z1 ; s1 ) = log2 (13.23)  2 † σ I + H1 T2 H1      1 †  R2 < I(z2 ; s2 |s1 ) = log2 I + 2 H2 T2 H2  . (13.24) σ

Note that the link to receiver 2 can operate at a rate as if it had perfect knowledge of the signal from transmitter 1.

422

Cellular networks

Extending this to K users where the transmissions are encoded in the order 1, 2, . . . K, we have   "K  2 † σ I + Hk j = k Tj Hk  . Rk < I(zk ; sk |s1 , . . . sk −1 ) = log2  (13.25) "K † σ 2 I + j = k + 1 Hk Tj Hk 

To find the full capacity region, we have to take the convex hull of the union over all possible encoding orderings and all possible covariance matrices. Note that the convex hull of a set of points in Rk is the convex set that has the minimum volume such that all the points are in the convex set. To describe the general capacity region, we rewrite the above equation for a general encoding order P so that the rate for the kth receiver to be encoded is bounded as follows: RP(k ) < I(zP(k ) ; sP(k ) |sP(1) · · · sP(k −1) )   "K  2  † σ I + HP(k ) j = k TP(j ) HP(k )  . = log2  "K  † σ 2 I + HP(k ) j = k + 1 TP(j ) HP(k ) 

(13.26)

Taking the convex hull of the union over all transmit covariances and encoding orderings, we arrive at the following achievable rate region: ⎧ ⎫ ⎨ J  ⎬ RP(1) , RP(2) , . . . , RP(K ) Convex Hull . (13.27) ⎩ ⎭ " P,T j s.t.

T r (T j )≤P

This scheme is capacity achieving, as shown in Reference [341], which uses more general assumptions, allowing for different numbers of antennas at each mobile user and arbitrary noise covariance matrices. The proof is rather involved, and we refer the interested reader there for details.

13.3

Linear receivers in cellular networks with Rayleigh fading and constant transmit powers In this section, we study the performance of some linear receiver structures in cellular networks. Linear receivers are attractive as they are computationally inexpensive compared to optimal receivers, which attempt to decode interfering signals. In this section, we consider three representative linear receivers, namely the antenna-selection receiver discussed in Section 13.3.3, the matched-filter (MF) receiver discussed in Section 9.2.1, and the linear minimum mean-square-error (MMSE) receiver discussed in Section 9.2.3. We shall analyze the performance of a representative link in such a network with multiple antennas at the receivers, in the presence of frequency-flat Rayleigh fading. The linear minimum-mean-squared error (MMSE) receiver discussed previously is optimal in the sense of maximizing the SINR. However, it requires

13.3 Linear receivers in cellular networks

423

Ro d RI

Figure 13.3 Portion of a cellular network with hexagonal cells. The smallest circle

containing a cell has radius RI and the largest circle contained within a cell has radius Ro .

knowledge of channel parameters between the antennas of the desired transmitter and the receiver as well as the covariance matrix of the interference observed at the antennas of the receiver. Any network protocol utilizing the linear MMSE receiver will thus need a mechanism to allow receivers to estimate this information rapidly. Additionally, the linear MMSE receiver requires a matrix inversion which can be computationally expensive in low-cost systems in rapidly changing environments. Thus, while the matched filter receiver and selection combiner have strictly worse performance than the MMSE receiver, they are attractive because of lower complexity. We start with analyzing systems that do not use power control, where all transmitters use a fixed transmit power of P . We shall approximate the cells as circles of radius R for simplicity and assume that there are a Poisson number of transmitters distributed uniformly randomly in the circle with mean number of points equal to ρ π R2 . For the selection combiner and matched-filter, we provide a framework for evaluating the CDF of the SINR with hexagonal cells and provide bounds for the CDF for the MMSE receiver. Note that the derivations in the remainder of this chapter will be used in the next chapter on ad hoc wireless networks as well since the expressions we find can be adapted to ad hoc wireless networks.

13.3.1

Link lengths in cellular networks In the analysis of cellular systems, it is useful to characterize the distribution of link-lengths that arise from the cellular model as it impacts the distribution of signal and interference strengths. Suppose that a given wireless node is randomly located at some point on a cellular network and establishes a link of length x with the base station that is closest in Euclidean distance to it.

424

Cellular networks

For the hexagonal cell model (see Figure 13.3) with minimum base station separation d, the CDF, PDF, and kth moment of x are given by ⎧ ⎪ if x < 0, ⎪ ⎪0,√ 2 ⎪ ⎪ 2 3π x ⎪ , if 0 ≤ x < d2 ⎪ 3d 2 ⎪ √ 2 ⎨ √ 2   2 3π x d − 4 d3x cos−1 2x 2 (13.28) Fx (x) = 3d 2 1   ⎪ √ √ x2 ⎪ 2 ⎪ 3d 1 d ⎪ , if 2 ≤ x < 3 +2 3 d 2 − 4 ⎪ ⎪ ⎪ √ ⎪ ⎩1, if x ≥ 33d , fx (x) =

and

⎧ 4π ⎪ ⎪ √3d 2 x, ⎨ √4π

3d 2 ⎪ ⎪ ⎩0,

x−

8



3x d2

if 0 < x