159 64 24MB
English Pages 1155 [1099] Year 2010
Handbook of Signal Processing Systems E E
Shuvra S. Bhattacharyya • Ed F. Deprettere Rainer Leupers • Jarmo Takala Editors
Handbook of Signal Processing Systems Foreword by S.Y. Kung
Editors Shuvra S. Bhattacharyya ECE Department and UMIACS University of Maryland 2311 A. V. Williams Building College Park Maryland 20742 USA [email protected] Rainer Leupers RWTH Aachen University Templergraben 55 52056 Aachen Germany [email protected]
Ed F. Deprettere Leiden Institute of Advanced Computer Science Leiden Embedded Research Center Leiden University Niels Bohrweg 1 2333 CA Leiden Netherlands [email protected] Jarmo Takala Department of Computer Systems Tampere University of Technology Korkeakoulunkatu 1 33720 Tampere Finland [email protected]
ISBN 978-1-4419-6344-4 e-ISBN 978-1-4419-6345-1 DOI 10.1007/978-1-4419-6345-1 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2010932128 © Springer Science+Business Media, LLC 2010 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
To Milu Shuvra Bhattacharyya To Deirdre Ed Deprettere To Bettina Rainer Leupers To Auli Jarmo Takala
Foreword
It gives me immense pleasure to introduce this timely handbook to the research/development communities in the field of signal processing systems (SPS). This is the first of its kind and represents state-of-the-arts coverage of research in this field. The driving force behind information technologies (IT) hinges critically upon the major advances in both component integration and system integration. The major breakthrough for the former is undoubtedly the invention of IC in the 50’s by Jack S. Kilby, the Nobel Prize Laureate in Physics 2000. In an integrated circuit, all components were made of the same semiconductor material. Beginning with the pocket calculator in 1964, there have been many increasingly complex applications followed. In fact, processing gates and memory storage on a chip have since then grown at an exponential rate, following Moore’s Law. (Moore himself admitted that Moore’s Law had turned out to be more accurate, longer lasting and deeper in impact than he ever imagined.) With greater device integration, various signal processing systems have been realized for many killer IT applications. Further breakthroughs in computer sciences and Internet technologies have also catalyzed large-scale system integration. All these have led to today’s IT revolution which has profound impacts on our lifestyle and overall prospect of humanity. (It is hard to imagine life today without mobiles or Internets!) The success of SPS requires a well-concerted integrated approach from multiple disciplines, such as device, design, and application. It is important to recognize that system integration means much more than simply squeezing components onto a chip and, more specifically, there is a symbiotic relationship between applications and technologies. Emerging applications, e.g. 3D TV, will prompt modern system requirements on performance and power consumption, thus inspiring new intellectual challenges. The SPS architecture must consider overall system performance, flexibility, and scalability, power/thermal management, hardware-software partition, and algorithm developments. With greater integration, system designs become more complex and there exists a huge gap between what can be theoretically designed and what can be practically implemented. It is critical to consider, for instance, how to deploy in concert an ever increasing number of transistors with acceptable power consumption; and how to make hardware effective for applications and yet friendly vii
viii
Foreword
to the users (easy to program). In short, major advances in SPS must arise from close collaboration between application, hardware/architecture, algorithm, CAD, and system design. The handbook has offered a comprehensive and up-to-date treatment of the driving forces behind SPS, current architectures and new design trends. It has also helped seed the SPS field with innovations in applications; architectures; programming and simulation tools; and design methods. More specifically, it has provided a solid foundation for several imminent technical areas, for instance, scalable, reusable, and reliable system architectures, energy-efficient high-performance architectures, IP deployment and integration, on-chip interconnects, memory hierarchies, and future cloud computing. Advances in these areas will have greater impact on future SPS technologies. It is only fitting for Springer to produce this timely handbook. Springer has long played a major role in academic publication on SPS, many of them have been in close cooperation with IEEE’s Signal Processing, Circuits and Systems, and Computer Societies. For nearly 20 years, I have been the Editor-in-Chief of Springer’s Journal of Signal Processing Systems, considered by many as a major forum for the SPS researchers. Nevertheless, the idea has been around for years that a singlevolume reference book would very effectively complement the journal in serving this technical community. Then, during the 2008 IEEE Workshop on Signal Processing Systems, Washington D.C., Jennifer Evans from Springer and the editorial team led by Prof. Shuvra Bhattacharyya met to brainstorm implementation of such idea. The result is this handsome volume, containing contributions from renowned pioneers and active researchers. I congratulate the authors and editors for putting together such an outstanding handbook. It is truly a timely contribution to the exciting field of SPS for this new decade and many decades to come. Princeton
S.Y. Kung
Preface
This handbook provides a reference on key topics in the design, analysis, implementation, and optimization of hardware and software for signal processing systems. Plans for the handbook originated through valuable discussions with SY Kung (Princeton University) and Jennifer Evans (Springer). Through these discussions, we became aware of the need for an integrated reference focusing on the efficient implementation of signal processing applications. We were then fortunate to be able to enlist the participation of a diverse group of leading experts to contribute chapters in a broad range of complementary topics. We hope that this handbook will serve as a useful reference to engineering practitioners, graduate students, and researchers working in the broad area of signal processing systems. It is also envisioned that selected chapters from the book can be used as core readings for seminar- or project-oriented graduate courses in signal processing systems. Given the wide range of topics covered in the book, instructors would have significant flexibility to orient such a course towards particular themes or levels of abstraction that they would like to emphasize. The handbook is organized in four parts. Part I motivates representative applications that drive and apply state-of-the art methods for design and implementation of signal processing systems; Part II discusses architectures for implementing these applications; Part III focuses on compilers and simulation tools; and Part IV describes models of computation and their associated design tools and methodologies. We are very grateful to all of the authors for their valuable contributions, and for the time and effort they have devoted to preparing the chapters. We would also like to thank Jennifer Evans and SY Kung for their encouragement of this project, which was instrumental in getting the project off the ground, and to Jennifer Maurer for her support and patience throughout the entire development process of the handbook. Shuvra Bhattacharyya, Ed Deprettere Rainer Leupers, and Jarmo Takala
ix
Contents
Part I Applications Managing Editor: Shuvra Bhattacharyya Signal Processing for Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . William S. Levine 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Brief introduction to control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Sensitivity to disturbance . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Sensitivity conservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Signal processing in control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 simple cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Demanding cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Exemplary case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Digital Signal Processing in Home Entertainment . . . . . . . . . . . . . . . . . . . . Konstantinos Konstantinides 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 The Personal Computer and the Digital Home . . . . . . . . . . 2 The Digital Photo Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Image Capture and Compression . . . . . . . . . . . . . . . . . . . . . 2.2 Focus and exposure control . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Pre-processing of CCD data . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 White Balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Demosaicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Color transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Post Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Preview display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 3 4 8 9 9 10 12 15 19 20 20 21 23 23 24 25 25 26 26 27 27 27 27 27
xi
xii
Contents
2.9 Compression and storage . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10 Photo Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.11 Storage and Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.12 Digital Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 The Digital Music Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Broadcast Radio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Audio Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 The Digital Video Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Digital Video Broadcasting . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 New Portable Digital Media . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Digital Video Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Digital Video Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Digital Video Playback and Display . . . . . . . . . . . . . . . . . . 5 Sharing Digital Media in the Home . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 DLNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Digital Media Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Digital Media Player . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Copy Protection and DRM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Microsoft Windows DRM . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 To Probe Further . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27 28 28 29 29 29 30 31 31 33 33 33 35 35 35 36 36 36 37 38 40 41 41
MPEG Reconfigurable Video Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marco Mattavelli, Jörn W. Janneck and Mickaël Raulet 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Requirements and rationale of the MPEG RVC framework . . . . . . . 2.1 Limits of previous monolithic specifications . . . . . . . . . . . 2.2 Reconfigurable Video Coding specification requirements 3 Description of the standard or normative components of the framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 The toolbox library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The C AL Actor Language . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 FU Network language for the codec configurations . . . . . 3.4 Bitstream syntax specification language BSDL . . . . . . . . . 3.5 Instantiation of the ADM . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Case study of new and existing codec configurations . . . . 4 The procedures and tools supporting decoder implementations . . . 4.1 OpenDF supporting framework . . . . . . . . . . . . . . . . . . . . . . 4.2 CAL2HDL synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 CAL2C synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43 44 44 46 47 48 48 49 53 55 57 57 61 61 62 64 65 66
Contents
xiii
Signal Processing for High-Speed Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Naresh Shanbhag, Andrew Singer, and Hyeon-min Bae 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 2 System Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 2.1 The Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 2.2 The Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 2.3 Receiver Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 2.4 Noise Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3 Signal Processing Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.1 Modulation Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.2 Adaptive Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.3 Equalization of Nonlinear Channels . . . . . . . . . . . . . . . . . . 87 3.4 Maximum Likelihood Sequence Estimation (MLSE) . . . . 88 4 Case Study of an OC-192 EDC-based Receiver . . . . . . . . . . . . . . . . 91 4.1 MLSE Receiver Architecture . . . . . . . . . . . . . . . . . . . . . . . . 91 4.2 MLSE Equalizer Algorithm and VLSI Architecture . . . . . 92 4.3 Measured Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5 Advanced Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.1 FEC for Low-power Back-Plane Links . . . . . . . . . . . . . . . . 96 5.2 Coherent Detection in Optical Links . . . . . . . . . . . . . . . . . . 98 6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 7 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Video Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Yu-Han Chen and Liang-Gee Chen 1 Evolution of Video Coding Standards . . . . . . . . . . . . . . . . . . . . . . . . 103 2 Basic Components of Video Coding Systems . . . . . . . . . . . . . . . . . . 105 2.1 Color Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 2.2 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 2.3 Transform and Quantization . . . . . . . . . . . . . . . . . . . . . . . . . 110 2.4 Entropy Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 3 Emergent Video Applications and Corresponding Coding Systems 113 3.1 HDTV Applications and H.264/AVC . . . . . . . . . . . . . . . . . 113 3.2 Streaming and Surveillance Applications and Scalable Video Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 3.3 3D Video Applications and Multiview Video Coding . . . . 117 4 Conclusion and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Low-power Wireless Sensor Network Platforms . . . . . . . . . . . . . . . . . . . . . . 123 Jukka Suhonen, Mikko Kohvakka, Ville Kaseva, Timo D. Hämäläinen, and Marko Hännikäinen 1 Characteristics of Low-power WSNs . . . . . . . . . . . . . . . . . . . . . . . . . 123 1.1 Quality of Service Requirements . . . . . . . . . . . . . . . . . . . . . 125 1.2 Services for Signal Processing Application . . . . . . . . . . . . 126
xiv
Contents
2
Key Standards and Industry Specifications . . . . . . . . . . . . . . . . . . . . 127 2.1 IEEE 1451 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 2.2 WSN Communication Standards . . . . . . . . . . . . . . . . . . . . . 127 3 Software Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 3.1 Sensor Operating Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 130 3.2 Middlewares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 4 Hardware Platforms and Components . . . . . . . . . . . . . . . . . . . . . . . . 131 4.1 Communication subsystem . . . . . . . . . . . . . . . . . . . . . . . . . 131 4.2 Computing subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 4.3 Sensing subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 4.4 Power subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 4.5 Existing Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5 Medium Access Control Features and Services . . . . . . . . . . . . . . . . . 136 5.1 MAC Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 5.2 Unsynchronized Low Duty-Cycle MAC Protocols . . . . . . 137 5.3 Synchronized Low Duty-Cycle MAC Protocols . . . . . . . . 138 5.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 6 Routing and Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.1 Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.2 Routing Paradigms and Technologies . . . . . . . . . . . . . . . . . 144 6.3 Hybrid Transport and Routing Protocols . . . . . . . . . . . . . . 147 7 Embedded WSN Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.1 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 7.2 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 8 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 8.1 Low-Energy TUTWSN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 8.2 Low-Latency TUTWSN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Signal Processing for Cryptography and Security Applications . . . . . . . . . 161 Miroslav Kneževi´c1, Lejla Batina1 , Elke De Mulder1 , Junfeng Fan1 , Benedikt Gierlichs1 , Yong Ki Lee2 , Roel Maes1 and Ingrid Verbauwhede1,2 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 2 Efficient Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 2.1 Secret-Key Algorithms and Implementations . . . . . . . . . . . 162 2.2 Public-Key Algorithms and Implementations . . . . . . . . . . 164 2.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 3 Secure Implementations: Side Channel Attacks . . . . . . . . . . . . . . . . 170 3.1 DSP Techniques Used in Side Channel Analysis . . . . . . . . 171 4 Working with Fuzzy Secrets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 4.1 Fuzzy Secrets: Properties and Applications in Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 4.2 Generating Cryptographic Keys from Fuzzy Secrets . . . . . 173 5 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
Contents
xv
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 High-Energy Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Anthony Gregerson, Michael J. Schulte, and Katherine Compton 1 Introduction to High-Energy Physics . . . . . . . . . . . . . . . . . . . . . . . . . 179 2 The Compact Muon Solenoid Experiment . . . . . . . . . . . . . . . . . . . . . 182 2.1 Physics Research at the CMS Experiment . . . . . . . . . . . . . 183 2.2 Sensors and Triggering Systems . . . . . . . . . . . . . . . . . . . . . 184 2.3 The Super Large Hadron Collider Upgrade . . . . . . . . . . . . 188 3 Characteristics of High-Energy Physics Applications . . . . . . . . . . . 189 3.1 Technical Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 3.2 Characteristics of the Development Process . . . . . . . . . . . . 192 3.3 Evolving Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 3.4 Functional Verification and Testing . . . . . . . . . . . . . . . . . . . 194 3.5 Unit Modification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 4 Techniques for Managing Design Challenges . . . . . . . . . . . . . . . . . . 196 4.1 Managing Hardware Cost and Complexity . . . . . . . . . . . . . 196 4.2 Managing Variable Latency and Variable Resource Use . . 198 4.3 Scalable and Modular Designs . . . . . . . . . . . . . . . . . . . . . . . 201 4.4 Dataflow-Based Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 4.5 Language-Independent Functional Testing . . . . . . . . . . . . . 203 4.6 Efficient Design Partitioning . . . . . . . . . . . . . . . . . . . . . . . . 204 5 Example Problems from the CMS Level-1 Trigger . . . . . . . . . . . . . . 204 5.1 Particle Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 5.2 Jet Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Medical Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Raj Shekhar, Vivek Walimbe, and William Plishker 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 2 Medical Imaging Modalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 2.1 Radiography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 2.2 Computed Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 2.3 Nuclear Medicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 2.4 Magnetic Resonance Imaging . . . . . . . . . . . . . . . . . . . . . . . 215 2.5 Ultrasound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 3 Medical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 3.1 Imaging Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 4 Image Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 4.1 CT Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 4.2 PET/SPECT Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . 223 4.3 Reconstruction in MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 4.4 Ultrasound Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . 224 5 Image Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 6 Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
xvi
Contents
6.1 Classification of Segmentation Techniques . . . . . . . . . . . . 227 6.2 Threshold-based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 6.3 Deformable Model-based . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 7 Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 7.1 Geometry-based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 7.2 Intensity-based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 7.3 Transformation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 8 Image Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 8.1 Surface Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 8.2 Volume Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 9 Medical Image Processing Platforms . . . . . . . . . . . . . . . . . . . . . . . . . 237 9.1 Computing Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 9.2 Platforms For Accelerated Implementation . . . . . . . . . . . . 238 10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 11 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 Signal Processing for Audio HCI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Dmitry N. Zotkin and Ramani Duraiswami 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 2 Microphone Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 2.1 Source Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 2.2 Beamforming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 2.3 Practical Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 3 Audio Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 3.1 Head-related transfer function . . . . . . . . . . . . . . . . . . . . . . . 248 3.2 Reverberation Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 3.3 User Motion Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 3.4 Signal Processing Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . 250 4 Sample Application: Fast HRTF Measurement System . . . . . . . . . . 252 5 Sample Application: Auditory Scene Capture and Reproduction . . 256 6 Sample Application: Acoustic Field Visualization . . . . . . . . . . . . . . 260 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Distributed Smart Cameras and Distributed Computer Vision . . . . . . . . . . 267 Marilyn Wolf and Jason Schlessman 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 2 Approaches to Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 3 Early Work in Distributed Smart Cameras . . . . . . . . . . . . . . . . . . . . . 270 4 Approaches to Distributed Computer Vision . . . . . . . . . . . . . . . . . . . 270 4.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 4.2 Platform Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 4.3 Mid-Level Fusion for Gesture Recognition . . . . . . . . . . . . 275 4.4 Target-Level Fusion for Tracking . . . . . . . . . . . . . . . . . . . . 275 4.5 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 4.6 Sparse Camera Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
Contents
xvii
5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 Part II Architectures Managing Editor: Jarmo Takala Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 Oscar Gustafsson and Lars Wanhammar 1 Number representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 1.1 Binary representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 1.2 Two’s complement representation . . . . . . . . . . . . . . . . . . . . 284 1.3 Redundant representations . . . . . . . . . . . . . . . . . . . . . . . . . . 284 1.4 Shifting and increasing the wordlength . . . . . . . . . . . . . . . . 286 1.5 Negation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 1.6 Finite-wordlength effects . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 2 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 2.1 Ripple-carry addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 2.2 Carry-lookahead addition . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 2.3 Carry-select and conditional sum addition . . . . . . . . . . . . . 295 2.4 Multi-operand addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 3 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 3.1 Partial product generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 3.2 Summation structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 3.3 Vector merging adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 3.4 Multiply-accumulate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 3.5 Multiplication by a constant . . . . . . . . . . . . . . . . . . . . . . . . . 305 3.6 Distributed arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 4 Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 4.1 Restoring and nonrestoring division . . . . . . . . . . . . . . . . . . 313 4.2 SRT division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 4.3 Speeding up division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 4.4 Square root extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 5 Floating-point representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 5.1 Normalized representations . . . . . . . . . . . . . . . . . . . . . . . . . 316 5.2 IEEE 754 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 5.3 Addition and subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 5.4 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 5.5 Quantization error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 6 Computation of elementary function . . . . . . . . . . . . . . . . . . . . . . . . . 319 6.1 CORDIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 6.2 Polynomial and piecewise polynomial approximations . . 321 6.3 Table-based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 7 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
xviii
Contents
Application-Specific Accelerators for Communications . . . . . . . . . . . . . . . . 329 Yang Sun, Kiarash Amiri, Michael Brogioli, and Joseph R. Cavallaro 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 1.1 Coarse Grain Versus Fine Grain Accelerator Architectures331 1.2 Hardware/Software Workload Partition Criteria . . . . . . . . 333 2 Hardware Accelerators for Communications . . . . . . . . . . . . . . . . . . . 335 2.1 MIMO Channel Equalization Accelerator . . . . . . . . . . . . . 336 2.2 MIMO Detection Accelerators . . . . . . . . . . . . . . . . . . . . . . . 338 2.3 Channel Decoding Accelerators . . . . . . . . . . . . . . . . . . . . . 344 3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 FPGA-based DSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 John McAllister 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 2 FPGA Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 2.1 FPGA Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 2.2 The Evolution of FPGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 3 State-of-the-art FPGA Technology for DSP Applications . . . . . . . . 368 3.1 FPGA Programmable Logic Technology . . . . . . . . . . . . . . 369 3.2 FPGA DSP-datapath for DSP . . . . . . . . . . . . . . . . . . . . . . . 371 3.3 FPGA Memory Hierarchy in DSP System Implementation374 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376 4 Core-Based Implementation of FPGA DSP Systems . . . . . . . . . . . . 377 4.1 Strategy Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 4.2 Core-based Implementation of DSP Applications from Simulink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378 4.3 Other Core-based Development Approaches . . . . . . . . . . . 379 5 Programmable Processor-Based FPGA DSP System Design . . . . . . 380 5.1 Application Specific Microprocessor Technology for FPGA DSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 5.2 Reconfigurable Processor Design for FPGA . . . . . . . . . . . 384 5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386 6 System Level Design for FPGA DSP . . . . . . . . . . . . . . . . . . . . . . . . . 387 6.1 Deadalus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387 6.2 Compaan/LAURA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388 6.3 Koski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388 6.4 Requirements for FPGA DSP System Design . . . . . . . . . . 389 7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390 General-Purpose DSP Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 Jarmo Takala 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 2 Arithmetic Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 3 Data Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
Contents
xix
3.1 Fixed-Point Data Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 3.2 Floating-Point Data Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . 402 3.3 Special Function Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 4 Memory Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404 5 Address Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 6 Program Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 8 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412 Application Specific Instruction Set DSP Processors . . . . . . . . . . . . . . . . . . 415 Dake Liu 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 1.1 ASIP Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 1.2 Difference Between ASIP and General CPU . . . . . . . . . . . 416 2 DSP ASIP Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 3 Architecture and Instruction Set Design . . . . . . . . . . . . . . . . . . . . . . . 420 3.1 Code Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420 3.2 Using an Available Architecture . . . . . . . . . . . . . . . . . . . . . 422 3.3 Generating a Custom Architecture . . . . . . . . . . . . . . . . . . . 425 4 Instruction Set Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 4.1 Instruction Set Specification . . . . . . . . . . . . . . . . . . . . . . . . . 428 4.2 Instructions for Functional Acceleration . . . . . . . . . . . . . . . 429 4.3 Instruction Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430 4.4 Instruction and Architecture Optimization . . . . . . . . . . . . . 431 4.5 Design Automation of an Instruction Set Architecture . . . 433 5 ASIP Instruction Set for Radio Baseband Processors . . . . . . . . . . . . 434 6 ASIP Instruction Set for Video and Image Compression . . . . . . . . . 437 7 Programming Toolchain and Benchmarking . . . . . . . . . . . . . . . . . . . 439 8 ASIP Firmware Design and Benchmarking . . . . . . . . . . . . . . . . . . . . 440 8.1 Firmware Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441 8.2 Benchmarking and Instruction Usage Profiling . . . . . . . . . 442 9 Microarchitecture Design for ASIP . . . . . . . . . . . . . . . . . . . . . . . . . . 444 10 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446 Coarse-Grained Reconfigurable Array Architectures . . . . . . . . . . . . . . . . . 449 Bjorn De Sutter, Praveen Raghavan, Andy Lambrechts 1 Application Domain of Coarse-Grained Reconfigurable Arrays . . . 449 2 CGRA Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 3 CGRA Design Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 3.1 Tight versus Loose Coupling . . . . . . . . . . . . . . . . . . . . . . . . 454 3.2 CGRA Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456 3.3 Interconnects and Register Files . . . . . . . . . . . . . . . . . . . . . 461 3.4 Computational Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 3.5 Memory Hierarchies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
xx
Contents
3.6 Compiler Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466 Case Study: ADRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468 4.1 Mapping Loops onto ADRES CGRAs . . . . . . . . . . . . . . . . 469 4.2 ADRES Design Space Exploration . . . . . . . . . . . . . . . . . . . 475 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481 4
Multi-core Systems on Chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 Luigi Carro and Mateus Beck Rutzig 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 2 Analytical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 2.1 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 492 2.2 Energy Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496 2.3 DSP Applications on a MPSoC Organization . . . . . . . . . . 497 3 MPSoC Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500 3.1 Architecture View Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501 3.2 Organization View Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 4 Successful MPSoC Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504 4.1 OMAP Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504 4.2 Samsung S3C6410/S5PC100 . . . . . . . . . . . . . . . . . . . . . . . . 506 4.3 Other MPSoCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508 5 Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510 5.1 Interconnection Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . 510 5.2 MPSoC Programming Models . . . . . . . . . . . . . . . . . . . . . . . 511 6 Future Research Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 DSP Systems using Three-Dimensional (3D) Integration Technology . . . . . 515 Tong Zhang, Yangyang Pan, and Yiran Li 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 2 Overview of 3D Integration Technology . . . . . . . . . . . . . . . . . . . . . . 517 3 3D Logic-Memory Integration Paradigm for DSP Systems: Rationale and Opportunities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518 4 3D DRAM Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 5 Case Study I: 3D Integrated VLIW Digital Signal Processor . . . . . 523 5.1 3D DRAM Stacking in VLIW Digital Signal Processors . 523 5.2 3D DRAM L2 Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524 5.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526 6 Case Study II: 3D Integrated H.264 Video Encoder . . . . . . . . . . . . . 527 6.1 H.264 Video Encoding Basics . . . . . . . . . . . . . . . . . . . . . . . 528 6.2 3D DRAM Stacking in H.264 Video Encoder . . . . . . . . . . 530 6.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
Contents
xxi
Mixed Signal Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543 Olli Vainio 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543 2 General Properties of Data Converters . . . . . . . . . . . . . . . . . . . . . . . . 544 2.1 Sample and Hold Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 2.2 Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547 2.3 Performance Specifications . . . . . . . . . . . . . . . . . . . . . . . . . 548 3 Analog to Digital Converters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549 3.1 Nyquist-Rate A/D Converters . . . . . . . . . . . . . . . . . . . . . . . 550 3.2 Oversampled A/D Converters . . . . . . . . . . . . . . . . . . . . . . . 553 4 Digital to Analog Converters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 4.1 Nyquist-rate D/A Converters . . . . . . . . . . . . . . . . . . . . . . . . 558 4.2 Oversampled D/A Converters . . . . . . . . . . . . . . . . . . . . . . . 560 5 Switched-Capacitor Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561 5.1 Amplifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562 5.2 Switched Capacitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562 5.3 Integrators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563 5.4 First-Order Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564 5.5 Second-Order Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565 5.6 Differential SC Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566 5.7 Performance Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 6 Frequency Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 6.1 Phase-Locked Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 6.2 Fractional-N Frequency Synthesis . . . . . . . . . . . . . . . . . . . . 568 6.3 Delay-Locked Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569 6.4 Direct Digital Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571 Part III Programming and Simulation Tools Managing Editor: Rainer Leupers C Compilers and Code Optimization for DSPs . . . . . . . . . . . . . . . . . . . . . . . 575 Björn Franke 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575 2 C Compilers for DSPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576 2.1 Industrial Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576 2.2 Compiler and Language Extensions . . . . . . . . . . . . . . . . . . 577 2.3 Compiler Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578 3 Code Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579 3.1 Address Code Optimizations . . . . . . . . . . . . . . . . . . . . . . . . 580 3.2 Control Flow Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . 582 3.3 Loop Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 3.4 Memory Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592 3.5 Optimizations for Code Size . . . . . . . . . . . . . . . . . . . . . . . . 594 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
xxii
Contents
Compiling for VLIW DSPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603 Christoph W. Kessler 1 VLIW DSP architecture concepts and resource modeling . . . . . . . . 603 2 Case study: TI ’C62x DSP processor family . . . . . . . . . . . . . . . . . . . 611 3 VLIW DSP code generation overview . . . . . . . . . . . . . . . . . . . . . . . . 615 4 Instruction selection and resource allocation . . . . . . . . . . . . . . . . . . . 617 5 Cluster assignment for clustered VLIW architectures . . . . . . . . . . . 619 6 Register allocation and generalized spilling . . . . . . . . . . . . . . . . . . . . 620 7 Instruction scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622 7.1 Local instruction scheduling . . . . . . . . . . . . . . . . . . . . . . . . 623 7.2 Modulo scheduling for loops . . . . . . . . . . . . . . . . . . . . . . . . 624 7.3 Global instruction scheduling . . . . . . . . . . . . . . . . . . . . . . . 627 7.4 Generated instruction schedulers . . . . . . . . . . . . . . . . . . . . . 629 8 Integrated code generation for VLIW and clustered VLIW . . . . . . . 630 9 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634 Software Compilation Techniques for MPSoCs . . . . . . . . . . . . . . . . . . . . . . . 639 Rainer Leupers, Weihua Sheng, Jeronimo Castrillon 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640 1.1 MPSoCs and MPSoC Compilers . . . . . . . . . . . . . . . . . . . . . 640 1.2 Challenges of Building MPSoC Compilers . . . . . . . . . . . . 641 2 Foundation Elements of MPSoC Compilers . . . . . . . . . . . . . . . . . . . 642 2.1 Programming Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643 2.2 Granularity, Parallelism and Intermediate Representation 649 2.3 Platform Description for MPSoC compilers . . . . . . . . . . . . 656 2.4 Mapping and Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . 659 2.5 Code Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664 3 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675 DSP Instruction Set Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679 Florian Brandner and Nigel Horspool and Andreas Krall 1 Introduction/Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679 2 Interpretive Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681 2.1 The Classic Interpreter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682 2.2 Threaded Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683 2.3 Interpreter Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . 685 3 Compiled Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686 3.1 Basic Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686 3.2 Simulation of Pipelined Processors . . . . . . . . . . . . . . . . . . . 688 3.3 Static Binary Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . 689 4 Dynamically Compiled Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 690 4.1 Basic Simulation Principles . . . . . . . . . . . . . . . . . . . . . . . . . 690 4.2 Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 690 5 Hardware Supported Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691
Contents
xxiii
5.1 Interfacing Simulators with Hardware . . . . . . . . . . . . . . . . 691 5.2 Simulation in Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . 692 6 Generation of Instruction Set Simulators . . . . . . . . . . . . . . . . . . . . . . 694 6.1 Processor Description Languages . . . . . . . . . . . . . . . . . . . . 694 6.2 Retargetable Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696 7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702 Optimization of Number Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . 707 Wonyong Sung 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707 2 Fixed-point Data Type and Arithmetic Rules . . . . . . . . . . . . . . . . . . 708 2.1 Fixed-Point Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708 2.2 Fixed-point Arithmetic Rules . . . . . . . . . . . . . . . . . . . . . . . . 710 2.3 Fixed-point Conversion Examples . . . . . . . . . . . . . . . . . . . . 711 3 Range Estimation for Integer Word-length Determination . . . . . . . . 713 3.1 L1-norm based Range Estimation . . . . . . . . . . . . . . . . . . . . 714 3.2 Simulation based Range Estimation . . . . . . . . . . . . . . . . . . 714 3.3 C++ Class based Range Estimation Utility . . . . . . . . . . . . . 715 4 Floating-point to Integer C Code Conversion . . . . . . . . . . . . . . . . . . 717 4.1 Fixed-Point Arithmetic Rules in C Programs . . . . . . . . . . . 717 4.2 Expression Conversion Using Shift Operations . . . . . . . . . 719 4.3 Integer Code Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 720 4.4 Implementation Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 721 5 Word-length Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726 5.1 Finite Word-length Effects . . . . . . . . . . . . . . . . . . . . . . . . . . 726 5.2 Fixed-point Simulation using C++ gFix Library . . . . . . . 728 5.3 Word-length Optimization Method . . . . . . . . . . . . . . . . . . . 729 5.4 Optimization Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735 6 Summary and Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737 Intermediate Representations for Simulation and Implementation . . . . . . 739 Jerker Bengtsson 1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739 2 Untimed Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 742 2.1 System Property Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . 742 2.2 Functions Driven by State Machines . . . . . . . . . . . . . . . . . . 746 3 Timed Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 752 3.1 Job Configuration Networks . . . . . . . . . . . . . . . . . . . . . . . . . 752 3.2 IPC Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753 3.3 Timed Configuration Graphs . . . . . . . . . . . . . . . . . . . . . . . . 758 3.4 Set of Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758 3.5 Construction of Timed Configuration Graphs . . . . . . . . . . 762 4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766
xxiv
Contents
Embedded C for Digital Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 769 Bryan E. Olivier 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769 2 Typical DSP Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 770 3 Fixed Point Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 771 3.1 Fixed Point Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 772 3.2 Fixed Point Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773 3.3 Fixed Point Conversions And Arithmetical Operations . . 774 3.4 Fixed Point Support Functions . . . . . . . . . . . . . . . . . . . . . . . 777 4 Memory Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 778 5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 780 5.1 FIR Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 780 5.2 Sine Function in Fixed Point . . . . . . . . . . . . . . . . . . . . . . . . 781 6 Named Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 782 7 Hardware I/O Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783 8 History and Future of Embedded C . . . . . . . . . . . . . . . . . . . . . . . . . . 785 9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787 Part IV Design Methods Managing Editor: Ed Deprettere Signal Flow Graphs and Data Flow Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 791 Keshab K. Parhi and Yanni Chen 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791 2 Signal Flow Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 792 2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 792 2.2 Transfer Function Derivation of SFG . . . . . . . . . . . . . . . . . 793 3 Data Flow Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796 3.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796 3.2 Synchronous Data Flow Graph . . . . . . . . . . . . . . . . . . . . . . 797 3.3 Construct an Equivalent Single-rate DFG from the Multi-rate DFG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797 3.4 Equivalent Data Flow Graphs . . . . . . . . . . . . . . . . . . . . . . . 799 4 Applications to Hardware Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 802 4.1 Unfolding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803 4.2 Folding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808 5 Applications to Software Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 811 5.1 Intra-iteration and Inter-iteration Precedence Constraints . 811 5.2 Definition of Critical Path and Iteration Bound . . . . . . . . . 811 5.3 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816
Contents
xxv
Systolic Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817 Yu Hen Hu and Sun-Yuan Kung 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817 2 Systolic Array Computing Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 819 2.1 Convolution Systolic Array . . . . . . . . . . . . . . . . . . . . . . . . . 819 2.2 Linear System Solver Systolic Array . . . . . . . . . . . . . . . . . 820 2.3 Sorting Systolic Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822 3 Formal Systolic Array Design Methodology . . . . . . . . . . . . . . . . . . . 823 3.1 Loop Representation, Regular Iterative Algorithm (RIA), and Index Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823 3.2 Localized and Single Assignment Algorithm Formulation 824 3.3 Data Dependence and Dependence Graph . . . . . . . . . . . . . 826 3.4 Mapping an Algorithm to a Systolic Array . . . . . . . . . . . . . 827 3.5 Linear Schedule and Assignment . . . . . . . . . . . . . . . . . . . . 829 4 Wavefront Array Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831 4.1 Synchronous versus Asynchronous Global On-chip Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831 4.2 Wavefront Array Processor Architecture . . . . . . . . . . . . . . 832 4.3 Mapping Algorithms to Wavefront Arrays . . . . . . . . . . . . . 832 4.4 Example: Wavefront Processing for Matrix Multiplication 833 4.5 Comparison of Wavefront Arrays against Systolic Arrays 835 5 Hardware Implementations of Systolic Array . . . . . . . . . . . . . . . . . . 835 5.1 Warp and iWARP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835 5.2 SAXPY Matrix-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 837 5.3 Transputer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839 5.4 TMS 32040 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 840 6 Recent Developments and Real World Applications . . . . . . . . . . . . . 841 6.1 Block Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 841 6.2 Wireless Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . 844 7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848 Decidable Signal Processing Dataflow Graphs: Synchronous and Cyclo -Static Dataflow Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 851 Soonhoi Ha and Hyunok Oh 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 851 2 SDF (Synchronous Dataflow) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852 2.1 Static Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853 2.2 Software Synthesis from SDF graph . . . . . . . . . . . . . . . . . . 855 2.3 Static Scheduling Techniques . . . . . . . . . . . . . . . . . . . . . . . 858 3 Cyclo-Static Data Flow (CSDF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 860 3.1 Static Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 861 3.2 Static Scheduling and Buffer Size Reduction . . . . . . . . . . . 862 3.3 Hierarchical Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . 863 4 Other Decidable Dataflow Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 864
xxvi
Contents
4.1 FRDF (Fractional Rate Data Flow) . . . . . . . . . . . . . . . . . . . 864 4.2 SPDF (Synchronous Piggybacked Data Flow) . . . . . . . . . . 868 4.3 SSDF (Scalable SDF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 871 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873 Mapping Decidable Signal Processing Graphs into FPGA Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875 Roger Woods 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875 1.1 Chapter breakdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 876 2 FPGA hardware platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 877 2.1 FPGA logic blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 878 2.2 FPGA DSP functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . 879 2.3 FPGA memory organization . . . . . . . . . . . . . . . . . . . . . . . . 880 2.4 FPGA design strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 881 3 Circuit architecture derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 882 3.1 Basic mapping of DSP functionality to FPGA . . . . . . . . . . 883 3.2 Retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884 3.3 Cut-set theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885 3.4 Application to recursive structures . . . . . . . . . . . . . . . . . . . 888 4 Circuit architecture optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 890 4.1 Folding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 890 4.2 Unfolding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 892 4.3 Adaptive LMS filter example . . . . . . . . . . . . . . . . . . . . . . . . 892 4.4 Towards FPGA-based IP core generation . . . . . . . . . . . . . . 894 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895 5.1 Incorporation of FPGA cores into data flow based systems896 5.2 Application of techniques for low power FPGA . . . . . . . . 896 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 897 Dynamic and Multidimensional Dataflow Graphs . . . . . . . . . . . . . . . . . . . . 899 Shuvra S. Bhattacharyya, Ed F. Deprettere, and Joachim Keinert 1 Multidimensional synchronous data flowgraphs (MDSDF) . . . . . . . 899 1.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 900 1.2 Arbitrary sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 901 1.3 A complete example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904 1.4 Initial tokens on arcs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905 2 Windowed synchronous/cyclo-static dataflow . . . . . . . . . . . . . . . . . . 906 2.1 Windowed synchronous/cyclo-static data flow graphs . . . 907 2.2 Balance equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 909 3 Motivation for dynamic DSP-oriented dataflow models . . . . . . . . . . 912 4 Boolean dataflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 914 5 The Stream-based Function Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 916 5.1 The concern for an actor model . . . . . . . . . . . . . . . . . . . . . . 916 5.2 The stream-based function actor model . . . . . . . . . . . . . . . 917 5.3 The formal model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 918
Contents
xxvii
5.4 Communication and scheduling . . . . . . . . . . . . . . . . . . . . . . 919 5.5 Composition of SBF actors . . . . . . . . . . . . . . . . . . . . . . . . . 920 6 CAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 921 7 Parameterized dataflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 923 8 Enable-invoke dataflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926 9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 928 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 929 Polyhedral Process Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 931 Sven Verdoolaege 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 931 2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 932 3 Polyhedral Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934 3.1 Polyhedral Sets and Relations . . . . . . . . . . . . . . . . . . . . . . . 934 3.2 Lexicographic Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 937 3.3 Polyhedral Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 937 3.4 Piecewise Quasi-Polynomials . . . . . . . . . . . . . . . . . . . . . . . 939 3.5 Polyhedral Process Networks . . . . . . . . . . . . . . . . . . . . . . . . 939 4 Polyhedral Analysis Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 940 4.1 Parametric Integer Programming . . . . . . . . . . . . . . . . . . . . . 941 4.2 Emptiness Check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 942 4.3 Parametric Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 942 4.4 Computing Parametric Upper Bounds . . . . . . . . . . . . . . . . 943 4.5 Polyhedral Scanning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 944 5 Dataflow Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 946 5.1 Standard Dataflow Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 946 5.2 Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 949 6 Channel Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 950 6.1 FIFOs and Reordering Channels . . . . . . . . . . . . . . . . . . . . . 950 6.2 Multiplicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 951 6.3 Internal Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 953 7 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 954 7.1 Two Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955 7.2 More than Two Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 956 7.3 Blocking Writes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 957 7.4 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 959 8 Buffer Size Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 961 8.1 FIFOs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 961 8.2 Reordering Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 962 8.3 Accuracy of the Buffer Sizes . . . . . . . . . . . . . . . . . . . . . . . . 964 9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964
xxviii
Contents
Kahn Process Networks and a Reactive Extension . . . . . . . . . . . . . . . . . . . . 967 Marc Geilen and Twan Basten 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 968 2 Denotational Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 971 3 Operational Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 973 3.1 Labeled transition systems . . . . . . . . . . . . . . . . . . . . . . . . . . 974 3.2 Operational Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 977 4 The Kahn Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 979 5 Analizability Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 981 6 Implementing Kahn Process Networks . . . . . . . . . . . . . . . . . . . . . . . 982 6.1 Implementing Atomic Processes . . . . . . . . . . . . . . . . . . . . . 982 6.2 Correctness Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 982 6.3 Run-time Scheduling and Buffer Management . . . . . . . . . 984 7 Extensions of KPN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987 8 Reactive Process Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 990 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 990 8.2 Design Considerations of RPN . . . . . . . . . . . . . . . . . . . . . . 994 8.3 Operational Semantics of RPN . . . . . . . . . . . . . . . . . . . . . . 996 8.4 Implementation Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 998 8.5 Analyzable Models Embedded in RPN . . . . . . . . . . . . . . . . 1000 9 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1000 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1003 Methods and Tools for Mapping Process Networks onto Multi-Processor Systems-On-Chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1007 Iuliana Bacivarov, Wolfgang Haid, Kai Huang, Lothar Thiele 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1007 2 KPN Design Flows for Multiprocessor Systems . . . . . . . . . . . . . . . . 1009 3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1011 3.1 System Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1012 3.2 System Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1013 3.3 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014 3.4 Design Space Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . 1017 4 Specification, Synthesis, Analysis, and Optimization in DOL . . . . . 1019 4.1 Distributed Operation Layer . . . . . . . . . . . . . . . . . . . . . . . . . 1019 4.2 System Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1020 4.3 System Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1022 4.4 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1026 4.5 Design Space Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . 1030 4.6 Results of the DOL Framework . . . . . . . . . . . . . . . . . . . . . . 1033 5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1036 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1037
Contents
xxix
Integrated Modeling using Finite State Machines and Dataflow Graphs . . 1041 Joachim Falk, Joachim Keinert, Christian Haubelt, Jürgen Teich, and Christian Zebelein 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1041 2 Modeling Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1042 2.1 *charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1042 2.2 The California Actor Language . . . . . . . . . . . . . . . . . . . . . . 1046 2.3 Extended Codesign Finite State Machines . . . . . . . . . . . . . 1049 2.4 SysteMoC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1053 3 Design Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057 3.1 Ptolemy II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057 3.2 The OpenDF Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . 1058 3.3 SystemCoDesigner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1059 4 Integrated Finite State Machines for Multidimensional Data Flow . 1064 4.1 Array-OL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065 4.2 Windowed Data Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1069 4.3 Exploiting Knowledge about the Model of Computation . 1073 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1074 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1077
List of Contributors
Kiarash Amiri Rice University, 6100 Main Street, Houston, TX 77005, USA, e-mail: [email protected] Iuliana Bacivarov ETH Zurich, Computer Engineering and Networks Laboratory, CH-8092 Zurich, Switzerland, e-mail: [email protected] Hyeon-min Bae KAIST, Daejeon, South Korea, e-mail: [email protected] Twan Basten Eindhoven University of Technology, and Embedded Systems Institute, Den Dolech 2, 5612AZ, Eindhoven, The Netherlands, e-mail: [email protected] Lejla Batina Katholieke Universiteit Leuven, ESAT/COSIC and IBBT, Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium, e-mail: lejla.batina@esat. kuleuven.be Jerker Bengtsson Centre for Research on Embedded Systems, Halmstad University, Box 823, 301 18 Halmstad, Sweden, e-mail: [email protected] Shuvra S. Bhattacharyya University of Maryland, Dept. of Electrical and Computer Engineering, and Institute for Advanced Computer Studies, College Park MD 20742, e-mail: [email protected] Florian Brandner Institut für Computersprachen, Technische Universität Wien, Argentinierstraße 8, A-1040 Wien, Austria, e-mail: [email protected] Michael Brogioli xxxi
xxxii
List of Contributors
Freescale Semiconductor Inc., 7700 W. Parmer Lane, MD: PL63, Austin, TX 78729, USA, e-mail: [email protected] Luigi Carro Federal University of Rio Grande do Sul, Bento Gonçalves 9500, Porto Alegre, RS, Brazil, e-mail: [email protected] Jeronimo Castrillon RWTH Aachen University, Software for Systems on Silicon, Templergraben 55, 52056 Aachen, Germany, e-mail: [email protected] Joseph R. Cavallaro Rice University, 6100 Main Street, Houston, TX 77005, USA, e-mail: [email protected] Liang-Gee Chen Graduate Institute of Electronics Engineering and Department of Electrical Engineering, National Taiwan University, No. 1 Sec. 4 Roosevelt Rd., Taipei 10617, Taiwan, R.O.C., e-mail: [email protected] Yanni Chen Marvell Semiconductor Incorporation, 5488 Marvell Lane, Santa Clara, CA 95054, USA, e-mail: [email protected] Yu-Han Chen Graduate Institute of Electronics Engineering and Department of Electrical Engineering, National Taiwan University, No. 1 Sec. 4 Roosevelt Rd., Taipei 10617, Taiwan, R.O.C., e-mail: [email protected] Katherine Compton University of Wisconsin, Madison, WI, e-mail: [email protected] Elke De Mulder Katholieke Universiteit Leuven, ESAT/COSIC and IBBT, Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium, e-mail: elke.demulder@esat. kuleuven.be Ed F. Deprettere Leiden University, Leiden Institute of Advanced Computer Science, Niels Bohrweg 1, 2333 CA Leiden, The Netherlands, e-mail: [email protected] Bjorn De Sutter Computer Systems Lab, Ghent University, Sint-Pietersnieuwstraat 41, 9000 Gent, Belgium, and Department of Electronics and Informatics, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussel, Belgium, e-mail: bjorn.desutter@elis. ugent.be Ramani Duraiswami Department of Computer Science and Institute for Advanced Computer Studies (UMIACS), University of Maryland, College Park MD 20742 USA, e-mail: [email protected]
List of Contributors
xxxiii
Joachim Falk University of Erlangen-Nuremberg, Hardware-Software-Co-Design, Am Weichselgarten 3, 91058 Erlangen, Germany, e-mail: joachim.falk@ informatik.uni-erlangen.de Junfeng Fan Katholieke Universiteit Leuven, ESAT/COSIC and IBBT, Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium, e-mail: [email protected]. be Björn Franke Institute for Computing Systems Architecture, School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh, EH8 9AB, United Kingdom, e-mail: [email protected] Marc Geilen Eindhoven University of Technology, Den Dolech 2, 5612AZ, Eindhoven, The Netherlands, e-mail: [email protected] Benedikt Gierlichs Katholieke Universiteit Leuven, ESAT/COSIC and IBBT, Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium, e-mail: benedikt.gierlichs@ esat.kuleuven.be Anthony Gregerson University of Wisconsin, Madison, WI, e-mail: [email protected] Oscar Gustafsson Division of Electronics Systems, Department of Electrical Engineering, Linköping University, SE-581 83 Linköping, Sweden, e-mail: [email protected] Soonhoi Ha Seoul National University, 599 Gwanak-ro, Gwanak-gu, Seoul 151-744, Korea, e-mail: [email protected] Wolfgang Haid ETH Zurich, Computer Engineering and Networks Laboratory, CH-8092 Zurich, Switzerland, e-mail: [email protected] Timo D. Hämäläinen Department of Computer Systems, Tampere University of Technology, P.O. Box 553, FI-33101 Tampere, Finland, e-mail: [email protected] Marko Hännikäinen Department of Computer Systems, Tampere University of Technology, P.O. Box 553, FI-33101 Tampere, Finland, e-mail: [email protected] Christian Haubelt University of Erlangen-Nuremberg, Hardware-Software-Co-Design, Am Weichselgarten 3, 91058 Erlangen, Germany, e-mail: christian.haubelt@ informatik.uni-erlangen.de
xxxiv
List of Contributors
Nigel Horspool Department of Computer Science, University of Victoria, Victoria, BC V8W 3P6, Canada, e-mail: [email protected] Yu Hen Hu University of Wisconsin - Madison, Dept. Electrical and Computer Engineering, 1415 Engineering Drive, Madison, WI 53706, e-mail: [email protected] Kai Huang ETH Zurich, Computer Engineering and Networks Laboratory, CH-8092 Zurich, Switzerland, e-mail: [email protected] Jörn W. Janneck Department of Automatic Control, Lund University, SE-221 00 Lund, Sweden, e-mail: [email protected] Ville Kaseva Department of Computer Systems, Tampere University of Technology, P.O. Box 553, FI-33101 Tampere, Finland, e-mail: [email protected] Joachim Keinert Fraunhofer Institute for Integrated Circuits IIS, Am Wolfsmantel 33, 91058 Erlangen, Germany, e-mail: [email protected] Christoph W. Kessler Dept. of Computer and Information Science, Linköping University, 58183 Linköping, Sweden, e-mail: [email protected] Miroslav Knezevic Katholieke Universiteit Leuven, ESAT/COSIC and IBBT, Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium, e-mail: miroslav.knezevic@esat. kuleuven.be Mikko Kohvakka Department of Computer Systems, Tampere University of Technology, P.O. Box 553, FI-33101 Tampere, Finland, e-mail: [email protected] Konstantinos Konstantinides, x e-mail: [email protected] Andreas Krall Institut für Computersprachen, Technische Universität Wien, Argentinierstraße 8, A-1040 Wien, Austria, e-mail: [email protected] S. Y. Kung Princeton University, e-mail: [email protected] Andy Lambrechts IMEC, Kapeldreef 75, 3001 Heverlee, Belgium, e-mail: [email protected] Yong Ki Lee University of California, Los Angeles, Electrical Engineering, 420 Westwood
List of Contributors
xxxv
Plaza, Los Angeles, CA 90095-1594, USA, e-mail: [email protected] Rainer Leupers RWTH Aachen University, Software for Systems on Silicon, Templergraben 55, 52056 Aachen, Germany, e-mail: [email protected] William S. Levine Department of ECE, University of Maryland, College Park, MD 20742, USA, e-mail: [email protected] Yiran Li Rensselaer Polytechnic Institute, 110 8th street, Troy, NY 12180, USA, e-mail: [email protected] Dake Liu Department of Electrical Engineering, Linköping University, 58183, Linköping, Sweden, e-mail: [email protected] Roel Maes Katholieke Universiteit Leuven, ESAT/COSIC and IBBT, Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium, e-mail: [email protected] Marco Mattavelli GR-SCI-MM, EPFL, CH-1015 Lausanne, Switzerland, e-mail: marco. [email protected] John McAllister Institute of Electronics, Communications and Information Technology (ECIT), Queen’s University Belfast, Northern Ireland Science Park, Queen’s Road, Queen’s Island, Belfast BT 9DT, UK, e-mail: [email protected] Hyunok Oh Hanyang University, Haengdangdong, Seongdonggu, Seoul 133-791, Korea, e-mail: [email protected] Bryan E. Olivier ACE Associated Compiler Experts bv., De Ruyterkade 113, 1011 AB Amsterdam, The Netherlands, e-mail: [email protected] Yangyang Pan Rensselaer Polytechnic Institute, 110 8th street, Troy, NY 12180, USA, e-mail: [email protected] Keshab K. Parhi University of Minnesota, Dept. of Elect. & Comp. Eng., Minneapolis, MN 55455, USA, e-mail: [email protected] William Plishker Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland, 22 S. Greene St., Baltimore, MD 21201, USA, and Department of
xxxvi
List of Contributors
Electrical and Computer Engineering, University of Maryland, 1423 AV Williams, College Park, MD 20742, USA, e-mail: [email protected] Praveen Raghavan IMEC, Kapeldreef 75, 3001 Heverlee, Belgium, e-mail: [email protected] Mickaël Raulet IETR/INSA Rennes, F-35708, Rennes, France, e-mail: mickael.raulet@ insa-rennes.fr Mateus Rutzig Federal University of Rio Grande do Sul, Bento Gonçalves 9500, Porto Alegre, RS, Brazil, e-mail: [email protected] Jason Schlessman Department of Electrical Engineering, Princeton University, Engineering Quad, Princeton, NJ 08544, USA, e-mail: [email protected] Michael Schulte University of Wisconsin, Madison, WI, e-mail: [email protected] Naresh Shanbhag University of Illinois at Urbana-Champaign, Urbana, e-mail: shanbhag@ illinois.edu Raj Shekhar Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland, 22 S. Greene St., Baltimore, MD 21201, USA, e-mail: rshekhar@ umm.edu Weihua Sheng RWTH Aachen University, Software for Systems on Silicon, Templergraben 55, 52056 Aachen, Germany, e-mail: [email protected] Andrew Singer University of Illinois at Urbana-Champaign, Urbana, e-mail: acsinger@ illinois.edu Jukka Suhonen Department of Computer Systems, Tampere University of Technology, P.O. Box 553, FI-33101 Tampere, Finland, e-mail: [email protected] Yang Sun Rice University, 6100 Main Street, Houston, TX 77005, USA, e-mail: [email protected] Wonyong Sung School of Electrical Engineering, Seoul National University, 599 Gwanangno, Gwanak-gu, Seoul, 151-744, Republic Of Korea, e-mail: [email protected] Jarmo Takala Department of Computer Systems, Tampere University of Technology, P.O. Box
List of Contributors
xxxvii
553, FI-33101 Tampere, Finland, e-mail: [email protected] Jürgen Teich University of Erlangen-Nuremberg, Hardware-Software-Co-Design, Am Weichselgarten 3, 91058 Erlangen, Germany, e-mail: teich@informatik. uni-erlangen.de Lothar Thiele ETH Zurich, Computer Engineering and Networks Laboratory, CH-8092 Zurich, Switzerland, e-mail: [email protected] Olli Vainio Department of Computer Systems, Tampere University of Technology, P.O. Box 553, FI-33101 Tampere, Finland, e-mail: [email protected] Ingrid Verbauwhede Katholieke Universiteit Leuven, ESAT/COSIC and IBBT, Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium, e-mail: ingrid.verbauwhede@ esat.kuleuven.be Sven Verdoolaege Katholieke Universiteit Leuven, Department of Computer Science, Celestijnenlaan 200A, B-3001 Leuven, Belgium, e-mail: [email protected] Vivek Walimbe GE Healthcare, 3000 N.Grandview Blvd., W-622, Waukesha, WI 53188, USA, e-mail: [email protected] Lars Wanhammar Division of Electronics Systems, Department of Electrical Engineering, Linköping University, SE-581 83 Linköping, Sweden, e-mail: [email protected] Marilyn Wolf School of Electrical and Computer Engineering, Georgia Institute of Technology, 777 Atlantic Drive NW, Atlanta, GA 30332-0250, USA, e-mail: wolf@ece. gatech.edu Roger Woods ECIT, Queens University Belfast, Queen’s Island, Queen’s Road, Belfast, BT3 9DT, UK, e-mail: [email protected] Christian Zebelein University of Erlangen-Nuremberg, Hardware-Software-Co-Design, Am Weichselgarten 3, 91058 Erlangen, Germany, e-mail: christian.zebelein@ informatik.uni-erlangen.de Tong Zhang Rensselaer Polytechnic Institute, 110 8th street, Troy, NY 12180, USA, e-mail: [email protected] Dmitry N. Zotkin
xxxviii
List of Contributors
Institute for Advanced Computer Studies (UMIACS), University of Maryland, College Park MD 20742 USA, e-mail: [email protected]
Part I
Applications Managing Editor: Shuvra Bhattacharyya
Signal Processing for Control William S. Levine
Abstract Signal processing and control are closely related. In fact, many controllers can be viewed as a special kind of signal processor that converts an exogenous input signal and a feedback signal into a control signal. Because the controller exists inside of a feedback loop, it is subject to constraints and limitations that do not apply to other signal processors. A well known example is that a stable controller in series with a stable plant can, because of the feedback, result in an unstable closed-loop system. Further constraints arise because the control signal drives a physical actuator that has limited range. The complexity of the signal processing in a control system is often quite low, as is illustrated by the Proportional + Integral + Derivative (PID) controller. Model predictive control is described as an exemplar of controllers with very demanding signal processing. ABS brakes are used to illustrate the possibilities for improved controller capability created by digital signal processing. Finally, suggestions for further reading are included.
1 Introduction There is a close relationship between signal processing and control. For example, a large amount of classical feedback control theory was developed by people working for Bell Laboratories who were trying to solve problems with the amplifiers used for signal processing in the telephone system. This emphasizes that feedback, which is a central concept in control, is also a very important technique in signal processing. Conversely, the Kalman filter, which is now a critical component in many signal processing systems, was discovered by people working on control systems. In fact, the state space approach to the analysis and design of linear systems grew out of
William S. Levine Department of ECE, University of Maryland, College Park, MD 20742, USA e-mail: [email protected]
S.S. Bhattacharyya et al. (eds.), Handbook of Signal Processing Systems, DOI 10.1007/978-1-4419-6345-1_1, © Springer Science+Business Media, LLC 2010
3
4
William S. Levine
research and development related to control problems that arose in the American space program.
2 Brief introduction to control The starting point for control is a system with inputs and outputs as illustrated schematically in Fig.1. It is helpful to divide the inputs into those that are available
z
d
u
Plant
y
Fig. 1 A generic system to be controlled, called the plant. d is a vector of exogenous inputs; u is a vector of control inputs; y is a vector of measured outputs–available for feedback; and z is a vector of outputs that are not available for feedback
for control and those that come from some source over which we have no control. They are conventionally referred to as control inputs and exogenous inputs. Similarly, it is convenient to divide the outputs into those that we care about, but are not available for feedback, and those that are measured and available for feedback. A good example of this is a helicopter. Rotorcraft have four control inputs, collective pitch, variable pitch, tail rotor pitch, and thrust. The wind is an important exogenous input. Wind gusts will push the aircraft around and an important role of the controller is to mitigate this. Modern rotorcraft include a stability and control augmentation system (SCAS). This is a controller that, as its name suggests, makes it easier for the pilot to fly the aircraft. The main interest is in the position and velocity of the aircraft. In three dimensions, there are three such positions and three velocities. Without the SCAS, the pilot would control these six signals indirectly by using the four control inputs to directly control the orientation of the aircraft—these are commonly described as the roll, pitch, and yaw angles—and the rate of change of this orientation—the three angular velocities. Thus, there is a six component vector of angular orientations and angular velocities that are used by the SCAS. These are the feedback signals. There is also a six component vector of positions and velocities that are of interest but that are not used by the SCAS. While there are ways to measure these signals in a
Signal Processing for Control
5
rotorcraft, in many situations there are signals of interest that can not be measured in real time. The SCAS-controlled system is again a system to be controlled. There are two different control systems that one might employ for the combined aircraft and SCAS. One is a human pilot. The other is an autopilot. In both cases, one can identify inputs for control as well as exogenous inputs. After all, the wind still pushes the aircraft around. In both cases, there are some outputs that are available for feedback and others that are important but not fed back. It should be clear from the example that once one has a system to be controlled there is some choice over what output signals are to be used by the controller directly. That is, one can, to some extent, choose which signals are to be fed back. One possible choice is to feed back no signals. The resulting controller is referred to as an open-loop controller. Open loop control has received relatively little attention in the literature. However, it is an important option in cases where any measurements of the output signals would be impossible or very inaccurate. A good example of this until very recently was the control of washing machines. Because it was impossible to measure how clean the dishes or clothing inside the machine were, the controller simply washed for a fixed period of time. Now some such machines measure the turbidity of the waste water as an indirect measurement of how clean the wash is and use this as feedback to determine when to stop washing. There is also a possibility of feed forward control. The idea here is to measure some of the exogenous inputs and use this information to control the system. For example, one could measure some aspects, say the ph, of the input to a sewage treatment system. This input is normally uncontrollable, i.e., exogenous. Knowing the input ph obviously facilitates controlling the output ph. Note that a feed-forward controller is different from an open-loop controller in that the former is based on a measurement and the latter is not. The fact remains that most control systems utilize feedback. This is because feedback controls have several significant advantages over open-loop controls. First, the designer and user of the controlled system does not need to know nearly as much about the system as he or she would need for an open-loop control. Second, the controlled system is much less sensitive to variability of the plant and other disturbances. This is well illustrated by what is probably the oldest, most common, and simplest feedback control system in the world—the float valve level control in the toilet tank. Imagine controlling the level of water in the tank by means of an open-loop control. The user, in order to refill the tank after a flush, would have to open the valve to allow water to flow into the tank, wait just the right amount of time for the tank to fill, and then shut off the flow of water. Looking into the tank would be cheating because that would introduce feedback—the user would, in effect, measure the water level. Many errors, such as turning off the flow too late, opening the input valve too much, a large stone in the tank, or too small a prior flush, would cause the tank to overflow. Automation would not solve this. Contrast the feedback solution. Once the tank has flushed, the water level is too low. This lowers the float and opens the input valve. Water flows into the tank,
6
William S. Levine
thereby raising the water level and the float, until the float gets high enough to close the input valve. None of the open-loop disasters can occur. The closed-loop system corrects for variations in the size and shape of the tank and in the initial water level. It will even maintain the desired water level in the presence of a modest leak in the tank. Furthermore, the closed-loop system does not require any action by the user. It responds automatically to the water level. However, feedback, if improperly applied, can introduce problems. It is well known that the feedback interconnection of a perfectly stable plant with a perfectly stable controller can produce an unstable closed-loop system. It is not so well known, but true, that feedback can increase the sensitivity of the closed-loop system to disturbances. In order to explain these potential problems it is first necessary to introduce some fundamentals of feedback controller design. The first step in designing a feedback control system is to obtain as much information as possible about the plant, i.e., the system to be controlled. There are three very different situations that can result from this study of the plant. First, in many cases very little is known about the plant. This is not uncommon, especially in the process industry where the plant is often nonlinear, time varying, very complicated, and where models based on first principles are imprecise and inaccurate. Good examples of this are the plants that convert crude oil into a variety of products ranging from gasoline to plastics, those that make paper, those that process metals from smelting to rolling, those that manufacture semiconductors, sewage treatment plants, and electric power plants. Second, in some applications a good non parametric model of the plant is known. This was the situation in the telephone company in the 1920’s and 30’s. Good frequency responses were known for the linear amplifiers used in transmission lines. Today, one would say that accurate Bode plots were known. These plots are a precise and detailed mathematical model of the plant and are sufficient for a good feedback controller to be designed. They are non parametric because the description consists of experimental data alone. Third, it is often possible to obtain a detailed and precise mathematical model of the plant. For example, good mathematical descriptions of many space satellites can be derived by means of Newtonian mechanics. Simple measurements then complete the mathematical description. This is the easiest situation to describe and analyze so it is discussed in much more detail here. Nonetheless, the constraints and limitations described below apply to all feedback control systems. In the general case, where the plant is nonlinear, this description takes the form of a state-space ordinary differential equation. x(t) ˙ = f (x(t), u(t), d(t))
(1)
y(t) = g(x(t), u(t), d(t)) z(t) = h(x(t), u(t), d(t))
(2) (3)
In this description, x(t) is the n-dimensional state vector, u(t) is the m-dimensional input vector (the control input), d(t) is the exogenous input, y(t) is the p-dimensional
Signal Processing for Control
7
measured output (available for feedback), and z(t) is the vector of outputs that are not available for feedback. The dimensions of the d(t) and z(t) vectors will not be needed in this article. In the linear time-invariant (LTI) case, this becomes x(t) ˙ = Ax(t) + Bu(t) + Ed(t)
(4)
y(t) = Cx(t) + Du(t) + Fd(t) z(t) = Hx(t) + Gu(t) + Rd(t)
(5) (6)
Another description is often used in the LTI case because of its many useful features. It is equivalent to the description above and obtained by simply taking Laplace transforms of Equations (4), (5), and (6) and then doing some algebraic manipulation to eliminate X (s) from the equations. G (s) G12 (s) U(s) Y (s) = 11 (7) Z(s) G21 (s) G22 (s) D(s) The capital letters denote Laplace transforms of the corresponding lower case letters and s denotes the complex variable argument of the Laplace transform. Note that setting s = j makes this description identical to the non parametric description of case 2 on the previous page when the plant is asymptotically stable and extends the frequency domain characterization (Bode plot) of the plant to the unstable case as well. The issues of closed-loop stability and sensitivity can be easily demonstrated with a simplified version of Equation (7) above. All signals will be of interest but it will not be necessary to display this explicitly so ignore Z(s) and let all other signals be scalars. For simplicity, let G12 (s) = 1. Let G11 (s) G p (s) and let Gc (s) denote the dynamical part of the controller. Finally, suppose that the objective of the controller is to make Y (s) ≈ R(s) as closely as possible where R(s) is some exogenous input to the controller. It is generally impossible to make Y (s) = R(s) ∀s The resulting closed-loop system will be, in block diagram form
D(s) R(s)
E(s)
Gc(s)
Fig. 2 Simple feedback connection
U(s)
Gp(s)
Y(s)
8
William S. Levine
Simple algebra then leads to G p (s)Gc (s) 1 D(s) + R(s) 1 + G p(s)Gc (s) 1 + G p(s)Gc (s) Gc (s) −Gc (s) D(s) + R(s) U(s) = 1 + G p(s)Gc (s) 1 + G p(s)Gc (s) Y (s) =
(8) (9)
It is conventional in control theory to define the two transfer functions appearing in Equation (8) as follows. The transfer function in Equation (9) is also defined below. G p (s)Gc (s) 1 + G p(s)Gc (s) 1 S(s) = 1 + G p(s)Gc (s) Gc (s)Gc (s) V (s) = 1 + G p(s)Gc (s) T (s) =
(10) (11) (12)
Now that a complete description of a simple feedback control situation is available, it is straightforward to demonstrate some of the unique constraints on Gc (s), the signal processor that is used to make the closed-loop system track R(s).
2.1 Stability A simple example demonstrating that the feedback interconnection of asymptotically stable and thoroughly well-behaved systems can result in an unstable closedloop system is shown below. This example also illustrates that the usual culprit is too high an open-loop gain. Let G p (s) =
3 s2 + 4s + 3
(13)
k s+2
(14)
Gc (s) =
The closed-loop system, T (s) is, after some elementary calculations, T (s) =
3k (s + 2)(s2 + 4s + 3) + 3k
(15)
The denominator of T (s) is also the denominator of S(s) and V (s) and it has roots with positive real parts for all k > 20. Thus, the closed-loop system is unstable for all k > 20.
Signal Processing for Control
9
2.2 Sensitivity to disturbance Notice that T (s) + S(s) = 1
∀s
(16)
Since S(s) is the response to a disturbance input (for example, wind gusts acting on an aircraft) and T (s) is the response to the input that Y (s) should match as closely as possible (for example, the pilot’s commands), this appears to be an ideal situation. An excellent control design would make T (s) ≈ 1, ∀s and this would have the effect of reducing the response to the disturbance to nearly zero. There is, unfortunately, a problem hidden in Equation (16). For any real system, G p (s) goes to zero as |s| → . This, together with the fact that too large a gain for Gc (s) will generally cause instability, implies that T (s) → 0 as |s| → and this implies that S(s) → 1 as |s| → . Hence, there will be values of s at which the output of the closed-loop system will consist entirely of the disturbance signal. Remember that letting s = j in any of the transfer functions above gives the frequency response of the system. Thus, the closed-loop system is a perfect reproducer of high-frequency disturbance signals.
2.3 Sensitivity conservation Theorem: (Bode Sensitivity Integral) Suppose that G p (s)Gc (s) is rational, has no right half plane poles, and has at least 2 more poles than zeros. Suppose also that the corresponding closed-loop system T (s) is stable. Then the sensitivity S(s) satisfies 0
log|S( j )|d = 0
(17)
This is a simplified version of a theorem that can be found in [12]. The log in Equation (17) is the natural logarithm. The implications of this theorem and Equation (16) are: if you make the closedloop system follow the input signal R(s) closely over some frequency range, as is the goal of most control systems, then inevitably there will be a frequency range where the closed-loop system actually amplifies disturbance signals. One might hope that the frequency range in which the sensitivity must be greater than one in order to make up for the region of low sensitivity could be infinite. This could make the magnitude of the increased sensitivity infinitesimally small. A further result [12] proves that this is not the case. A frequency range where the sensitivity is low implies a frequency range where it is high. Thus, a feedback control system that closely tracks exogenous signals, R( j ) for some range of values of frequency (typically 0 ≤ < b ) will inevitably amplify any disturbance signals, D( j ), over some other frequency range. This discussion has been based on the assumption that both the plant and the controller are continuous-time systems. This is because most real plants are continuous-
10
William S. Levine
time systems. Thus, however one implements the controller, and it is certainly true that most controllers are now digitally implemented, the controller must act as a continuous-time controller. It is therefore subject to the limitations described above.
3 Signal processing in control An understanding of the role of signal processing in control begins with a refinement of the abstract system model in Fig. 1 to that shown in Fig. 3.
z
d
Plant
u actuators
y sensors
Fig. 3 The plant with sensors and actuators indicated
Although control theorists usually include the sensors and actuators as part of the plant, it is important to understand their presence and their role. In almost all modern control systems excepting the float valve, the sensors convert the physical signal of interest into an electronic version and the actuator takes an electronic signal as its input and converts it into a physical signal. Both sensing and actuation involve specialized signal processing. A common issue for sensors is that there is very little energy in the signal so amplification and other buffering is needed. Most actuation requires much greater power levels than are used in signal processing so some form of power amplification is needed. Generally, the power used in signal processing for control is a small fraction of the power consumed by the closed-loop system. For this reason, control designers rarely concern themselves with the power needed by the controller. Most modern control systems are implemented digitally. The part of a digital control system between the sensor output and the actuator input is precisely a digital signal processor (DSP). The fact that it is a control system impacts the DSP in at least three ways. First, it is essential that a control signal be produced on time every time. This is a hard real-time constraint. Second, the fact that the control signal eventually is input into an actuator also imposes some constraints. Actuators saturate. That is, there are limits to their output values that physically cannot be exceeded. For example, the rudder on a ship and the control surfaces on a wing can only turn so many degrees. A pump has a maximum amount it can move. A motor
Signal Processing for Control
11
has a limit on the torque it can produce. Control systems must either be designed to avoid saturating the actuator or to avoid the performance degradation that can result from saturation. A mathematical model of saturation is given in Fig. (4). Another nonlinearity that is difficult, if not impossible, to avoid is the so-called dead zone illustrated in Fig. 4. This arises, for example, in the control of movement because very small motor torques are unable to overcome stiction. Although there are many other common nonlinearities that appear in control systems, we will only mention one more, the quantization nonlinearity that appears whenever a continuous plant is digitally controlled. Thirdly, the fact that the DSP is actually inside a feedback loop imposes the constraints discussed at the end of the previous section, too much gain causes closedloop instability and too precise tracking can cause amplified sensitivity to disturbances. Output
Output
Input
Input
Fig. 4 A saturation nonlinearity is shown at the left and a dead zone nonlinearity at the right.
The main impact of the real time constraint on the DSP is that it must be designed to guarantee that it produces a control signal at every sampling instant. The main impact of the nonlinearities is from the saturation. A control signal that is too large will be truncated. This can actually cause the closed-loop system to become unstable. More commonly, it will cause the performance of the closed-loop system to degrade. The most famous example of this is known as integrator windup as discussed in the following subsection. The dead zone limits the accuracy of the control. The impact of the sensitivity and stability constraints is straightforward. The design of DSPs for control must account for these effects as well as the obvious direct effect of the controller on the plant. There is one more general issue associated with the digital implementation of controllers. It is well known that the Nyquist rate upper bounds the interval between samples when a continuous-time signal is discretized in time. For purposes of control it is also possible to sample too fast. The intuitive explanation for this is that if the samples are too close together the plant output changes very little from sample
12
William S. Levine
to sample. A feedback controller bases its actions on these changes. If the changes are small enough, noise dominates the signal and the controller performs poorly.
3.1 simple cases A generic feedback control situation is illustrated in Fig. 5. Fundamentally, the controller in the figure is a signal processor, taking the signals dc and y as inputs and producing the signal u as output. The objective of the controller is to make y(t) = dc (t) ∀t ≥ 0. This is actually impossible because all physical systems have some form of inertia. Thus, control systems are only required to come close to this objective. There are a number of precise definitions of the term “close." In the simplest cases, the requirements/specifications are that the closed-loop system be asymptotically stable and that | y(t) − dc (t) |< ∀t ≥ T ≥ 0, where T is some specified “settling time" and is some allowable error. In more demanding applications there will be specified limits on the duration and size of the transient response [14]. An interesting additional specification is often applied when the plant is approximately linear and time-invariant. In this case, an easy application of the final value theorem for Laplace transforms proves that the error in the steady-state response of the closedloop system to a step input (i.e., a signal that is 0 up to t = 0 at which time it jumps to a constant value ) goes to zero as t → provided the open-loop system (controller in series with the plant with no feedback) has a pole at the origin. Note that this leads to the requirement/specification that the open-loop system have a pole at the origin. This is achievable whereas, because of inaccuracies in sensors and actuators, zero steady-state error is not. dp
z
dc
Controller
u
Plant
y
Fig. 5 The plant and controller. Notice that the exogenous input of Fig. 5 has been split into two parts with dc denoting the exogenous input to the controller and d p that to the plant.
In many cases the signal processing required for control is straightforward and undemanding. For example, the most common controller in the world, used in applications from the toilet tank to aircraft, is the Proportional + Integral + Derivative (PID) Controller. A simple academic version of the PID Controller is illustrated in
Signal Processing for Control
13
Fig. 6. Notice that this simple PID Controller is a signal processing system with two inputs, the signal to be tracked dc and the actual measured output y and one output, the control signal u. This PID controller has three parameters that must be chosen so as to achieve satisfactory performance. The simplest version has kd and ki equal to zero. This is a proportional controller. The float valve in the toilet tank is an example. For most plants, although not the toilet tank, choosing k p too large will result in the closed-loop system being unstable. That is, y(t) will either oscillate or grow until saturation occurs somewhere in the system. It is also usually true that, until instability occurs, increasing k p will decrease the steady-state error. This captures the basic trade off in control design. Higher control gains tend to improve performance but also tend to make the closed-loop system more nearly unstable.
kp dc
y
u
Proportional Gain
kd
d/dt
Derivative Gain 1 s
ki Integral Gain
Fig. 6 A simple academic PID Controller
Changing ki to some non zero value usually improves the steady-state performance of the closed-loop system. To see this, first break the feedback loop by removing the connection carrying y in Fig. 5 and set kd = 0. Then, taking Laplace transforms, gives U(s) = (k p + ki /s)Y (s)
(18)
Assuming that the plant is linear and denoting its transfer function by G p (s) gives the open-loop transfer function, (k p s + ki Y (s) =( )G p (s) Dc (s) s
(19)
This open-loop system will have at least one pole at the origin unless G p (s) has a zero there. As outlined in the previous section, this implies in theory that the steady-state error in response to a step input should be zero. In reality, dead zone nonlinearity and both sensor and actuator nonlinearity will result in nonzero steadystate errors.
14
William S. Levine
It is relatively simple to adjust the gains k p and ki so as to achieve good performance of the closed-loop system. There are a number of companies that supply ready-made PID controllers. Once these controllers are connected to the plant, a technician or engineer “tunes" the gains to get good performance. There are a number of tuning rules in the literature, the most famous of which is due to Ziegler and Nichols. It should be noted that the Ziegler-Nichols tuning rules are now known to be too aggressive; better tuning rules are available [5]. Choosing a good value of kd is relatively difficult although the tuning rules do include values for it. Industry surveys have shown that a significant percentage of PID controllers have incorrect values for kd . When it is properly chosen, the D term improves the transient response. There are three common improvements to the simple PID controller of Fig. 6. First, it is very unwise to include a differentiator in a physical system. Differentiation amplifies high frequency noise, which is always present. Thus, the kd s term in the s . This simply adds a low pass filter PID controller should be replaced by k kknds+1 d in series with the differentiator and another parameter, kn to mitigate the problems with high frequency noise. Second, the most common input signal dc (t) is a step. Differentiating such a signal would introduce a signal into the system that is an impulse (infinite at t = 0 and 0 everywhere else). Instead, the signal input to the D-term is chosen to be y(t), not dc (t) − y(t). Notice that this change does away with the impulse and only changes the controller output at t = t0 whenever dc (t) is a step at t0 . Lastly, when the PID controller saturates the actuator (u(t) becomes too large for the actuator it feeds) a phenomenon known as integral windup often occurs. This causes the transient response of the closed-loop system to be much larger and to last much longer than would be predicted by a linear analysis. There are several ways to mitigate the effects of integral windup [5]. Nowadays the simple PID controller is almost always implemented digitally. The discretization in time and quantization of the signals has very little effect on the performance of the closed-loop system in the vast majority of cases. This is because the plant is usually very slow in comparison to the speeds of electronic signal processors. The place where digital implementation has its largest impact is on the possibility for improving the controller. The most exciting of these is to make the PID controller either adaptive or self-tuning. Both automate the tuning of the controller. The difference is that the self-tuning controller operates in two distinct modes, tuning and controlling. Often, an operator selects the mode, tuning when there is reason to believe the controller is not tuned properly and controlling when it is correctly tuned. The adaptive controller always operates in its adaptive mode, simultaneously trying to achieve good control and tuning the parameters. There is a large literature on both self tuning and adaptive control; many different techniques have been proposed. There are also about a dozen turnkey self-tuning PID controllers on the market [5]. There are many plants, even LTI single-input single-output (SISO) ones, for which the PID Controller is ineffective or inappropriate. One example is an LTI plant with a pair of lightly damped complex conjugate poles. Nonetheless, for many
Signal Processing for Control
15
such plants the controller can be easily implemented in a DSP. There are many advantages to doing this. Most are obvious but one requires a small amount of background. Most real control systems require a “safety net." That is, provisions must be made to deal with failures, faults, and other problems that may arise. For example, the float valve in the toilet tank will eventually fail to close fully. This is an inevitable consequence of valve wear and/or small amounts of dirt in the water. An overflow tube in the tank guarantees that this is a soft failure. If the water level gets too high, the excess water spills into the overflow tube. From there it goes into the toilet and eventually into the sewage line. The valve failure does not result in a flood in the house. These safety nets are extremely important in the real world, as should be obvious from this simple example. It is easy to implement them within the controller DSP. This is now commonly done. In fact, because many control applications are easily implemented in a DSP, and because such processors are so cheap, it is now true that there is often a good deal of unused capacity in the DSPs used for control. An important area of controls research is to find ways to use this extra capacity to improve the controller performance. One success story is described in the following section.
3.2 Demanding cases The signal processing necessary for control can be very demanding. One such control technique that is extensively used is commonly known as Model Predictive Control () [11] although it has a number of other names such as Receding Horizon Control, Dynamic Matrix Control, and numerous variants. The application of MPC is limited to systems with “slow" dynamics because the computations involved require too much time to be accomplished in the sampling interval for “fast" systems. The simplest version of MPC occurs when there is no exogenous input and the objective of the controller is to drive the system state to zero. This is the case described below. The starting point for MPC is a known plant which may have multiple inputs and multiple outputs (MIMO) and a precise mathematical measure of the performance of the closed-loop system. It is most convenient to use a state-space description of the plant x(t) ˙ = f (x(t), u(t))
(20)
y(t) = g(x(t), u(t))
(21)
Note that we have assumed that the system is time-invariant and that the noise can be adequately handled indirectly by designing a robust controller (a controller that is relatively insensitive to disturbances) rather than by means of a more elaborate
16
William S. Levine
controller whose design accounts explicitly for the noise. It is assumed that the state x(t) is an n-vector, the control u(t) is an m-vector, and the output y(t) is a p-vector. The performance measure is generically J(u[0,) ) =
0
l(x(t), u(t)), dt
(22)
The notation u[0,) denotes the entire signal {u(t) : 0 ≤ t < }. The control objective is to design and build a feedback controller that will minimize the performance measure. It is possible to solve this problem analytically in a few special cases. When an analytic solution is not possible one can resort to the approximate solution known as MPC. First, discretize both the plant and the performance measure with respect to time. Second, further approximate the performance measure by a finite sum. Lastly, temporarily simplify the problem by abandoning the search for a feedback control and ask only for an open-loop control. The open-loop control will depend on the initial state so assume that x(0) = x0 and this is known. The approximate problem is then to find the sequence [u(0) u(1) · · · u(N − 1] that solves the constrained nonlinear programming problem given below min
i=N
l(x(i), u(i) + lN (x(N + 1))
[u(0) u(1)···u(N)] i=0
subject to x(i + 1) = fd (x(i), u(i))
i = 0, 1, · · · , N
and x(0) = x0
(23)
(24) (25)
Notice that the solution to this problem is a discrete-time control signal, [uo (0) uo (1) · · · uo (N − 1]
(26)
(where the superscript “o" denotes optimal) that depends only on knowledge of x0 .
This will be converted into a closed-loop (and an MPC) controller in two steps. The first step is to create an idealized MPC controller that is easy to understand but impossible to build. The second step is to modify this infeasible controller by a practical, implementable MPC controller. The idealized MPC controller assumes that y(i) = x(i), ∀i. Then, at i = 0, x(0) = x0 is known. Solve the nonlinear programming problem instantaneously. Apply the control uo (0) on the time interval 0 ≤ t < , where is the discretization interval. Next, at time t = , equivalently i = 1, obtain the new value of x, i.e., x(1) = x1 . Again, instantaneously solve the nonlinear programming problem, exactly as before except using x1 as the initial condition. Again, apply only the first step of the newly computed optimal control (denote it by uo (1)) on the interval ≤ t < 2 . The idea is to compute, at each time instant, the open-loop optimal control for the full time horizon of N + 1 time steps but only implement that control for the first step. Continue to repeat this forever.
Signal Processing for Control
17
Of course, the full state is not usually available for feedback (i.e., y(i) = x(i)) and it is impossible to solve a nonlinear programming problem in zero time. The solution to both of these problems is to use an estimate of the state. Let an optimal (in some sense) estimate of x(k + 1) given all the data up to time k be denoted by x(i ˆ + 1|i). For example, assuming noiseless and full state feedback, y(k) = x(k) ∀k, and the dynamics of the system are given by x(i + 1) = fd (x(i), u(i))
i = 0, 1, · · · , N
(27)
then x(k ˆ + 1|k) = fd (x(k), u(k))
(28)
The implementable version of MPC simply replaces xi in the nonlinear programˆ + 1|i) and solves for uo (i + 1). This means that ming problem at time t = i by x(i the computation of the next control value can start at time t = i and can take up to the time (i + i) . It can take a long time to solve a complicated nonlinear programming problem. Because of this the application of MPC to real problems has been limited to relatively slow systems, i.e., systems for which is large enough to insure that the computations will be completed before the next value of the control signal is needed. As a result, there is a great deal of research at present on ways to speed up the computations involved in MPC. The time required to complete the control computations becomes even longer if it is necessary to account explicitly for noise. Consider the following relatively simple version of MPC in which the plant is linear and time-invariant except for the saturation of the actuators. Furthermore, the plant has a white Gaussian noise input and the output signal contains additive white Gaussian noise as well. The plant is then modeled by x(i + 1) = Ax(i) + Bu(i) + D (i) y(i) = Cx(i) + (i)
(29) (30)
The two noise signals are zero mean white Gaussian noises with covariance matrices E( (i) T (i)) = ∀i and E( (i) T (i)) = I ∀i where E(·) denotes expectation. The two noise signals are independent of each other. The performance measure is J(u(0,] ) = E(
1 i= T (y (i)Qy(i) + uT (i)Ru(i)) 2 i=0
(31)
In the equation above, Q and R are symmetric real matrices of appropriate dimensions. To avoid technical difficulties R is taken to be positive definite (i.e., u Ru > 0 for all u = 0) and Q positive semidefinite (i.e., y Qy ≥ 0 for all y). In the absence of saturation, i.e., if the linear model is accurate for all inputs, then the solution to the stochastic control problem of minimizing Equation (31) subject to Equations (29 &
18
William S. Levine
30) is the Linear Quadratic Gaussian () regulator [15]. It separates into two independent components. One component is the optimal feedback control uo (i) = F o x(i), where the superscript “o" denotes optimal. This control uses the actual state vector which is, of course, unavailable. The other component of the optimal control is a Kalman filter which produces the optimal estimate of x(i) given the available data at time i. Denoting the output of the filter by x(i|i), ˆ the control signal becomes ˆ uo (i) = F o x(i|i)
(32)
This is known as the certainty equivalent control because the optimal state estimate is used in place of the actual state. For this special case, all of the parameters can be computed in advance. The only computations needed in real time are the one step ahead Kalman filter /predictor and the matrix multiplication in Equation (32). If actuator saturation is important, as it is in many applications, then the optimal control is no longer linear and it is necessary to use MPC. As before, approximate the performance measure by replacing by some finite N. An exact feedback solution to this new problem would be extremely complicated. Because of the finite time horizon, it is no longer true that the optimal control is time invariant. Because of the saturation, it is no longer true that the closed-loop system is linear nor is it true that the state estimation and control are separate and independent problems. Even though this separation is not true, it is common to design the MPC controller by using a Kalman filter to estimate the state vector. Denote this estimate by x(i|i) ˆ and the one step ahead prediction by x(i|i ˆ − 1). The finite time LQ problem with dynamics given by Equations (29 & 30) and performance measure
J(u(0,N] ) = E(
1 i=N T (y (i)Qy(i) + uT (i)Ru(i)) + yT (N + 1)Qy(N + 1)) 2 i=0
is then solved open loop with initial constraint ⎧ ⎪ ⎨umax , u(i) = u(i), ⎪ ⎩ −umax ,
(33)
condition x(0) = x(i|i ˆ − 1) and with the if u(i) ≥ umax ; if |u(i)| < umax ; if u(i) ≤ −umax ;
(34)
This is a convex programming problem which can be quickly and reliably solved. As is usual in MPC, only the first term of the computed control sequence is actually implemented. The substitution of the one step prediction for the filtered estimate is done so as to provide time for the computations. There is much more to MPC than has been discussed here. An introduction to the very large literature on the subject is given in section 5. One issue that is too important to omit is that of stability. The rationale behind MPC is an heuristic notion that the performance of such a controller is likely to be very good. While good performance implicitly requires stability, it certainly does not guarantee it. Thus,
Signal Processing for Control
19
it is comforting to know that stability is theoretically guaranteed under reasonably weak assumptions for a broad range of MPC controllers [16].
3.3 Exemplary case ABS brakes are a particularly vivid example of the challenges and benefits associated with the use of DSP in control. The basic problem is simply stated. Control the automobile’s brakes so as to minimize the distance it takes to stop. The theoretical solution is also simple. Because the coefficient of sliding friction is smaller than the coefficient of rolling friction, the vehicle will stop in a shorter distance if the braking force is the largest it can be without locking the wheels. All this is well known. The difficulty is the following. The smallest braking force that locks the wheels depends on how slippery the road surface is. For example, a very small braking force will lock the wheels if they are rolling on glare ice. A much larger force can be applied without locking the wheels when they are rolling on dry pavement. Thus, the key practical problem for ABS brakes is how to determine how much force can be applied without locking the wheels. In fact, there is as yet no known way to measure this directly. It must be estimated. The control, based on this estimate, must perform at least as well as an unassisted human driver under every possible circumstance. The key practical idea of ABS brakes is to frequently change the amount of braking force to probe for the best possible value. Probing works roughly as follows. Apply a braking force and determine whether the wheels have locked. If they have, reduce the braking force to a level low enough that the wheels start to rotate again. If they have not, increase the braking force. This is repeated at a reasonably high frequency. Probing cannot stop and must take be repeated frequently because the slipperiness of the road surface can change rapidly. It is important to keep the braking force close to its optimal value all the time. Balancing the dual objectives of quickly detecting changes in the slipperiness of the surface and keeping the braking force close to its optimal value is one of the keys to successful ABS brakes. Note that this tradeoff is typical of adaptive control problems where the controller has to serve the dual objectives of providing good control and providing the information needed to improve the controller. The details of the algorithm for combining probing and control are complicated [13]. The point here is that the complicated algorithm is a life saving application of digital signal processing for control. Two other points are important. In order for ABS brakes to be possible, it was first necessary to develop a hydraulic braking system that could be electronically modulated. That is, the improved controller depends on a new actuator. Second, there is very little control theory available to assist in the design of the controller upon which ABS braking depends. There are three reasons why control theory is of little help in designing ABS brakes. First, the interactions between tires and the road surface are very complex. The dynamic distortion of the tire under loading plays a fundamental role in traction
20
William S. Levine
(see [21] for a dramatic example). While there has been research on appropriate mathematical models for the tire/road interaction, realistic models are too complex for control design and simpler models that might be useful for controller design are not realistic enough. Second, the control theory for systems that have large changes in dynamics as a result of small changes in the control—exactly what happens when a tire loses traction—is only now being developed. Such systems are one form of hybrid system. The control of hybrid systems is a very active current area of research in control theory. Third, a crucial component of ABS brakes is the electronically controllable hydraulic brake cylinder. This device is obviously needed in order to implement the controller. Control theory has very little to say about the physical devices needed to implement a controller.
4 Conclusions Nowadays most control systems are implemented digitally. This is motivated primarily by cost and convenience. It is abetted by continuing advances in sensing and actuation. More and more physical signals can be converted to electrical signals with high accuracy and low noise. More and more physical actuators convert electrical inputs into physical forces, torques, etc. Sampling and digitization of the sensor signals is also convenient, very accurate if necessary, and inexpensive. Hence, the trend is very strongly towards digital controllers. The combination of cheap sensing and digital controllers has created exciting opportunities for improved control and automation. Adding capability to a digitally implemented controller is now often cheap and easy. Nonlinearity, switching, computation, and logical decision making can all be used to improve the performance of the controller. The question of what capabilities to add is wide open.
5 Further reading There are many good undergraduate textbooks dealing with the basics of control theory. Two very popular ones are by Dorf and Bishop [1] and by Franklin, Powell, and Emami Naeini [2]. A standard undergraduate textbook for the digital aspects of control is [3]. A very modern and useful upper level undergraduate or beginning graduate textbook is by Goodwin, Graebe, and Salgado [4]. An excellent source for information about PID control is [5]. Graduate programs in control include an introductory course in linear system theory that is very similar to the ones for signal processing. The book by Kailath [7] although old is very complete and thorough. Rugh [8] is more control oriented and newer. The definitive graduate textbook on nonlinear control is [6].
Signal Processing for Control
21
For more advanced and specialized topics in control, The Control Handbook [9] is an excellent source. It was specifically designed for the purpose of providing a starting point for further study. It also contains good introductory articles about undergraduate topics and a variety of applications. There is a very large literature on MPC. Besides the previously cited [11], there are books by Maciejowski [17] and Kwon and Han [18] as well as several others. There are also many survey papers. The one by Rawlings [19] is easily available and recommended. Another very useful and often cited survey is by Qin and Badgwell [20]. This is a particularly good source for information about companies that provide MPC controllers and other important practical issues, Lastly, the Handbook of Networked and Embedded Control Systems [10] provides introductions to most of the issues of importance in both networked and digitally controlled systems.
References 1. Dorf, R.C., Bishop, R. H.: Modern Control Systems, (11th edition), Pearson Prentice-Hall, Upper Saddle River, (2008) 2. Franklin, G.F., Powell, J.D., Emami-Naeini, A.: Feedback Control of Dynamic Systems (5th edition), Prentice-Hall, Upper Saddle River, (2005) 3. Franklin, G.F., Powell, J.D., Workman, M.: Digital Control of Dynamic Systems (3rd edition), Addison-Wesley, Menlo Park (1998) 4. Goodwin, G.C., Graebe, S.F., Salgado, M.E.: Control System Design, Prentice-Hall, Upper Saddle River, (2001) 5. Astrom, K.J., Hagglund, T.: PID Controllers: Theory, Design, and Tuning (2nd edtion), International Society for Measurement and Control, Seattle, (1995) 6. Khalil, H.K. Nonlinear Systems (3rd edition), Prentice-Hall, Upper Saddle River, (2001) 7. Kailath, T. Linear Systems, Prentice-Hall, Englewood Cliffs, (1980) 8. Rugh, W.J.: Linear System Theory (2nd edition), Prentice-Hall, Upper Saddle River, (1996) 9. Levine, W.S. (Editor): The Control Handbook, CRC Press, Boca Raton (1995) 10. Hristu-Varsakelis, D., Levine, W.S.: Handbook of Networked and Embedded Control Systems, Birkhauser, Boston, (2005) 11. Camacho, E.F., Bordons, C.: Model Predictive Control, 2nd Edition, Springer, London, (2004) 12. Looze, D. P., Freudenberg, J. S.: Tradeoffs and limitations in feedback systems. The Control Handbook, pp 537-550, CRC Press, Boca Raton(1995) 13. Bosch: Driving Safety Systems (2nd edition), SAE International, Warrenton, (1999) 14. Yang, J. S., Levine, W. S.: Specification of control systems. The Control Handbook, pp 158169, CRC Press, Boca Raton (1995) 15. Lublin, L., Athans, M.: Linear quadratic regulator control. The Control Handbook, pp 635650, CRC Press, Boca Raton (1995) 16. Scokaert, P. O. M, Mayne, D. Q., Rawlings, J. B.: Suboptimal model predictive control (feasibility implies stability). IEEE Transactions on Automatic Control, 44(3) pp 648-654 (1999) 17. Maciejowski, J. M.: Predictive control with constraints. Prentice Hall, Englewood Cliffs (2002) 18. Kwon, W. H.,Han, S.: Receding Horizon Control: Model Predictive Control for State Models. Springer, London (2005) 19. Rawlings, J. B.: Tutorial overview of model predictive control. IEEE Control Systems Magazine, 20(3) pp 38-52, (2000)
22
William S. Levine
20. Qin, S. J., Badgwell, T. A.: A survey of model predictive control technology. Control Engineering Practice,11, pp 733-764 (2003) 21. Hallum, C.: The magic of the drag tire. SAE Paper 942484, Presented at SAE MSEC (1994)
Digital Signal Processing in Home Entertainment Konstantinos Konstantinides
Abstract In the last decade or so, audio and video media switched from analog to digital and so did consumer electronics. In this chapter we explore how digital signal processing has affected the creation, distribution, and consumption of digital media in the home. By using “photos”, “music”, and “video” as the three core media of home entertainment, we explore how advances in digital signal processing, such as audio and video compression schemes, have affected the various steps in the digital photo, music, and video pipelines. The emphasis in this chapter is more on demonstrating how applications in the digital home drive and apply state-of-the art methods for the design and implementation of signal processing systems, rather than describing in detail any of the underlying algorithms or architectures, which we expect to be covered in more detail in other chapters. We also explore how media can be shared in the home, and provide a short review the principles of the DLNA stack. We conclude with a discussion on digital rights management (DRM) and a short overview of the Microsoft Windows DRM.
1 Introduction Only a few years ago, the most complex electronic devices in the home were probably the radio and the television, all built using traditional analog electronic components such as transistors, resistors, and capacitors. However, in the late 1990s we witnessed a major change in home entertainment. Media began to go digital and so did consumer electronics, like cameras, music players, and TVs. Today, a typical home may have more than a half-dozen electronic devices, built using both traditional electronic components and sophisticated microprocessors or digital signal processors. In this chapter will try to showcase how digital signal processing (DSP) has affected the creation, distribution, and consumption of digital media in e-mail: [email protected]
S.S. Bhattacharyya et al. (eds.), Handbook of Signal Processing Systems, DOI 10.1007/978-1-4419-6345-1_2, © Springer Science+Business Media, LLC 2010
23
24
Konstantinos Konstantinides
the home. We will use “photos”, “music”, and “video” as the three core media of home entertainment. The emphasis in this chapter is more on demonstrating how applications in the digital home drive and apply state-of-the art methods for the design and implementation of signal processing systems, rather than describing in detail any of the underlying algorithms or architectures. We expect these details to be exciting topics of future chapters in this book. For example, while we discuss the need for efficient image and video compression in the home, there is no plan to explain the details of either the JPEG or MPEG compression standards in this chapter.
1.1 The Personal Computer and the Digital Home There are many contributing factors for the transition from analog to digital home entertainment, including advances in semiconductors, DSP theory, and communications. For example, submicron technologies allowed more powerful and cheaper processors. New developments in coding and communications led the way to new standards for the efficient coding and transmission of photos, videos, and audio. Finally, an ever-growing world-wide Web offered new alternatives for the distribution and sharing of digital content. One may debate which device in the home is the center of home entertainment — the personal computer (PC), the TV, and the set-top box, are all competing for the title — however, nobody can dispute the effects of the PC in the world of digital entertainment. Figure 1 shows the functional evolution of the home PC. In the early days, a PC was used mostly for such tasks as word processing, home finances, Web access, and playing simple PC games. Today, PCs are also the main interfaces to digital cameras and portable music players, and are being used for passive entertainment, such as listening to Internet radio and watching Internet videos. However, the integration of media server capabilities allows now a PC to also become an entertainment hub. Windows Media Player 11 (WMP11), Tiversity, and other popular media server applications allow now users to stream pictures, music, and videos, from any PC to a new generation of devices called digital media receivers (DMRs). DMRs integrate wired and wireless networking and powerful processors to play HD content in real time. Example of DMRs include connected-TVs and gaming devices, such as the Xbox360 or the PlayStation 3. Many DMRs can also operate as Extenders for Windows Media Center, which allows users to access remotely any PC with Windows Media Center. This service allows users to interact with their PCs at both the four feet level (personal computing using a keyboard and a mouse) and the ten feet level (personal entertainment using a remote control). This PC evolution has changed completely the way consumer-electronics (CE) manufacturers design new devices for the home. Instead of designing isolated islands of entertainment, they now design devices that can be interconnected and share media over the home network or the Internet. In the next few sections we will examine in more details the processing pipelines for photos, music, and videos, the Digital Living Network Alliance (DLNA) interoperability standard for content shar-
Digital Signal Processing in Home Entertainment
Productivity
Passive Entertainment
25
Entertainment Hub
Word processing Home Finances Web Access Simple PC games
Pictures Music Video
Media Streaming Interactive Entertainment Extreme PC Gaming
Past
Present
Near Future
Fig. 1 The evolution of the home PC.
ing in the home, and digital rights management issues, and how all these together influence the design and implementation of future signal processing systems.
2 The Digital Photo Pipeline If you ask people what non-living item they would try to rescue from a burning home, the most common answer will be “our photos”. Photos touch people like very view other items in a home. They provide a visual archive of a family’s past and present, memories most of us would like to preserve for the future generations as well. Figure 2 shows the photo pipelines for both an analog (film-based) and a digital camera. In the past, taking photos was a relatively simple process. As shown in Figure 2a, you took photos using a traditional film camera, you gave the film for developing, and you received a set of either prints or slides. In most cases, you would store your best pictures in a photo album and the rest in a shoe box. Unless you were a professional photographer, you had very little influence on how your printed pictures would look. Digital photography changed all that. Figure 2b shows a typical digital photo processing pipeline, and as we are going to see, digital signal processing is now part of every stage.
2.1 Image Capture and Compression Using a digital camera, digital signal processing begins with image capture. A new generation of integrated camera processors and DSP algorithms make sure that cap-
26
Konstantinos Konstantinides
Capture
Develop
Print
a) Film camera
Print Capture
Post-processing Store Organize
Digital Display b) Digital camera Fig. 2 Photo processing pipeline.
tured raw sensor data are translated to color-corrected image data that can be stored either in raw pixels or as compressed images. Inside the camera, the following image processing algorithms are usually involved:
2.2 Focus and exposure control Based on the content of the scene, camera controls determine the aperture size, the shutter speed, the automatic gain control, and the focal position of the lens.
2.3 Pre-processing of CCD data In this step, the camera processor removes noise and other artifacts to produce an accurate representation of the captured scene.
Digital Signal Processing in Home Entertainment
27
2.4 White Balance The goal of this step is to remove unwanted color casts from the image and colors that are perceived as “white” by the human visual system are rendered as white colors in the photo as well.
2.5 Demosaicing Most digital cameras have a single sensor which is overlaid with a color filter array (CFA). For example, a CFA with a Bayer pattern alternates red and green filters for odd rows with green and blue filters for odd rows. During demosaicing the camera uses interpolation algorithms to reconstruct a full-color image from the incomplete color samples.
2.6 Color transformations Color transformations transform color data from the sensor’s color space to one that betters matches the colors perceived by the human visual system, such as CIE XYZ.
2.7 Post Processing Demosaicing and color transformations may introduce color artifacts that are usually removed during this phase. In addition, a camera may also perform edge enhancement and other post-processing algorithms that improve picture quality.
2.8 Preview display The camera scales the captured image and renders it in a manner suitable to be previewed on the camera’s preview display.
2.9 Compression and storage The image is finally compressed and stored in the camera’s storage medium, such as flash memory. Compression is defined as the process of reducing the size of the digital representation of a signal and is an inherent operation in every digital camera.
28
Konstantinos Konstantinides
Consider a typical 8 Mega pixel image. Assuming 24-bits per pixel, un-compressed, such an image will take about 24 Mbytes. In contrast, a compressed version of the same picture can take less than 1.2 Mbytes in space, with no loss in perceptual quality. This difference means that if your camera has 500 MB of storage you can either store about 20 uncompressed pictures or about 400 compressed ones. In 1991, JPEG was introduced as an international standard for the coding of digital images, and remains among the most popular image-compression schemes.
2.10 Photo Post-Processing The ability to have access to the raw bits of a picture allowed users for the first time to manipulate and post-process their pictures. Even the simplest of photo processing software allows users to perform a number of image processing operations, such as: • • • • •
Crop, rotate, and resize images Automatically fix brightness and contrast Remove red-eye Adjust color and light Do special effects, or perform other sophisticated image processing algorithms
While early versions of image processing software depended heavily on user input, for example, a user had to select the eye area before doing red-eye removal, newer tools may use automatic face recognition algorithms and require minimal user interaction.
2.11 Storage and Search Digital pictures can be stored in a variety of media, including CD-ROMs, DVDs, PC hard drives, or network-attached storage (NAS) devices, both at home or on remote backup servers. These digital libraries not only provide a better alternative to shoe boxes, but also allow users to use new algorithms for the easy storage, organization, and retrieval of their photos. For example, new face recognition algorithms allow users to retrieve all photos that include “John” with a click of a button, instead of pouring through all the family albums. Similarly, photos can be automatically organized according to a variety of metadata, such as date, time, or even (GPSbased) location.
Digital Signal Processing in Home Entertainment
29
2.12 Digital Display Low-cost digital printers allow now for at home print-on-demand services, but photo display is not constrained any more by the covers of a photo album or the borders of the traditional photo frame. PCs, digital frames, and digital media receivers, like the Apple TV or the xbox360, allow users to display their photos into a variety of devices, including wall-mounted LCD monitors, digital frames, or even HDTVs. Display algorithms allow users to create custom slide shows with sophisticated transition effects that include rotations, zooming, blending and panning. For example, many display devices support now “Ken Burns effects”, a combination of display algorithms that combine slow zooming with panning, fading, and smooth transitions, similar to effects Ken Burns is using in his documentaries. All in all, the model digital photo pipeline integrates a variety of sophisticated digital image processing algorithms, from the time you take a picture all the way to the time you may display them on your HDTV.
3 The Digital Music Pipeline Music is the oldest form of home entertainment but in the last few years, thanks to digital signal processing, we have seen dramatic changes on how we acquire and consume music. In 1980, Philips and Sony introduced the first commercial audio CD and opened a new world of digital audio recordings and entertainment. Audio CDs offered clear sound, random access to audio tracks, and more than 60 minutes of continuous audio playback, so they quickly dominated the music market. But that was only the beginning. By mid 1990’s audio compression and the popular MP3 format allowed for a new generation of players and music distribution services. Figure 3 shows a simplified view of the digital music processing pipeline from a master recording to a player. As in the digital photo pipeline, digital signal processing is part of every stage.
3.1 Compression In most cases, music post-production processing begins with compression. Hardcore audiophiles will usually select a lossless audio compression scheme, which allows for the exact reconstruction of the original data. Examples of lossless audio codecs include WMA lossless, Meridian lossless (used in the DVD-Audio format), or FLAC (Free Lossless Audio Codec), to name just a few. However, in most cases, audio is being compressed using a lossy audio codec, such as MPEG-audio Layer III, also known as MP3, or the Advanced Audio Codec (AAC). Compared to the 1.4 Mbits/s bit rate from a stereo CD, compression allows for CD-quality music at about
30
Konstantinos Konstantinides
Compression
Digital audio broadcasting
Satellite or Digital Radios
Internet Downloads
Internet Audio Streaming
Portable Players
Home Digital Libraries or Servers
Digital Media Receivers
Fig. 3 Music processing pipeline.
128 kbits/s per audio channel. Music codecs achieve these levels of compression by taking into consideration both the characteristics of the audio signal and the human auditory perception system.
3.2 Broadcast Radio While AM/FM radios are still part of almost every household, they are complemented by other sources of broadcasted music to the home, including Satellite radio, HD Radio, and streamed Internet radio. Digital Audio Broadcasting (DAB), a European standard developed under the Eureka program, is being used in several countries in Europe, Canada, and Asia, and can use both MPEG-audio Layer II and AAC audio codecs. In North America, Sirius Satellite and XM Satellite Radio started offering subscription-based radio services in 2002.
Digital Signal Processing in Home Entertainment
31
In the United States, in 2002 the Federal Communications Commission (FCC) selected the “HD Radio” from iBiquity Digital Corporation as the method for digital terrestrial broadcasting for off-the-air AM or FM programming. Unlike DAB, HD Radio does not require any additional bandwidth or new channels. It uses Inband On-Channel (IBOC) technology which allows the simultaneous transmission of analog and digital broadcasts using the same bandwidth as existing AM/FM stations. HD Radio is a trademark of iBiquity Digital Corporation which develops all HD Radio transmitters and licenses HD Radio receiver technology. HD Radio is officially known as the NRSC-5 specification. Unlike DAB, NRSC-5 does not specify the audio codec. Today HD Radio uses a proprietary audio codec developed by iBiquity. In 1995, Real Networks introduced the RealAudio player and we entered the world of Internet-based radio. Today, Internet audio streaming is an integral part of any media player and allows users to access thousands of radio stations around the world.
3.3 Audio Processing Access to digital music allowed not only a new generation of music services, like the iTunes store from Apple Inc., but also created new opportunities for new tools for the storage, processing, organization, and distribution of music inside the home. Digital media servers allow us to store and organize our whole music collections like never before. Metadata in the audio files allows us to organize and view music files by genre, artist, album, and multiple other categories. In addition, new audio tools allow us to edit, mix, and create new audio tracks with the simplicity of “cut” and “paste.” But what if a track has no metadata? What if you just listened to a song and you would like to know more about it? New audio processing algorithms can use “digital signatures” in each song to automatically recognize a song while it is being played (see for example the Shazam service). Similar tools can automatically extract tempo and other music characteristics from a song or a selection of songs and automatically create a playlists with similar songs. For example, the “Genius” tool in Apple’s iTunes can automatically generate a playlist from a user’s library based on a single song selection.
4 The Digital Video Pipeline We have already discussed the importance of efficient coding in both the digital photo and music pipelines. Therefore, it will not come as a surprise that compression plays a key role in digital video entertainment as well. As early as 1988, the International Standards Organization (ISO) established the Moving Pictures Expert group (MPEG) with the mission to develop standards for the coded representation
32
Konstantinos Konstantinides
of moving pictures and associated audio. Since then, MPEG delivered a variety of video coding standards, including MPEG-1, MPEG-2, MPEG-4 (part 2), and more recently MPEG-4, part 10, or H.264. These standards opened the way to a variety of new digital media, including the VCD, the DVD, and more recently HDTV, HD-DVD and Blu-ray. However, compression is only a small part of the video processing pipeline in home entertainment. From Figure 4, the home can receive digital video content from a variety of sources, including digital video broadcasting, portable digital storage media, and portable video capture devices.
Internet Services
Capture
Post-processing
Store Organize
Off ff-the-Air Satellite Cable IPTV
Playback & Display
Digital Media: DVD Blu-ray Fig. 4 Digital video processing pipeline.
Digital Signal Processing in Home Entertainment
33
4.1 Digital Video Broadcasting As of June of 2009, in the United States, all television stations are required to broadcast only in digital format. In the next few years, similar transitions from analog to digital TV broadcasting will continue in countries all over the world. In addition to these changes, new Internet Protocol Television (IPTV) providers continue to challenge and get more market share from the traditional cable and satellite providers. Digital video broadcasting led to new developments in coding and communications and allowed a more efficient use of the communication spectrum. In addition, it influenced a new generation of set-top boxes and HDTVs with advanced video processing, personal video recording capabilities, and Internet connectivity.
4.2 New Portable Digital Media It is not a surprise that the DVD player is considered by many the most successful product in the history of consumer electronics. By providing excellent video quality, random access to video recordings, and multi-channel audio, it allowed consumers to experience movies in the home like never before. DVD movies combine MPEG2 video with Dolby Digital audio compression; however, a DVD is more than a collection of MPEG-coded files. Digital video and audio algorithms are being used in the whole DVD-creation pipeline, inside digital video cameras, in the editing room for the creation of special effects and post processing, and finally in the DVD authoring house, where bit rate-control algorithms apply rate-distortion principles to preserve the best video quality while using the minimum amount of bits. As we transition from standard definition to high definition, these challenges become even greater and the development of new digital signal processing algorithms and architectures is more important than ever. For example, the Blu-Ray specification not only allows for new coding schemes for high-definition video, but also provides advanced interactivity through Java-based programming and “BD Live.”
4.3 Digital Video Recording In 1999, Replay TV and Tivo launched the first digital video recorders or DVRs and changed forever the way people watch TV. DVRs offered far more than the traditional “time shifting” feature of a VCR. Hard disk drive-based recordings allowed numerous other features, including random access to recordings, instant replay, and commercial skip. More recently DVRs became complete entertainment centers with access to the Internet for remote access and Internet-based video streaming. In addition, USB-based ports allow users to connect personal media for instant access to their personal photos, music, and videos. Like modern Blu-ray players, DVRs inte-
34
Konstantinos Konstantinides
grate advanced digital media processors that allow for enhanced video processing and control. Figure 5 shows the simplified block diagram of a typical DVR. At the core of the DVR is a digital media processor that can perform multiple operations, including: • Interfaces with all the DVR peripherals, such as the IR controller and the hard disk drive • For digital video, it extracts a Program Stream from the input Transport Stream and either stores it in the HDD or decodes it and passes it to the output • For analog video, it passes it through to the output or compresses it and stores it in the HDD • Performs all DVR controls, communicates with the service provider to update the TV Guide, maintains the recording schedules, and performs all playback trick modes (such as, pause, fast forward, reverse play, and skip) • Performs multiple digital video and image-related operations, such as scaling and deinterlacing
IR Receiver
Front Panel Control
TV Tuners
Digital Video
Digital Media Processor
USB
Analog Video
Ethernet
Audio Codec
Security
SDRAM
HDD
Fig. 5 Block diagram of a typical DVR.
In addition to the media processor, a DVR includes at least one TV tuner, Infrared (IR) and/or RF interfaces for the remote control, a front panel control, securityrelated controls, such as Smart Card or CableCard interfaces, and audio and video output interfaces. DVRs may also include additional interfaces, including USB and Ethernet ports, so they can play media from external devices or for adding external peripherals, such as an external hard disk drive.
Digital Signal Processing in Home Entertainment
35
4.4 Digital Video Processing Comparing Figure 2 and Figure 4, we see that the digital video processing pipeline for personal camcorders is not that much different from the digital camera pipeline. After video capture using a CCD, video processing algorithms take the raw censor data, color correct them, and translate them in real-time to a compressed digital video bit stream. Alternatively, one can use any PC-based video capture card to capture and translate any analog video signal to digital. Regardless of the input, PC-based video editing tools allow us to apply a variety of post-processing imaging algorithms to the videos and create custom videos in a variety of formats. User videos can be stored either on DVDs or NAS-based media libraries. In addition, many users upload their videos into social networking sites, where videos are automatically transcoded to formats suitable for video streaming over the Internet, such as Adobe’s Flash video or H.264. Modern video servers allow for the seamless integration of video with Internet content, allowing users to search within video clips using either metadata or by automatically extracting data from the associated audio and video data.
4.5 Digital Video Playback and Display Every video eventually needs to be displayed on a screen, and modern Highdefinition monitors or TVs were never more sophisticated than now. Starting from their input ports, they need to interface with a variety of analog and digital ports; from the traditional RF input to DVI or HDMI. Sophisticated deinterlacing, motion compensation, and scaling algorithms are able to take any input, at any resolution, and convert it to the TV’s native resolution. For example, more recently, 100/120 Hz HDTVs need to apply advanced motion estimation and interpolation techniques to interpolate traditional 50/60 Hz video sequences to their higher frame rate. LEDbased LCD displays use also advanced algorithms to adjust the LED backlight according to the input video for the best possible dynamic contrast. In summary, digital signal processing is involved in every aspect of the capture, post-processing, and playback of photos, music, and videos, but that’s only the beginning. In the next section we will examine how our digital media can be shared anywhere in our home, using existing coding and communication technologies.
5 Sharing Digital Media in the Home In the previous sections we discussed how digital signal processing is associated with almost every aspect of the digital entertainment in the home. In most homes we see multiple islands of entertainment. For example, the living room may be one such island with a TV, a set-top box, a DVD player, and a gaming platform. The
36
Konstantinos Konstantinides
office may be another island with a PC and a NAS. In other rooms we may find a laptop or a second PC or even additional TVs. In addition, each family may have a variety of mobiles or smart phones, a camera, and a camcorder. Each of these islands is able to create, store, or playback content. Ideally, consumers would like to share media from any of these devices with any other device. For example, say you download a movie from the Internet on a PC but you really want to view it on the big TV in the living room, wouldn’t be nice to be able to stream that movie from the PC in the office to your TV via your home network? You definitely can, and the role of the Digital Living Network Alliance (DLNA) is to make this as easy as possible.
5.1 DLNA DLNA is a cross-industry association that was established in 2003 and includes now more than 250 companies from consumer electronics, the computer industry, and mobile device companies. The vision of DLNA is to create a wired or wireless interoperable network where digital content, such as photos, music, and videos can be shared by all home-entertainment devices. Figure 6 shows the DLNA interoperability stack for audio-visual devices. From Figure 6, DLNA uses established standards, such as IEEE 802.11 for wireless networking, HTTP for media transport, universal plug and play (UPnP) for device discovery and control, and standard media formats, such as JPEG and MPEG for digital media. DLNA defines more than 10 device classes, but for the purposes of this discussion we will only cover the two key classes: Digital media server and digital media player.
5.2 Digital Media Server A digital media server is any device that can acquire, record, store, and stream digital media to digital media players. Examples include PCs, advanced set-top boxes, music or video servers, and NAS devices.
5.3 Digital Media Player Digital media players (also known as digital media receivers) find content on digital media servers and provide playback capabilities. Figure 7 shows a typical interaction between a DLNA client and a DLNA server. The devices use HTTP for media transport and may also include a local user interface for setup and control. Examples of digital media players include connected TVs and game consoles. Today, digital media players combine wired and wireless connectivity and a powerful digi-
Digital Signal Processing in Home Entertainment
Media Formats Device Discovery, Control and Media Management
37
JPEG, LPCM, MPEG-2, MP3, MPEG-4, AAC LC, AVC/H.264 + optional formats UPnP AV 1.0 UPnP Device Architecture 1.0
Media Transport
HTTP (mandatory) RTP (optional)
Network Stack
IPv4 Protocol Suite
Network Connectivity
Wired: 803.3i, 802.3u Wireless: 802.11a/b/g Bluetooth
Fig. 6 DLNA Audio-Video Interoperability stack.
tal signal processor. However, despite the efforts of the standardization committees to establish standards for the coding of pictures, audio, and video, this did not stop the proliferation of other coding formats. For example, just for video, the coding alphabet soup for audio/video codecs includes DivX, xvid, Real Video, Windows Media Video (WMV), Adobe Flash video, and Matroska video, to name just a few. This proliferation of coding formats is a major challenge for consumer electronics manufacturers who strive to provide their customers a seamless experience. However, while it is rather easy to upload a new codec on a PC, this may not be the case for a set-top box, a DMR, or a mobile client. While manufacturers work on the next generation of multi-format players, it is not uncommon to see now PC-based media transcoders. These transcoders interact with media servers and allow for on-the-fly transcoding of any media format to the most commonly used formats, like MPEG-2 or WMV.
6 Copy Protection and DRM The digital distribution of media content opened new opportunities to consumers but also created a digital rights management nightmare for content creators and copyright holders. The Serial Copy Management System (SCMS) was one of the first schemes that attempted to control the number of second-generation digital copies
38
Konstantinos Konstantinides
Local User Interface MPEG, JPEG HTTP
Content via HTTP
UPnP-AV UPnP Device
UPnP Messages
Networking (IP/WiFi) DLNA Client
DLNA Server
Fig. 7 Typical DLNA Client-Server interaction.
one could make from a master copy. For example, while one can make a copy from an original CD, a special SCMS flag in the copy may indicate that one can’t make a copy of a copy. SCMS does not stopping you making unlimited copies from the original, but you can’t make a copy of the copy. SCMS was first introduced in digital audio tapes, but it is also available in audio CDs and other digital media. CSS (content scrambling system) is another copy protection scheme, used by almost every commercially available DVD-Video disc and controls the digital copying and playback of DVDs. DRM or digital rights management refers to technologies that control how digital media can be consumed. While, Fairplay from Apple Inc. and Windows DRM from Microsoft are the most commonly used DRM schemes, the development of alternative copy protection and DRM schemes is always an active area of research and development in digital signal processing.
6.1 Microsoft Windows DRM Microsoft provides two popular DRM kits; one for portable devices and one for networked devices. Figure 8 shows that under Windows Media DRM portable devices can acquire licenses either directly from a license server through an Internet connection, or indirectly, from a local media server (for example a user’s computer). Figure 9 shows license acquisition scenarios when two devices support the Windows Media DRM for Network devices. In this scenario, in addition to the basic DLNA stack, the digital media receiver integrates the Windows Media DRM layer
Digital Signal Processing in Home Entertainment
39
License Server
Direct License Acquisition
Internet
Direct License Acquisition
Indirect License Acquisition Fig. 8 License acquisition for portable devices.
(WMDRM) and has also the capability to play Windows Media Video (WMV) content. Using Windows Media DRM for network devices, the two devices exchange keys and data as follows: 1. The first time a device is used it must be registered and authorized by the server through UPnP. A special certificate identifies the device and contains information used to ensure secure communication. 2. During the initial registration, the server performs a proximity detection to verify that it is close enough to be considered inside the home. 3. Periodically, the server repeats the proximity detection to revalidate the device. 4. The device requests content for playback from the server. 5. If the server determines that the device is validated and has the right to play the content, it sends a response containing a new, encrypted session key, a rights policy statement specifying the security restrictions that the device must enforce, and finally the content. The content is encrypted by the session key. Each time content is requested, a new session is established. 6. The network device must parse the rights policy and determine if it can adhere to the required rights. If it can, it may render the content. As digital content is shared among multiple devices, most made by different manufacturers, an open DRM standard or DRM-free content seem to be the only ways to achieve inter-operability.
40
Konstantinos Konstantinides
Internet
Direct License Acquisition
Local User Interface WMV
MPEG, JPEG
WMDRM
Protected Content
HTTP UPnP-AV UPnP Device
UPnP Messages
Networking (IP/WiFi) DLNA Client with Windows DRM
DLNA Server Indirect License Acquisition
Fig. 9 License acquisition and data transfer for networked devices.
7 Summary and Conclusions The home entertainment ecosystem is transitioning from isolated analog devices into a network of interconnected digital devices. Driven by new developments in semiconductors, coding and digital communications, these devices can share photos, music, and videos using a home network, enhance the user experience, and continue to drive the development of new digital signal processing algorithms and architectures.
Digital Signal Processing in Home Entertainment
41
8 To Probe Further A good overview of the digital camera processing pipeline is given by [5]. An overview of image and video compression standards, covering both theory and architectures, including the JPEG [12], MPEG-1 [13], MPEG-2 [11], AAC [10], and MPEG-4 [7] standards, can be found in [9]. The H.264 standard, also known as MPEG-4, part 10, is defined in [2]. Chapter 5 and Chapter 3 of this Handbook provide additional details on video coding. The DAB specification can be found in [8]. In [3], Maxson provides a very thorough description of the NRSC-5 specification [1] for HD Radio transmission. For more information about DLNA, DLNA’s web site, www.dlna.org, provides excellent information and tutorials. A nice overview of DLNA can also be found in a White paper on UPnP and DLNA by [4]. Most of the material on Windows DRM is from an excellent description of Microsoft’s DRM by [6]. More details about Microsoft’s Extenders for Windows Media Center can be found in Microsoft’s Windows site [14].
References 1. NRSC-5-B in-band/on-channel digital radio broadcasting standard, CEA and NAB. Technical report, NRSC, April 2008. 2. H.264 series H: Audiovisual and multimedia systems, infrastracture of audiovisual services — coding of moving video, advanced video coding for generic audiovisual services. Technical report, ITU, 2007. 3. D. P. Maxson. The IBOC Handbook: understanding HD Radio technology. Focal Press, 2007. 4. Networked digital media standards, a UPnP/DLNA overview. Technical report, Allegro Software Development Corporation, October 2006. White paper. 5. R. Ramanath et al. Color image processing pipeline. IEEE Signal Processing Magazine, 22(1):34–43, January 2005. 6. J. Cohen. A general overview of Windows Media DRM 10 device technologies. Technical report, Microsoft Corporation, September 2004. 7. 14496-2, information technology - coding of audio-visual objects — part 2: Visual. Technical report, ISO/IEC, 2001. 8. ETS 300 401, radio broadcasting systems; digital audio broadcasting (DAB) to mobile, portable, and fixed receivers. Technical report, ETSI, 1997. 9. V. Bhaskaran and K. Konstantinides. Image and Video Compression Standards: Algorithms and Architectures. Kluwer Academic Publishers, second edition, 1997. 10. JTC1/SC29/WG11 N1430 MPEG-2 DIS 13818-7 (MPEG-2 advanced audio coding, AAC). Technical report, ISO/IEC, November 1996. 11. JTC1 CD 13818, generic coding of moving pictures and associated audio. Technical report, ISO/IEC, 1994. 12. JTC1 CD 10918 digital compression and coding of continuous-tone still images — part 1: Requirements and guidelines. Technical report, ISO/IEC, 1993. 13. JTC1 CD 11172, coding of moving pictures and associated audio for digital storage media up to 1.5 Mbits/s. Technical report, ISO/IEC, 1992. 14. Microsoft. Extenders for windows media center. http://www.microsoft.com/ windows/products/winfamily/mediacenter/features/extender.mspx, Cited: 4/6/2009.
MPEG Reconfigurable Video Coding Marco Mattavelli, Jörn W. Janneck and Mickaël Raulet
Abstract The current monolithic and lengthy scheme behind the standardization and the design of new video coding standards is becoming inappropriate to satisfy the dynamism and changing needs of the video coding community. Such a scheme and specification formalism do not enable designers to exploit the clear commonalities between the different codecs, neither at the level of the specification nor at the level of the implementation. Such a problem is one of the main reasons for the typical long time interval elapsing between the time a new idea is validated until it is implemented in consumer products as part of a worldwide standard. The analysis of this problem originated a new standard initiative within the ISO/IEC MPEG committee, called Reconfigurable Video Coding (RVC). The main idea is to develop a video coding standard that overcomes many shortcomings of the current standardization and specification process by updating and progressively incrementing a modular library of components. As the name implies, flexibility and reconfigurability are new attractive features of the RVC standard. The RVC framework is based on the usage of a new actor/dataflow oriented language called C AL for the specification of the standard library and the instantiation of the RVC decoder model. C AL dataflow models expose the intrinsic concurrency of the algorithms by employing the notions of actor programming and dataflow. This chapter gives an overview of the concepts and technologies building the standard RVC framework and the non standard tools supporting the RVC model from the instantiation and simulation of the C AL model to the software and/or hardware code synthesis. Marco Mattavelli Microelectronic Systems Lab, EPFL, CH-1015 Lausanne, Switzerland e-mail: [email protected] Jörn W. Janneck University of California at Berkeley, Berkeley, CA 94720, U.S.A. e-mail: [email protected] Mickaël Raulet IETR/INSA Rennes, F-35043, Rennes, France e-mail: [email protected]
S.S. Bhattacharyya et al. (eds.), Handbook of Signal Processing Systems, DOI 10.1007/978-1-4419-6345-1_3, © Springer Science+Business Media, LLC 2010
43
44
Marco Mattavelli, Jörn W. Janneck and Mickaël Raulet
1 Introduction A large number of successful MPEG (Motion Picture Expert Group) video coding standards has been developed since the first MPEG-1 standard in 1988. The standardization efforts in the field, besides having as first objective to guarantee the interoperability of compression systems, have also aimed at providing appropriate forms of specifications for wide and easy deployment. While video standards are becoming increasingly complex, and they take ever longer to be produced, this makes it difficult for standards bodies to produce timely specifications that address the need to the market at any given point in time. The structure of past standards has been one of a monolithic specification together with a fixed set of profiles that subset the functionality and capabilities of the complete standard. Similar comments apply to the reference code, which in more recent standards has become normative itself. Video devices are typically supporting a single profile of a specific standard, or a small set of profiles. They have therefore only very limited adaptivity to the video content, or to environmental factors (bandwidth availability, quality requirements). Within the ISO/IEC MPEG committee, Reconfigurable Video Coding (RVC) [12] [4] standard is intended to address the two following issues: make standards faster to produce, and permit video devices based on those standards to exhibit more flexibility with respect to the coding technology used for the video content. The key idea is to standardize a library of video coding components, instead of an entire video decoder. The standard can then evolve flexibly by incrementally extending that library, and video devices can configure themselves to support a variety of coding algorithms by composing encoders and decoders from that library of predefined coding modules. This chapter gives an overview of the concepts and technologies building the standard RVC framework and can complement and be complemented by Chapter 14, Chapter 2, and Chapter 5.
2 Requirements and rationale of the MPEG RVC framework Started in 2004, the MPEG Reconfigurable Video Coding (RVC) framework [4] is a new ISO standard currently (Fig. 1) under its final stage of standardization, aiming at providing video codec specifications at the level of library components instead of monolithic algorithms. RVC solves this problem by defining two standards: a language with which a video decoder can be described (ISO/IEC23001-4 or MPEG-B pt. 4 [10]) and a library of video coding tools employed in MPEG standards (ISO/IEC23002-4 or MPEG-C pt. 4 [11]). The new concept is to be able to specify a decoder of an existing standard or a completely new configuration that may better satisfy application-specific constraints by selecting standard components from a library of standard coding algorithms. The possibility of dynamic configuration and reconfiguration of codecs also requires new methodologies and new tools for describing the new bitstream syntaxes and the parsers of such new codecs.
MPEG Reconfigurable Video Coding
45
The essential concepts of the RVC framework (Fig. 2) are the following: • RVC-C AL [6], a dataflow language describing the Functional Unit (FU) behavior. This language defines the behavior of dataflow components called actors, which is a modular component that encapsulates its own state such that an actor can neither read nor modify the state of any other actor. The only interaction between actors is via messages (known in C AL as tokens) which flow from an output of one actor to an input of another. The behavior of an actor is defined in terms of a set of atomic actions. The execution inside an actor is purely sequential: at any point in time, only one action can be active inside an actor. An action can consume (read) tokens, modify the internal state of the actor, produce tokens, and interact with the underlying platform on which the actor is running. • FNL (Functional unit Network Language), a language describing the video codec configurations. FNL is an XML dialect that lists the FUs composing the codec, the parameterization of these FUs and the connections between the FUs. FNL
first hardware Code generator
2002
MPEG-4 SP decoder... ... in hardware
... in software
RVC-CAL specification MPEG-4 AVC decoder
2004
2003
CAL inception CLR 1.0
2005
OpenDF (SourceForge)
interpreter (Ptolemy)
2007
2006
2009
2008
first software Code generator Eclipse plugin
Orcc (SourceForge) Graphiti (SourceForge)
ISO/IEC FDIS 23001-4 ISO/IEC FDIS 23002-4
MPEG RVC
Fig. 1 C AL and RVC standard timeline.
MPEG-C
Decoder Description
FU Network Description (FNL) Bitstream Syntax Description (RVC-BSDL)
Model Instantiation: Selection of FUs and Parameter Assignment
MPEG Tool Library (RVC-CAL FUs)
Abstract Decoder Model (FNL + RVC-CAL) MPEG-B
Encoded Video Data
Decoder Implementation
MPEG Tool Library Implementation
Decoding Solution
Decoded Video Data
RVC Decoder Implementation
Fig. 2 RVC standard
2010
46
•
•
• • •
Marco Mattavelli, Jörn W. Janneck and Mickaël Raulet
allows hierarchical constructions: an FU can be defined as a composition of other FUs and described by another FND (FU Network Description). BSDL (Bitstream Syntax Description Language), a language describing the structure of the input bitstream. BSDL is a XML dialect that lists the sequence of the syntax elements with possible conditioning on the presence of the elements, according to the value of previously decoded elements. BSDL is further explained in section 3.4. A library of video coding tools, also called Functional Units (FU) covering all MPEG standards (the “MPEG Toolbox”). This library is specified and provided using RVC-C AL (a subset of the original C AL language) as specification language for each FU. An “Abstract Decoder Model” (ADM) constituting a codec configuration (described using FNL) instantiating FUs of the MPEG Toolbox. Figure 2 depicts the process of instantiating an “Abstract Decoder Model” in RVC. Tools simulating and validating the behavior of the ADM (Open DataFlow environment [1]). Tools automatically generating software and hardware descriptions of the ADM.
2.1 Limits of previous monolithic specifications MPEG has produced several video coding standards such as MPEG-1, MPEG-2, MPEG-4 Video, AVC (Advanced Video Coding) and recently SVC (Scalable Video Coding). While at the beginning MPEG-1 and MPEG-2 were only specified by textual descriptions, with the increasing complexity of algorithms, starting with the MPEG-4 set of standards, C or C++ specifications, called also reference software, have became the formal specification of the standards. However, the past monolithic specification of such standards (usually in the form of C/C++ programs) lacks flexibility and does not allow to use the combination of coding algorithms from different standards enabling to achieve specific design or performance trade-offs and thus fill, case by case, the requirements of specific applications. Indeed, not all coding tools defined in a profile@level of a specific standard are required in all application scenarios. For a given application, codecs are either not exploited at their full potential or require unnecessarily complex implementations. However, a decoder conformant to a standard has to support all of them and may results in non-efficient implementations. Moreover, such descriptions composed of non-optimized non-modular software packages have started to show many limits. If we consider that they are in practice the starting point of any implementation, system designers have to rewrite these software packages not only to try to optimize performances, but also to transform these descriptions into appropriate forms adapted to the current system design methodologies. Such monolithic specifications hide the inherent parallelism and the dataflow structure of the video coding algorithms, features that are necessary to be exploited for efficient implementations. In the meanwhile the evolution of video coding tech-
MPEG Reconfigurable Video Coding
47
nologies, leads to solutions that are increasingly complex to be designed and present significant overlap between successive versions of the standards. Why C etc. Fail? The control over low-level details, which is considered a merit of C language, typically tends to over-specify programs. Not only the algorithms themselves are specified, but also how inherently parallel computations are sequenced, how and when inputs and outputs are passed between the algorithms and, at a higher level, how computations are mapped to threads, processors and application specific hardware. In general, it is not possible to recover the original knowledge about the intrinsic properties of the algorithms by means of analysis of the software program and the opportunities for restructuring transformations on imperative sequential code are very limited compared to the parallelization potential available on multi-core platforms [3]. These are the main reasons for which C has been replaced by C AL in RVC.
2.2 Reconfigurable Video Coding specification requirements Scalable parallelism. In parallel programming, the number of things that are happening at the same time can scale in two ways: It can increase with the size of the problem or with the size of the program. Scaling a regular algorithm over larger amounts of data is a relatively well-understood problem, while building programs such that their parts execute concurrently without much interference is one of the key problems in scaling the von Neumann model. The explicit concurrency of the actor model provides a straightforward parallel composition mechanism that tends to lead to more parallelism as applications grow in size, and scheduling techniques permit scaling concurrent descriptions onto platforms with varying degrees of parallelism. Modularity and reuse. The ability to create new abstractions by building reusable entities is a key element in every programming language. For instance, objectoriented programming has made huge contributions to the construction of von Neumann programs, and the strong encapsulation of actors along with their hierarchical composability offers an analog for parallel programs. Concurrency. In contrast to procedural programming languages, where control flow is made explicit, the actor model emphasizes explicit specification of concurrency. Rallying around the pivotal and unifying von Neumann abstraction has resulted in a long and very successful collaboration between processor architects, compiler writers, and programmers. Yet, for many highly concurrent programs, portability has remained an elusive goal, often due to their sensitivity to timing. The untimedness and asynchrony of stream-based programming offers a solution to this problem. The portability of stream-based programs is underlined by the fact that programs of considerable complexity and size can be compiled to competitive hardware [14] as well as software [26], which suggests that stream-based programming might even be a solution to the old problem of flexibly co-synthesizing different mixes of hardware/software implementations from a single source.
48
Marco Mattavelli, Jörn W. Janneck and Mickaël Raulet
Encapsulation. The success of a stream programming model will in part depend on its ability to configure dynamically and to virtualize, i.e. to map to collections of computing resources too small for the entire program at once. Moving parts of a program on and off a resource requires encapsulation, i.e. a clear distinction between those pieces that belong to the parts to be moved and those that do not. The transactional execution of actors generates points of quiescence, the moments between transactions, when the actor is in a defined and known state that can be safely transferred across computing resources.
3 Description of the standard or normative components of the framework The fundamental element of the RVC framework, in the normative part, is the Decoder Description (Fig. 2) that includes two types of data: The Bitstream Syntax Description (BSD), which describes the structure of the bitstream. The BSD is written in RVC-BSDL. It is used to generate the appropriate parser to decode the corresponding input encoded data [9] [25]. The FU Network Description (FND), which describes the connections between the coding tools (i.e. FUs). It also contains the values of the parameters used for the instantiation of the different FUs composing the decoder [5] [14] [26]. The FND is written in the so called FU Network Language (FNL). The syntax parser (built from the BSD), together with the network of FUs (built from the FND), form a C AL model called the Abstract Decoder Model (ADM), which is the normative behavioral model of the decoder.
3.1 The toolbox library An interesting feature of the RVC standard that distinguishes it from traditional decoders-rigidly-specified video coding standards is that, a description of the decoder can be associated to the encoded data in various ways according to each application scenario. Figure 3 illustrates this conceptual view of RVC [20]. All the three types of decoders are within the RVC framework and constructed using the MPEG-B standardized languages. Hence, they all conform to the MPEG-B standard. A Type-1 decoder is constructed using the FUs within the MPEG Video Tool Library (VTL) only. Hence, this type of decoder conforms to both the MPEG-B and MPEG-C standards. A Type-2 decoder is constructed using FUs from the MPEG VTL as well as one or more proprietary libraries (VTL 1-n). This type of decoder conforms to the MPEG-B standard only. Finally, a Type-3 decoder is constructed using one or more proprietary VTL (VTL 1-n), without using the MPEG VTL. This type of decoder also conforms to the MPEG-B standard only. An RVC decoder (i.e. conformant to MPEG-B) is composed of coding tools described in VTLs according
MPEG Reconfigurable Video Coding
49
to the decoder description. The MPEG VTL is described by MPEG-C. Traditional programming paradigms (monolithic code) are not appropriate for supporting such types of modular framework. A new dataflow-based programming model is thus specified and introduced by MPEG RVC as specification formalism.
MPEG VTL (MPEG-C)
Decoder Descripon Coded data
Video Tools Libraries {1..N}
Decoder type-1 Decoder type-2 or Decoder type-3 or
Decoded video
MPEG-B decoder
Fig. 3 The conceptual view of RVC.
The MPEG VTL is normatively specified using RVC-C AL. An appropriate level of granularity for the components of the standard library is important, to enable an effective possibility of reconfigurations, for codecs, and an efficient reuse of components in codecs implementations. If the library is composed of too coarse modules, such modules will be too large/coarse to allow their usage in different and interesting codec configurations, whereas, if the library component granularity level is too fine, the number of modules in the library will result to be too large for an efficient and practical reconfiguration process at the codec implementation side, and may obscure the desired high-level description and modeling features of the RVC codec specifications. Most of the efforts behind the standardization of the MPEG VTL were devoted to study the best granularity trade-off level of the VTL components. However, it must be noticed that the choice of the best trade-off in terms of high-level description and module re-usability, does not really affect the potential parallelism of the algorithm that can be exploited in multi-core and FPGA implementations.
3.2 The C AL Actor Language C AL [6] is a domain-specific language that provides useful abstractions for dataflow programming with actors. For more information on dataflow methods, the reader may refer to Part IV of this handbook, which contains several chapters that go into detail on various kinds of dataflow techniques for design and implementation of signal processing systems. C AL has been used in a wide variety of applications and has been compiled to hardware and software implementations, and work on
50
Marco Mattavelli, Jörn W. Janneck and Mickaël Raulet
mixed HW/SW implementations is under way. The next section provides a brief introduction to some key elements of the language. 3.2.1 Basic Constructs The basic structure of a C AL actor is shown in the Add actor (Fig. 4), which has two input ports A and B, and one output port Out, all of type T. T may be of type int, or uint for respectively integers and unsigned integers, of type bool for booleans, or of type float for floating-point integers. Moreover C AL designers may assign a number of bits to the specific integer type depending on the variable numeric size. The actor contains one action that consumes one token on each input ports, and produces one token on the output port. An action may fire if the availability of tokens on the input ports matches the port patterns, which in this example corresponds to one token on both ports A and B.
a c t o r Add ( ) T A, T B ⇒ T Out : a c t i o n [ a ] , [ b ] ⇒ [ sum ] do sum : = a + b ; end end
Fig. 4 Basic structure of a C AL actor.
An actor may have any number of actions. The untyped Select actor (Fig. 5) reads and forwards a token from either port A or B, depending on the evaluation of guard conditions. Note that each of the actions has empty bodies. a c t o r S e l e c t ( ) S , A, B ⇒ O u t p u t : a c t i o n S : [ s e l ] , A: [ v ] ⇒ [ v ] guard s e l end action S : [ s e l ] , B: [ v ] ⇒ [ v ] guard not s e l end end
Fig. 5 Guard structure in a C AL actor.
MPEG Reconfigurable Video Coding
51
3.2.2 Priorities and State Machines An action may be labeled and it is possible to constrain the legal firing sequence by expressions over labels. In the PingPongMerge actor, reported in the Figure 6, a finite state machine schedule is used to force the action sequence to alternate between the two actions A and B. The schedule statement introduces two states s1 and s2.
a c t o r Pi ngPongM erge ( ) I n p u t 1 , I n p u t 2 ⇒ O u t p u t : A: a c t i o n I n p u t 1 : [ x ] ⇒ [ x ] end B : a c t i o n I n p u t 2 : [ x ] ⇒ [ x ] end s c h e d u l e fsm s1 : s1 (A) −−> s2 ; s2 (B ) −−> s1 ; end end
Fig. 6 FSM structure in a C AL actor.
The Route actor, in the Figure 7, forwards the token on the input port A to one of the three output ports. Upon instantiation it takes two parameters, the functions P and Q, which are used as predicates in the guard conditions. The selection of which action to fire is in this example not only determined by the availability of tokens and the guards conditions, by also depends on the priority statement. a c t o r Rout e ( P , Q) A ⇒ X, Y, Z : toX : a c t i o n [ v ] guard P ( v ) toY : a c t i o n [ v ] guard Q( v ) t oZ : a c t i o n [ v ]
⇒ X: [ v ] end ⇒ Y: [ v ] end ⇒ Z : [ v ] end
priority toX > toY > t oZ ; end end
Fig. 7 Priority structure in a C AL actor.
52
Marco Mattavelli, Jörn W. Janneck and Mickaël Raulet
3.2.3 C AL subset language for RVC For an in-depth description of the language, the reader is referred to the language report [6], for the specific subset specified and standardized by ISO in the Annex C of [10]. This subset only deals with fully typed actors and some restrictions on the C AL language constructs from [6] to have efficient hardware and software code generations without changing the expressivity of the algorithm. For instance, Figures 5, 6 and 7 are not RVC-C AL compliant and must be changed as the Figures 8, 9 and 10 where T1, T2, T are the types and only typed paramters can be passed to the actors not functions as P, Q.
a c t o r S e l e c t ( ) T1 S , T2 A, T3 B ⇒ T3 O u t p u t : a c t i o n S : [ s e l ] , A: [ v ] ⇒ [ v ] guard s e l end action S : [ s e l ] , B: [ v ] ⇒ [ v ] guard not s e l end end
Fig. 8 Guard structure in a RVC-C AL actor.
a c t o r Pi ngPongM erge ( ) T I n p u t 1 , T I n p u t 2 ⇒ T O u t p u t : A: a c t i o n I n p u t 1 : [ x ] ⇒ [ x ] end B : a c t i o n I n p u t 2 : [ x ] ⇒ [ x ] end s c h e d u l e fsm s1 : s1 (A) −−> s2 ; s2 (B ) −−> s1 ; end end
Fig. 9 FSM structure in a RVC-C AL actor.
A large selection of example actors is available at the OpenDF repository [15], among them can also be found the MPEG-4 decoder discussed below. Many other actors written in RVC-C AL will be soon available at the standard MPEG RVC tool repository once the conformance testing process will be completed.
MPEG Reconfigurable Video Coding
53
a c t o r Rout e ( ) T A ⇒ T X , T Y, T Z : f u n t i o n P ( T v _ i n )−−> T : \ \ body o f t h e f u n c t i o n P P( v_in ) end f u n t i o n Q( T v _ i n )−−> T : \ \ body o f t h e f u n c t i o n P Q( v _ i n ) end toX : a c t i o n [ v ] guard P ( v ) toY : a c t i o n [ v ] guard Q( v ) t oZ : a c t i o n [ v ]
⇒ X: [ v ] end ⇒ Y: [ v ] end ⇒ Z : [ v ] end
priority toX > toY > t oZ ; end end
Fig. 10 Priority structure in a RVC-C AL actor.
3.3 FU Network language for the codec configurations A set of C AL actors are instantiated and connected to form a C AL application, i.e. a C AL network. Figure 11 shows a simple C AL network Sum, which consists of the previously defined RVC-C AL Add actor and the delay actor shown in Figure 12. 6XP 2XW
,Q
=Y
,Q
% $
$GG
2XW
2XW
Fig. 11 A simple C AL network.
The source/language that defined the network Sum is found in Figure 13. Please, note that the network itself has input and output ports and that the instantiated entities may be either actors or other networks, which allow for a hierarchical design. Formerly, networks have been traditionally described in a textual language, which can be automatically converted to FNL and vice versa — the XML dialect
54
Marco Mattavelli, Jörn W. Janneck and Mickaël Raulet
a c t o r Z ( v ) T I n ⇒ T Out : A: a c t i o n ⇒ [ v ] end B : a c t i o n [ x ] ⇒ [ x ] end s c h e d u l e fsm s0 : s0 (A) −−> s1 ; s1 (B ) −−> s1 ; end end
Fig. 12 RVC-C AL Delay actor. network Sum ( ) I n ⇒ Out : entities add = Add ( ) ; z = Z( v =0); structure I n −−> add . A; z . Out −−> add . B ; add . Out −−> z . I n ; add . Out −− > Out ; end
Fig. 13 Textual representation of the Sum network.
standardized by ISO in Annex B of [10]. XML (Extensible Markup Language) is a flexible way to create common information formats. XML is a formal recommendation from the World Wide Web Consortium (W3C). XML is not a programming language, it is rather a set of rules that allow you to represent data in a structured manner. Since the rules are standard, the XML documents can be automatically generated and processed. Its use can be gauged from its name itself: • Markup: Is a collection of Tags • XML Tags: Identify the content of data • Extensible: User-defined tags The XML representation of the Sum network is found in Figure 14. A graphical editing framework called Graphiti editor [8] is available to create, edit, save and display a network. The XML and textual format for the network description are supported by such an editor.
MPEG Reconfigurable Video Coding
55