259 19 59MB
English Pages 413 [416] Year 2022
Information and Coding Theory in Computer Science
INFORMATION AND CODING THEORY IN COMPUTER SCIENCE
Edited by: Zoran Gacovski
ARCLER
P
r
e
s
s
www.arclerpress.com
Information and Coding Theory in Computer Science Zoran Gacovski
Arcler Press 224 Shoreacres Road Burlington, ON L7L 2H2 Canada www.arclerpress.com Email: [email protected] e-book Edition 2023 ISBN: 978-1-77469-610-1 (e-book) This book contains information obtained from highly regarded resources. Reprinted material sources are indicated. Copyright for individual articles remains with the authors as indicated and published under Creative Commons License. A Wide variety of references are listed. Reasonable efforts have been made to publish reliable data and views articulated in the chapters are those of the individual contributors, and not necessarily those of the editors or publishers. Editors or publishers are not responsible for the accuracy of the information in the published chapters or consequences of their use. The publisher assumes no responsibility for any damage or grievance to the persons or property arising out of the use of any materials, instructions, methods or thoughts in the book. The editors and the publisher have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission has not been obtained. If any copyright holder has not been acknowledged, please write to us so we may rectify. Notice: Registered trademark of products or corporate names are used only for explanation and identification without intent of infringement.
© 2023 Arcler Press ISBN: 978-1-77469-446-6 (Hardcover) Arcler Press publishes wide variety of books and eBooks. For more information about Arcler Press and its products, visit our website at www.arclerpress.com
DECLARATION Some content or chapters in this book are open access copyright free published research work, which is published under Creative Commons License and are indicated with the citation. We are thankful to the publishers and authors of the content and chapters as without them this book wouldn’t have been possible.
ABOUT THE EDITOR
Dr. Zoran Gacovski’s current position is a full professor at the Faculty of Technical Sciences, “Mother Tereza” University, Skopje, Macedonia. His teaching subjects include Software engineering and Intelligent systems, and his areas of research are: information systems, intelligent control, machine learning, graphical models (Petri, Neural and Bayesian networks), and human-computer interaction. Prof. Gacovski has earned his PhD degree at Faculty of Electrical engineering, UKIM, Skopje. In his career he was awarded by Fulbright postdoctoral fellowship (2002) for research stay at Rutgers University, USA. He has also earned best-paper award at the Baltic Olympiad for Automation control (2002), US NSF grant for conducting a specific research in the field of human-computer interaction at Rutgers University, USA (2003), and DAAD grant for research stay at University of Bremen, Germany (2008 and 2012). The projects he took an active participation in, are: “A multimodal human-computer interaction and modelling of the user behaviour” (for Rutgers University, 2002-2003) - sponsored by US Army and Ford; “Development and implementation of algorithms for guidance, navigation and control of mobile objects” (for Military Academy – Skopje, 1999-2002); “Analytical and non-analytical intelligent systems for deciding and control of uncertain complex processes” (for Macedonian Ministry of Science, 1995-1998). He is the author of 3 books (including international edition “Mobile Robots”), 20 journal papers, over 40 Conference papers, and he is also a reviewer/ editor for IEEE journals and Conferences.
TABLE OF CONTENTS
List of Contributors........................................................................................xv
List of Abbreviations..................................................................................... xxi
Preface................................................................................................... ....xxiii Section 1: Information Theory Methods and Approaches Chapter 1
Information Theory of Cognitive Radio System.......................................... 3 Introduction................................................................................................ 3 Cognitive Radio Network Paradigms........................................................... 4 Interference-Mitigating Cognitive Behavior: The Congnitive Radio Channel.................................................................................. 8 Interference Avoiding Channel.................................................................. 13 Colloborative Cognitive Channel.............................................................. 15 Comparsions............................................................................................. 17 References................................................................................................ 19
Chapter 2
Information Theory and Entropies for Quantized Optical Waves in Complex Time-Varying Media.................................................. 23 Introduction.............................................................................................. 23 Quantum Optical Waves in Time-Varying Media...................................... 25 Information Measures for Thermalized Quantum Optical Fields................ 29 Husimi Uncertainties and Uncertainty Relations....................................... 34 Entropies and Entropic Uncertainty Relations............................................ 35 Application to a Special System................................................................ 37 Summary and Conclusion......................................................................... 38 References................................................................................................ 42
Chapter 3
Some Inequalities in Information Theory Using Tsallis Entropy............... 45 Abstract.................................................................................................... 45 Introduction.............................................................................................. 46 Formulation of the Problem...................................................................... 47 Mean Codeword Length and its Bounds.................................................... 49 Conclusion............................................................................................... 53 References................................................................................................ 54
Chapter 4
The Computational Theory of Intelligence: Information Entropy............. 57 Abstract.................................................................................................... 57 Introduction.............................................................................................. 58 Entropy..................................................................................................... 59 Intelligence: Definition and Assumptions.................................................. 61 Global Effects........................................................................................... 65 Applications............................................................................................. 66 Related Works.......................................................................................... 69 Conclusions.............................................................................................. 70 References................................................................................................ 72 Section 2: Block and Stream Coding
Chapter 5
Block-Split Array Coding Algorithm for Long-Stream Data Compression............................................................................................ 75 Abstract.................................................................................................... 75 Introduction.............................................................................................. 76 Problems of Long-Stream Data Compression for Sensors........................... 78 CZ-Array Coding...................................................................................... 85 Analyses of CZ-Array Algorithm................................................................ 96 Experiment Results.................................................................................. 102 Conclusions............................................................................................ 110 Acknowledgments.................................................................................. 111 References.............................................................................................. 112
Chapter 6
Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding.......................................................................... 115 Abstract.................................................................................................. 115 Introduction............................................................................................ 116 x
Related Work on Lossless Compression.................................................. 119 Our Proposed Method............................................................................ 121 Experiments............................................................................................ 129 Conclusions............................................................................................ 142 Acknowledgments.................................................................................. 143 References.............................................................................................. 144 Chapter 7
Beam Pattern Scanning (BPS) versus Space-Time Block Coding (STBC) and Space-Time Trellis Coding (STTC)........................... 149 Abstract.................................................................................................. 149 Introduction............................................................................................ 150 Introduction of STBC, STTC and BPS Techniques.................................... 152 BPS versus STBC, STTC........................................................................... 159 Simulations............................................................................................. 161 Conclusions............................................................................................ 170 References.............................................................................................. 171
Chapter 8
Partial Feedback Based Orthogonal Space-Time Block Coding With Flexible Feedback Bits....................................................... 175 Abstract.................................................................................................. 175 Introduction............................................................................................ 176 Proposed Code Construction and System Model..................................... 177 Linear Decoder at the Receiver............................................................... 178 Feedback Bits Selection and Properties................................................... 179 Simulation Results.................................................................................. 183 Conclusions............................................................................................ 185 References.............................................................................................. 186
Chapter 9
Rateless Space-Time Block Codes for 5G Wireless Communication Systems........................................................................ 187 Abstract.................................................................................................. 187 Introduction............................................................................................ 188 Concept of Rateless Codes...................................................................... 189 Rateless Coding and Hybrid Automatic Retransmission Query................ 191 Rateless Codes’ Literature Review........................................................... 192 Rateless Codes Applications................................................................... 196
xi
Motivation to Rateless Space-Time Coding.............................................. 197 Rateless Space-Time Block Code for Massive MIMO Systems................. 197 Conclusion............................................................................................. 202 References.............................................................................................. 204 Section 3: Lossless Data Compression Chapter 10 Lossless Image Compression Technique Using Combination Methods............................................................................ 211 Abstract.................................................................................................. 211 Introduction............................................................................................ 212 Literature Review.................................................................................... 214 The Proposed Method............................................................................. 216 Conclusions............................................................................................ 229 Future Work............................................................................................ 230 References.............................................................................................. 231 Chapter 11 New Results in Perceptually Lossless Compression of Hyperspectral Images............................................................................. 233 Abstract.................................................................................................. 233 Introduction............................................................................................ 234 Data and Approach................................................................................. 236 Experimental Results............................................................................... 242 Conclusion............................................................................................. 264 Acknowledgements................................................................................ 265 References.............................................................................................. 266 Chapter 12 Lossless compression of digital mammography using base switching method................................................................................... 269 Abstract.................................................................................................. 269 Introduction............................................................................................ 270 Base-Switching Algorithm....................................................................... 274 Proposed Method................................................................................... 277 Results.................................................................................................... 284 Conclusions............................................................................................ 285 References.............................................................................................. 286
xii
Chapter 13 Lossless Image Compression Based on Multiple-Tables Arithmetic Coding................................................... 289 Abstract.................................................................................................. 290 Introduction............................................................................................ 290 The MTAC Method................................................................................. 291 Experiments............................................................................................ 300 Conclusions............................................................................................ 304 References.............................................................................................. 305 Section 4: Information and Shannon Entropy Chapter 14 Entropy—A Universal Concept in Sciences............................................ 309 Abstract.................................................................................................. 309 Introduction............................................................................................ 310 Entropy as a Qualificator of the Configurational Order........................... 314 The Concept of Entropy in Thermodynamics and Statistical Physics........ 315 The Shannon-Like Entropies.................................................................... 318 Conclusions............................................................................................ 323 Appendix................................................................................................ 324 Notes ..................................................................................................... 328 References.............................................................................................. 331 Chapter 15 Shannon entropy: Axiomatic Characterization and Application............. 333 Introduction............................................................................................ 334 Shannon Entropy: Axiomatic Characterization........................................ 334 Total Shannon Entropy and Entropy of Continuous Distribution.............. 337 Application: Differential Entropy and Entropy in Classical Statistics........ 339 Conclusion............................................................................................. 341 References.............................................................................................. 343 Chapter 16 Shannon Entropy in Distributed Scientific Calculations on Mobiles Ad-Hoc Networks (MANETs).................................................... 345 Abstract.................................................................................................. 345 Introduction............................................................................................ 346 Measuring the Problem........................................................................... 346 Simulation.............................................................................................. 352
xiii
Conclusion............................................................................................. 358 References.............................................................................................. 360 Chapter 17 Advancing Shannon Entropy for Measuring Diversity in Systems........... 361 Abstract.................................................................................................. 361 Introduction............................................................................................ 362 Renormalizing Probability: Case-Based Entropy and the Distribution of Diversity................................................................ 365 Case-Based Entropy of a Continuous Random Variable........................... 367 Results.................................................................................................... 369 Using 𝐶𝑐 to Compare and Contrast Systems............................................. 377 Conclusion............................................................................................. 379 Acknowledgments.................................................................................. 380 References.............................................................................................. 381 Index...................................................................................................... 383
xiv
LIST OF CONTRIBUTORS F. G. Awan University of Engineering and Technology Lahore, 54890 Pakistan N. M. Sheikh University of Engineering and Technology Lahore, 54890 Pakistan M. F. Hanif University of the Punjab Quaid-e-Azam Campus 54590, Lahore Pakistan Jeong Ryeol Choi Department of Radiologic Technology, Daegu Health College, Yeongsong-ro 15, Bukgu, Daegu 702-722, Republic of Korea Litegebe Wondie Department of Mathematics, College of Natural and Computational Science, University of Gondar, Gondar, Ethiopia Satish Kumar Department of Mathematics, College of Natural and Computational Science, University of Gondar, Gondar, Ethiopia Daniel Kovach Kovach Technologies, San Jose, CA, USA Qin Jiancheng School of Electronic and Information Engineering, South China University of Technology, Guangdong, China Lu Yiqin School of Electronic and Information Engineering, South China University of Technology, Guangdong, China Zhong Yu Zhaoqing Branch, China Telecom Co., Ltd., Guangdong, China School of Software, South China University of Technology, Guangdong, China
Jungan Chen Department of Electronic and Computer Science, Zhejiang Wanli University, Ningbo, China Jean Jiang College of Technology, Purdue University Northwest, Indiana, USA Xinnian Guo Department of Electronic Information Engineering, Huaiyin Institute of Technology, Huaian, China Lizhe Tan Department of Electrical and Computer Engineering, Purdue University Northwest, Indiana, USA Peh Keong The Department of Electrical and Computer Engineering, Michigan Technology University, Houghton, Michigan, USA Seyed (Reza) Zekavat Department of Electrical and Computer Engineering, Michigan Technology University, Houghton, Michigan, USA Lei Wang School of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an, China. Zhigang Chen School of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an, China. Ali Alqahtani College of Applied Engineering, King Saud University, Riyadh, Saudi Arabia A. Alarabeyyat Prince Abdulah Bin Gazi Faculty of Information Technology, Al-Balqa Applied University, Salt, Jordan S. Al-Hashemi Prince Abdulah Bin Gazi Faculty of Information Technology, Al-Balqa Applied University, Salt, Jordan
xvi
T. Khdour Prince Abdulah Bin Gazi Faculty of Information Technology, Al-Balqa Applied University, Salt, Jordan M. Hjouj Btoush Prince Abdulah Bin Gazi Faculty of Information Technology, Al-Balqa Applied University, Salt, Jordan S. Bani-Ahmad Prince Abdulah Bin Gazi Faculty of Information Technology, Al-Balqa Applied University, Salt, Jordan R. Al-Hashemi The Computer Information Systems Department, College of Information Technology, Al-Hussein Bin Talal University, Ma’an, Jordan. Chiman Kwan Applied Research LLC, Rockville, Maryland, USA Jude Larkin Applied Research LLC, Rockville, Maryland, USA Ravi kumar Mulemajalu Department of IS&E., KVGCE, Sullia, Karnataka, India Shivaprakash Koliwad Department of E&C., MCE, Hassan, Karnataka, India. Rung-Ching Chen Department of Information Management, Chaoyang University of Technology, No. 168, Jifong E. Rd., Wufong Township Taichung County 41349, Taiwan Pei-Yan Pai Department of Computer Science, National Tsing-Hua University, No. 101, Section 2, Kuang-Fu Road, Hsinchu 300, Taiwan Yung-Kuan Chan Department of Management Information Systems, National Chung Hsing University, 250 Kuo-kuang Road, Taichung 402, Taiwan
xvii
Chin-Chen Chang Department of Computer Science, National Tsing-Hua University, No. 101, Section 2, Kuang-Fu Road, Hsinchu 300, Taiwan Department of Information Engineering and Computer Science, Feng Chia University, No. 100 Wenhwa Rd., Seatwen, Taichung 407, Taiwan Vladimír Majerník Mathematical Institute, Slovak Academy of Sciences, Bratislava, Slovakia C. G. Chakrabarti Department of Applied Mathematics, University of Calcutta, India Indranil Chakrabarty Department of Mathematics, Heritage Institute of Technology, Chowbaga Road, Anandapur, India Pablo José Iuliano LINTI, Facultad de Informatica UNLP, La Plata, Argentina Luís Marrone LINTI, Facultad de Informatica UNLP, La Plata, Argentina R. Rajaram Department of Mathematical Sciences, Kent State University, Kent, OH, USA B. Castellani Department of Sociology, Kent State University, 3300 Lake Rd. West, Ashtabula, OH, USA A. N. Wilson School of Social and Health Sciences, Abertay University, Dundee DD1 1HG, UK
xviii
LIST OF ABBREVIATIONS 5G Fifth-Generation ACK Acknowledgments AMC
Adaptive Modulation And Coding
AR Augmented Reality ARM
Advanced RISC Machines
ARQ
Automatic Repeat reQuest
ASIC
Application-Specific Integrated Circuit
AVIRIS
Airborne Visible Infrared Imaging Spectrometer
AWGN
Additive White Gaussian Noise
BAB Block Array Builder BB-ANS
Bits Back With Asymmetric Numeral Systems
BBM
Bialynicki-Birula and Mycielski
BCH
Bose, Chaudhuri and Hocquenghem
BEC
Backward Error Correction
BECH
Binary Erasure Channel
BER
Bit Error Rate
BPS
Beam Pattern Scanning
BS Base Station BS Base switching BSC
Binary Symmetrical Channel
BST Base-Switching Transformation BWT Burrows-Wheeler Transform CAD Computer Aided Diagnosis CDF
Cumulative Distribution Function
CI Computational Intelligence CNN
Convolutional Neural Network
CoBALP+
Context-Based Adaptive Linear Prediction
CR Cognitive Radio
CR Compression Ratio CSI
Channel State Information
DCT Discrete Cosine Transform DMT Diversity Multiplexing Tradeoff DST
Dempster-Shafer Theory of Evidence
EDP Edge-Directed Prediction EGC
Equal Gain Combining
EURs
Entropic Uncertainty Relations
FBB
Fixed Block Based
FCC
Federal Communications Commission
FEC
Forward Error Correction
FLIF
Free Lossless Image Format
FPGA Field-Programmable Gate Array GAI General Artificial Intelligence GAP Gradient Adjusted Predictor GED
Gradient Edge Detection
GP Gel’fand Pinsker HD High-Definition HIS Hyperspectral Images HPBW
Half Power Beam Width
HUR
Heisenberg Uncertainty Relation
HVS Human Visual Systems IC-DMS
Interference Channel with Degraded Message Sets
IoT Internet of Things IR Industrial Revolution KLT Karhunen-Loeve Transform LDGM
Low-Density Generator Matrix
LDPC
Low-Density Parity Check Code
LS Least-Square LT Luby Transform LZW Lempel-Ziv-Welch MAC Multiple Access Channel MAI Multiple Access Interference MANETs
Mobile Ad-Hoc Networks xx
MANIAC
Meta-Adaptive Near-Zero Integer Arithmetic Coding
MBMS
Multimedia Broadcast Multicast System
MC-CDMA
Multi-Carrier Code Division Multiple Access
MED
Median Edge Detector
MGF
Moment Generating Function
MIAS Mammography Image Analysis MIMO
Multiple Input Multiple Output
ML Maximum Likelihood MLB
Matching Link Builder
MLP Multi-Level Progressive MMSE
Minimum Mean Square Error
MRC
Maximal Ratio Combining
MTAC Multiple-Tables Arithmetic Coding MTF Move-To-Front MUI Multiuser Interference OFDM
Orthogonal Frequency Division Multiplexing
OSI
Open Systems Interconnection
OSTBC
Orthogonal Space-Time Block Coding
PC Pilot Contamination PCA Principal Component Analysis PNG
Portable Network Graphics
PPM
Prediction By Partial Matching
PPMM
Partial Precision Matching Method
PSNR Peak-Signal-To-Noise Ratio PU Primary User QoS
Quality of Service
RAM Random Access Memory RCs Rateless Codes RestNet
Residual Neural Network
RF Radio Frequency RLE Run-Length Encoding RLS
Recursive Least Squares
RNN
Recurrent Neural Network
ROSIS
Reflective Optics System Imaging Spectrometer xxi
RSTBC
Rateless Space-Time Block Code
SB Split Band SDMA
Spatial Division Multiple Access
SER Symbol-Error-Rate SINR Signal-To-Interference-And-Noise Ratio SISO Single-Input Single-Output SNR Signal-To-Noise Ratio STBC
Space-Time Block Coding
STC Space-Time Coding STTC Space-Time Trellis Coding SVD Singular Value Decomposition TMBA
Two Modules Based Algorithm
UWB Ultrawideband V-BLAST
Vertical Bell Labs Layered Spacetime
VBSS
Variable Block Size Segmentation
VM Virtual Memory VSI
Visual Saliency-based Index
WBAN
Wearable Body Area Network
WSN
Wireless Sensor Network
XML
Extensible Markup Language
xxii
PREFACE
Coding theory is a field that studies the codes, their properties and their suitability for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transfer and data storage. Codes are studied in a variety of scientific disciplines - such as information theory, electrical engineering, mathematics, linguistics, and computer science in order to design efficient and reliable data transmission methods. This usually involves removing redundant digits and detecting / correction of errors in the transmitted data. There are four types of coding:
•
Data compression (or, source coding)
•
Error control (or channel peeling)
•
Cryptographic coding
•
Line coding
Data compression tries to remove redundant data from a source in order to transfer it as efficiently as possible. For example, Zip data compression makes data files smaller for purposes such as reducing Internet traffic. Data compression and error correction can be studied in combination. Error correction adds extra bits to make data transmission more robust to interference that occurs on the transmission channel. The average user may not be aware of the many types of applications that use error correction. A typical music CD uses the Reed-Solomon code to correct the impact of scratches and dust. In this application, the transmission channel is the CD itself. Mobile phones also use coding techniques to correct attenuation and high frequency transmission noise. Data modems, telephone transmissions, and NASA’s Deep Space Network all use channel coding techniques to transmit bits, such as turbo code and LDPC codes. This edition covers different topics from information theory methods and approaches, block and stream coding, lossless data compression, and information and Shannon entropy. Section 1 focuses on information theory methods and approaches, describing information theory of cognitive radio system, information theory and entropies
for quantized optical waves in complex time-varying media, some inequalities in information theory using Tsallis entropy, and computational theory of intelligence: information entropy. Section 2 focuses on block and stream coding, describing block-split array coding algorithm for long-stream data compression, bit-error aware lossless image compression with 2d-layer-block coding, beam pattern scanning (BPS) versus space-time block coding (STBC) and space-time trellis coding (STTC), partial feedback based orthogonal space-time block coding with flexible feedback bits, and rate-less space-time block codes for 5g wireless communication systems. Section 3 focuses on lossless data compression, describing lossless image compression technique using combination methods, new results in perceptually lossless compression of hyperspectral images, lossless compression of digital mammography using base switching method, and lossless image compression based on multiple-tables arithmetic coding. Section 4 focuses on information and Shannon entropy, describing entropy as universal concept in sciences, Shannon entropy - axiomatic characterization and application, Shannon entropy in distributed scientific calculations on mobiles ad-hoc networks (MANETs), the computational theory of intelligence: information entropy, and advancing Shannon entropy for measuring diversity in systems.
SECTION 1: INFORMATION THEORY METHODS AND APPROACHES
Chapter
INFORMATION THEORY OF COGNITIVE RADIO SYSTEM
1
F. G. Awan1, N. M. Sheikh1, and M. F. Hanif2 University of Engineering and Technology Lahore, 54890 Pakistan
1
University of the Punjab Quaid-e-Azam Campus 54590, Lahore Pakistan
2
INTRODUCTION Cognitive radio (CR) carries bright prospects for very efficient utilization of spectrum in future. Since cognitive radio is still in its early days, many of its theoretical limits are yet to be explored. In particular, determination of its maximum information rates for the most general case is still an open problem. Till today, many cognitive channel models have been presented. Either achievable or maximum bit rates have been evaluated for each of these. This chapter will summarize all the existing results, makes a comparison between different channel models and draw useful conclusions. Citation: F. G. Awan, N. M. Sheikh and M. F. Hanif, “Information Theory of Cognitive Radio System”, Cognitive Radio Systems, 2009. DOI: 10.5772/7843. Copyright: © 2009 The Author(s) and IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
4
Information and Coding Theory in Computer Science
The scarcity of the radio frequency (RF) spectrum along with its severe underutilization, as suggested by various government bodies like the Federal Communications Commission (FCC) in USA and Ofcom in UK like Rafsh [1] and [2] has triggered immense research activity on the concept of CR all over the world. Of the many facets that need to be dealt with, the information theoretic modeling of CR is of core importance, as it helps predict the fundamental limit of its maximum reliable data transmission. The information theoretic model proposed in [3] represents the real world scenario that the CR will have to encounter in the presence of primary user (PU) devices. Authors in [3] characterize the CR system as an interference channel with degraded message sets (IC-DMS), since the spectrum sensing nature of the CR may enable its transmitter (TX) to know PU’s message provided the PU is in close proximity of the CR. Elegantly using a combination of rate-splitting [4] and Gel’fand Pinsker (GP) [5] coding, [3] has given an achievable rate region of the so called CR-channel or IC-DMS. Further, in [3] time sharing is performed between the two extreme cases when either the CR dedicates zero power (“highly polite”) or full power (“highly rude”) to its message. A complete review of information theoretic studies can be found in [6] and [7]. This chapter then discusses outer-bounds to the individual rates and the conditions under which these bounds become tight for the symmetric Gaussian CR channel in the low interference gain regime. The CR transmitter is assumed to use dirty paper coding while deriving the outer-bounds. The capacity of the CR channel in the low interference scenario is known when the CR employs “polite” approach by devoting some portion of its power to transmit PU’s message that will help calculating quality of service for the CR users. Then, we will focus on the scenario when the CR goes for the “rude” approach i.e., does not relay PU’s message and tries to maximize its own rates only. It will be derived that when both CR and the PU operate in low interference gain regime, then treating interference as additive noise at the PU receiver and doing dirty paper coding at the CR is nearly optimal.
COGNITIVE RADIO NETWORK PARADIGMS Since its introduction in [8], the definition of cognitive radio has evolved over the years. Consequently, different interpretations of cognitive radio and different visions for its future exist today. In this section we describe a few communication models that have been proposed for cognitive radio. We
Information Theory of Cognitive Radio System
5
broadly classify them into overlay or known interference models, underlay or interference avoidance models.
Underlay Paradigm The underlay paradigm encompasses techniques that allow communication by the cognitive radio assuming it has knowledge of the interference caused by its transmitter to the receivers of all noncognitive users [7]. In this setting the cognitive radio is often called a secondary user which cannot significantly interfere with the communication of existing (typically licensed) users, who are referred to as primary users. Specifically, the underlay paradigm mandates that concurrent noncognitive and cognitive transmissions may occur only if the interference generated by the cognitive devices at the noncognitive receivers is below some acceptable threshold. The interference constraint for the noncognitive users may be met by using multiple antennas to guide the cognitive signals away from the noncognitive receivers, or by using a wide bandwidth over which the cognitive signal can be spread below the noise floor, then despread at the cognitive receiver. The latter technique is the basis of both spread spectrum and ultrawideband (UWB) communications. The interference caused by a cognitive transmitter to a noncognitive receiver can be approximated via reciprocity if the cognitive transmitter can overhear a transmission from the cognitive receiver’s location. Alternatively, the cognitive transmitter can be very conservative in its output power to ensure that its signal remains below the prescribed interference threshold. In this case, since the interference constraints in underlay systems are typically quite restrictive, this limits the cognitive users to short range communications. While the underlay paradigm is most common in the licensed spectrum (e.g. UWB underlays many licensed spectral bands), it can also be used in unlicensed bands to provide different classes of service to different users.
Overlay Paradigm The enabling premise for overlay systems is that the cognitive transmitter has knowledge of the noncognitive users’ codebooks and its messages as well [7]. The codebook information could be obtained, for example, if the noncognitive users follow a uniform standard for communication based on a publicized codebook. Alternatively, they could broadcast their codebooks periodically. A noncognitive user message might be obtained by decoding
6
Information and Coding Theory in Computer Science
the message at the cognitive receiver. However, the overlay model assumes the noncognitive message is known at the cognitive transmitter when the noncognitive user begins its transmission. While this is impractical for an initial transmission, the assumption holds for a message retransmission where the cognitive user hears the first transmission and decodes it, while the intended receiver cannot decode the initial transmission due to fading or interference. Alternatively, the noncognitive user may send its message to the cognitive user (assumed to be close by) prior to its transmission. Knowledge of a noncognitive user’s message and/or codebook can be exploited in a variety of ways to either cancel or mitigate the interference seen at the cognitive and noncognitive receivers. On the one hand, this information can be used to completely cancel the interference due to the noncognitive signals at the cognitive receiver by sophisticated techniques like dirty paper coding [9]. On the other hand, the cognitive users can utilize this knowledge and assign part of their power for their own communication and the remainder of the power to assist (relay) the noncognitive transmissions. By careful choice of the power split, the increase in the noncognitive user’s signal-tonoise power ratio (SNR) due to the assistance from cognitive relaying can be exactly offset by the decrease in the noncognitive user’s SNR due to the interference caused by the remainder of the cognitive user’s transmit power used for its own communication. This guarantees that the noncognitive user’s rate remains unchanged while the cognitive user allocates part of its power for its own transmissions. Note that the overlay paradigm can be applied to either licensed or unlicensed band communications. In licensed bands, cognitive users would be allowed to share the band with the licensed users since they would not interfere with, and might even improve, their communications. In unlicensed bands cognitive users would enable a higher spectral efficiency by exploiting message and codebook knowledge to reduce interference.
Interweave Paradigm The ‘interweave’ paradigm is based on the idea of opportunistic communication, and was the original motivation for cognitive radio [10]. The idea came about after studies conducted by the FCC [8] and industry [2] showed that a major part of the spectrum is not utilized most of the time. In other words, there exist temporary space-time frequency voids, referred to as spectrum holes, that are not in constant use in both the licensed and unlicensed bands.
Information Theory of Cognitive Radio System
7
These gaps change with time and geographic location, and can be exploited by cognitive users for their communication. Thus, the utilization of spectrum is improved by opportunistic frequency reuse over the spectrum holes. The interweave technique requires knowledge of the activity information of the noncognitive (licensed or unlicensed) users in the spectrum. One could also consider that all the users in a given band are cognitive, but existing users become primary users, and new users become secondary users that cannot interfere with communications already taking place between existing users. To summarize, an interweave cognitive radio is an intelligent wireless communication system that periodically monitors the radio spectrum, intelligently detects occupancy in the different parts of the spectrum and then opportunistically communicates over spectrum holes with minimal interference to the active users. For a fascinating motivation and discussion of the signal processing challenges faced in interweave cognitive radio, we refer the reader to [11]. Table 1 [12] summarizes the differences between the underlay, overlay and interweave cognitive radio approaches. While underlay and overlay techniques permit concurrent cognitive and noncognitive communication, avoiding simultaneous transmissions with noncognitive or existing users is the main goal in the interweave technique. We also point out that the cognitive radio approaches require different amounts of side information: underlay systems require knowledge of the interference caused by the cognitive transmitter to the noncognitive receiver(s), interweave systems require considerable side information about the noncognitive or existing user activity (which can be obtained from robust primary user sensing) and overlay systems require a large amount of side information (non-causal knowledge of the noncognitive user’s codebook and possibly its message). Apart from device level power limits, the cognitive user’s transmit power in the underlay and interweave approaches is decided by the interference constraint and range of sensing, respectively. While underlay, overlay and interweave are three distinct approaches to cognitive radio, hybrid schemes can also be constructed that combine the advantages of different approaches. For example, the overlay and interweave approaches are combined in [7]. Before launching into capacity results for these three cognitive radio networks, we will first review capacity results for the interference channel. Since cognitive radio networks are based on the notion of minimal interference, the interference channel provides a fundamental building block to the capacity as well as encoding and decoding strategies for these networks.
Information and Coding Theory in Computer Science 8
Comparison 1.
radio
Table of underlay, overlay and interweave cognitive tech
niques.
INTERFERENCE-MITIGATING COGNITIVE
BEHAVIOR: THE CONGNITIVE RADIO CHANNEL
Natasha’s
This This discussion is has been taken from paper. consideration
is simplest possible scenario in which a cognitive radio could be employed.
Assume there exists a primary transmitter and receiver (S1 — R1), as
pair
as transmitter well the cognitive secondary and receiver pair (S2 — R2). As
shown in Fig. 1.1, there are three possibilities for transmitter cooperation in these two point-to-point channels. We have chosen to focus on transmitter
because such cooperation is often more insightful and general cooperation than receiver-side cooperation [12, 13]. Thus assume that each receiver decodes independently. Transmitter cooperation in this figure is denoted by a directed double line. These three channels are simple examples of the cognitive decomposition of wireless networks seen in [14]. The three possible types of transmitter cooperation in this simplified scenario are:
• Competitive www.intechopen.com
•
behavior: The two transmitters transmit independent messages. There is no cooperation in sending the messages, and thus the two users compete for the channel. This is the same channel as the 2 sender, 2 receiver interference channel [14, 15]. Cognitive behavior: Asymmetric cooperation is possible between the transmitters. This asymmetric cooperation is a result of S2 knowing S1’s message, but not vice-versa. As a first step, we idealize the concept of message knowledge: whenever the cognitive node S2 is able to hear and decode the message of the primary node S1, we assume it has full a priori knowledge. This
Information Theory of Cognitive Radio System
9
is called the genie assumption, as these messages could have been given to the appropriate transmitters by a genie. The one way double arrow indicates that S2 knows S1’s message but not vice versa. This is the simplest form of asymmetric non-causal cooperation at the transmitters. Usage of the term cognitive behavior is to emphasize the need for S2 to be a “smart” device capable of altering its transmission strategy according to the message of the primary user. We can motivate considering asymmetric side information in practice in three ways: •
Depending on the device capabilities, as well as the geometry and channel gains between the various nodes, certain cognitive nodes may be able to hear and/or obtain the messages to be transmitted by other nodes. These messages would need to be obtained in real time, and could exploit the geometric gains between cooperating transmitters relative to receivers in, for example, a 2 phase protocol [3]. • In an Automatic Repeat reQuest (ARQ) system, a cognitive transmitter, under suitable channel conditions (if it has a better channel to the primary transmitting node than the primary receiver), could decode the primary user’s transmitted message during an initial transmission attempt. In the event that the primary receiver was not able to correctly decode the message, and it must be retransmitted, the cognitive user would already have the to-betransmitted message, or asymmetric side information, at no extra cost (in terms of overhead in obtaining the message). • The authors in [16] consider a network of wireless sensors in which a sensor S2 has a better sensing capability than another sensor S1 and thus is able to sense two events, while S1 is only able to sense one. Thus, when they wish to transmit, they must do so under an asymmetric side-information assumption: sensor S2 has two messages, and the other has just one. 1. Cooperative behavior: The two transmitters know each others’ messages (two way double arrows) and can thus fully and symmetrically cooperate in their transmission. The channel pictured in Fig. 1.1 (c) may be thought of as a two antenna sender, two single antenna receivers broadcast channel [17]. Many of the classical, well known information theoretic channels fall into the categories of competitive and cooperative behavior. For more details, we
10
Information and Coding Theory in Computer Science
refer the interested reader to the cognitive network decomposition theorem of [13] and [18]. We now turn to the much less studied behavior which spans and in a sense interpolates between the symmetric cooperative and competitive behaviors. We call this behavior asymmetric cognitive behavior. In this section we will consider one example of cognitive behavior: a two sender, two receiver (with two independent messages) interference channel with asymmetric and a priori message knowledge at one of the transmitters, as shown in Fig. 1. (b). Certain asymmetric (in transmitter cooperation) channels have been considered in the literature: for example in [19], the capacity region of a multiple access channel with asymmetric cooperation between the two transmitters is computed. The authors in [20] consider a channel which could involve asymmetric transmitter cooperation, and explore the conditions under which the capacity of this channel coincides with the capacity of the channel in which both messages are decoded at both receivers. In [21, 18] the authors introduced the cognitive radio channel, which captures the most basic form of asymmetric transmitter cooperation for the interference channel. We now study the information theoretic limits of interference channels with asymmetric transmitter cooperation, or cognitive radio channels.
Figure 1. a) Competitive behavior, the interference channel. The transmitters may not cooperate. (b) Cognitive behavior, the cognitive radio channel. Asymmetric transmitter cooperation. (c) Cooperative behavior, the two antenna broadcast channel. The transmitters, but not the receivers, may fully and symmetrically cooperate.
The channel is thus expressed via following pair of equations:
Yp = X p + aX s + Z p
(1)
Ys = bX p + X s + Z s
(2)
Information Theory of Cognitive Radio System
11
While deriving the channel capacity, an assumption of low interferencegain has been made. Low interference regime corresponds to the scenario where the cognitive user is assumed to be near its own base station rather than that of primary user, which normally is the case. When applied to the interference channel in standard form, this situation corresponds to a ≤ 1. At the same time the two devices are assumed to work in an environment where the co-existence conditions exist, ensuring that cognitive radio generates no interference with the primary user in its vicinity and the primary receiver is a single user decoder. With all these assumptions, the channel is named as the cognitive (1,a,b,1) channel. The capacity R s (in bps) of the cognitive radio under the conditions existing as above is expressed in a closed form relation as: (3) where α ∈ [0,1] and its value is determined using the following arguments: ∗
To ensure that the primary user remains unconscious of the presence of the cognitive device and communicates at a rate of , the maximum achievable rate
of the primary system is found to be:
(4) Now for 0 < a < 1, using Intermediate Value Theorem, this quadratic equation in a always has a unique root in [0, 1]:
(5) It is to be noted that the above capacity expressions hold for any b∈R. For detailed proofs the reader is referred to [8]. A few important points are worth mentioning here. Since the cognitive radio knows both mp (the message to be transmitted by primary user) and ms (the message to be transmitted by the cognitive device), it generates its codeword Xns such that it also incorporates the codeword Xnp to be generated by the primary user. By doing so, the cognitive device can implement the concept of dirty paper coding that helps it mitigating the interference caused by the primary user at the cognitive receiver. Thus the cognitive device performs superposition coding as follows:
12
Information and Coding Theory in Computer Science
Where
(6) encodes ms and is generated by performing dirty paper
coding, treating as a known interference that will affect the secondary receiver. It is evident from (6) that the secondary device uses part of its power 𝛼Ps to relay the primary user’s message to the primary receiver. This relaying of message from the secondary user results in an elevated value of SNR at the primary receiver. At the same time, the secondary user’s message with power (1 - 𝛼) Ps, transmitted towards the primary receiver balances the increase in SNR and as a result the primary device remains completely oblivious to the presence of the cognitive user. This approach has been named as selfless approach in [23]. In [8], results corresponding to high interference regime a > 1 have also been presented as ancillary conclusions. But such results are of not much significance as they present the scenario of high interference caused by the secondary user which is an event with low probability of occurrence. Similar results have also been obtained in [3], [16] and [15]. The results in [15] correspond to the selfish approach of [23] and thus represent an upper bound on information rates of cognitive radio but with interference, because in this case the cognitive user does not spend any of its power in relaying the message to the primary receiver. Similarly authors in [16] have shown that, in the Gaussian noise case, their capacity region is explicitly equal to that of [8] and, numerically, to that of [23]. Very recently [24] has also extended the results of [8] to the case of Gaussian Multiple Access channel (MAC) with n cognitive users. [24] has simply used the results for a general MAC channel i.e., the achievable rate for n-users is the sum of the achievable rates of individual users. Using this, together with the result of [8], it has determined the achievable rate region of a Gaussian MAC cognitive channel.
Information Theory of Cognitive Radio System
13
Figure 2. The Gaussian interference channel in its standard form.
INTERFERENCE AVOIDING CHANNEL This channel model, as devised in [22] and [6], works on the principle of opportunistic communication i.e., the secondary communication takes place only when the licensed user is found to be idle and a spectrum hole is detected. Thus this model conforms to the basic requirement of the cognitive device not interfering with the licensed users. The secondary sender SS and receiver RS are assumed to have a circular sensing region with them being in the center. The secondary transmitter and receiver sensing regions are circular with each having radius Rr. The distance between them is d. They are further supposed to be communicating in the presence of primary users A, B and C as shown in Fig. 3.
Figure 3. The scenario for interference avoiding channel.
The cognitive sender SS can detect a spectral hole when both A and B are inactive whereas the secondary receiver RS determines this when it finds both B and C to be not involved in a communication scenario. Since
14
Information and Coding Theory in Computer Science
secondary transmitter and receiver do not have complete knowledge of primary users activity in each other’s sensing regions, the spectral activity in their respective regions corresponds to the notion of being distributed. Similarly, the primary user activity sensed by the secondary transmitterreceiver pair continues to change with time i.e., different primary users become active and inactive at different time. Thus, the spectral activity is also assumed to be dynamic. To incorporate both these features, the conceptual model of Fig. 3 is reduced to the two switch mathematical model as shown in Fig. 4.
Figure 4. The two switch model.
The two switches ss, sr are treated as binary random variables with ss, sr ∈ {0, 1}. The value of ss = 1 or sr = 1 means that there are no primary users in the sensing region of the secondary transmitter or receiver and vice versa holds if either of these two values is zero. The input X is related to the output 7 via following equation:
(7)
where N is additive white Gaussian noise (AWGN) at the secondary receiver. sr is either 1 or 0 as mentioned above. So when it is multiplied as done in (7), it simply shows whether the secondary receiver has detected the primary device or not. If ss is only known to the transmitter and sr is only available to the receiver, the situation corresponds to communication with partial side information. Determination of capacity with partial side information involves input distribution maximization that is difficult to solve. This is not done and instead a tight capacity upper and lower bound is obtained for this communication channel. A capacity upperbound is determined by assuming that the receiver knows both ss and sr whereas, the transmitter only knows ss. The expression of capacity Css* of secondary user for this case is [23]:
Information Theory of Cognitive Radio System
1 C =e 2 Rr2 π − arccos 2 Rr −λ
d2 dRr 1 − 2 4R r
2 λπ R r + log 1 Pe
15
(8)
where P is the secondary transmitter power constraint. For the capacity lowerbound, [23] uses the results of [25]. For this a genie g argument is used. It should be noted that utilization of genie concept represents the notion that either the sender or receiver is provided with some information noncausally. To determine the capacity lower bound the genie is supposed to provide some side information to the receiver. So if the genie provides some information it must have an associated entropy rate. The results in [25] suggest that the improvement in capacity due to this genie information cannot exceed the entropy rate of the genie information itself. Using this argument and that the genie provides information to the receiver every T channel uses, it is easy to establish that the capacity lower bound approaches the upper bound even for very highly dynamic environments. It is assumed that the location of primary users in the system follows Poisson point process with a density of X nodes per unit area. And that the primary user detection at the secondary transmitter and receiver is perfect. The capacity expression in (8), as given in [23], is evaluated to be:
1 C =e − λ 2 Rr2 π − arccos 2 Rr
d2 dRr 1 − 2 4R r
2 λπ R r log 1 Pe + (9)
where, again, P is the secondary power constraint.
COLLOBORATIVE COGNITIVE CHANNEL In this channel, the cognitive user is modeled as a relay, with no information of its own, working between the primary transmitter and receiver. Capacity limits for collaborative communications have been recently explored [26] that suggest sufficient conditions on the geometry and the signal path loss of the transmitting entities for which performance close to the genie bound can be achieved [Natasha’s tutorial]. Consider three nodes, source (s), relay (r) and destination (d) as illustrated in Fig. 5.
16
Information and Coding Theory in Computer Science
Figure 5. The collaborative communication channel incorporating the source s, relay r and the destination d node.
The relay node is assumed to work in half duplex mode, meaning that, it cannot receive and transmit data simultaneously. Thus the system works in two phases i.e., the listening phase and the collaborative phase. During the first phase the relay node receives data from the source node for the first n1 transmissions while for the collaborative phase the relay transmits to the destination for the remaining n - n1 transmissions, where n are the number of channel uses, in which the source node wishes to transmit the 2nR messages. Taking the channels as AWGN, considering X and U as column vectors representing the transmission from the source and relay node respectively and denoting by Y and Z the received messages at the relay and destination respectively, the listening phase is described via following equations: (10) (11) where Nz and Ny represent complex AWGN with variance 1/2, Hs is the fading matrix between the source and destination nodes and Hr is the fading matrix between the source and relay nodes. In the collaborating phase: (12) where Hc is the channel matrix that contains Hs as a submatrix. It is well known that a Multiple Input Multiple Output (MIMO) system with Gaussian codebook and rate R bits/channel can reliably communicate over any channel with transfer matrix H such that where I denotes the identity matrix and H† represents the conjugate transpose of H. Before providing an explicit formula for rates, a little explanation is in order here.
Information Theory of Cognitive Radio System
17
During the first phase, relay listens for an amount of time n1 and since it knows Hr, it results in nR ≤ n1C(Hr). On the other hand, the destination node receives information at the rate of C(Hs) bits/channel during the first phase and at the rate of C(Hc) bits/channel during the second phase. Thus it may reliably decode the message provided that nR ≤ n1C(Hs)+(n-n1)C(Hc). Taking limit n^> the ratio n1/n tends to a fractionf such that the code of rate R for the set of channels (Hr,Hc) satisfies:
(13)
(14) where Similarly [23] presents a corollary which suggests that if the cognitive user has no message of its own, it can aid the primary system because it knows the primary’s message, resulting in an improvement of primary user’s data rates. New outer bounds to the individual rates and the conditions under which these bounds become tight for the symmetric Gaussian cognitive radio (CR) channel in the low interference gain regime are presented in [27]. The CR transmitter is assumed to use dirty paper coding while deriving the outer bounds. The capacity of the CR channel in the low interference scenario is known when the CR employs “polite” approach by devoting some portion of its power to transmit primary user’s (PU’s) message. However, this approach does not guarantee any quality of service for the CR users. Hence, focus will be on the scenario when the CR goes for the “rude” approach, does not relay PU’s message and tries to maximize its own rates only. It is shown that when both CR and the PU operate in low interference gain regime, then treating interference as additive noise at the PU receiver and doing dirty paper coding at the CR is nearly optimal.
COMPARSIONS The channel model presented in the first section uses complex coding techniques to mitigate channel interference that naturally results in higher throughput than that of the channel model of the second section. But there is one serious constraint in the first channel model, the information throughput of the cognitive user is highly dependent upon the distance between the primary transmitter and the cognitive transmitter. If this distance is large, secondary transmitter spends considerable time in obtaining the primary
18
Information and Coding Theory in Computer Science
user’s message. After obtaining and decoding the message, the cognitive device dirty paper codes it and sends it to its receiver. This message transfer from the primary to the cognitive transmitter results in lower number of bits transmitted per second by the cognitive radio and hence results in reduced data rates. The capacity of the two switch model is independent of the distance between the two transmitters as the secondary transmitter refrains from sending data when it finds the primary user busy. Thus the benefit of the channel interference knowledge in the first channel model quickly disappears as the distance between the primary and secondary transmitter tends to increase. Accurate estimation of the primary system’s message and the interference channel coefficient needed in the first channel model for dirty paper coding purposes is itself a problem. Inaccurate estimation of these parameters will result in a decrease in the rates of the cognitive radio because the dirty paper code, based on the knowledge of primary user’s message and channel interference coefficient, will not be able to completely mitigate the primary user’s interference. At the same time determination of channel interference coefficient requires a handshaking protocol to be devised which is a serious overhead and may result in poor performance. On the other hand, the interference avoiding channel cannot overcome the hidden terminal problem. This problem naturally arises as the cognitive user would not be able to detect the presence of distant primary devices. The degraded signals from the primary users due to multipath fading and shadowing effects would further aggravate this problem. This requires the secondary systems to be equipped with extremely sensitive detectors. But very sensitive detectors have prohibited long sensing times. Thus a protocol needs to be devised by virtue of which the sensed information is shared between the cognitive devices. Finally, the role of CR as a relay in the third channel model restricts the real usefulness of the concept of dynamic spectrum utilization. Although limited, yet significant gains in data rates of the existing licensed user system can be obtained by restricting a CR device to relays of PU’s message only.
Information Theory of Cognitive Radio System
19
REFERENCES 1.
2. 3.
4.
5.
6.
7.
8.
9. 10.
11. 12.
13.
Federal Communications Commission Spectrum Policy Task Force, “Report of the Spectrum Efficiency Working Group,” Technical Report 02-135, no. November, 2002. Shared Spectrum Company, “Comprehensive Spectrum occupancy measurements over six different locations,”August 2005. N. Devroye, P. Mitran, and V. Tarokh, “Achievable rates in cognitive radio channels,” IEEE Transactions on Information Theory, vol. 52, no. 5, pp. 1813-1827, May 2006. T. Han and K. Kobayashi, “A new achievable rate region for the interference channel,” IEEE Transactions on Information Theory, vol. 27, no. 1, pp. 49-60,1981. S. I. Gel’fand and M. S. Pinsker, “Coding for channel with random parameters,” Problems of Control and Information Theory, vol. 9, no. 1, pp. 19-31, 1980. F. G. Awan and M. F. Hanif, “A Unified View of Information-Theoretic Aspects of Cognitive Radio,” in Proc. International Conference on Information Technology: New Generations, pp. 327-331, April 2008. S. Srinivasa and S. A. Jafar, “The throughput potential of cognitive radio - a theoretical perspective,” IEEE Communications Magazine, vol. 45, no. 5, pp. 73-79,2007. Jovicic and P. Viswanath, “Cognitive radio: An information theoretic perspective,” 2006 IEEE International Symposium on Information Theory, July 2006. M. H. M. Costa, “Writing on dirty paper,” IEEE Transactions on Information Theory, vol. 29, no. 3, pp. 439-441, May 1983. Joseph Mitola, “Cognitive Radio: An Integrated Agent Architecture for Software Defined Radio,” PhD Dissertation, KTH, Stockholm, Sweden, December 2000. Paul J. Kolodzy, “Cognitive Radio Fundamentals,” SDR Forum, Singapore, April 2005. C.T.K.Ng and A. Goldsmith, “Capacity gain from transmitter and receiver cooperation,” in Proc. IEEE Int. Symp. Inf. Theory, Sept. 2005. N. Devroye, P. Mitran, and V. Tarokh, “Cognitive decomposition of wireless networks,” in Proceedings of CROWNCOM, Mar. 2006.
20
Information and Coding Theory in Computer Science
14. Carleial, “Interference channels,” IEEE Trans. Inf. Theory, vol. IT-24, no. 1, pp. 60-70, Jan. 1978. 15. N. Devroye, P. Mitran, and V. Tarokh, “Achievable rates in cognitive radio channels,” IEEE Trans. Inf. Theory, vol. 52, no. 5, pp. 18131827, May 2006. 16. W. Wu, S. Vishwanath, and A. Arapostathis, “On the capacity of the interference channel with degraded message sets,” IEEE Trans. Inf. Theory, June 2006. 17. H. Weingarten, Y. Steinberg, and S. Shamai, “The capacity region of the Gaussian MIMO broadcast channel,” IEEE Trans. Inf. Theory, vol. 52, no. 9, pp. 3936-3964, Sept. 2006. 18. N. Devroye, P. Mitran, and V. Tarokh, “Achievable rates in cognitive networks,” in 2005 IEEE International Symposium on Information Theory, Sept. 2005. 19. E. C. van der Meulen, “Three-terminal communication channels,” Adv. Appl. Prob., vol. 3, pp. 120-154, 1971. 20. I.Maric, R. Yates, and G. Kramer, “The strong interference channel with unidirectional cooperation,” in Information Theory and Applications ITA Inaugural Workshop, Feb. 2006. 21. N. Devroye, P. Mitran, and V. Tarokh, “Achievable rates in cognitive radio channels,” in 39th Annual Conf. on Information Sciences and Systems (CISS), Mar. 2005. 22. S. Jafar and S. Srinivasa, “Capacity limits of cognitive radio with distributed dynamic spectral activity,” in Proc. of ICC, June 2006. 23. S. Srinivasa and S. A. Jafar, “The throughput potential of cognitive radio: A theoretical perspective,” in Fortieth Asilomar Conference on Signals, Systems and Computers, 2006., Oct. 2006. 24. P. Cheng, G. Yu, Z. Zhang, H.-H. Chen, and P. Qiu, “On the achievable rate region of gaussian cognitive multiple access channel,” IEEE Communications Letters, vol. 11, no.5, pp. 384-386, May. 2007. 25. S. A. Jafar, “Capacity with causal and non-causal side information-a unified view,” IEEE Transactions on Information Theory, vol. 52, no. 12, pp. 5468-5474, Dec. 2006. 26. P. Mitran, H. Ochiai, and V. Tarokh, “Space-time diversity enhancements using collaborative communication,” IEEE Transactions on Information Theory, vol. 51, no.6, pp. 2041-2057, June 2005.
Information Theory of Cognitive Radio System
21
27. F. G. Awan, N. M. Sheikh and F. H. Muhammad, “Outer Bounds for the Symmetric Gaussian Cognitive Radio Channel with DPC Encoded Cognitive Transmitter,” in Proc. The World Congress on Engineering 2009 (WCE 2009) by the International Association of Engineers (IAENG), London, UK, July 2009.
Chapter
INFORMATION THEORY AND ENTROPIES FOR QUANTIZED OPTICAL WAVES IN COMPLEX TIME-VARYING MEDIA
2
Jeong Ryeol Choi Department of Radiologic Technology, Daegu Health College, Yeongsong-ro 15, Bukgu, Daegu 702-722, Republic of Korea
INTRODUCTION An important physical intuition that led to the Copenhagen interpretation of quantum mechanics is the Heisenberg uncertainty relation (HUR) which is a consequence of the noncommutativity between two conjugate observables. Our ability of observation is intrinsically limited by the HUR, quantifying an amount of inevitable and uncontrollable disturbance on measurements (Ozawa, 2004).
Citation: Jeong Ryeol Choi, “Information Theory and Entropies for Quantized Optical Waves in Complex Time-Varying Media”, Open Systems, Entanglement and Quantum Optics, 2013, http://dx.doi.org/10.5772/56529. Copyright: © 2013 The Author(s) and IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
24
Information and Coding Theory in Computer Science
Though the HUR is one of the most fundamental results of the whole quantum mechanics, some drawbacks concerning its quantitative formulation are reported. As the expectation value of the commutator between two arbitrary noncommuting operators, the value of the HUR is not a fixed lower bound and varies depending on quantum state (Deutsch, 1983). Moreover, in some cases, the ordinary measure of uncertainty, i.e., the variance of canonical variables, based on the Heisenberg-type formulation is divergent (Abe et al., 2002). These shortcommings are highly nontrivial issues in the context of information sciences. Thereby, the theory of informational entropy is proposed as an alternate optimal measure of uncertainty. The adequacy of the entropic uncertainty relations (EURs) as an uncertainty measure is owing to the fact that they only regard the probabilities of the different outcomes of a measurement, whereas the HUR the variances of the measured values themselves (Werner, 2004). According to Khinchin’s axioms (Ash, 1990) for the requirements of common information measures, information measures should be dependent exclusively on a probability distribution (Pennini & Plastino, 2007). Thank to active research and technological progress associated with quantum information theory (Nielsen & Chuang, 2000; Choi et al., 2011), the entropic uncertainty band now became a new concept in quantum physics. Information theory proposed by Shannon (Shannon, 1948a; Shannon, 1948b) is important as information-theoretic uncertainty measures in quantum physics but even in other areas such as signal and/or image processing. Essential unity of overall statistical information for a system can be demonstrated from the Shannon information, enabling us to know how information could be quantified with absolute precision. Another good measure of uncertainty or randomness is the Fisher information (Fisher, 1925) which appears as the basic ingredient in bounding entropy production. The Fisher information is a measure of accuracy in statistical theory and useful to estimate ultimate limits of quantum measurements. Recently, quantum information theory besides the fundamental quantum optics has aroused great interest due to its potential applicability in three sub-regions which are quantum computation, quantum communication, and quantum cryptography. Information theory has contributed to the development of the modern quantum computation (Nielsen & Chuang, 2000) and became a cornerstone in quantum mechanics. A remarkable ability of
Information Theory and Entropies for Quantized Optical Waves in ...
25
quantum computers is that they can carry out certain computational tasks exponentially faster than classical computers utilizing the entanglement and superposition principle. Stimulated by these recent trends, this chapter is devoted to the study of information theory for optical waves in complex time-varying media with emphasis on the quantal information measures and informational entropies. Information theoretic uncertainty relations and the information measures of Shannon and Fisher will be managed. The EUR of the system will also be treated, quantifying its physically allowed minimum value using the invariant operator theory established by Lewis and Riesenfeld (Lewis, 1967; Lewis & Riesenfeld, 1969). Invariant operator theory is crucial for studying quantum properties of complicated time-varying systems, since it, in general, gives exact quantum solutions for a system described by time-dependent Hamiltonian so far as its counterpart classical solutions are known.
QUANTUM OPTICAL WAVES IN TIME-VARYING MEDIA Let us consider optical waves propagating through a linear medium that has time-dependent electromagnetic parameters. Electromagnetic properties of the medium are in principle determined by three electromagnetic parameters such as electric permittivity ϵ, magnetic permeability µ, and conductivity σ. If one or more parameters among them vary with time, the medium is designated as a time-varying one. Coulomb gauge will be taken for convenience under the assumption that the medium have no net charge distributions. Then the scalar potential vanishes and, consequently, the vector potential is the only potential needed to consider when we develop quantum theory of electromagnetic wave phenomena. Regarding this fact, the quantum properties of optical waves in time-varying media are described in detail in Refs. (Choi & Yeon, 2005; Choi, 2012; Choi et al, 2012) and they will be briefly surveyed in this section as a preliminary step for the study of information theory. According to separation of variables method, it is favorable to put vector potential in the form (1) Then, considering the fact that the fields and current density obey the relations, D = ϵ(t)E, B = µ(t)H, and J = σ(t)E, in linear media, we derive
Information and Coding Theory in Computer Science
26
equation of motion for ql from Maxwell equations as (Choi, 2012; Choi, 2010a; Pedrosa & Rosas, 2009)
(2)
Here, the angular frequency (natural frequency) is given by ωl(t) = c(t)kl where c(t) is the speed of light in media and kl (= |kl|) is the wave number. Because electromagnetic parameters vary with time, c(t) can be represented as a time-dependent form, i.e., . However, kl (= |kl|) is constant since it does not affected by time-variance of the parameters. The formula of mode function ul (r) depends on the geometrical boundary condition in media (Choi & Yeon, 2005). For example, it is given by (ν = 1, 2) for the fields propagating under the periodic boundary condition, where V is the volume of the space, is a unit vector in the direction of polarization designated by ν. From
Hamilton’s equations of motion, and , the classical Hamiltonian that gives Eq. (3) can be easily established. Then, by converting canonical variables, ql and pl, into quantum operators, and , from the resultant classical Hamiltonian, we have the quantum Hamiltonian such that (Choi et al., 2012) where
(3)
is an arbitrary time function, and (4) (5)
The complete Hamiltonian is obtained by summing all individual Hamiltonians: From now on, let us treat the wave of a particular mode and drop the under subscript l for convenience. It is well known that quantum problems of optical waves in nonstationary media are described in terms of classical solutions of the system. Some researchers use real classical solutions (Choi, 2012; Pedrosa & Rosas, 2009) and others imaginary solutions (Angelow & Trifonov, 2010; Malkin et al., 1970). In this chapter, real solutions of classical
Information Theory and Entropies for Quantized Optical Waves in ...
27
equation of motion for q will be considered. Since Eq. (2) is a second order differential equation, there are two linearly independent classical solutions. Let us denote them as s1(t) and s2(t), respectively. Then, we can define an Wronskian of the form (6) This will be used at later time, considering only the case that Ω > 0 for convenience. When we study quantum problem of a system that is described by a time-dependent Hamiltonian such as Eq. (3), it is very convenient to introduce an invariant operator of the system. Such idea (invariant operator method) is firstly devised by Lewis and Riesenfeld (Lewis, 1967; Lewis & Riesenfeld, 1969) in a time-dependent harmonic oscillator as mentioned in the introductory part and now became one of potential tools for investigating quantum characteristics of time-dependent Hamiltonian systems. By solving the Liouville-von Neumann equation of the form (7) we obtain the invariant operator of the system as (Choi, 2004) (8) where Ω is chosen to be positive from Eq. (6) and and and creation operators, respectively, that are given by
are annihilation
(9)
(10)
with (11) Since the system is somewhat complicate, let us develop our theory with b(t) = 0 from now on. Then, Eq. (5) just reduces to . Since the formula of Eq. (8) is very similar to the familiar Hamiltonian of the simple harmonic oscillator, we can solve its eigenvalue equation via well known
28
Information and Coding Theory in Computer Science
conventional method. The zero-point eigenstate from derived by acting 2012)
. Once on
is obtained
is obtained, nth eigenstates are also times. Hence we finally have (Choi,
(12) where
and Hn are Hermite polynomials.
According to the theory of Lewis-Riesenfeld invariant, the wave functions that satisfy the Schrödinger equation are given in terms of ϕn(q, t): (13)
where θn(t) are time-dependent phases of the wave functions. By substituting Eq. (13) with Eq. (3) into the Schrödinger equation, we derive the phases to be θn(t) = − (n + 1/2) η(t) where (Choi, 2012) (14) The probability densities in both q and p spaces are given by the square of wave functions, i.e., ρn(q) = |ψn(q, t)|2 and , respectively. From Eq. (13) and its Fourier component, we see that (15)
where
(16)
(17) The wave functions and the probability densities derived here will be used in subsequent sections in order to develop the information theory of the system.
Information Theory and Entropies for Quantized Optical Waves in ...
29
INFORMATION MEASURES FOR THERMALIZED QUANTUM OPTICAL FIELDS Informations of a physical system can be obtained from the statistical analysis of results of a measurement performed on it. There are two important information measures. One is the Shannon information and the other is the Fisher information. The Shannon information is also called as the Wehrl entropy in some literatures (Wehrl, 1979; Pennini & Plastino, 2004) and suitable for measuring uncertainties relevant to both quantum and thermal effects whereas quantum effect is overlooked in the concept of ordinary entropy. The Fisher information which is also well known in the field of information theory provides the extreme physical information through a potential variational principle. To manage these informations, we start from the establishment of density operator for the electromagnetic field equilibrated with its environment of temperature T. Density operator of the system obeys the Liouville-von Neumann equation such that (Choi et al., 2011) (18) Considering the fact that invariant operator given in Eq. (8) is also established via the Liouville-von Neumann equation, we can easily construct density operator as a function of the invariant operator. This implies that the Hamiltonian in the density operator of the simple harmonic oscillator should be replaced by a function of the invariant operator where is inserted for the purpose of dimensional consideration. Thus we have the density operator in the form (19) where W = y(0)Ω, Z is a partition function, β = kbT, and kb is Boltzmann’s constant. If we consider Fock state expression, the above equation can be expand to be (20) while the partition function becomes
30
Information and Coding Theory in Computer Science
(21) If we consider that the coherent state is the most classical-like quantum state, a semiclassical distribution function associated with the coherent state may be useful for the description of information measures. As is well known, the coherent state
is obtained by solving the eigenvalue equation of :
(22) Now we introduce the semiclassical distribution function related with the density operator via the equation (Anderson & Halliwell, 1993) (23) This is sometimes referred to as the Husimi distribution function (Husimi, 1940) and appears frequently in the study relevant to the Wigner distribution function for thermalized quantum systems. The Wigner distribution function is regarded as a qusi-distribution function because some parts of it are not positive but negative. In spite of the ambiguity in interpreting this negative value as a distribution function, the Wigner distribution function meets all requirements of both quantum and statistical mechanics, i.e., it gives correct expectation values of quantum mechanical observables. In fact, the Husimi distribution function can also be constructed from the Wigner distribution function through a mathematical procedure known as “Gaussian smearing” (Anderson & Halliwell, 1993). Since this smearing washes out the negative part, the negativity problem is resolved by the Hisimi’s work. But it is interesting to note that new drawbacks are raised in that case, which state that the probabilities of different samplings of q and p, relevant to the Husimi distribution function, cannot be represented by mutually exclusive states due to the lack of orthogonality of coherent states used for example in Eq. (23) (Anderson & Halliwell, 1993; Nasiri, 2005). This weak point is however almost fully negligible in the actual study of the system, allowing us to use the Husimi distribution function as a powerful means in the realm of semiclassical statistical physics. Notice that coherent state can be rewritten in the form (24)
Information Theory and Entropies for Quantized Optical Waves in ...
where is the displacement operator of the form little algebra leads to
31
.A
(25) Hence, the coherent state is expanded in terms of Fock state wave functions. Now using Eqs. (20) and (25), we can evaluate Eq. (23) to be
(26) Here, we used a well known relation in photon statistics, which is
(27) As you can see, the Husimi distribution function is strongly related to the coherent state and it provides necessary concepts for establishment of both the Shannon and the Fisher informations. If we consider Eqs. (9) and (22), α (with b(t) = 0) can be written as (28) Hence there are innumerable number of α-samples that correspond to different pair of (q,p), which need to be examined for measurement. A natural measure of uncertainty in information theory is the Shannon information as mentioned earlier. The Shannon information is defined as (Anderson & Halliwell, 1993) (29) where d α = dRe(α) dIm(α). With the use of Eq. (26), we easily derive it: 2
(30) This is independent of time and, in the limiting case of fields propagating in time-independent media that have no conductivity, W becomes natural
32
Information and Coding Theory in Computer Science
frequency of light, leading this formula to correspond to that of the simple harmonic oscillator (Pennini & Plastino, 2004). This approaches to IS ≃ ln[kbT/(ħW)] for sufficiently high temperature, yielding the dominance of the thermal fluctuation and, consequently, permitting the quantum fluctuation to be neglected. On the other hand, as T decreases toward absolute zero, the Shannon information is always larger than unity (IS ≥ 1). This condition is known as the Lieb-Wehrl condition because it is conjectured by Wehrl (Wehrl, 1979) and proved by Lieb (Lieb, 1978). From this we can see that IS has a lower bound which is connected with pure quantum effects. Therefore, while usual entropy is suitable for a measure of uncertainty originated only from thermal fluctuation, IS plays more universal uncertainty measure covering both thermal and quantum regimes (Anderson & Halliwell, 1993). Other potential measures of information are the Fisher informations which enable us to assess intrinsic accuracy in the statistical estimation theory. Let us consider a system described by the stochastic variable α = α(x) with a physical parameter x. When we describe a measurement of α in order to infer x from the measurement, it is useful to introduce the coherentstate-related Fisher information that is expressed in the form (31) In fact, there are many different scenarios of this information depending on the choice of x. For a more general definition of the Fisher information, you can refer to Ref. (Pennini & Plastino, 2004). If we take and x = β, the Fisher’s information measure can be written as (Pennini & Plastino, 2004) (32) Since β is the parameter to be estimated here, Iβ reflects the change of according to the variation of temperature. A straightforward calculation yields (33) This is independent of time and of course agree, in the limit of the simple harmonic oscillator case, to the well-known formula of Pennini and Plastino (Pennini & Plastino, 2004). Hence, the change of electromagnetic param-
Information Theory and Entropies for Quantized Optical Waves in ...
33
eters with time does not affect to the value of β. IF,β reduces to zero at absolute zero-temperature (T → 0), leading to agreement with the third law of thermodynamics (Pennini & Plastino, 2007). Another typical form of the Fisher informations worth to be concerned is the one obtained with the choice of and x = {q, p} (Pennini et al, 1998):
(34) where σqq,α and σpp,α are variances of q and p in the Glauber coherent state, respectively. Notice that σqq,α and σpp,α are inserted here in order to consider the weight of two independent terms in Eq. (34). As you can see, this information is jointly determined by means of canonical variables q and p. To evaluate this, we need (35) (36) It may be favorable to represent in terms of at this stage. They are easily obtained form the inverse representation of Eqs. (9) and (10) to be (37)
(38)
Thus with the use of these, Eqs. (35) and (36) become (39)
to
(40) A little evaluation after substituting these quantities into Eq. (34) leads
(41)
34
Information and Coding Theory in Computer Science
Notice that this varies depending on time. In case that the time dependence of every electromagnetic parameters vanishes and σ → 0, this reduces to that of the simple harmonic oscillator limit, where natural frequency ω is constant, which exactly agrees with the result of Pennini and Plastino (Pennini & Plastino, 2004).
HUSIMI UNCERTAINTIES AND UNCERTAINTY RELATIONS Uncertainty principle is one of intrinsic features of quantum mechanics, which distinguishes it from classical mechanics. Aside form conventional procedure to obtain uncertainty relation, it may be instructive to compute a somewhat different uncertainty relation for optical waves through a complete mathematical description of the Husimi distribution function. Bearing in mind this, let us see the uncertainty of canonical variables, associated with information measures, and their corresponding uncertainty relation. The definition of uncertainties suitable for this purpose are (42) (43) (44) where with an arbitrary operator is the expectation value relevant to the Husimi distribution function and can be evaluated from (45) While orders give
and
for l = 1, the rigorous algebra for higher
(46) (47) (48) where
Information Theory and Entropies for Quantized Optical Waves in ...
35
(49) Thus we readily have (50) (51) Like other types of uncertainties in physics, the relationship between σµ,qq and σµ,pp is rather unique, i.e., if one of them become large the other become small, and there is nothing whatever one can do about it. We can represent the uncertainty product σµ and the generalized uncertainty product Σµ in the form (52) (53) Through the use of Eqs. (50) and (51), we get (54) (55) Notice that σµ varies depending on time, while Σµ does not and is more simple form. The relationship between σµ and usual thermal uncertainty relations σ obtained using the method of thermofield dynamics (Choi, 2010b; Leplae et al., 1974) are given by σµ = r(β)σ where .
ENTROPIES AND ENTROPIC UNCERTAINTY RELATIONS The HUR is employed in many statistical and physical analyses of optical data measured from experiments. This is a mathematical outcome of the nonlocal Fourier analysis (Bohr, 1928) and we can simply represent it by multiplying standard deviations of q and p together. From measurements, simultaneous prediction of q and p with high precision for both beyond certain limits levied by quantum mechanics is impossible according to the Heisenberg uncertainty principle. It is plausible to use the HUR as a measure of the
36
Information and Coding Theory in Computer Science
spread when the curve of distribution involves only a simple hump such as the case of Gaussian type. However, the HUR is known to be inadequate when the distribution of the statistical data is somewhat complicated or reveals two or more humps (Bialynicki-Birula, 1984; Majernik & Richterek, 1997). For this reason, the EUR is suggested as an alternative to the HUR by Bialynicki-Birula and Mycielski (Biatynicki-Birula & Mycielski, 1975). To study the EUR, we start from entropies of q and p associated with the Shannon’s information theory: (56) (57) By executing some algebra after inserting Eqs. (15) and (16) into the above equations, we get (58)
(59)
where E(Hn) are entropies of Hermite polynomials of the form (Dehesa et al, 2001) (60) By adding Eqs. (58) and (59) together, (61) we obtain the alternative uncertainty relation, so-called the EUR such that
(62)
Information Theory and Entropies for Quantized Optical Waves in ...
37
The EUR is always larger than or at least equal to a minimum value known as the BBM (Bialynicki-Birula and Mycielski) inequality: UE ≥ 1 + ln π ≃ 2.14473 (Haldar & Chakrabarti, 2012). Of course, Eq. (62) also satisfy this inequality. The BBM inequality tells us a lower bound of the uncertainty relation and the equality holds for the case of the simple harmonic oscillation of fields with n = 0. The EUR with evolution in time, as well as information entropy itself, is a potential tool to demonstrate the effects of time dependence of electromagnetic parameters on the evolution of the system and, consequently, it deserves a special interest. The genera1 form of the EUR can also be extended to not only other pairs of observables such as photon number and phase but also more higher dimensional systems even up to infinite dimensions.
APPLICATION TO A SPECIAL SYSTEM The application of the theory developed in the previous sections to a particular system may provide a better understanding of information theory for the system to us. Let us see the case that and (63) where µ0[= µ(0)], h, and ω0 are real constants and |h| ≪ 1. Then, the classical solutions of Eq. (2) are given by (64) (65) where s0 is a real constant, Ceν and Seν are Mathieu functions of the cosine and the sine elliptics, respectively, and . Figure 1 is information measures for this system, plotted as a function of time. Whereas IS and IF,β do not vary with time, IF,{q,p} oscillates as time goes by.
38
Information and Coding Theory in Computer Science
Figure 1. The time evolution of IF,{q,p}. The values of (k, h) used here are (1, 0.1) for solid red line, (3, 0.1) for long dashed blue line, and (3, 0.2) for short dashed green line. Other parameters are taken to be ϵ0 = 1, µ0 = 1, β = 1, ħ = 1, ω0 = 5, and s0 = 1.
In case of h → 0, the natural frequency in Eq. (2) become constant and W → ω. Then, Eqs. (64) and (65) become s1 = s0 cos ωt and s2 = s0 sin ωt, respectively. We can confirm in this situation that our principal results, Eqs. (30), (33), (41), (54), and (62) reduce to those of the wave described by the simple harmonic oscillator as expected.
SUMMARY AND CONCLUSION Information theories of optical waves traveling through arbitrary timevarying media are studied on the basis of invariant operator theory. The time-dependent Hamiltonian that gives classical equation of motion for the time function q(t) of vector potential is constructed. The quadratic invariant operator is obtained from the Liouville-von Neumann equation given in Eq. (7) and it is used as a basic tool for developing information theory of the system. The eigenstates ϕn(q, t) of the invariant operator are identified using the annihilation and the creation operators. From these eigenstates, we are possible to obtain the Schrödinger solutions, i.e., the wave functions ψn(q, t), since ψn(q, t) is merely given in terms of ϕn(q, t).
Information Theory and Entropies for Quantized Optical Waves in ...
39
Figure 2. The Uncertainty product σµ (thick solid red line) together with σµ,qq (long dashed blue line) and σµ,pp (short dashed green line). The values of (k, h) used here are (1, 0.1) for (a) and (3, 0.1) for (b). Other parameters are taken to be ϵ0 = 1, µ0 = 1, β = 1, ħ = 1, ω0 = 5, and s0 = 1.
The semiclassical distribution function is the expectation value of in the coherent state which is the very classical-like quantum state. From Eq. (30), we see that the Shannon information does not vary with time. However, Eq. (41) shows that the Fisher information IF,{q,p} varies depending on time. It is known that the localization of the density is determined in accordance with the Fisher information (Romera et al, 2005). For this reason, the Fisher measure is regarded as a local measure while the Shannon information is a global information measure of the spreading of density. Local information measures vary depending on various derivatives of the probability density whereas global information measures follow the Kinchin’s axiom for information theory (Pennini & Plastino, 2007; Plastino & Casas, 2011).
40
Information and Coding Theory in Computer Science
Figure 3. The EUR UE (thick solid red line) together with S(ρn) (long dashed blue line) and (short dashed green line). The values of (k, h) used here are (1, 0.1) for (a), (3, 0.1) for (b), and (3, 0.2) for (c). Other parameters are taken to be ϵ0 = 1, µ0 = 1, ħ = 1, ω0 = 5, n = 0, and s0 = 1.
Two kinds of uncertainty products relevant to the Husimi distribution function are considered: one is the usual uncertainty product σµ and the other
Information Theory and Entropies for Quantized Optical Waves in ...
41
is the more generalized product Σµ defined in Eq. (53). While σµ varies as time goes by Σµ is constant and both have particular relations with those in standard thermal state. Fock state representation of the Shannon entropies in q- and p-spaces are derived and given in Eqs. (58) and (59), respectively. The EUR which is an alternative uncertainty relation is obtained by adding these two entropies. The EUR is more advantageous than the HUR in the context of information theory. The information theory is not only important in modern technology of quantum computing, cryptography, and communication, its area is now extended to a wide range of emerging fields that require rigorous data analysis like neural systems and human brain. Further developments of theoretical and physical backgrounds for analyzing statistical data obtained from a measurement beyond standard formulation are necessary in order to promote the advance of such relevant sciences and technologies.
42
Information and Coding Theory in Computer Science
REFERENCES 1.
Abe, S.; Martinez, S.; Pennini, F. & Plastino, A. (2002). The EUR for power-law wave packets. Phys. Lett. A, Vol. 295, Nos. 2-3, pp. 74-77. 2. Anderson, A. & Halliwell, J. J. (1993). Information-theoretic measure of uncertainty due to quantum and thermal fluctuations. Phys. Rev. D, Vol. 48, No. 6, pp. 2753-2765. 3. Angelow, A. K. & Trifonov, D. A. (2010). Dynamical invariants and Robertson-Schrödinger correlated states of electromagnetic field in nonstationary linear media. arXiv:quant-ph/1009.5593v1. 4. Ash, R. B. (1990). Information Theory. Dover Publications, New York, USA. 5. Bialynicki-Birula, I. (1984). Entropic uncertainty relations in quantum mechanics. In: L. Accardi and W. von Waldenfels (Editors), Quantum Probability and Applications, Lecture Notes in Mathematics 1136, Springer, Berlin. 6. Biatynicki-Birula, I. & Mycielski, J. (1975). Uncertainty relations for information entropy in wave mechanics. Commun. Math. Phys. Vol. 44, No. 2, pp. 129-132. 7. Bohr, N. (1928). Como Lectures. In: J. Kalckar (Editor), Niels Bohr Collected Works, Vol. 6, North-Holand Publishing, Amsterdam, 1985. 8. Choi, J. R. (2004). Coherent states of general time-dependent harmonic oscillator. Pramana-J. Phys., Vol. 62, No. 1, pp. 13-29. 9. Choi, J. R. (2010a). Interpreting quantum states of electromagnetic field in time-dependent linear media. Phys. Rev. A, Vol. 82, No. 5, pp. 055803(1-4). 10. Choi, J. R. (2010b). Thermofield dynamics for two-dimensional dissipative mesoscopic circuit coupled to a power source. J. Phys. Soc. Japan, Vol. 79, No. 4, pp. 044402(1-6). 11. Choi, J, R, (2012). Nonclassical properties of superpositions of coherent and squeezed states for electromagnetic fields in time-varying media. In: S. Lyagushyn (Editor), Quantum Optics and Laser Experiments, pp. 25-48, InTech, Rijeka. 12. Choi, J. R.; Kim, M.-S.; Kim, D.; Maamache, M.; Menouar, S. & Nahm, I. H. (2011). Information theories for time-dependent harmonic oscillator. Ann. Phys.(N.Y.), Vol. 326, No. 6, pp. 1381-1393.
Information Theory and Entropies for Quantized Optical Waves in ...
43
13. Choi, J. R. & Yeon, K. H. (2005). Quantum properties of light in linear media with time-dependent parameters by Lewis-Riesenfeld invariant operator method. Int. J. Mod. Phys. B, Vol. 19, No. 14, pp. 2213-2224. 14. Choi, J. R.; Yeon, K. H.; Nahm, I. H.; Kweon, M. J.; Maamache, M. & Menouar, S. (2012). Wigner distribution function and nonclassical properties of schrodinger cat states for electromagnetic fields in timevarying media. In: N. Mebarki, J. Mimouni, N. Belaloui, and K. Ait Moussa (Editors), The 8th International Conference on Progress in Theoretical Physics, AIP Conf. Proc. Vol. 1444, pp. 227-232, American Institute of Physics, New York. 15. Dehesa, J. S.; Martinez-Finkelshtein, A. & Sanchez-Ruiz, J. (2001). Quantum information entropies and orthogonal polynomials. J. Comput. Appl. Math., Vol. 133, Nos. 1-2, pp. 23-46. 16. Deutsch, D. (1983). Uncertainty in quantum measurements. Phys. Rev. Lett., Vol. 50, No. 9, pp. 631-633. 17. Fisher, R. A. (1925). Theory of statistical estimation. Proc. Cambridge Philos. Soc., Vol. 22, No. 5, pp. 700-725. 18. Haldar, S. K. & Chakrabarti, B. (2012). Dynamical features of quantum information entropy of bosonic cloud in the tight trap. arXiv:cond-mat. quant-gas/1111.6732v5. 19. Husimi, K. (1940). Some formal properties of the density matrix. Proc. Phys. Math. Soc. Jpn., Vol. 22, No. 4, pp. 264-314. 20. Leplae, L.; Umezawa, H. & Mancini, F. (1974). Derivation and application of the boson method in superconductivity. Phys. Rep., Vol. 10, No. 4, pp. 151-272. 21. Lewis, H. R. Jr. (1967). Classical and quantum systems with timedependent harmonic-oscillator-type Hamiltonians. Phys. Rev. Lett., Vol. 18, No. 13, pp. 510-512. 22. Lewis, H. R. Jr. & Riesenfeld W. B. (1969). An exact quantum theory of the time-dependent harmonic oscillator and of a charged particle in a time-dependent electromagnetic field. J. Math. Phys., Vol. 10, No. 8, pp. 1458-1473. 23. Lieb, E. H. (1978). Proof of an entropy conjecture of Wehrl. Commun. Math. Phys., Vol. 62, No. 1, pp. 35-41. 24. Majernik, V. & Richterek, L. (1997). Entropic uncertainty relations. Eur. J. Phys., Vol. 18, No. 2, pp. 79-89.
44
Information and Coding Theory in Computer Science
25. Malkin, I. A.; Man’ko, V. I. & Trifonov, D. A. (1970). Coherent states and transition probabilities in a time-dependent electromagnetic field. Phys. Rev. D, Vol. 2, No. 8, pp. 1371-1384. 26. Nasiri, S. (2005). Distribution functions in light of the uncertainty principle. Iranian J. Sci. & Tech. A, Vol. 29, No. A2, pp. 259-265. 27. Nielsen, M. A. & Chuang, I. L. (2000). Quantum Computation and Quantum Information. Cambridge University Press, Cambridge, England. 28. Ozawa, M. (2004). Uncertainty relations for noise and disturbance in generalized quantum measurements. Ann. Phys., Vol. 311, No. 2, pp. 350-416. 29. Pedrosa, I. A. & Rosas, A. (2009). Electromagnetic field quantization in time-dependent linear media. Phys. Rev. Lett., Vol. 103, No. 1, pp. 010402(1-4). 30. Pennini, F. & Plastino, A. (2004). Heisenberg-Fisher thermal uncertainty measure. Phys. Rev. E, Vol. 69, No. 5, pp. 057101(1-4). 31. Pennini, F. & Plastino, A. (2007). Localization estimation and global vs. local information measures. Phys. Lett. A, Vol. 365, No. 4, pp. 263267. 32. Pennini, F.; Plastino, A. R. & Plastino, A. (1998) Renyi entropies and Fisher informations as measures of nonextensivity in a Tsallis setting. Physica A, Vol. 258, Nos. 3-4, pp. 446-457. 33. Plastino, A & Casas, A. (2011). New microscopic connections of thermodynamics. In: M. Tadashi (Editor), Thermodynamics, pp. 3-23, InTech, Rijeka. 34. Romera, E.; Sanchez-Moreno, P. & Dehesa, J. S. (2005). The Fisher information of single-particle systems with a central potential. Chem. Phys. Lett., Vol. 414, No. 4-6, pp. 468-472. 35. Shannon, C. E. (1948a). A mathematical theory of communication. Bell Syst. Tech., Vol. 27, No. 3, pp. 379-423. 36. Shannon, C. E. (1948b). A mathematical theory of communication II. Bell Syst. Tech., Vol. 27, No. 4, pp. 623-656. 37. Wehrl, A. (1979). On the relation between classical and quantummechanical entropy. Rep. Math. Phys., Vol. 16, No. 3, pp. 353-358. 38. Werner, R. F. (2004). The uncertainty relation for joint measurement of position and momentum. Quantum Inform. Comput., Vol. 4, Nos. 6-7, pp. 546-562.
Chapter
SOME INEQUALITIES IN INFORMATION THEORY USING TSALLIS ENTROPY
3
Litegebe Wondie and Satish Kumar Department of Mathematics, College of Natural and Computational Science, University of Gondar, Gondar, Ethiopia
ABSTRACT We present a relation between Tsallis’s entropy and generalized Kerridge inaccuracy which is called generalized Shannon inequality and is wellknown generalization in information theory and then give its application in coding theory. The objective of the paper is to establish a result on noiseless coding theorem for the proposed mean code length in terms of generalized information measure of order ξ.
Citation: Litegebe Wondie, Satish Kumar, “Some Inequalities in Information Theory Using Tsallis Entropy”, International Journal of Mathematics and Mathematical Sciences, vol. 2018, Article ID 2861612, 4 pages, 2018. https://doi.org/10.1155/2018/2861612. Copyright: © 2018 by Authors. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Information and Coding Theory in Computer Science
46
INTRODUCTION Throughout the paper N denotes the set of the natural numbers and for n ∈ N we set
(1)
where n = 2, 3, 4, . . . denote the set of all n-components complete and incomplete discrete probability distributions. For nonadditive measure of inaccuracy, denoted by
we define a as
If
then
(2)
reduces to nonadditive entropy.
(3)
Entropy (3) was first of all characterized by Havrda and Charvát [1]. Later on, Daróczy [2] and Behara and Nath [3] studied this entropy. Vajda [4] also characterized this entropy for finite discrete generalized probability distributions. Sharma and Mittal [5] generalized this measure which is known as the entropy of order α and type β. Pereira and Gur Dial [6] and Gur Dial
Some Inequalities in Information Theory Using Tsallis Entropy
47
[7] also studied Sharma and Mittal entropy for a generalization of Shannon inequality and gave its applications in coding theory. Kumar and Choudhary [8] also gave its application in coding theory. Recently, Wondie and Kumar [9] gave a Joint Representation of Renyi’s and Tsallis Entropy. Tsallis [10] gave its applications in physics for to Shannon [11] entropy
and
reduces
(4) Inequality (6) has been generalized in the case of Renyi’s entropy.
FORMULATION OF THE PROBLEM For and inaccuracy [12] is that
then an important property of Kerridge’s
(5) Equality holds if and only if A = B. In other words, Shannon’s entropy is the minimum value of Kerridge’s inaccuracy. If then (5) is no longer necessarily true. Also, the corresponding inequality (6) is not necessarily true even for generalized probability distributions. Hence, it is natural to ask the following question: “For generalized probability distributions, what are the quantity the minimum values of which are ?” We give below an answer to the above question separately for by dividing the discussion into two parts (i) and (ii) Also we shall assume that because the problem is trivial for n = 1. Case 1. Let true. For that (5) is true if
. If then as remarked earlier (5) is it can be easily seen by using Jenson’s inequality equality in (5) holding if and only if (7)
Information and Coding Theory in Computer Science
48
Case 2. Let
Since (6) is not necessarily true, we need an inequality (8)
such that
and equality holds if and only if
Since
by reverse Hölder inequality, that is, if and
are positive real numbers, then
(9) Let
and
Putting these values into (9), we get
(10) where we used (8), too. This implies however that (11) Or
(12) using (12) and the fact that ξ >1,, we get (6). Particular’s Case. If ξ =1, then (6) becomes (13) which is Kerridge’s inaccuracy [12].
Some Inequalities in Information Theory Using Tsallis Entropy
49
MEAN CODEWORD LENGTH AND ITS BOUNDS We will now give an application of inequality (6) in coding theory for
(14) Let a finite set of n input symbols (15) be encoded using alphabet of D symbols, then it has been shown by Feinstein [13] that there is a uniquely decipherable code with lengths if and only if the Kraft inequality holds; that is, (16) where D is the size of code alphabet. Furthermore, if (17) is the average codeword length, then for a code satisfying (16), the inequality (18) is also fulfilled and equality holds if and only if (19) and by suitable encoded into words of long sequences, the average length can be made arbitrarily close to H(A), (see Feinstein [13]). This is Shannon’s noiseless coding theorem. By considering Renyi’s entropy (see, e.g., [14]), a coding theorem, analogous to the above noiseless coding theorem, has been established by Campbell [15] and the authors obtained bounds for it in terms of (20) Kieffer [16] defined a class rules and showed that Hξ(A) is the best decision rule for deciding which of the two sources can be coded with expected cost of sequences of length N when N →∞, where the cost of
50
Information and Coding Theory in Computer Science
encoding a sequence is assumed to be a function of length only. Further, in Jelinek [17] it is shown that coding with respect to Campbell’s mean length is useful in minimizing the problem of buffer overflow which occurs when the source symbol is produced at a fixed rate and the code words are stored temporarily in a finite buffer. Concerning Campbell’s mean length the reader can consult [15]. It may be seen that the mean codeword length (17) had been generalized parametrically by Campbell [15] and their bounds had been studied in terms of generalized measures of entropies. Here we give another generalization of (17) and study its bounds in terms of generalized entropy of order ξ. Generalized coding theorems by considering different information measure under the condition of unique decipherability were investigated by several authors; see, for instance, the papers [6, 13, 18]. An investigation is carried out concerning discrete memoryless sources possessing an additional parameter ξ which seems to be significant in problem of storage and transmission (see [9, 16–18]). In this section we study a coding theorem by considering a new information measure depending on a parameter. Our motivation is, among others, that this quantity generalizes some information measures already existing in the literature such as Arndt [19] entropy, which is used in physics. Definition 1. Let be arbitrarily fixed, then the mean length corresponding to the generalized information measure is given by the formula
where integers so that
and
(21)
are positive
(22) Since (22) reduces to Kraft inequality when ξ = 1, therefore it is called generalized Kraft inequality and codes obtained under this generalized inequality are called personal codes.
Some Inequalities in Information Theory Using Tsallis Entropy
Theorem 2. Let length
51
be arbitrarily fixed. Then there exist code
so that (23)
holds under condition (22) and equality holds if and only if
(24) where
and
are given by (3) and (21), respectively.
Proof. First of all we shall prove the lower bound of
.
By reverse Hölder inequality, that is, if
and
are positive real numbers then
(25) Let
and
Putting these values into (25), we get
(26) where we used (22), too. This implies however that
(27)
52
Information and Coding Theory in Computer Science
For ξ >1, (27) becomes
(28) using (28) and the fact that ξ >1, we get (29) From (24) and after simplification, we get
(30) This implies
(31) which gives
. Then equality sign holds in (29).
Now we will prove inequality (23) for upper bound of We choose the codeword lengths
.
in such a way that
(32) is fulfilled for all From the left inequality of (32), we have
(33) multiplying both sides by and then taking sum over k, we get the generalized inequality (22). So there exists a generalized code with code lengths
Some Inequalities in Information Theory Using Tsallis Entropy
53
Since ξ >1, then (32) can be written as
(34) Multiplying (34) throughout by from k = 1 to n, we obtain inequality
and then summing up
(35) Since
for ξ >1, we get from (35) inequality (23).
Particular’s Cases. For ξ →1, then (23) becomes
(36) which is the Shannon [11] classical noiseless coding theorem (37) where H(A) and L are given by (4) and (17), respectively.
CONCLUSION In this paper we prove a generalization of Shannon’s inequality for the case of entropy of order ξ with the help of Hölder inequality. Noiseless coding theorem is proved. Considering Theorem 2 we remark that the optimal code length depends on ξ in contrast with the optimal code lengths of Shannon which do not depend of a parameter. However, it is possible to prove coding theorem with respect to (3) such that the optimal code lengths are identical to those of Shannon.
54
Information and Coding Theory in Computer Science
REFERENCES 1.
2. 3.
4. 5.
6.
7. 8.
9.
10. 11. 12.
13.
J. Havrda and F. S. Charvát, “Quantification method of classification processes. Concept of structural α-entropy,” Kybernetika, vol. 3, pp. 30–35, 1967. Z. Daróczy, “Generalized information functions,” Information and Control, vol. 16, no. 1, pp. 36–51, 1970. M. Behara and P. Nath, “Additive and non-additive entropies of finite measurable partitions,” in Probability and Information Theory II, pp. 102–138, Springer-Verlag, 1970. I. Vajda, “Axioms for α-entropy of a generalized probability scheme,” Kybernetika, vol. 4, pp. 105–112, 1968. B. D. Sharma and D. P. Mittal, “New nonadditive measures of entropy for discrete probability distributions,” Journal of Mathematical Sciences, vol. 10, pp. 28–40, 1975. R. Pereira and Gur Dial, “Pseudogeneralization of Shannon inequality for Mittal’s entropy and its application in coding theory,” Kybernetika, vol. 20, no. 1, pp. 73–77, 1984. Gur Dial, “On a coding theorems connected with entropy of order α and type β,” Information Sciences, vol. 30, no. 1, pp. 55–65, 1983. S. Kumar and A. Choudhary, “Some coding theorems on generalized Havrda-Charvat and Tsallis’s entropy,” Tamkang Journal of Mathematics, vol. 43, no. 3, pp. 437–444, 2012. L. Wondie and S. Kumar, “A joint representation of Renyi’s and Tsalli’s entropy with application in coding theory,” International Journal of Mathematics and Mathematical Sciences, vol. 2017, Article ID 2683293, 5 pages, 2017. C. Tsallis, “Possible generalization of Boltzmann-Gibbs statistics,” Journal of Statistical Physics, vol. 52, no. 1-2, pp. 479–487, 1988. C. E. Shannon, “A mathematical theory of communication,” Bell System Technical Journal, vol. 27, no. 4, pp. 623–656, 1948. D. F. Kerridge, “Inaccuracy and inference,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 23, pp. 184–194, 1961. A. Feinstein, Foundations of Information Theory, McGraw-Hill, New York, NY, USA, 1956.
Some Inequalities in Information Theory Using Tsallis Entropy
55
14. A. Rényi, “On measures of entropy and information,” in Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability, pp. 547–561, University of California Press, 1961. 15. L. L. Campbell, “A coding theorem and Rényi’s entropy,” Information and Control, vol. 8, no. 4, pp. 423–429, 1965. 16. J. C. Kieffer, “Variable-length source coding with a cost depending only on the code word length,” Information and Control, vol. 41, no. 2, pp. 136–146, 1979. 17. F. Jelinek, “Buffer overflow in variable length coding of fixed rate sources,” IEEE Transactions on Information Theory, vol. 14, no. 3, pp. 490–501, 1968. 18. G. Longo, “A noiseless coding theorem for sources having utilities,” SIAM Journal on Applied Mathematics, vol. 30, no. 4, pp. 739–748, 1976. 19. C. Arndt, Information Measure-Information and Its Description in Science and Engineering, Springer, Berlin, Germany, 2001.
Chapter
THE COMPUTATIONAL THEORY OF INTELLIGENCE: INFORMATION ENTROPY
4
Daniel Kovach Kovach Technologies, San Jose, CA, USA
ABSTRACT This paper presents an information theoretic approach to the concept of intelligence in the computational sense. We introduce a probabilistic framework from which computation alintelligence is shown to be an entropy minimizing process at the local level. Using this new scheme, we develop a simple data driven clustering example and discuss its applications. Keywords: Machine Learning, Artificial Intelligence, Entropy, Computer Science, Intelligence Citation: Kovach, D. (2014), “The Computational Theory of Intelligence: Information Entropy”. International Journal of Modern Nonlinear Theory and Application, 3, 182190. doi: 10.4236/ijmnta.2014.34020. Copyright: © 2014 by authors and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0.
Information and Coding Theory in Computer Science
58
INTRODUCTION This paper attempts to introduce a computational approach to the study of intelligence that the researcher has accumulated for many years. This approach takes into account data from Psychology, Neurology, Artificial Intelligence, Machine Learning, and Mathematics. Central to this framework is the fact that the goal of any intelligent agent is to reduce the randomness in its environment in some meaningful ways. Of course, formal definitions in the context of this paper for terms like “intelligence”, “environment”, and “agent” will follow. The approach draws from multidisciplinary research and has many applications. We will utilize the construct in discussions at the end of the paper. Other applications will follow in future works. Implementations of this framework can apply to many fields of study including General Artificial Intelligence (GAI), Machine Learning, Optimization, Information Gathering, Clustering, and Big Data, and extend outside of the applied mathematics and computer science realm to even more areas including Sociology, Psychology, and Neurology, and even Philosophy.
Definitions One cannot begin a discussion about the philosophy of artificial intelligence without a definition of the word “intelligence” in the first place. With the panoply of definitions available, it is understandable that there may be some disagreement, but typically each school of thought generally shares a common thread. The following are three different definitions of intelligence from respectable sources: •
“The aggregate or global capacity of the individual to act purposefully, to think rationally, and to deal effectively with his environment.”[1] . • “A process that entails a set of skills of problem solving enabling the individual to resolve genuine problems or difficulties that he or she encounters and, when appropriate, to create an effective product and must also entail the potential for finding or creating problems and thereby providing the foundation for the acquisition of new knowledge.” [2] . • “Goal-directed adaptive behavior.” [3] . Vernon’s hierarchical model of intelligence from the 1950’s [1] , and Hawkins’ On Intelligence from g 2004 [4] are some other great resources
The Computational Theory of Intelligence: Information Entropy
59
on this topic. Consider the following working definition of this paper, with regard to information theory and computation: Computational Intelligence (CI) is an information processing algorithm that • •
Records data or events into some type of store, or memory. Draws from the events recorded in memory, to make assertions, or predictions about future events. • Using the disparity between the predicted and events and the new incoming events, the memory structure in step 1 can be updated such that the predictions of step 2 are optimized. The mapping in 3 is called learning, and is endemic to CI. Any entity that is facilitating the CI process we will refer to as an agent, in particular when the connotation is that the entity is autonomous. The surrounding infrastructure that encapsulates the agent together with the ensuing events is called the environment.
Structure The paper is organized as follows. In Section 2 we provide a brief summary of the concept of information entropy as it is used for our purposes. In Section 3, we provide a mathematical framework for intelligence and show discuss its relation to entropy. Section 4 discusses the global ramifications of local entropy minimization. In Section 5 we present a simple application of the framework to data analytics, which is available for free download. Sections 6 and 7 discuss relevant related research, and future work.
ENTROPY A key concept of information theory is that of entropy, which amounts to the uncertainty in a given random variable, [5] . It is essentially, a measure of unpredictability (among other interpretations). The concept of entropy is a much deeper principal of nature that penetrates to the deepest core of physical reality and is central to physics and cosmological models [6] -[8] .
Mathematical Representation Although terms like Shannon entropy are pervasive in the field of information theory, it will be insightful to review the formulation in our context. To arrive at the definition of entropy, we must first recall what is meant by information
60
Information and Coding Theory in Computer Science
content. The information content of a random variable, denoted I[X], is given by
(1) where ℙ[X] is the probability of X. The entropy of X, denoted 𝔼[X], is then defined as the expectation value of the information content. (2) Expanding using the definition of the expectation value, we have (3)
Relationship of Shannon Entropy to Thermodynamics The concept of entropy is deeply rooted at the heart of physical reality. It is a central concept in thermodynamics, governing everything from chemical reactions to engines and refrigerators. The relationship of entropy as it is known in information theory, however, is not mapped so straightforwardly to its use in thermodynamics. In statistical thermodynamics, the entropy S, of a system is given by
(4) where pi denote the probability of each microstate, or configuration of the system, and kb is the Boltzmann constant which serves to map the value of the summation to physical reality in terms of quantity and units. The connection between the thermodynamic and information theoretic versions of entropy relate to the information needed to detail the exact state of the system, specifically, the amount of further Shannon information needed to define the microscopic state of the system that remains ambiguous when given its macroscopic definition in terms of the variables of Classical Thermodynamics. The Boltzmann constant serves as a constant of proportionality.
The Computational Theory of Intelligence: Information Entropy
61
Renyi Entropy We can extend the logic of the beginning of this section to a more general formulation called the Renyi entropy of order α, where α ≥ 0 and α ≠ 1 defined as
(5) Under this convention we can apply the concept of entropy more generally to extend the utility of the concept to a variety of applications. It is important to note that this formulation approaches 1 in the limit as α →1. Although the discussions of this paper were inspired by Shannon entropy, we wish to present a much more general definition and a bolder proposition.
INTELLIGENCE: DEFINITION AND ASSUMPTIONS : S → O. The function 𝕀 represents the intelligence process, a member of ℐ, the set of all such functions. It maps input from set S to output in O. First, let (6) reflect the fact that 𝕀 is mapping one element from S to one element in O, each tagged by the identifier i ∈ ℕ, which is bounded by the cardinality of the input set. The cardinality of these two sets need not match, nor does the mapping between 𝕀 need to be bijective, or even surjective. This is an iterative process, as denoted by the index, t. Finally, let Ot represent the collection of . Over time, the mapping should converge to the intended element, oi ∈ O, as is reflected in notation by (7) Introduce the function (8) which in practice is usually some type of error or fitness measuring function. Assuming that 𝕃t is continuous and differentiable, let the updating mechanism at some particular t for 𝕀 be
62
Information and Coding Theory in Computer Science
(9) In other words, 𝕀 iteratively updates itself with respect to the gradient of some function, 𝕃. Additionally, 𝕃 must satisfy the following partial differential equation (10) where the function d is some measure of the distance between O and Ot, assuming such a function exists, and ρ is called the learning rate. A generalization of this process to abstract topological spaces where such a distance function is a commodity is an open question. Finally, for this to qualify as an intelligence process, we must have (11)
Unsupervised and Supervised Learning Some consideration should be given to the sets S and O. If O = P(S) where P(S) is the power set of S, then we will say that the mapping 𝕀 is an unsupervised mapping. Otherwise, the mapping is supervised. The ramifications of this distinction are as follows. In supervised learning, the agent is given two distinct sets and trained to form a mapping between them explicitly. With unsupervised learning, the agent is tasked with learning subtle relationships in a single data set or, put more succinctly, to develop the mapping between S and its power set discussed above [9] [10] .
Overtraining Further, we should note that just because we have some function : S → O satisfying the definitions and assumptions of this section does not mean that this mapping be necessarily meaningful. After all, we could make a completely arbitrary but consistent mapping via the prescription above, and although this would satisfy all the definitions and assumptions, it would be complete memorization on the part of the agent. But this, in fact is exactly the definition of overtraining a common pitfall in the training stage of machine learning and about which one must be very diligent to avoid.
The Computational Theory of Intelligence: Information Entropy
63
Entropy Minimization One final part of the framework remains, and that is to show that entropy is minimized, as was stated at the beginning of this section. To show that, we consider 𝕀 as a probabilistic mapping, with (12) indicating the probability that 𝕀 maps si ∈ S to some particular oj ∈ O. From this, we can calculate the entropy in the mapping from S to O, at each iteration t. If the projection [si] has N possible outcomes, then the Shannon entropy of each si ∈ S is given by (13)
The total entropy is simply the sum of 𝔼t[si] over i. Let , then for the purposes of standardization across varying cardinalities, it may be insightful to speak of the normalized entropy 𝔼t[S], (14) As t → ∞, the mapping from each and we have
to its corresponding oj converges,
(15) Therefore (16) Further, using the definition for Renyi entropy in 5 for each t and i (17) To show that the Renyi entropy is also minimized, we can use an identity involving the p-norm
64
Information and Coding Theory in Computer Science
(18) and show that the log function is maximized t → ∞ for α > 1, and minimized for . The case α → 0 was shown above with the definition of Shannon entropy. To continue, note that (19) since the summation is taken over all possible states oj ∈ O. But from 15, we have (20) for finite t and thus the log function is minimized only as t → ∞. To show that the Renyi entropy is also minimized for , we repeat the abovelogic but note that the with the sign reversal of minimized as t → ∞.
, the quantity
is
Finally, we can take a normalized sum over all i to obtain the total Renyi entropy of . By this definition, then the total entropy is minimized along with its components.
Entropic Self Organization In section 3 we talked about the definitions of intelligence via the mapping : S → O. Here, we seek to apply the entropy minimization concept to P(S) itself, rather than a mapping. Explicitly, σ ⊂ P(S), where (21)
and for every s ∈ S, there is aunique s ∈σ such that s ∈ S. That is, every element of S has one and only one element of σ containing it. The term entropic self-organization refers to finding the Σ ⊂ P(S) such that ℍα[σ] is minimized over all σ satisfying 21 (22)
The Computational Theory of Intelligence: Information Entropy
65
GLOBAL EFFECTS In nature, whenever a system is taken from a state of higher entropy to a state of lower entropy, there is always some amount of energy involved in this transition, and an increase in the entropy of the rest of the environment greater than or equal to that of the entropy loss [6] . In other words, consider a system S composed of two subsystems, s1 and s2, then (23) Now, consider that system in equilibrium at times t1, and t2t1 > t2 and denote the state at each S1 and S2, respectively. Then due to the second law of thermodynamics, . (24) Therefore (25) Now, suppose one of the subsystems, say, s1 decreases in entropy by some amount, Δs between t1, and t2, i.e. . Then to preserve the inequality, the entropy of the rest of the system must be such that . (26) So the entropy of the rest of the system has to increase by an amount greater than or equal to the loss of entropy in s1. This will require some amount of energy, ΔE. Observe that all we have done thus far is follow the natural consequences of the Second Law of Thermodynamics with respect to our considerations with intelligence. While the second law of thermodynamics has been verified time and again in virtually all areas of physics, few have extended it as a more general principal in the context of information theory. In fact, we will conclude this section with a postulate about intelligence: Computational intelligence is a process that locally minimizes and globally maximizes Renyi entropy.
66
Information and Coding Theory in Computer Science
It should be stressed that although the above is necessary of intelligence, it is not sufficient in the justification of an algorithm or process as being intelligent.
APPLICATIONS Here, we implement the discussions of this paper to practical examples. First, we consider a simple example of unsupervised learning; a clustering algorithm based on Shannon entropy minimization. Next we look at some simple behavior of an intelligent agent as it acts to maximize global entropy in its environment.
Clustering by Entropy Minimization Consider a data set consisting of a number of elements organized into rows. The experiment that follows, we consider 300 samples, each a vector from R3. In this simple proof of concept we will group the data into like neighborhoods by minimizing the entropy across all elements at each respective index in the data set. This is a data driven example, so essentially we use a genetic algorithm to perturb the juxtaposition of members of each neighborhood until the global entropy reaches a minimum (entropic self organization), while at the same time avoiding trivial cases such as a neighborhood with only one element. We leverage the Python framework heavily for this example, which is freely available for many operating systems at [11] . Please note that this is a simple prototype, a proof of concept used to exemplify the material in this paper. It is not optimized for latency, memory utilization, and it has not been optimized or performance tested against other algorithms in its comparative class, although dramatic improvements could be easily achieved by integrating the information content of the elements into the algorithm. Specifically, we would move elements with high information content to clusters where that element would otherwise have low information content. Furthermore, observe that for further efficacy, a preprocessing layer may be beneficial, especially with topological data set. Nevertheless, applications of this concept applied to clustering on small and large scales will be discussed in a future work.
The Computational Theory of Intelligence: Information Entropy
67
We can visualize the progression of the algorithm and the final results, respectively, in Figure 1. For simplicity, only the first two (non-noise) dimensions are plotted. The accuracy of the clustering algorithm was 8.3% error rate in 10,000 iterations, with an average simulation time: 480.1 seconds. Observe that although there are a few “blemishes” in the final clustering results, with a proper choice of parameters including the maximum computational epochs the clustering algorithm will eventually succeed with 100% accuracy. Also pictured in Figure 2 are the results of the clustering algorithm applied to a data set containing four additional fields of pseudo-randomly generated noise, each in the interval [−1,1]. The performance of this trial was worse than the last in terms of speed, but was had about the same classification accuracy. The accuracy of the clustering algorithm was 6.0% error rate in 10,000 iterations, with an average simulation time: 1013.1 seconds.
Entropy Maximization In our next set of examples consider a virtual agent confined to move about a “terrain”, represented by a three- dimensional surface, given by one of the two following equations, each of which are plotted visually in Figure 3, and defined by the following functions, respectively: (27) and
(28)
68
Information and Coding Theory in Computer Science
Figure 1. Entropic clustering algorithm results over time.
We will confine x and Y such that and note that the range of each respective surface is z ∈ [0,1]. The algorithm proceeds as follows. First, the agent is initialized with a starting position, p0 = (x0, y0). It updates the coordinates of the agent’s position by incrementing or decrementing by some small value, ε = (εx, εy). As the agent meanders about the surface, data is collected as to its position on the z-axis. If we partition the range of each surface into equally spaced intervals, we can form a histogram H of the agent’s positional information. From this H we can construct a discrete probability function, ℙH and thus calculate the Renyi entropy. The agent can then use feedback from the entropy determined using H to calculate an appropriate ε from which it upates its position, and the cycle continues. The overall goal is to maximize its entropy, or timeout after a predetermined number of iterations. In this particular simulation, the agent is initialized using a “random walk”, in which is is chosen at random. Next, it is updated using feedback from the entropy function.
The Computational Theory of Intelligence: Information Entropy
69
From the simple set of rules, we see emergent desire for parsimony with respect to position on the surface, even in the less probable partitions of z, as z →1. As our simulation continues to run, so tends ℙH to a uniform distribution. The Figure 3 depict a random walk on surface 1 and surface 2 respectively, where the top and bottom right figures show surface traversal using an entropic search algorithm.
RELATED WORKS Although there are many approaches to intelligence from the angle of cognitive science, few have been proposed from the computational side. However, as of late, some great work in this area is underway.
Figure 2. Entropic clustering algorithm results over time.
70
Information and Coding Theory in Computer Science
Figure 3. Surfaces for hill climbing agent simulation.
Many sources claim to have computational theories of intelligence, but for the most part these “theories” merely act to describe certain aspects of intelligence [12] [13] . For example, Meyer in [12] suggests that performance on multiple tasks is dependent on adaptive executive control, but makes no claim on the emergence of such characteristics. Others discuss how data is aggregated. This type of analysis is especially relevant in computer vision and image recognition [13] . The efforts in this paper seek to introduce a much broader theory of emergence of autonomous goal directed behavior. Similar efforts are currently under way. Inspired by physics and cosmology, Wissner-Gross asserts autonomous agents act to maximize the entropy in their environment [14] . Specifically he proposes a path integral formulation from which he derives a gradient that can be analogized as a causal force propelling a system along a gradient of maximum entropy over time. Using this idea, he created a startup called Entropica [15] that applies this principal in ingenious ways in a variety of different applications, ranging from anything to teaching a robot to walk upright, to maximizing profit potential in the stock market. Essentially, what Wissner-Gross did was start with a global principal and worked backwards. What we did in this paper was to arrive at a similar result from a different perspective, namely entropy minimization.
CONCLUSIONS The purpose of this paper was to lay the groundwork for a generalization of the concept of intelligence in the computational sense. We discussed how entropy minimization can be utilized to facilitate the intelligence process, and how the disparities between the agent’s prediction and the reality of
The Computational Theory of Intelligence: Information Entropy
71
the training set can be used to optimize the agent’s performance. We also showed how such a concept could be used to produce a meaningful, albeit simplified, practical demonstration. Some future work includes applying the principals of this paper to data analysis, specifically in the presence of noise or sparse data. We will discuss some of these applications in the next paper. More future work includes discussing the underlying principals under which data can be collected hierarchically, discussing how computational processes can implement the discussions in this paper to evolve and work together to form processes of greater complexity, and discussing the relevance of these contributions to abstract concepts like consciousness and self awareness. In the following paper we will examine how information can aggregate together to form more complicated structures, the roles these structures can play. Daniel Kovach More concepts, examples, and applications will follow in future works.
72
Information and Coding Theory in Computer Science
REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
13.
14.
15.
Wechsler, D. and Matarazzo, J.D. (1972) Wechsler’s Measurement and Appraisal of Adult Intelligence. Oxford UP, New York. Gardner, H. (1993) Frames of the Mind: The Theory of Multiple Intelligences. Basic, New York. Sternberg, R.J. (1982) Handbook of Human Intelligence. Cambridge UP, Cambridge Cambridgeshire. Hawkins, J. and Sandra, B. (2004) On Intelligence. Times, New York. Ihara, S. (1993) Information Theory for Continuous Systems. World Scientific, Singapore. Schroeder, D.V. (2000) An Introduction to Thermal Physics. Addison Wesley, San Francisco. Penrose, R. (2011) Cycles of Time: An Extraordinary New View of the Universe. Alfred A. Knopf, New York. Hawking, S.W. (1998) A Brief History of Time. Bantam, New York. Jones, M.T. (2008) Artificial Intelligence: A Systems Approach. Infinity Science, Hingham. Russell, S.J. and Peter, No. (2003) Artificial Intelligence: A Modern Approach. Prentice Hall/Pearson Education, Upper Saddle River. (2013) Download Python. N.p., n.d. Web. 17 August 2013. http://www. python.org/getit Marr, D. and Poggio, T. (1979) A Computational Theory of Human Stereo Vision. Proceedings of the Royal Society B: Biological Sciences, 204, 301-328. http://dx.doi.org/10.1098/rspb.1979.0029 Meyer, D.E. and Kieras, D.E. (1997) A Computational Theory of Executive Cognitive Processes and Multiple-Task Performance: Part I. Basic Mechanisms. Psychological Review, 104, 3-65. http://dx.doi. org/10.1098/rspb.1979.0029 Wissner-Gross, A. and Freer, C. (2013) Causal Entropic Forces. Physical Review Letters, 110, Article ID: 168702. http://dx.doi. org/10.1103/PhysRevLett.110.168702 (2013) Entropica. N.p., n.d. Web. 17 August 2013. http://www. entropica.com
SECTION 2: BLOCK AND STREAM CODING
Chapter
BLOCK-SPLIT ARRAY CODING ALGORITHM FOR LONG-STREAM DATA COMPRESSION
5
Qin Jiancheng,1 Lu Yiqin,1 and Zhong Yu2,3 School of Electronic and Information Engineering, South China University of Technology, Guangdong, China 1
Zhaoqing Branch, China Telecom Co., Ltd., Guangdong, China
2
School of Software, South China University of Technology, Guangdong, China
3
ABSTRACT With the advent of IR (Industrial Revolution) 4.0, the spread of sensors in IoT (Internet of Things) may generate massive data, which will challenge the limited sensor storage and network bandwidth. Hence, the study of big data compression is valuable in the field of sensors. A problem is how to compress the long-stream data efficiently with the finite memory of a Citation: Qin Jiancheng, Lu Yiqin, Zhong Yu, “Block-Split Array Coding Algorithm for Long-Stream Data Compression”, Journal of Sensors, vol. 2020, Article ID 5726527, 22 pages, 2020. https://doi.org/10.1155/2020/5726527. Copyright: © 2020 by Authors. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
76
Information and Coding Theory in Computer Science
sensor. To maintain the performance, traditional techniques of compression have to treat the data streams on a small and incompetent scale, which will reduce the compression ratio. To solve this problem, this paper proposes a block-split coding algorithm named “CZ-Array algorithm,” and implements it in the shareware named “ComZip.” CZ-Array can use a relatively small data window to cover a configurable large scale, which benefits the compression ratio. It is fast with the time complexity O(N) and fits the big data compression. The experiment results indicate that ComZip with CZArray can obtain a better compression ratio than gzip, lz4, bzip2, and p7zip in the multiple stream data compression, and it also has a competent speed among these general data compression software. Besides, CZ-Array is concise and fits the hardware parallel implementation of sensors.
INTRODUCTION With the advent of IR (Industrial Revolution) 4.0 and the following rapid expanding of IoT (Internet of Things), lots of sensors are available in various fields, which will generate massive data. The soul of IR 4.0 and IoT with intelligent decision and control relies on these valuable data. But the spreading sensors’ data also bring problems to the smart systems, especially in the WSN (wireless sensor network) with precious bandwidth. Due to the limited storage capacity and network bandwidth, GBs or TBs of data in IoT make an enormous challenge to the sensors. Data compression is a desirable way to reduce storage usage and speed up network transportation. In practice, stream data are widely used to support the large data volume which exceeds the maximum storage of a sensor. For example, a sensor with an HD (high-definition) camera can deal with its video data as a long stream, despite its small memory. And in most cases, a lot of sensors in the same zone may generate similar data, which can be gathered and compressed as a stream, and then transmitted to the back-end cloud platform. This paper focuses on the sensors’ compression which has strict demands about low computing consumption, fast encoding, and energy saving. These special demands exclude most of the unfit compression algorithms. And we pay attention to the lossless compression because it is general. Even a lossy compression system usually contains an entropy encoder as the terminal unit, which depends on the lossless compression. For example, in the image
Block-Split Array Coding Algorithm for Long-Stream Data Compression
77
compression, DCT (discrete cosine transform) algorithm [1] needs a lossless compressor. Some lossy compression algorithms can avoid the entropy encoder, such as SVD (singular value decomposition) algorithm [2], but they often consume more computation resources and energy than a lossless compressor. A problem is about the finite memory of each sensor under the longstream data. The sensors have to compress GBs of data or more, while a sensor has only MBs of RAM (random access memory) or less. In most of the traditional compression algorithms, the compression ratio depends on the size of the data window, which is limited by the capacity of the RAM. To maintain the performance, traditional techniques have to treat the data streams on a small and incompetent scale, which will reduce the compression ratio. For example, the 2 MB data window cannot see the stream data out of 2 MB at a time; thus, the compression algorithm cannot merge the data inside and outside this window, even if they are similar and compressible. The window scale restricts the compression ratio. Unfortunately, due to the limited hardware performance and energy consumption of the sensors, it is difficult to enlarge the data window for the long-stream data compression. Moreover, multiple data streams are common in IoT, such as the dualcamera video data, and these streams may have redundant data that ought to be compressed. But since the small data window can see only a little part of a stream, how can it see more streams so that they can be merged? In our previous papers, we have designed and upgraded a compression format named “CZ format” which can support the data window up to 1 TB (or larger) [3, 4] and implemented it in our compression software named “ComZip.” But the sensor’s RAM still limits the data window size. And using flash memory to extend the data window is not good, because the compression speed will fall evidently. To solve the problems of long-stream data compression with limited data window size, this paper proposes a block-split coding algorithm named “CZ-Array algorithm” and implements it in ComZip. CZ-Array algorithm has the following features: •
It splits the data stream into blocks and rebuilds them with the time complexity O(N) so that the data window can cover a larger scale to fit the big data compression
Information and Coding Theory in Computer Science
78
•
It builds a set of matching links to speed up the LZ77 compression algorithm [5], depressing the time complexity from O(N2) into O(N) • It is concise for the hardware design in the LZ77 encoder pipeline, so that the sensors may use parallel hardware to accelerate the LZ compression We do some experiments on both platforms x86/64 and ARM (Advanced RISC Machines), to compare the efficiencies of data compression among ComZip with CZ-Array, gzip, lz4, bzip2, and p7zip. The experiment results indicate that ComZip with CZ-Array can obtain the best compression ratio among these general data compression software in the multiple stream data compression, and it also has competent speed. Besides, the algorithm analysis infers CZ-Array is concise and fits the hardware parallel implementation of sensors. To make further experiments, we provide 2 versions of ComZip on the website: for Ubuntu Linux (x86/64 platform) and Raspbian (ARM platform). Other researchers may download them from http://www.28x28.com:81/doc/ cz_array.html. The remainder of this paper is structured as follows: Section 2 expresses the problems of long-stream data compression for sensors. Section 3 introduces the algorithm of CZ-Array coding. Section 4 analyzes the complexities of the CZ-Array algorithm. The experiment results are given in Section 5. The conclusions are given in Section 6.
PROBLEMS OF LONG-STREAM DATA COMPRESSION FOR SENSORS Video/audio or other special sensors in IoT can generate long-stream data, but the bottlenecks of data transportation, storage, and computation in the networks of sensors need to be eliminated. Data compression meets this requirement. Figure 1 shows a typical scene in a network with both lightweight and heavy sensors, where long-stream data are generated.
Block-Split Array Coding Algorithm for Long-Stream Data Compression
79
Figure 1. Network of lightweight and heavy sensors.
This network has lots of lightweight nodes to sense the situation and generate long-stream data. Since they have limited energy, storing capacity, and computing resources, they can neither keep the data locally nor make strong compression. Meanwhile, a few heavy nodes in the network can gather and compress the data to reduce the long-distance bandwidth, and then transport them to the back-end cloud platform. The cloud platform has plenty of resources to store, decompress, and analyze the data. In our previous papers, we have discussed the big data compression and encryption in the heavy nodes among a network of sensors [4, 6], but if the heavy nodes compress long-stream data, we still have problems with the finite memory, energy consumption, speed, and compression ratio. This paper focuses on the following problems: •
Limited by the data window size; how to cover a larger scale to see more data streams for a better compression ratio? • Keeping the compression ratio roughly unchanged; how to improve the speed of the compression algorithm? In our previous [3] and [4], we have shown that a larger data window can gain a better compression ratio. In this paper, we still use the same definition of the compression ratio as follows:
Information and Coding Theory in Computer Science
80
(1) Dzip and D are the volumes of the compressed and original data, respectively. If the original data are not compressed, R = 0. If the compressed data are larger than the original data, R < 0. Always R < 1. The compression algorithms can merge similar data fragments within a data window. Thus, a larger data window can see more data and have more merging opportunities. But the data window size is limited by the capacity of RAM. Although in Figure 1 the cloud platforms have plenty of RAM for large data windows, a typical heavy-node sensor has only MBs of RAM at present. Using the VM (virtual memory) to enlarge the data window is not good enough, because the flash memory is much slower than RAM. Moreover, a heavy node may gather data streams from different lightweight nodes, and the streams may have similar data fragments. But how can a data window see more than one stream to get more merging opportunities? For the second problem, the compression speed is important for the sensors, especially the heavy nodes. GBs of stream data have to be compressed in time, while the sensors’ computing resources are finite. Cutting down the data window size to accelerate the computations is not a good way, because the compression ratio will sink evidently, which runs into the first problem again. To solve the problems, we need to review the main related works around sensors and stream data compression. In [3] and [4], we have discussed current mathematic models and methods of lossless compression can be divided into 3 classes: •
•
The compression based on probabilities and statistics: typical algorithms in this class are Huffman and arithmetic coding [7]. The data window size can hardly influence the speed of such a class of compression To maintain the statistic data for compression, the time complexity of Huffman coding is O(lbM) and that of traditional arithmetic coding is O(M). M is the amount of coding symbols, such as 256 characters and the index code-words in some LZ algorithms. O(M) is not fast enough, but current arithmetic coding algorithms
Block-Split Array Coding Algorithm for Long-Stream Data Compression
81
have been optimized and reached O(lbM), including that in ComZip [3]. • The compression based on the dictionary indexes: typical algorithms in this class are the LZ series [5], such as LZ77/LZ78/ LZSS • To achieve the string matching, the time complexity of traditional LZ77, LZ78, and LZSS coding are O(N2), O(NlbN), and O(NlbN).N is the data window size. While in ComZip, we optimize LZ77 and reach O(N) by the CZ-Array algorithm. This paper focuses on it. • The compression based on the order and repeat of the symbols: typical algorithms in this class are BWT (Burrows-Wheeler transform) [8], MTF (move-to-front) [9], and RLE (run-length encoding). The time complexity of traditional BWT coding is O(N2lbN), which is too slow. But current BWT algorithms have been optimized and reached O(N), including the CZ-BWT algorithm in ComZip [6]. Moreover, we use a new method to split the data stream into blocks and rebuild them with the time complexity O(N). This is neither BWT nor MTF/RLE, but it is another way to improve the compression ratio by accommodating the data window scale. This paper focuses on it as a part of the CZ-Array algorithm. MTF and RLE are succinct and easy to be implemented, but their compression ratios are uncompetitive in the current compression field. To achieve better compression ratio and performance, current popular compression software commonly combine different compression models and methods. Table 1 lists the features of popular compression software and ComZip.
82
Information and Coding Theory in Computer Science
Table 1. Features of compression software. Software
Format
Basic algorithms
Maximum data window Support
Current
Shortages
WinZip
Deflate
LZSS & Huffman
512 KB (LZSS)
512 KB
Small data window; low compression ratio; weak big data support
WinRAR
RAR
LZSS & Huffman
4 MB (LZSS)
4 MB
Small data window; low compression ratio; weak big data support
PPMd
PPM
—
—
Good compression ratio for text data only; weak big data support
gzip
Deflate
LZ77 & Huffman
32 KB (LZ77)
32 KB
Small data window; low compression ratio; weak big data support
lz4
lz4
LZ77
64 KB (LZ77)
64 KB
Small data window; low compression ratio; limited big data support
bzip2
bz2
BWT & Huffman
900 KB (BWT)
900 KB
Small BWT block; low compression ratio; weak big data support
7-zip/ p7zip
LZMA
LZSS & arithmetic
4 GB (LZSS)
1.5 GB
Separated data windows for multithreads; limited big data support
ComZip
cz
BWT & LZ77 & arithmetic
1 TB or more (LZ77)
512 GB
Need larger data window for higher compression ratio
To focus on the balance of compression ratio and speed, this paper ignores some existing methods, such as MTF, RLE, PPMd [10], and PAQ, which have too low compression ratio or speed. In this paper, the term “data window” refers to different data structures in each compression algorithm. If software combines multiple data windows, we take its bottleneck window as the focus. Table 2 shows the data windows in typical compression algorithms.
Block-Split Array Coding Algorithm for Long-Stream Data Compression
83
Table 2. Data windows in different algorithms. Algorithm
Data window
Example of window size
LZ77
Sliding window
32 KB (gzip)
LZSS
Sliding window binary tree
1GB (p7zip)
LZ78/LZW
Dictionary trie
4 KB (GIF)
BWT
BWT block
900 KB (bzip2)
Huffman
Statistic table
64 KB (order-1 context) 16 MB (order-2 context)
Arithmetic PPMd
To enlarge the data window and improve performance, researchers try various ways. Each compression class has a different algorithm optimization. As mentioned above, ComZip also keeps up with the optimal algorithms of the 3 classes, but each time we cannot focus on too many points. In [6], we have compared the time complexities of traditional BWT, BWT with SAIS [11], BWT with GSACA [12], and CZ-BWT, which are current BWT studies. The study of compression with AI (artificial intelligence) is a hopeful direction. Current researches mostly surround some special fields, such as image compression by neural networks [13–15]. Yet, it is still a problem for AI to achieve the general-field lossless compression efficiently, especially in the sensors with limited computing resources. This paper discusses the general compression, and some algorithms are more practical than AI. We may continue the study of AI algorithms as our future work. The performance of compression is important for sensors, so the studies of parallel computing and hardware acceleration such as ASIC (applicationspecific integrated circuit) and FPGA (field-programmable gate array) [16– 18] are valuable. A hotspot of researches is the GPU (graphics processing unit) acceleration [19–21]. But as we mentioned in [4], the problem is the parallel threads split the data window into smaller slices and then reduce the compression ratio. Exactly, this paper cares about the data window size and scale. In [6], we have considered the concision of CZ-BWT for hardware design, and this paper also considered the hardware implementation of CZArray. To enlarge the data window scale, CZ-Array follows the proposition
84
Information and Coding Theory in Computer Science
of RAID (redundant arrays of independent disks) [22], which was previously used for the storage capacity, the performance, and the fault tolerance. To improve the compression speed, CZ-Array draws the reference from lz4 [17], one of the current fastest LZ algorithms based on LZ77. Both RAID and lz4 are concise, but RAID was not used for the compression before, and lz4 has small data windows and a low compression ratio. We are arranging our work in data coding research. Table 3 shows the relationship within our work. We use the same platform: the ComZip software system. ComZip is a complex platform with unrevealed parts for the research of various coding problems, and it is still developing. To make the paper description clear, each time we can only focus on a few points. So each paper shows different details, and we call them Focus A, B, C, and D. Table 3. Relationship of data coding research on ComZip. Paper
Similar base
Different focus
[3]
ComZip platform (shown framework)
A. Compression format for multi-level coding (including its framework)
[4]
ComZip platform (shown parallel pipeline)
B. Combined compression & encryption coding (including its parallel pipeline)
[6]
ComZip platform (shown BWT filter)
C. Fast BWT coding (including its matching link builder); hardware design
This paper
ComZip platform (shown CZ-Array units)
D. Array coding (including block array builder & matching link builder); hardware design (comparing the similar structure with [6])
Figure 2 shows these focuses within the platform ComZip. We can see that the figures in each paper have some different appearances of ComZip because different problems are considered. Typically, this paper focuses on array coding, so the green arrows in Figure 2 points to the ComZip details which are invisible in the figures of [3, 4, 6]. Besides, this paper compares the similar structure in Focus C and D: the Matching Link Builder (MLB). Despite these MLBs in CZ-BWT [6] and CZ-Array have different functions, we cover them with a similar structure, which can simplify the hardware design and save the cost.
Block-Split Array Coding Algorithm for Long-Stream Data Compression
85
Figure 2. Focuses on research of ComZip.
CZ-ARRAY CODING Concepts of CZ-Array The compression software ComZip uses the parallel pipeline named “CZ pipeline.” We have introduced the framework of the CZ encoding pipeline in [4, 6], and the reverse framework is the CZ decoding pipeline. Figure 3 is the same encoding framework, and the difference is that CZ-Array units are embedded, including
Information and Coding Theory in Computer Science
86
•
•
The BAB (block array builder), which can mix multiple data streams into a united array stream and enlarge the data window scale The MLB (matching link builder), which can improve the speed of LZ77 and BWT encoding
Figure 3. The framework of CZ encoding pipeline with CZ-Array.
CZ-Array combines the following methods to cover a larger data window scale and speed up the compression: •
CZ-Array uses the BAB to mix multiple data streams into a united array stream so that the multiple streams can be seen in the same data window
Block-Split Array Coding Algorithm for Long-Stream Data Compression
87
As shown in Figure 4, no matter how long a data stream is, the compression unit can only see the part in the data window. A traditional compression algorithm in Figure 4(a) has to treat the streams serially, which means the streams in the queue are out of the window scale and invisible. A parallel compression algorithm in Figure 4(b) can treat multiple streams to accelerate, but the window has to be divided into parts, and different parts cannot see each other; thus, the window scale is shrunk. The window scale in Figure 4(c) is not shrunk while CZ-Array treats multiple streams in the same window. This case implies a better compression ratio because similar parts of different streams may be seen and matched by chance. Such a chance will not appear in Figure 4(b). Mixing multiple streams into a united stream by BAB is fast. We got the hint from RAID [22] and implemented the BAB in the field of lossless compression. In BAB, streams are split into blocks, and the blocks are rearranged in a different sequence and then transmitted into the data window as a new stream. •
CZ-Array uses the MLB to make a BWT or LZ77 encoding pipeline so that the encoding speed can be optimized The MLB can create matching links before BWT or LZ77 encoding. As shown in Figure 5, the matching links are helpful for the fast location without byte-searching in the data stream. Figure 5(a) shows the basic matching links, which are used in CZ-BWT. In [6], we call them “bucket sorting links.” Figure 5(b) shows the endless matching links, which are used in LZ77 shown in Figure 5(c). From Figure 5(c), we can see that the reversed string “ZYXCBA…” is the current target to be encoded. k is the basic match length. The LZ77 encoder may follow the matching link to find the maximum match length and its location, which is much faster than simply searching the data window. In the rest of this section, we will introduce the algorithms of making and following the matching links. •
Both BAB and MLB are concise for the hardware design so that the sensors can gain better compression performance by the acceleration of FPGA/ASIC CZ-Array algorithm is made up of several subalgorithms, such as the block array building algorithm and the matching link building algorithm. These 2 algorithms are concise and fit for the hardware design. We will provide their primary hardware design in Section 4.
88
Information and Coding Theory in Computer Science
Figure 4. Data window scale for compression.
Figure 5. Matching link for compression (k = 3).
BAB Coding The BAB benefits from the hint of RAID, although they are different in detail. Figure 6 shows the simplest scenes of RAID-0 and BAB coding. In
Block-Split Array Coding Algorithm for Long-Stream Data Compression
89
Figure 6(a), we suppose the serial file is written to the empty RAID-0. This file data stream is split into blocks and then written into different disks in the RAID. In Figure 6(b), the blocks are read from these disks and then rearranged into a serial file.
Figure 6. RAID-0 and BAB coding (n = 4).
In Figure 6(c), we assume all data streams are endless or have the same length. They are split into blocks, organized as a block array, and then arranged into a united array stream in the BAB. In Figure 6(d), we see the reversed process: a united array stream is restored into multiple data streams. As RAID-5 can use n + 1 blocks (n data blocks and 1 parity-check block) to obtain redundant reliability, the block array can also use m + 1 blocks (m data blocks and 1 error-correction-code block) for the information security. m and n are different. For example, m = 1000 and n = 4. If the compressed data stream is changed, the error-correction-code block will be found unfit
Information and Coding Theory in Computer Science
90
for the data blocks. But this paper just cares about the compression ratio and speed, so we focus on the simplest block array, like RAID-0. In Figure 6, the RAID and BAB coding algorithms can easily determine the proper order of data blocks because the cases are simple and the block sizes are the same. But in practice, the data streams are more complex, such as the multiple streams with different lengths, the sudden ending stream, and the newly starting stream. Thus, the BAB coding algorithm has to treat these situations. In the BAB encoding process, we define n as the row amount of the data block array; B as the size of a block; A as the amount of original data streams (to be encoded); Stream[i] as the data stream with ID i (i = 0, 1, ⋯, A − 1); Stream[i].length is the length of Stream[i] (i = 0, 1, ⋯, A − 1).
Figure 6 shows the simplest situation n = A. But in practice, the data window size is limited, so this n cannot be too large. Otherwise, the data in the window may be overdispersed and decrease the compression ratio. Hence, n < A is usual, and each stream (to be encoded/decoded) has 2 status: active and inactive. A stream under the BAB coding (especially in the block array rows) is active, while a stream in the queue (outside of the rows) is inactive. As shown in Figure 7, we divide the situations into 2 cases to simplify the algorithm design: Case 1: Stream[i].length and A are already known before encoding. For example, ComZip generates an XML (Extensible Markup Language) document at the beginning of the compression. Each file is regarded as a data stream, and the XML document stores the files’ catalog As shown in Figure 7(a), this case indicates all the streams are ready and none is absent during the encoding process. So the algorithm can use a fixed amount n to encode the streams. When an active stream ends, a new inactive stream in the queue will connect this end. It is a seamless connection in the same block. And in the decompression process, streams can be separated by Stream[i].length. •
Case 2: Stream[i].length and A are unknown before encoding. For example, the sensors send video streams occasionally. If a sensor stops the current stream and then continues, we regard the continued stream as a new one As shown in Figure 7(b), this case indicates the real-time stream amount A is dynamic, and each length is unknown until Stream[i] ends. So the •
Block-Split Array Coding Algorithm for Long-Stream Data Compression
91
algorithm has a maximum amount Nx and maintains the dynamic amount n. When an active stream ends, n − 1. When n < Nx, the algorithm will try to get a new inactive stream in the queue to add to the empty array row. If it succeeds, n + 1.
Figure 7. BAB encoding algorithm design (n = 4).
The different cases in Figure 7 lead to different block structures. A block in Figure 7(a) contains pure data, while a block in Figure 7(b) has the structure shown in Table 4, which has a block header before the payload data. We design this block header to provide information for the BAB decoding algorithm so that it can separate the streams properly. Table 4. Block structure for Case 2. Type Block header
Content Structure version
Payload
Payload length (B) Stream ending tag Reserved tags Payload data
Length 2 B 3 B 1 b 7 b Fixed length (e.g., 1 MB)
92
Information and Coding Theory in Computer Science
To focus on the accurate problems around compression ratio and speed, this paper skips the implement for Case 2 and discusses the algorithms for Case 1 only. Algorithms 1 and 2 show the BAB encoding and decoding for case 1.
Algorithm 1. BAB Encoding for Case 1.
Algorithm 2. BAB Decoding for Case 1.
Block-Split Array Coding Algorithm for Long-Stream Data Compression
93
To make the algorithm description clear, the implementing details are omitted. For example, since each stream length are known in Case 1, Algorithms 1 and 2 simply take the length control for granted: If the current stream ends, the input/output procedure will get the actual data length only, even if the fixed block length B is wanted.
MLB and LZ77 Encoding The MLBs can accelerate BWT and LZ77 encoding. We have introduced CZ-BWT encoding in [6], which actually includes the MLB algorithm in its phase 1. This paper just focuses on the other MLB algorithm and LZ77 encoding with matching links. As shown in Figure 5(b), the MLB can use a bucket array to make endless matching links, such as the “ACO” link and “XYZ” link, and then provide them to the LZ77 encoder. Figure 5(c) shows the example of string matching in the LZ77 encoder. Following the “XYZ” link, we can see 2 matching points: 4 bytes matching and 5 bytes matching. More details about these algorithms are shown in Figures 8(a) and 8(b). There are 2 phases in our LZ77 encoding: (Figure 8(a)) building the matching links in the MLB and (Figure 8(b)) each time following a matching link to find the maximum matching length in LZ77 encoder. So, the outputting length/index code-words come from these maximum matching lengths and the corresponding matching point positions. When a matching point cannot be found, a single character code-word is outputted.
Figure 8. LZ77 string match with matching link (k = 3).
94
Information and Coding Theory in Computer Science
In Figure 8(a), the data stream and its matching links can be endless. But in practice, the data window is limited by the RAM of the sensor. While the data stream passes through, the data window slides in the stream buffer, and the parts of data out of the buffer will lose information so that we cannot follow the matching links beyond the data window. Cutting the links is slow, but we may simply distinguish whether the link pointer has exceeded the window bound. To locate the data window sliding on the stream buffer, we define the “straight” string s in the window as follows: (2) where buf[0...M − 1] is the stream buffer (M = 2N), pos is the current position of the data window on the stream buffer, s[i...i + 2] is a 24b integer (i = 0, 1, ⋯, N − 1,), and buf[i...i + 2] is also a 24b integer (i = 0, 1, ⋯, M − 3).
In [6], we have built the matching links for CZ-BWT, which are also called “bucket sorting links.” Now, we build the matching links for LZ77 on buf. Figure 8 shows the example of the “XYZ” link, and we see 3 matching points in it. The bucket array has 2563 link headers, and all links are endless. We define the structure of the links and their headers as follows:
(3)
(4)
k is the fixed match length of each matching point in the links. In the example of Figure 5, k = 3 for “ACO” and “XYZ”. i is the plain position in Equations (3) and (4). We can see link[i] stores the distance (i.e., relative position) of the next matching point. Be aware that if the next matching point is out of the window bound, link[i] stores a useless value. The algorithm can distinguish this condition: Let the value null = M, if i -link[i] Dmax, where Dmax = D(fo,ha), (Dmax is a function of the carrier freuquency fo and antenna height ha). Considering fo = 900MHz, and BS height, ha > 25m, Dmax ≈ 1000m (see [15]). Assuming a cell of radius D ≈ 500m, and by referring to [15], we find that a = 2 for cells 1, 2 and 6 whereas a = 1 for cells 3, 4 and 5. Thus, in the simulations we ignore the interference from cells 1, 2 and 6 and only consider inter-cell interference from cells 3, 4 and 5 with a little loss in accuracy. With the model introduced in (20), the received STBC/MC-CDMA signal corresponds to
Beam Pattern Scanning (BPS) versus Space-Time Block Coding ...
167
(23) where bc,k[i] and bc,k[i+1], i ∈ {0,2,4,…} is the kth user ith information bit in the cth cell for STBC,
and
are the Rayleigh fade amplitude due to
antenna 0 and antenna 1 in the nth sub-carrier in the cth cell and
and
are their phase, respectively, is the HadamardWalsh spreading code for th th n k user and n subcarrier, ψc is the long code of the nth sub-carrier in the cth cell, 1/(Rc)a characterizes the long-term path loss and n(t) is an additive white Gaussian noise (AWGN). Figure 8(a) represents network capacity simulation results generated considering MRC across time components in BPS and across space components in STBC (see [3] and [4]) and EGC across frequency components in both BPS and STBC. It is observed that a higher network capacity is achievable with BPS/MC-CDMA. For example, at the probability-of-error of 10-2 BPS/MC-CDMA offers up to two-fold higher capacity. It is also observed that STBC/MC-CDMA offers a better performance compared to the traditional MC-CDMA without diversity when the number of users in the cell are less than 80. However, as the number of users in the cell increases beyond 80, the performance of STBC/MC-CDMA becomes even worse than traditional MC-CDMA (i.e., MC-CDMA with antenna array but without diversity benefits). This is because STBC scheme discussed in this paper (see [1]) is designed to utilize MRC. It has been shown that MRC combining scheme is the optimal combining scheme when there is only one user available, while in a Multiple Access environment, MRC enhances the Multiple Access Interference (MAI) and therefore degrades the performance of the system [16].
168
Information and Coding Theory in Computer Science
Figure 8. Capacity performance (a) STBC and BPS, and (b) STTC and BPS.
Considering STTC-QPSK, with assumption (d), STTC-QPSK/MCCDMA received signal corresponds to
Beam Pattern Scanning (BPS) versus Space-Time Block Coding ...
169
(24)
Here s0,k and s1,k is the k user information bit transmitted from antenna th
0 and antenna 1, respectively,
and
are the Rayleigh fade amplitude
due to antenna 0 and antenna 1 in the nth sub-carrier and
and
are their
phase, respectively, is the Hadamard-Walsh spreading code for kth user and nth sub-carrier, and n(t) is an additive white Gaussian noise (AWGN). Network capacity simulations for STTC-QPSK are generated assuming EGC across time components (in BPS), space components (in STTC-QPSK) and frequency components for BPS and STTC-QPSK [Figure 8(b)]. Figure 8(b) represents STTC versus BPS-QPSK simulation results. This figure shows that BPS-QPSK is superior compared to STTC-QPSK and QPSK without diversity. In this simulation, BPS-QPSK leads to significantly better capacity due to the time diversity induced by beam-pattern movement and frequency diversity inherent in MC-CDMA. The results also show that QPSK performance is superior compared to STTC-QPSK. This agrees with the FER simulation results in Figure 5(b), where QPSK is better than STTCQPSK at low SNRs (e.g., at SNR = 10 dB). This is because STTC-QPSK is designed under the assumption of high enough SNR values; thus, it is less efficient compared to QPSK at low SNRs [17]. (The capacity curve for higher SNR values may lead to better STTC-QPSK performance compared to QPSK; however, STTC-QPSK shows a lower performance compared to BPS-QPSK for all SNRs). Thus, it is observed that a higher network capacity is achievable via BPS/MC-CDMA. It is also worth mentioning that STTC-QPSK performance can be significantly improved via interference suppression/cancellation techniques at the cost of system complexity as discussed in [19–21]. In this paper, we conducted the comparison without a complexity added to the STTC scheme via implementing interference suppression algorithms. Simulations confirm that BPS offers superior network capacity compared to STC schemes; however, there are two issues associated with BPS scheme: 1) diversity achievable via BPS changes with distance; greater the distance of mobile from the BS, higher the diversity and network capacity [10]. It
170
Information and Coding Theory in Computer Science
is notable that in general, the average number of users located in constant width annuluses (with BS at the center) increases as the distance from the BS increases; and 2) BPS works just in urban areas (or in rich scattering environments); but, because a high network capacity is only required in urban areas, this is not a critical issue. Moreover, BPS can also be merged with STC techniques, e.g., via the structure shown in Figure 4. In this case, the traditional antenna arrays are replaced with time varying weight vector antenna arrays to direct and move the antenna pattern. Another approach for merging BPS with STBC is introduced in [18]. Nevertheless, it is worth mentioning that BPS scheme achieve the probability-of-error performance and the network capacity benefits with a relatively less complexity. This makes BPS a prominent scheme for future wireless generations with smart antennas. However, the spectrum efficiency of BPS is about 5% less than STC which is a minimal disadvantage compared to the benefits created by BPS technique.
CONCLUSIONS A comparison was preformed between STBC, STTCQPSK and BPS transmit diversity techniques in terms of network capacity, BER/FER performance, spectrum efficiency, complexity and antenna dimensions. BER performance and network capacity simulations are generated BPS, STBC, and STTC schemes. This comparison shows that BPS transmit diversity scheme is much superior compared to both STBC and STTC-QPSK schemes: a) The BS physical antenna dimensions of BPS is much smaller than that of STC techniques, and b) The BER/FER performance and network capacity of BPS is much higher than that of STC schemes. The complexity of BPS system is minimal because the complexity is mainly located at the BS, and the receiver complexity is low because all the diversity components enter the receiver serially in time. In terms of spectrum efficiency, both STC schemes outperform BPS scheme by a very small percentage (e.g., in the order of 5%). BPS scheme introduces a small bandwidth expansion due to the movement in the beam pattern that eventually results in a lower throughput per bandwidth.
Beam Pattern Scanning (BPS) versus Space-Time Block Coding ...
171
REFERENCES 1.
S. M. Alamouti, “A simple transmit diversity technique for wireless communications,” IEEE Journal on Selected areas in Communications, Vol. 16, No. 8, pp. 1451–1458, 1998. 2. V. Tarokh, H. Jafarkhani, and A. R. Calderbank, “Space-time block codes from orthogonal designs,” IEEE Transactions on Information Theory, Vol. 45, No. 5, pp. 1456–1467, July 1999. 3. V. Tarokh, N. Seshadri, and A. R. Calderbank, “Space-time codes for high data rate wireless communication: Performance criterion and code construction,” IEEE Transactions on Information Theory, Vol. 44, pp. 744–765, March 1998. 4. V. Tarokh, A. F. Naguib, N. Seshadri, and A. Calderbank, “Spacetime codes for high data rate wireless communications: Performance criteria in the presence of channel estimation errors, mobility, and multiple paths,” IEEE Transactions on Communications, Vol. 47, No. 2, February 1999. 5. A. F. Naguib, V. Tarokh, N. Seshadri, and A. R Calderbank, “A spacetime coding modem for high-data-rate wireless communications,” IEEE Journal on Selected Areas in Communications, Vol. 16, No. 8, October 1998. 6. N. Seshadri and J. H. Winters, “Two signaling schemes for improving the error performance of frequency division-duplex transmission system using transmitter antenna diversity,” International Journal Wireless Information Networks, Vol. 1, No. 1, pp. 49–60, January 1994. 7. J. H. Winters, “The diversity gain of transmit diversity in wireless systems with Rayleigh fading,” in Proceedings of the 1994 ICC/ SUPERCOMM, New Orleans, Vol. 2, pp. 1121–1125, May 1994. 8. R. W. Heath, S. Sandhu, and A. J. Paulraj, “Space-time block coding versus space-time trellis codes,” Proceedings of IEEE International Conference on Communications, Helsinki, Finland, June 11–14, 2001. 9. V. Tarokh, A. Naguib, N. Seshadri, and A. R. Calderbank, “Combined array processing and space-time coding,” IEEE Transactions on Information Theory, Vol. 45, No. 4, pp. 1121–1128, May 1999. 10. S. A. Zekavat and C. R. Nassar, “Antenna arrays with oscillating beam patterns: Characterization of transmit diversity using semielliptic coverage geometric-based stochastic channel modeling,” IEEE
172
11.
12.
13.
14. 15.
16.
17.
18.
19.
20.
Information and Coding Theory in Computer Science
Transactions on Communications, Vol. 50, No. 10, pp. 1549–1556, October 2002. S. A. Zekavat, C. R. Nassar, and S. Shattil, “Oscillating beam adaptive antennas and multi-carrier systems: Achieving transmit diversity, frequency diversity and directionality,” IEEE Transactions on Vehicular Technology, Vol. 51, No. 5, pp. 1030 –1039, September 2002. S. A. Zekavat and C. R. Nassar, “Achieving high capacity wireless by merging multi-carrier CDMA systems and oscillating-beam smart antenna arrays,” IEEE Transactions on Vehicular Technology, Vol. 52, No. 4, pp. 772– 778, July 2003. P. K. Teh and S. A. Zekavat, “A merger of OFDM and antenna array beam pattern scanning (BPS): Achieving directionality and transmit diversity,” accepted in IEEE 37th Asilomar Conference on Signals, Systems and Computers, November 9–12, 2003. J. W. C. Jakes, Microwave Mobile Communications, New York, Wiley, 1974. A. J. Rustako, N. Amitay, G. J. Owens, and R. S. Roman, “Radio propagation at microwave frequencies for line-ofsight microcellular mobile and personal communications,” IEEE Transactions on Vehicular Technology, Vol. 40, No. 1, pp. 203–210, February 1991. J. M. Auffray and J. F. Helard “Performance of multicarrier CDMA technique combined with space-time block coding over Rayleigh channel,” IEEE 7th International Symposium on Spread-Spectrum Technology, Vol. 2, pp. 348–352, September 2–5, 2002. A. G. Amat, M. Navarro, and A. Tarable, “Space-time trellis codes for correlated channels,” IEEE International Symposium on Signal Processing and Information Technology, Darmstadt, Germany, December 14–17, 2003. S. A. Zekavat and P. K. Teh, “Beam-pattern-scanning dynamic-time block coding,” Proceedings of Wireless Networking Symposium, The University of Texas at Austin, October 22–24, 2003. B. Lu and X. D. Wang, “Iterative receivers for multiuser space-time coding systems,” IEEE Journal on Selected Areas in Communications, Vol. 18. No. 11, pp. 2322– 2335, November 2000. E. Biglieri, A. Nordio, and G. Taricco, “Suboptimum receiver interfaces and space-time codes,” IEEE Transactions on Signal Processing, Vol. 51, No. 11, pp. 2720– 2728, November 2003.
Beam Pattern Scanning (BPS) versus Space-Time Block Coding ...
173
21. H. B. Li and J. Li, “Differential and coherent decorrelating multiuser receivers for space-time-coded CDMA systems,” IEEE Transactions on Signal Processing, Vol. 50, No. 10, pp. 2529–2537, October 2002.
Chapter
PARTIAL FEEDBACK BASED ORTHOGONAL SPACE-TIME BLOCK CODING WITH FLEXIBLE FEEDBACK BITS
8
Lei Wang and Zhigang Chen School of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an, China.
ABSTRACT The conventional orthogonal space-time block code (OSTBC) with limited feedback has fixed p-1 feedback bits for the specific nTp transmit antennas. A new partial feedback based OSTBC which provides flexible feedback bits is proposed in this paper. The proposed scheme inherits the properties of having a simple decoder and the full diversity of OSTBC, moreover, preserves full data rate. Simulation results show that for nTp transmit antennas, the proposed scheme has the similar performance with the conventional one Citation: Wang, L. and Chen, Z. (2013), “Partial Feedback Based Orthogonal SpaceTime Block Coding With Flexible Feedback Bits”. Communications and Network, 5, 127-131. doi: 10.4236/cn.2013.53B2024. Copyright: © 2013 by authors and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0
176
Information and Coding Theory in Computer Science
by using p-1 feedback bits, whereas has the better performance with more feedback bits. Keywords: MIMO, Transmit Diversity, Space-time Block Coding, Partial Feedback
INTRODUCTION Orthogonal space-time block coding (OSTBC) is a simple and effective transmission paradigm for MIMO system, due to achieving full diversity with low complexity [1]. One of the most effective OSTBC schemes is the Alamouti code [2] for two transmit antennas, which has been adopted as the open-loop transmit diversity scheme by current 3GPP standards. However, the Alamouti code is the only rate-one OSTBC scheme [3]. With higher number of transmit antennas, the OSTBC for complex constellations will suffer the rate loss. Focusing on this drawback, the open-loop solutions have been presented, such as the quasi-OSTBC (QOSTBC) [4] with rate one for four transmit antennas, and other STBC schemes [5,6] with full rate and full diversity. Alternatively, the close-loop solutions have also been designed to improve the performance of OSTBC by exploiting limited channel information feedback at the transmitter. In this paper, we focus on the close-loop scheme. Based the group-coherent code, the nTp bits feedback based OSTBC for p-1 transmit antennas has been constructed in [7], and generalized to an arbitrary number of receive antennas in [8]. The partial feedback based schemes in [7,8] exhibit a higher diversity order while preserving low decoding complexity. However, these schemes for nTp transmit antennas require a fixed number of p−1 bits feedback. That is to say, for such scheme, improving the performance by increasing the feedback bits implies that the number of transmit antennas nTp must be increased at the same time. Therefore, the scheme is inflexible in compromising the performance and the feedback overhead. In this paper, by multiplying a well-designed feedback vector to each signal to be transmitted from each antenna, we propose a novel partial feedback based OSTBC scheme with flexible feedback bits. In this scheme, the OSTBC can be straightly extended to more than two antennas. Importantly, we can show that the proposed scheme preserves the simple decoding structure of OSTBC, full diversity and full data rate.
Partial Feedback Based Orthogonal Space-Time Block Coding With ...
177
Notations: Throughout this paper, (⋅) T and (⋅)H represent “transpose” and “Hermition”, respectively. Re(a) denotes the real part of a complex a, and .
PROPOSED CODE CONSTRUCTION AND SYSTEM MODEL Consider a MIMO system with nTp transmit and nR receive antennas. Assuming we have an OSTBC for nT transmit antennas, and can be denoted as where cm is the T ×1 signal to be transmitted from the mth antenna for . Then a code to be transmitted from nTp antennas, where p ≥ 2 is an integer, may be constructed as (1) where as
is the 1 × nTp feedback vector for the mth antenna, which is defined where ⊗ denotes the Kronecker product, is the mth
row of the identity matrix
, and 1 × p vector bl is given by
(2) where . For the feedback vector at the mth antenna, it contains a subset of all possible Qp−1 feedback vectors i.e., . With the transmission of T × nTp code matrix can be written as
, the T × nR receive signal
(3) where is the nTp × nR channel matrix, and is the T × nR complex Gaussian noise matrix. The entries of H and N are independent samples of a zero-mean complex Gaussian random variable with variance 1 and nTp/ρ respectively, where ρ is the average signal-tonoise ratio (SNR) at each receive antenna.
Information and Coding Theory in Computer Science
178
LINEAR DECODER AT THE RECEIVER The received signal at ith receive antenna can be rewritten as (4) where the nT × nTp matrix bl is composed of nT feedback vectors, and can be expressed in a stacked form given by
We divide nTp × 1 channel vector hi into nT segments in the following way
where each segment can be denoted as p × 1. Then the equivalent channel vector
(5) with dimension
in (4) has the form of
(6) For convenience, we will use the Alamouti code as the basic OSTBC matrix in the rest of this paper, and the results can be straightly extended to other OSTBC. For the received signal in (4), After performing the conjugate operation to the second entry of yi, the received signal yi can be equivalently expressed as (7) where
is the equivalent channel matrix corresponding to the entries of
and their conjugates, and mouti code. Denote the kth entry of
has a pair of symbols in the Alaand according to the
linearity of the OSTBC [9], the equivalent channel matrix of
has the form
Partial Feedback Based Orthogonal Space-Time Block Coding With ...
179
(8) where the matrices Ck and Dk specifying the Alamouti code are defined in [9]. Since matched filtering is the first step in the detection process, leftmultiplying
by
will yield (9)
where code, we get
. Due to the properties of Ck and Dk for the Alamouti
(10) where denotes the equivalent channel gain for receive antenna i. It is clear that is a diagonal matrix, therefore, the simple decoder of OSTBC can be straightly applied for (7), thus s1 and s2 can be decoded independently.
FEEDBACK BITS SELECTION AND PROPERTIES In this section, we will discuss the feedback bits selection criterion and the key properties of the proposed scheme.
Feedback Bits Selection At the ith receive antenna, form
can be expressed in the following quadratic (11)
where
and
180
Information and Coding Theory in Computer Science
For all the nR receive antennas, then the total channel gain is given by (12) It is clear that in order to improve the system performance, we must feedback the specific l with (p−1)logQ bits, which provides the largest lγl. Denote the (m, n) entry of Al as
, thus
, and
, where b0 = 0 is preset. Moreover, it is easy to verify that
. Then the quadratic form in (11) can be represented as
(13) where gik(n) denotes the nth element in gik, and
Substituting (13) in (11) and
leads to
(14)
Thus, the (p−1)logQ feedback bits will be selected as (15) In this way, we can choose the optimal feedback vector bl, further construct
for the mth transmit antenna.
Diversity Analysis The key property of the proposed partial feedback based OSTBC scheme is proved in the following.
Partial Feedback Based Orthogonal Space-Time Block Coding With ...
Property 1: The partial feedback based OSTBC achieve full diversity.
181
in (1) can
Proof: For simplicity, we denote L = Qp−1. Selecting the optimal lopt will provide the largest channel gain bounded by
which can be lower
(16) For the summed matrix it is clear that its diagonal elements equal to L, and its non-diagonal elements have the form of
(17) Let
since
is reduced to
(18) Therefore, we can obtain (16) and yields
which can be substituted into
(19)
Since the lower bound of the channel gain provides full diversity of nTpnR the proposed scheme can certainly guarantee the full diversity.
182
Information and Coding Theory in Computer Science
Configuration of Flexible Feedback Bits Furthermore, the proposed scheme
has the flexible feedback bits. For
a specific p, has the feedback bits of (p−1)logQ. However, for the number that not equal to (p−1)logQ, we can rewrite the vector bl in (2) as thus the number of feedback bits is . For example, for nT = 2 and p = 4, the number of feedback bits are 3 and 6 in the case of Q = 2, and Q = 4, respectively. If we set Q1 = Q2 = 2, and Q3 = 4 in bl, then the number of feedback bits is 4, and if we set Q1 = 2, and Q2 = Q3 = 4 in bl, then the number of feedback bits is 5, and so on.
BER Analysis Assuming the power of each symbol in x = [ s1 i.e., the form of
s2]T is normalized to unity,
for i = 1, 2, we can obtain the average SNR per bit has
Furthermore, assuming QPSK modulation and maximum likelihood (ML) decoding are used in the considered system, the conditional BER is given by (20) as
By using (16), the upper bound of the conditional BER can be formulated
(21) Using the technique of Moment Generating Function (MGF)[10], the average BER can be expressed as (22)
Partial Feedback Based Orthogonal Space-Time Block Coding With ...
where and BER can be further expressed as
183
is the MGF of η. The average
(23) Using the result of (5A.4) in [10], this definite integrals has the closedform of (24) where
SIMULATION RESULTS In all simulations, we consider QPSK symbols in Alamouti code, and a single receive antenna with nR = 1, where the channels are assumed to be independent and identically distributed (i.i.d.) quasi-static Rayleigh flatfading channels. In Figure 1, we plot the bit error rate (BER) performance of the generalized partial feedback based OSTBC scheme in [7,8] (“GPF” for short ) and the proposed flexible feedback bits scheme (“FFB” for short) with nTp = 4 transmit antennas. For this case p = 2, and the GPF scheme can only feedback 1 bit, whereas the proposed scheme can feedback more bits to improve the system performance. For comparison, in Figure 1 we also give the BER figures of the complex orthogonal code for four transmit antennas [11], and the numerical results of the upper bound in (24) of the proposed scheme. Figure 1 shows that with 1 bit feedback, the GPF and FFB schemes have close performance, whereas the FFB scheme has better performance with more feedback bits. In comparison to the complex orthogonal code, both two schemes have better performance.
184
Information and Coding Theory in Computer Science
Figure 1. BER performance of the two schemes with nTp = 4 transmit antennas.
In Figure 2, the BER performance of the two schemes with nTp = 8 transmit antennas is depicted. For this case p = 4, and the GPF scheme can only feedback 3 bits, whereas the proposed FFB scheme can feedback more bits. We can observe that with the same feedback bits 3, the two schemes have very similar performance, and with more feedback bits, the proposed FFB scheme can further improve the performance. In the simulations of these two schemes, the exhaustive search over all possible feedback vectors is used.
Figure 2. BER performance of the two schemes with nTp = 8 transmit antennas.
Partial Feedback Based Orthogonal Space-Time Block Coding With ...
185
CONCLUSIONS In this paper, we proposed a partial feedback based OSTBC scheme with flexible feedback bits. The new scheme inherits the OSTBC properties of achieving full diversity, preserving low decoding complexity, and has full rate. Moreover, compared with the conventional partial feedback based OSTBC schemes, the new scheme can support flexible feedback bits and can improve the system performance with more feedback bits.
186
Information and Coding Theory in Computer Science
REFERENCES 1.
V. Tarokh, H. Jafarkhani and A. R. Calderbank, “Space-time Block Codes from Orthogonal Designs,” IEEE Transactions on Information Theory, Vol. 45, No. 5, 1999, pp. 1456-1467. doi:10.1109/18.771146 2. S. M. Alamouti, “A Simple Transmitter Diversity Scheme for Wireless Communications,” IEEE Journal on Selected Areas in Communications, Vol. 16, No. 8, 1998, pp. 1451-1458. doi:10.1109/49.730453 3. S. Sandhu and A. J. Paulraj, “Space-time Block Codes: A Capacity Perspective,” IEEE, Communications Letters, Vol. 4, No. 12, 2000, pp. 384-386. doi:10.1109/4234.898716 4. H. Jafarkhani, “A Quasi-orthogonal Space-time Block Code,” IEEE Transactions on Communications, Vol. 49, No. 1, 2001, pp. 1-4. doi:10.1109/26.898239 5. W. Su and X. G. Xia, “Signal Constellations for Quasi- orthogonal Space-time Block Codes with Full Diversity,” IEEE Transactions on Information Theory, Vol. 50, 2004, pp. 2331-2347. doi:10.1109/ TIT.2004.834740 6. X. L. Ma and G. B. Giannakis, “Full-diversity Full-rate Complex-field Space-time Coding,” IEEE Transactions on Signal Processing, Vol. 51, No. 11, 2003, pp. 2917-2930. doi:10.1109/TSP.2003.818206 7. J. Akhtar and D. Gesbert, “Extending Orthogonal Block Codes with Partial Feedback,” IEEE Transactions on Wireless Communications, Vol. 3, No. 6, 2004, pp. 1959-1962. doi:10.1109/TWC.2004.837469 8. A. Sezgin, G. Altay and A. Paulraj, “Generalized Partial Feedback Based Orthogonal Space-time Block Coding,” IEEE Transactions on Wireless Communications, Vol. 8, No. 6, 2009, pp. 2771-2775. doi:10.1109/TWC.2009.080352 9. B. Hassibi and B. M. Hochwald, “High-rate Codes That Are Linear in Space and Time,” IEEE Transactions on Information Theory, Vol. 48, No. 7, 2002, pp. 1804-1824. doi:10.1109/TIT.2002.1013127 10. M. K. Simon and M. S. Alouini, “Digital Communication over Fading Channels,” John Wiley & Sons Inc., 2000. 11. G. Ganeon and P. Stoica, “Space-time Block Codes: A Maximum SNR Approach,” IEEE Transactions Informations Theory, Vol. 47, No. 4, 2001, pp. 1650-1656. doi:10.1109/18.923754
Chapter
RATELESS SPACE-TIME BLOCK CODES FOR 5G WIRELESS COMMUNICATION SYSTEMS
9
Ali Alqahtani College of Applied Engineering, King Saud University, Riyadh, Saudi Arabia
ABSTRACT This chapter presents a rateless space-time block code (RSTBC) for massive multiple-input multiple-output (MIMO) wireless communication systems. We discuss the principles of rateless coding compared to the fixed-rate channel codes. A literature review of rateless codes (RCs) is also addressed. Furthermore, the chapter illustrates the basis of RSTBC deployments in massive MIMO transmissions over lossy wireless channels. In such channels, data may be lost or are not decodable at the receiver end due to a variety of factors such as channel losses or pilot contamination. Massive Citation: Ali Alqahtani, “Rateless Space-Time Block Codes for 5G Wireless Communication Systems”, Intech Open - The Fifth Generation (5G) of Wireless Communication, 2018, DOI: 10.5772/intechopen.74561. Copyright: © 2018 the Author(s) and IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
188
Information and Coding Theory in Computer Science
MIMO is a breakthrough wireless transmission technique proposed for future wireless standards due to its spectrum and energy efficiencies. We show that RSTBC guarantees the reliability of the system in such highly lossy channels. Moreover, pilot contamination (PC) constitutes a particularly significant impairment in reciprocity-based multi-cell systems. PC results from the non-orthogonality of the pilot sequences in different cells. In this chapter, RSTBC is also employed in the downlink transmission of a multicell massive MIMO system to mitigate the effects of signal-to-interferenceand-noise ratio (SINR) degradation resulting from PC. We conclude that RSTBC can effectively mitigate such interference. Hence, RSTBC is a strong candidate for the upcoming 5G wireless communication systems. Keywords: massive MIMO, rateless codes, STBC, pilot contamination, 5G
INTRODUCTION In practice, the transmitted data over the channel are usually affected by noise, interference, and fading. Several channel models, such as additive white Gaussian noise (AWGN), binary symmetrical channel (BSC), binary erasure channel (BECH), wireless fading channel, and lossy (or erasure) channel, are introduced in which errors’ (or losses) control technique is required to reduce the errors (or losses) caused by such channel impairments [1]. This technique is called channel coding, which is a main part of the digital communication theory. Historical perspective on channel coding is given in [2]. Generally speaking, channel coding, characterized by a code rate, is designed by controlled-adding redundancy to the data to detect and/ or correct errors and, hence, achieve reliable delivery of digital data over unreliable communication channels. Error correction may generally be realized in two different error control techniques, namely: forward error correction (FEC), and backward error correction (BEC). The former omits the need for the data retransmission process, while the latter is widely known as automatic repeat request (or sometimes automatic retransmission query) (ARQ). For large size of data, a large number of errors will occur, and thereby, it is difficult for FEC to work reasonably. The ARQ technique, in such conditions, requires more retransmission processes, which will cause significant growth in power consumption. However, these processes will sustain additional overhead that includes data retransmission and adding redundancy into the
Rateless Space-Time Block Codes for 5G Wireless Communication Systems
189
original data. They cannot correctly decode the source data when the packet loss rate is high [3]. Therefore, it is of significant interest to design a simple channel coding with a flexible code rate and capacity approaching behavior to achieve robust and reliable transmission over universal lossy channels. Rateless codes constitute such a class of schemes. We describe the concept of rateless coding in the next section.
CONCEPT OF RATELESS CODES Rateless codes (RCs) are channel codes used for encoding data to generate incremental redundancy codes and then, are transmitted over channels with variable packet error rate. The interpretation of the terminology “rateless” is that the code does not fix its code rate before transmission. Rather, it can only be determined after correctly recovering the transmitted data. In the available literature, the rateless code is typically referred to by some associated terminologies such as “variable-rate,” “rate-compatible,” “adaptive-rate,” or “incremental redundancy” scheme [4]. However, the rate of a rateless code can be considered in two perspectives as the instantaneous rate and the effective rate. The instantaneous rate is the ratio of the number of information bits to the total number of bits transmitted at a specific instant. On the other hand, the effective rate is the rate realized at the specific point when the codeword has been successfully received [5]. The counterpart of rateless coding is fixed-rate coding, which is basically well known in the literature of channel codes. The relationship between rateless and fixed-rate channel codes can be seen as the correspondence between continuous and discrete signals or the construction of a video clip from video frames. In this illustrating analogy, the fixed-rate code corresponds to the discrete signal or to the video frame, while the rateless code is viewed as the continuous signal or the video clip [5]. Basically, rateless codes are proposed to solve the problem of data packet losses. They can continuously generate potentially unlimited number of data streams until an acknowledgment from the receiver is received declaring successful decoding. The basic concept of rateless codes is illustrated in Figure 1 [1]. From the figure, a total of kc packets, obtained from the fragmented source data, are encoded by the transmitter to get a large number of encoded packets nc. Due to the lossy channel, several encoded packets are lost during the transmission, and finally, only rc encoded packets are collected by the
190
Information and Coding Theory in Computer Science
receiver. The decoding process on the received packets should be able to recover all original kc packets.
Figure 1. General encoding and decoding processes of rateless codes [1].
To illustrate the importance of rateless codes, let’s assume that we have a fixed-rate code Cfixed of fixed-code rate Rfixed designed to achieve a performance close to the channel capacity target Ctarget at a specific signal-tonoise-ratio (SNR), ϕfixed. However, the channel fluctuations make the fixedrate code impose two limitations [1]. First, if the actual SNR at the receiver is actually greater than ϕfixed, then the code essentially becomes an inefficient channel code. That is because the code incorporates more redundancy than the actual channel conditions require. Second, on the other hand, if the actual SNR becomes lower than ϕfixed, then the channel will be in an outage for the reason that the fixed-rate code Cfixed no longer provides sufficient redundancy appropriate for the actual channel conditions. Contrasting with fixed-rate code, the rateless code has a flexible code rate in accordance with the channel quality, which is time varying in nature. Another benefit of RCs is that they potentially do not require the channel state information (CSI) at the transmitter. This property is of particular importance in the design of codes for wireless channels. In particular, RCs can be employed in multicell cellular systems when channel estimation errors severely degrade the performance.
Rateless Space-Time Block Codes for 5G Wireless Communication Systems
191
RATELESS CODING AND HYBRID AUTOMATIC RETRANSMISSION QUERY For further discussion, there is an analogy between hybrid ARQ (HARQ) and rateless codes, since they transmit additional symbols until the received information is successfully decoded. On the other hand, they do have some differences. HARQ refers to a special transmission mechanism, which combines the conventional ARQ principle with error control coding. The basic three ARQ protocols are stop-and-wait ARQ, go-back-N ARQ, and selective repeat ARQ. All the three ARQ protocols use the sliding window protocol to inform the transmitter on which data frames or symbols should be retransmitted. Figure 2 illustrates the ARQ schemes: stop-and-wait ARQ (half duplex), continuous ARQ with pullback (full duplex), and continuous ARQ with selective repeat (full duplex). In each of them, time is advancing from lift to right [6].
Figure 2. Automatic repeat request (ARQ) [6]. (a) Stop and wait ARQ (half duplex); (b) continuous ARQ with pullback (full duplex); (c) continuous ARQ with selective repeat (full duplex).
These protocols reside in the data link or transport layers of the open systems interconnection (OSI) model. This is one difference between the
Information and Coding Theory in Computer Science
192
proposed RSTBC and HARQ, since RSTBC is employed in the physical layer. Comparing rateless codes to HARQ, we summarize the following points: • •
•
•
•
•
RC is often viewed as a form of continuous incremental redundancy HARQ in the literature [5]. HARQ is not capable of working over the entire SNR range, and therefore, it necessitates combination with some form of adaptive modulation and coding (AMC). On the other hand, RC can entirely eliminate AMC and work over a wide range of SNR [7]. From the point of view of redundancy, HARQ has more redundancy, since it requires many acknowledgments (ACK) or negative acknowledgments (NACK) for each packet transmission return to show successful/unsuccessful decoding, respectively. In contrast, only a single-bit acknowledgment is needed for the transmission of a message with RC [8]. When the number of receivers is large, ARQ acknowledgments may cause significant delays and bandwidth consumption. Consequently, using ARQ for wireless broadcast is not scalable [9]. It was seen in [8] that RC is capable of outperforming ARQ completely at low SNRs in broadcast communication. However, they behave the same in point to point as well as in high-SNR broadcast communications. RC and the basic ARQ differ in code construction. RCs can generate different redundant blocks, while ARQ merely retransmits the same block [8]. For different receivers, distinct and independent errors are often encountered. In such cases, the merely retransmitted data packets are only useful to a specific user while they are with no value for others. Hence, it is highly undesirable to send respective erroneous data frames or symbols to each user. The physical layer RCs are useful since the decoder can exploit useful information from packets that are dropped by ARQ protocols in higher layers [7].
RATELESS CODES’ LITERATURE REVIEW In the past decade, rateless codes have gained a lot of concerns in both communication and information theory research communities, which led
Rateless Space-Time Block Codes for 5G Wireless Communication Systems
193
to the strong theory behind these codes mostly for erasure channels. Most of the available works in the rateless codes literature are extensions of the fountain codes over the erasure channels [10]. The name “fountain” came from the analogy to a water supply capable of giving an unlimited number of water drops. Due to this reason, rateless codes are also referred to as fountain codes. They were initially developed to achieve efficient transmission in erasure channels, to which the initial work on rateless codes has mainly been limited, with the primary application in multimedia video streaming [10]. The first practical class of rateless codes is the Luby Transform (LT) code which was originally intended to be used for recovering packets that are lost (erased) during transmission over computer networks. The fundamentals of LT are introduced in [11] in which the code is rateless since the number of encoded packets that can be generated from the original packets is potentially limitless. Figure 3 illustrates the block diagram of LT encoder. The original packets can be generated from slightly larger encoded packets. Although the encoding process of LT is quite simple, however, LT requires the proper design of the degree distribution (Soliton distributionbased) which significantly affects the performance of the code.
Figure 3. Block diagram of the LT encoder [5].
Afterward, LT code was extended to the well-known Raptor code [12] by appending a weak LT encoder with an outer pre-code such as the irregular low-density parity check code (LDPC). Figure 4 depicts the general block diagram of Raptor code.
Figure 4. Block diagram of the Raptor code encoder [5].
194
Information and Coding Theory in Computer Science
The decoding algorithm of the Raptor code depends on the decoder of the LT code and the pre-code used. However, the Raptor code requires lesser overhead. But it has disadvantages such as the lower bound of the total overhead depends on the outer code and the decoding algorithm implementation is slightly more complicated due to multiple decoding processes. Online codes [13] also belong to the family of fountain rateless codes and work based on two layers of packet processing (inner and outer). However, in contrast to the LT and Raptor codes, online codes have more encoding and decoding complexity as a function of the block length. The overall design of the online code is shown in Figure 5. LT and Raptor codes were originally intended to be used for transmission over the BEC channel such as Internet channel, where the transmitted packets are erased by routers along the path.
Figure 5. Online code encoding and decoding design [13].
On the other hand, some works have studied their performance on noisy channels such as BSC and AWGN channels [14]. Although it was demonstrated that the Raptor codes have better performance on a wide variety of noisy channels than LT codes, however, both schemes exhibit high error floors in such channels. The previous rateless codes have fixed-degree distribution, which causes degradation in performance when employing over noisy channels.
Rateless Space-Time Block Codes for 5G Wireless Communication Systems
195
Motivated by this result, a reconfigurable rateless code was introduced in [5] which adaptively can modify its encoding/decoding algorithms by adjusting its degree distribution incrementally according to the channel conditions. Such code is not systematic and remains fixed if no new knowledge of channel condition is obtained from feedback. By dropping this assumption, as in [15], the significant overhead reduction can be achieved while still maintaining the same encoding and decoding complexities. In addition, the effective code rate of [5] is actually determined by the decoder, not the encoder. In another perspective, the use of rateless codes in the physical layer can be beneficial since the decoder can exploit useful information even from packets that cannot be correctly decoded and therefore are ignored by higher layers [7]. A construction of physical-layer Raptor codes based on protographs was proposed in [16]. Other works of rateless coding over the AWGN channel were provided in [17, 18]. For wireless channels, rateless code paradigm was found in many works. In [19], a rateless coding scheme based on traditional Raptor code is introduced in a single-input single-output (SISO) system over fading channels. A similar approach is presented in [20] by the same authors for relay networks. The authors in [21] have considered one of the latest works of rateless coding over wireless channels. They tackle the high error floor problem arising from the low-density generator matrix (LDGM)-like encoding scheme in conventional rateless codes. While there are significant works on rateless codes for AWGN channels, few work exists on rateless codes for MIMO systems. Rateless codes for MIMO channels were introduced in [22], where two rateless code constructions were developed. The first one was based on simple layering over an AWGN channel. The second construction used a diagonal layering structure over a particular time-varying scalar channel. However, the latter is merely concatenating a rateless code (outer code) using dithered repetition with the vertical Bell Labs layered space-time (V-BLAST) code (inner code) [23]. Away from digital fountain codes, discussed so far, performance limits and code construction of block-wise rateless coding for conventional MIMO fading channels are studied in [24]. The authors have used the diversity multiplexing tradeoff (DMT) as a performance metric [25]. Also, they have demonstrated that the design principle of rateless codes follows the approximately universal codes [26] over MIMO channels. In addition,
Information and Coding Theory in Computer Science
196
simple rateless codes that are DMT optimal for a SISO channel have also been examined. However, [24] considered the whole MIMO channel as parallel sub-channels, in which each sub-channel is a MIMO channel. Furthermore, for each block, the code construction of symbols within the redundant block is not discussed. Hence, more investigation of other performance metrics for the scheme proposed in [24] under different channel scenarios is required. In [27], a cognitive radio network employs rateless coding along with queuing theory to maximize the capacity of the secondary user while achieving primary users’ delay requirement. Furthermore, [28] presents a novel framework of opportunistic beam-forming employing rateless code in multiuser MIMO downlink to provide faster and high quality of service (QoS) wireless services.
RATELESS CODES APPLICATIONS There are various applications of rateless codes: •
•
•
Video streaming over the Internet and packet-based wireless networks: The application of rateless codes to video streaming was initially proposed for multimedia broadcast multicast system (MBMS) standard of the 3GPP [29, 30]. Broadcasting has been extensively used in wireless networks to distribute information of universal attention, for example, safety warning messages, emergency information, and weather information, to a large number of users [31, 32]. Rateless coding has been utilized in the 3GPP’s Release 6 multimedia broadcast/ multicast service (MBMS) [33]. Wearable wireless networks: A wearable body area network (WBAN) is an emerging technology that is developed for wearable monitoring application. Wireless sensor networks are usually considered one of the technological foundations of ambient intelligence. Agile, low-cost, ultra-low power networks of sensors can collect a massive amount of valuable information from the surrounding environment [34, 35]. Wireless sensor network (WSN) technologies are considered one of the key research areas in computer science and the health-care application industries for improving the quality of life. A block-based scheme of rateless channel erasure coding was proposed in [36] to reduce the impact of wireless channel errors on the augmented reality (AR) video streams, while also reducing energy consumption.
Rateless Space-Time Block Codes for 5G Wireless Communication Systems
197
MOTIVATION TO RATELESS SPACE-TIME CODING According to literature survey, there is not enough research work yet on rateless space-time codes (STCs), even for the regular MIMO systems. Few works in rateless STCs are available such as [37, 38]. In [37], a rateless coding scheme was introduced for the AWGN channel, using layering, repetition and random dithering. The authors also extended their work to multiple-input single-output (MISO) Gaussian channels where the MISO channel is converted to parallel AWGN channels. In [38], the performance of MIMO radio link is improved by a rate-varying STC under a high-mobility communication system. Rateless coding can be extended to space-time block codes (STBCs), where the coding process is performed blockwise over time and space. The main advantage of STBCs is that they can provide full diversity gain with relatively simple encoding and decoding schemes. Unlike the conventional fixed-rate STBC, rateless STBC is designed such that the code rate is not fixed before transmission. Instead, it depends on the instantaneous channel conditions. Incorporating RSTBC in massive MIMO systems is reasonable and very attractive, since rateless coding is based on generating a massive number of encoded blocks, and massive MIMO technique uses a large number of antenna elements. Motivated by such fact, in this chapter, a new approach has been developed to fill the gap between rateless STCs and massive MIMO systems by exploiting significant degrees of freedom available in massive MIMO systems for rateless coding. The contribution of RSTBC is to convert lossy massive MIMO channels into lossless ones and provide a reliable robust transmission when very large MIMO dimensions are used.
RATELESS SPACE-TIME BLOCK CODE FOR MASSIVE MIMO SYSTEMS Massive MIMO wireless communication systems have been targeted for deployment in the fifth-generation (5G) cellular standards, to enhance the wireless capacity and communication reliability [39] fundamentally. In massive MIMO systems, a large number of antennas, possibly hundreds or even thousands, work together to deliver big data to the end users. Despite the significant enhancement in capacity and/or link quality offered by MIMO systems and space-time codes (STCs) [40, 41], it has been shown recently that massive MIMO can even improve the performance of MIMO systems dramatically. This has prompted a lot of research works on massive MIMO systems lately.
198
Information and Coding Theory in Computer Science
In this section, we illustrate the mechanism of rateless space-time block code (RSTBC) in a massive MIMO system, as we have addressed in [42, 43, 44, 45]. Figure 6 shows simply the encoding and decoding processes, where a part of the encoded packets (or blocks) cannot be received due to channel losses. Hence, with the availability of slightly larger encoded packets, the receiver can recover the original packets from the minimum possible number of transmitted encoded packets that are already received. The required number of blocks for recovery depends on the loss rate of the channel. During the transmission, the receiver of a specific user measures the mutual information between the transmitter and the receiver after it receives each block and compares it to a threshold.
Figure 6. Encoding and decoding processes of RSTBC in a massive MIMO system.
Namely, it is desired to decode a message of total mutual information MM. Assume that the required packets to deliver the message correctly are where is the codeword matrix transmitted during the l block, Nt is the number of transmit antenna elements at the base station (BS), T is the number of time slots, and L is the number of required blocks at the receiver to recover the transmitted block. Let ml denote the th
measured mutual information after receiving the codeword block . If ml ≤ M, the receiver waits for further blocks, else if ml > M, the receiver sends a simple feedback to the transmitter to stop transmitting the remaining part of the encoded packets and remove them from the BS buffer. This process
Rateless Space-Time Block Codes for 5G Wireless Communication Systems
199
continues until the receiver accumulates enough number of blocks (L) to recover the message or the time allowed is over the channel coherence time. The decoding process is conducted sequentially, first using
, then
if is not sufficient, and so forth. Once the check-sum condition is satisfied, the received blocks are linearly combined at the receiver to decode the whole underling message. It should be noted that the code is described as “rateless” because the number of required blocks (L) to recover the message is not fixed before transmission; rather, it depends on the channel state. The dimensions in which the code is extended ratelessly are time (number of channel uses) and space (number of functional antennas) as well as it belongs to block codes. Therefore, it is called rateless space-time block code (RSTBC). Before proceeding, each of the RSTBC matrix is constructed based on the following random process (1) where ⊙ denotes the element-wise multiplication operation (Hadamard product); X is the
complex data matrix to be transmitted, and Dl is
the lth random binary matrix generated randomly where each of its entries is either 0 or 1 and occurs with probabilities P0 and P1, respectively. For each l, a new lth Dl is constructed with different positions of zeros. This means that D1 ≠ D2 ≠ Dl ≠ ⋯ ≠ DL and consequently, X1 ≠ X2 ≠ Xl ≠ ⋯ ≠ XL. Such a method is considered as rateless coding in the sense that the encoder can generate on the fly potentially a very large number of blocks. A power constraint on each Xl is introduced as the average power does not exceed Nt. Now, we consider a downlink massive multiuser MIMO (MU-MIMO) system in which RSTBC is applied as shown in Figure 7.
Figure 7. RSTBC code for massive MU-MIMO system (BS-to-users scenario).
In this system, a BSTx, equipped with a large number of antennas, communicates simultaneously with K independent users on the same time-
200
Information and Coding Theory in Computer Science
frequency resources where each user device has Nr receive antennas. The overall channel matrix
can be written as (2)
where is the channel matrix corresponding to the kth user. To eliminate the effects of the multiuser interference (MUI) at the specific receiving users, a precoding technique is applied at the BSTx with, for example, a zero-forcing (ZF) precoding matrix which is calculated as (3) where β is a normalized factor. In this system, channel reciprocity is exploited to estimate the downlink channels via uplink training, as the resulting overhead is linearly a function of the number of users rather than the number of BS antennas [46]. For a single-cell MU-MIMO system, the received signal at the kth user at time instant t can be expressed as
(4) where corresponds to the average SNR per user (Ex is the symbol energy, and 0 is the noise power at the receiver); L is the maximum number of required blocks of RSTBC at the user; the transmitted signal by the the nth antenna where
is is the
channel coefficient from the nth transmit antenna to the kth user; which is the (n, l)th element of the matrix Dl; and wk is the noise at the kth user receiver. It has been demonstrated in [42, 43, 44, 45] that RSTBC is able to compensate for data losses. For more details, the reader is referred to these references. Here are some sample simulation results. The averaged symbolerror-rate (SER) performance when RSTBC is applied for Nt = 100 with QPSK is shown in Figure 8, where the loss rate is assumed to be 25%.
Rateless Space-Time Block Codes for 5G Wireless Communication Systems
201
Figure 8. SER curves for massive MU-MIMO system with 25%-rate loss and Nt = 100, K = 10 users, with QPSK, when RSTBC is applied.
It is inferred from Figure 8 that for small values of L, the averaged SER approaches a fixed level at high SNR because RSTBC, with the current number of blocks, is no longer able to compensate for further losses. Therefore, it is required to increase L to achieve enhancements until losses effects are eliminated. As shown, for instance, RSTBC with L = 32, the flooring in the SER curves has vanished due to the diversity gain achieved by RSTBC (as the slopes of the SER curves increase) so that the effect of losses is eliminated considerably. Thus, the potential for employing RSTBC to combat losses in massive MU-MIMO systems has been shown. Furthermore, Figure 9 shows the cumulative distribution function (CDF) of the averaged downlink SINR (in dB) in the target cell for simulation and analytical results for a multi-cell massive MU-MIMO system with Nt = 100, K = 10 users, QPSK, and pilot reuse factor = 3/7, when RSTBC is applied with L = 4, 8, 16, 32, where lossy channel of 25% loss rate is assumed. Notably, RSTBC supports the system to alleviate the effects of pilot contamination by increasing the downlink SINR. Simulation and analytical results show good matching as seen. Also, it is obvious that the improvements in SINR are linear functions of the number of RSTBC blocks L. It should be mentioned that the simulation parameters are tabulated in Table 1.
202
Information and Coding Theory in Computer Science
Figure 9. CDF simulation and analytical results’ comparisons of SINR for multi-cell massive MU-MIMO system with Nt = 100, K = 10, QPSK, pilot reuse factor = 3/7, RSTBC with L = 4, 8, 16, 32, and 25% loss rate. Analytical curves are plotted using Eq. (21) in [43]. Table 1. Simulation parameters for massive MU-MIMO system. Parameter Cell radius Reference distance from the BS Path loss exponent Carrier frequency Shadow fading standard deviation
Value 500 m 100 m 3.8 28 GHz 8 dB
CONCLUSION In this chapter, we have considered the rateless space-time block code (RSTBC) for massive MIMO wireless communication systems. Unlike the fixed-rate codes, RSTBC adapts the amount of redundancy over time and space for transmitting a message based on the instantaneous channel conditions. RSTBC can be used to protect data transmission in lossy systems
Rateless Space-Time Block Codes for 5G Wireless Communication Systems
203
and still guarantee the reliability of the regime when transmitting big data. It is concluded that, using RSTBC with very large MIMO dimensions, it is possible to recover the original data from a certain amount of encoded data even when the losses are high. Moreover, RSTBC can be employed in a multi-cell massive MIMO system at the BS to mitigate the downlink inter-cell interference (resulting from pilot contamination) by improving the downlink SINR. These results strongly introduce the RSTBC for the upcoming 5G wireless communication systems.
204
Information and Coding Theory in Computer Science
REFERENCES 1.
2.
3.
4. 5.
6. 7.
8.
9. 10.
11.
12. 13.
Abdullah A, Abbasi M, Fisal N. Review of rateless-networkcoding based packet protection in wireless sensor networks. Mobile Information Systems. 2015;2015:1-13 Liew T, Hanzo L. Space–time codes and concatenated channel codes for wireless communications. Proceedings of the IEEE. 2002;90(2):187219 Huang J-W, Yang K-C, Hsieh H-Y, Wang J-S. Transmission control for fast recovery of rateless codes. International Journal of Advanced Computer Science and Applications (IJACSA). 2013;4(3):26-30 Bonello N, Yang Y, Aissa S, Hanzo L. Myths and realities of rateless coding. IEEE Communications Magazine. 2011;49(8):143-151 Bonello N, Zhang R, Chen S, Hanzo L. Reconfigurable rateless codes. In: IEEE 69th Vehicular Technology Conference, 2009, VTC Spring 2009; IEEE. 2009, pp. 1-5 Bernard S. Digital Communications Fundamentals and Applications. USA: Prentice Hall; 2001 Mehran F, Nikitopoulos K, Xiao P, Chen Q. Rateless wireless systems: Gains, approaches, and challenges. In: 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP). IEEE; 2015. pp. 751-755 Wang X, Chen W, Cao Z. ARQ versus rateless coding: from a point of view of redundancy. In: 2012 IEEE International Conference on Communications (ICC); IEEE. 2012. pp. 3931-3935 Wang P. Finite length analysis of rateless codes and their application in wireless networks [PhD dissertation]. University of Sydney; 2015 Byers JW, Luby M, Mitzenmacher M, Rege A. A digital fountain approach to reliable distribution of bulk data. ACM SIGCOMM Computer Communication Review. 1998;28(4):56-67 Luby M. LT codes. In: The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings. 2002. pp. 271280 Shokrollahi A. Raptor codes. IEEE Transactions on Information Theory. 2006;52(6):2551-2567 Maymounkov P. Online codes. Technical report. New York University; 2002
Rateless Space-Time Block Codes for 5G Wireless Communication Systems
205
14. Palanki R, Yedidia JS. Rateless codes on noisy channels. In: International Symposium on Information Theory, 2004. ISIT 2004. Proceedings; June 2004; p. 38 15. Chong KFE, Kurniawan E, Sun S, Yen K. Fountain codes with varying probability distributions. In: 2010 6th International Symposium on Turbo Codes Iterative Information Processing; Sept 2010. pp. 176-180 16. Kuo S-H, Lee H-C, Ueng Y-L, Lin M-C. A construction of physical layer systematic Raptor codes based on protographs. IEEE Communications Letters. 2015;19(9):1476-1479 17. Chen S, Zhang Z, Zhu L, Wu K, Chen X. Accumulate rateless codes and their performances over additive white Gaussian noise channel. IET Communications. March 2013;7(4):372-381 18. Erez U, Trott MD, Wornell GW. Rateless coding for Gaussian channels. IEEE Transactions on Information Theory. Feb 2012;58(2):530-547 19. Castura J, Mao Y. Rateless coding over fading channels. IEEE Communications Letters. Jan 2006;10(1):46-48 20. Castura J, Mao Y. Rateless coding and relay networks. IEEE Signal Processing Magazine. Sept 2007;24(5):27-35 21. Tian S, Li Y, Shirvanimoghaddam M, Vucetic B. A physical-layer rateless code for wireless channels. IEEE Transactions on Communications. June 2013;61(6):2117-2127 22. Shanechi MM, Erez U, Wornell GW. Rateless codes for MIMO channels. In: IEEE GLOBECOM 2008-2008 IEEE Global Telecommunications Conference; Nov 2008. pp. 1-5 23. Wolniansky PW, Foschini GJ, Golden G, Valenzuela RA. V-BLAST: An architecture for realizing very high data rates over the rich-scattering wireless channel. In: 1998 URSI International Symposium on Signals, Systems, and Electronics, 1998. ISSSE 98; IEEE. 1998. pp. 295-300 24. Fan Y, Lai L, Erkip E, Poor HV. Rateless coding for MIMO fading channels: performance limits and code construction. IEEE Transactions on Wireless Communications. 2010;9(4):1288-1292 25. Zheng L, Tse DNC. Diversity and multiplexing: A fundamental tradeoff in multiple-antenna channels. IEEE Transactions on Information Theory. 2003;49(5):1073-1096 26. Tavildar S, Viswanath P. Approximately universal codes over slow-fading channels. IEEE Transactions on Information Theory. 2006;52(7):3233-3258
206
Information and Coding Theory in Computer Science
27. Chen Y, Huang H, Zhang Z, Qiu P, Lau VK. Cooperative spectrum access for cognitive radio network employing rateless code. In: ICC Workshops-2008 IEEE International Conference on Communications Workshops; IEEE. 2008. pp. 326-331 28. Chen X, Zhang Z, Chen S, Wang C. Adaptive mode selection for multiuser MIMO downlink employing rateless codes with QoS provisioning. IEEE Transactions on Wireless Communications. 2012;11(2):790-799 29. Afzal J, Stockhammer T, Gasiba T, Xu W. System design options for video broadcasting over wireless networks. In: Proceedings of IEEE CCNC, vol. 54. Citeseer; 2006. p. 92 30. Afzal J, Stockhammer T, Gasiba T, Xu W. Video streaming over MBMS: A system design approach. Journal of Multimedia. 2006;1(5):25-35 31. Molisch AF. Wireless Communications, vol. 2. New York, USA: John Wiley & Sons; 2011 32. Labiod H. Wireless ad hoc and Sensor Networks. Vol. 6. New York, USA: John Wiley & Sons; 2010 33. Hartung F, Horn U, Huschke J, Kampmann M, Lohmar T, Lundevall M. Delivery of broadcast services in 3G networks. IEEE Transactions on Broadcasting. 2007;53(1):188-199 34. Culler D, Estrin D, Srivastava M. Guest editors’ introduction: Overview of sensor networks. IEEE Computer Society. Aug 2004;37(8):41-49 35. Zhao F, Guibas LJ. Wireless Sensor Networks: An Information Processing Approach. San Francisco, USA: Elsevier Science & Technology; 2004 36. Razavi R, Fleury M, Ghanbari M. Rateless coding on a wearable wireless network for augmented reality and biosensors. In: 2008 IEEE 19th International Symposium on Personal, Indoor and Mobile Radio Communications; IEEE. 2008. pp. 1-4 37. Erez U, Wornell G, Trott MD. Rateless space–time coding. In: Proceedings. International Symposium on Information Theory, 2005. ISIT 2005; IEEE. 2005, pp. 1937-1941 38. Wang C, Zhang Z. Performance analysis of a rate varying space–time coding scheme. In: 2013 International Workshop on High Mobility Wireless Communications (HMWC). IEEE. 2013. pp. 151-156
Rateless Space-Time Block Codes for 5G Wireless Communication Systems
207
39. Larsson E, Edfors O, Tufvesson F, Marzetta T. Massive MIMO for next generation wireless systems. Communications Magazine, IEEE. 2014;52(2):186-195 40. Tarokh V, Seshadri N, Calderbank AR. Space–time codes for high data rate wireless communication: Performance criterion and code construction. IEEE Transactions on Information Theory. 1998;44(2):744-765 41. Alamouti SM. A simple transmit diversity technique for wireless communications. IEEE Journal on Selected Areas in Communications. 1998;16(8):1451-1458 42. Alqahtani AH, Sulyman AI, Alsanie A. Rateless space time block code for massive MIMO systems. International Journal of Antennas and Propagation. 2014;2014:1-10 43. Alqahtani AH, Sulyman AI, Alsanie A. Rateless space time block code for mitigating pilot contamination effects in multicell massive MIMO system with lossy links. IET Communications Journal. 2016;10(16):2252-2259 44. Alqahtani AH, Sulyman AI, Alsanie A. Rateless space time block code for antenna failure in massive MU-MIMO systems. IEEE Wireless Communications and Networking Conference (WCNC); Doha, Qatar; April 2016. pp. 1-6 45. Alqahtani AH, Sulyman AI, Alsanie A. Loss-tolerant large-scale MU-MIMO system with rateless space time block code. In: 22nd Asia-Pacific Conference on Communications (APCC); Yogyakarta, Indonesia; August 2016. pp. 342-347 46. Marzetta TL. How much training is required for multiuser MIMO? In: Fortieth Asilomar Conference on Signals, Systems and Computers, 2006. ACSSC’06; IEEE; 2006. pp. 359-363
SECTION 3: LOSSLESS DATA COMPRESSION
Chapter
LOSSLESS IMAGE COMPRESSION TECHNIQUE USING COMBINATION METHODS
10
A. Alarabeyyat1, S. Al-Hashemi1, T. Khdour1, M. Hjouj Btoush1, S. Bani-Ahmad1, and R. Al-Hashemi2 Prince Abdulah Bin Gazi Faculty of Information Technology, Al-Balqa Applied University, Salt, Jordan 1
The Computer Information Systems Department, College of Information Technology, AlHussein Bin Talal University, Ma’an, Jordan. 2
ABSTRACT The development of multimedia and digital imaging has led to high quantity of data required to represent modern imagery. This requires large disk space for storage, and long time for transmission over computer networks, and these two are relatively expensive. These factors prove the need for images Citation: A. Alarabeyyat, S. Al-Hashemi, T. Khdour, M. Hjouj Btoush, S. Bani-Ahmad, R. Al-Hashemi and S. Bani-Ahmad, “Lossless Image Compression Technique Using Combination Methods,” Journal of Software Engineering and Applications, Vol. 5, No. 10, 2012, pp. 752-763. doi: 10.4236/jsea.2012.510088. Copyright: © 2012 by authors and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0
212
Information and Coding Theory in Computer Science
compression. Image compression addresses the problem of reducing the amount of space required to represent a digital image yielding a compact representation of an image, and thereby reducing the image storage/ transmission time requirements. The key idea here is to remove redundancy of data presented within an image to reduce its size without affecting the essential information of it. We are concerned with lossless image compression in this paper. Our proposed approach is a mix of a number of already existing techniques. Our approach works as follows: first, we apply the well-known Lempel-Ziv-Welch (LZW) algorithm on the image in hand. What comes out of the first step is forward to the second step where the Bose, Chaudhuri and Hocquenghem (BCH) error correction and detected algorithm is used. To improve the compression ratio, the proposed approach applies the BCH algorithms repeatedly until “inflation” is detected. The experimental results show that the proposed algorithm could achieve an excellent compression ratio without losing data when compared to the standard compression algorithms. Keywords: Image Compression, LZW, BCH
INTRODUCTION Image applications are widely used, driven by recent advances in the technology and breakthroughs in the price and performance of the hardware and the firmware. This leads to an enormous increase in the storage space and the transmitting time required for images. This emphasizes the need to provide efficient and effective image compression techniques. In this paper we provide a method which is capable of compressing images without degrading its quality. This is achieved through minimizing the number of bits required to represent each pixel. This, in return, reduces the amount of memory required to store images and facilitates transmitting image in less time. Image compression techniques fall into two categories: lossless or lossy image compression. Choosing which of these two categories depends on the application and on the compression degree required [1,2]. Lossless image compression is used to compress images in critical applications as it allows the exact original image to be reconstructed from the compressed one without any loss of the image data. Lossy image compression, on the other hand, suffers from the loss of some data. Thus, repeatedly compressing and decompressing an image results in poor
Lossless Image Compression Technique Using Combination Methods
213
quality of image. An advantage of this technique is that it allows for higher compression ratio than the lossless [3,4]. Compression is achieved by removing one or more of the three basic data redundancies: •
Coding redundancy, which is presented when less than optimal code words are used; • Interpixel redundancy, which results from correlations between the pixels of an image; • Psychovisual redundancy, which is due to data that are ignored by the human visual system [5]. So, image compression becomes a solution to many imaging applications that require a vast amount of data to represent the images, such as document imaging management systems, facsimile transmission, image archiving, remote sensing, medical imaging, entertainment, HDTV, broadcasting, education and video teleconferencing [6]. One major difficulty that faces lossless image compression is how to protect the quality of the image in a way that the decompressed image appears identical to the original one. In this paper we are concerned with lossless image compression based on LZW and BCH algorithms, which compresses different types of image formats. The proposed method repeats the compression three times in order to increase the compression ratio. The proposed method is an implementation of the lossless image compression. The steps of our approach are as follows: first, we perform a preprocessing step to convert the image in hand into binary. Next, we apply the LZW algorithm on the image to compress. In this step, the codes from 0 to 255 represent 1-character sequences consisting of the corresponding 8-bit character, and the codes from 256 through 4095 are created in a dictionary for sequences encountered in the data as it is encoded. The code for the sequence (without that character) is emited, and a new code (for the sequence with that character) is added to the dictionary [7]. Finally, we use the BCH algorithm to increase image compression ratio. An error correction method is used in this step where we store the normal data and first parity data in a memory cell array, the normal data and first parity data form BCH encoded data. We also generate the second parity data from the stored normal data. To check for errors, we compare the first parity data with the second parity data as in [8,9].
214
Information and Coding Theory in Computer Science
Notice that we repeat compressing by the BCH algorithm until the required level of compression is achieved. The method of decompression is done in reversible order that produces image identical to original one.
LITERATURE REVIEW A large number of data compression algorithms have been developed and used throughout the years. Some of which are of general use, i.e., can be used to compress files of different types (e.g., text files, image files, video files, etc.). Others are developed to compress efficiently a particular type of files. It has been realized that, according to the representation form of the data at which the compression process is performed, below is reviewing some of the literature review in this field. In [10], the authors present lossless image compression with four modular components: pixel sequence, prediction, error modeling, and coding. They used two methods that clearly separate the four modular components. These method are called Multi-Level Progressive Method (MLP), and Partial Precision Matching Method (PPMM) for lossless compression, both involving linear predictions, modeling prediction errors by estimating the variance of a Laplace distribution (symmetric exponential), and coding using arithmetic coding applied to pre-computed distributions [10]. In [11], a composite modeling method (hybrid compression algorithm for binary image) is used to reduce the number of data coded by arithmetic coding, which code the uniform areas with less computation and apply arithmetic coding to the areas. The image block is classified into three categories: all-white, all-black, and mixed, then image processed 16 rows at a time, which is then operated by two global and local stages [11]. In [12], the authors propose an algorithm that works by applying a reversible transformation on the fourteen commonly used files of the Calgary Compression Corpus. It does not process its input sequentially, but instead processes a block of texts as a single unit, to form a new block that contains the same characters, but is easier to compress by simple compression algorithms, group characters together based on their contexts. This technique makes use of the context on only one side of each character so that the probability of finding a character closer to another instance of the same character is increased substantially. The transformation does not itself compress the data, but reorder it to make it easy to compress with simple algorithms such as move-to-front coding in combination with Huffman or arithmetic coding [12].
Lossless Image Compression Technique Using Combination Methods
215
In [13], the authors present Lossless grayscale image compression method—TMW—is based on the use of linear predictors and implicit segmentation. The compression process is split into an analysis step and a coding step. In the analysis step, a set of linear predictors and other parameters suitable for the image are calculated in the analysis step in a way that minimizes the length of the encoded image which is included in the compressed file and subsequently used for the coding step. To do the actual encoding, obviously, the chosen parameter set has to be considered as a part of the encoded image and has to be stored or transmitted alongside with the result of the Coding Stage [13]. In [14], the authors propose a lossless compression scheme for binary images which consists of a novel encoding algorithm and uses a new edge tracking algorithm. The proposed scheme consists of two major steps: the first step encodes binary image data using the proposed encoding method that encodes image data to only characteristic vector information of objects in image by using a new edge tracing method. Unlike the existing approaches, our method encodes information of edge lines obtained using the modified edge tracing method instead of directly encoding whole image data. The second is compressing the encoded image Huffman and Lempel-ZivWelch (LZW) [14]. In [15], the author presents an algorithm for lossless binary image compression which consists of two modules, called Two Modules Based Algorithm (TMBA), the first module: direct redundancy exploitation and the second: improved arithmetic coding [15]. In [16], a two-dimensional dictionary-based on lossless image compression scheme for grayscale images is introduced. The proposed scheme reduces a correlation in image data by finding two-dimensional blocks of pixels that are approximately matched throughout the data and replacing them with short codewords. In [16], the two-dimensional Lempel-Ziv image compression scheme (denoted GS-2D-LZ) is proposed. This scheme is designed to take advantage of the two-dimensional correlations in the image data. It relies on three different compression strategies, namely: two-dimensional block matching, prediction, and statistical encoding. In [17], the authors presented a lossless image compression method that is based on Multiple-Table’s Arithmetic Coding (MTAC) method to encode a gray-level image, first classifies the data and then encodes each cluster of data using a distinct code table. The MTAC method employs a
216
Information and Coding Theory in Computer Science
median edge detector (MED) to reduce the entropy rate of f. The gray levels of two adjacent pixels in an image are usually similar. A base-switching transformation approach is then used to reduce the spatial redundancy of the image. The gray levels of some pixels in an image are more common than those of others. Finally, the arithmetic encoding method is applied to reduce the coding redundancy of the image [17]. In [18], the authors used a lossless method of image compression and decompression is proposed. It uses a simple coding technique called Huffman coding. A software algorithm has been developed and implemented to compress and decompress the given image using Huffman coding techniques in a MATLAB platform. They concern with compressing images by reducing the number of bits per pixel required to represent it, and to decrease the transmission time for images transmission. The image is reconstructed back by decoding it using Huffman codes [18]. This paper uses the adaptive bit-level text compression schema based on humming code data compression used in [19]. Our schema consists of six steps repeated to increase image compression rate. The compression ratio is found by multiplying the compression ratio for each loop, and are referred to this schema by HCDC (K) where (K) represents the number of repetition [19]. In [20], the authors presented a lossless image compression based on BCH combined with Huffman algorithm [20].
THE PROPOSED METHOD The objective of the proposed method in this paper is to design an efficient and effective lossless image compression scheme. This section deals with the design of a lossless image compression method. The proposed method is based on LZW algorithm and the BCH algorithm an error correcting technique, in order to improve the compression ratio of the image comparing to other compression techniques in the literature review. Later, we will explain the methodology that will be used in details and the architecture of the proposed method. The proposed method is a lossless image compression scheme which is applied to all types of image based on LZW algorithm that reduce the repeated value in image and BCH codes that detect/correct the errors. The BCH algorithm works by adding extra bits called parity bits, whose role is to verify the correctness of the original message sent to the receiver so, the
Lossless Image Compression Technique Using Combination Methods
217
system in this paper benefit from this feature. This method of BCH convert blocks of size k to n by adding parity bits, depending on the size of the message k, which is encoded into a code word of the length n. The proposed method is shown below in Figure 1.
Figure 1. Proposed image compression approach.
Lempel-Ziv-Welch (LZW) The compression system improves the compression of the image through the implementation of LZW algorithm. First, the entered image is converted to the gray scale and then converted from decimal to binary to be a suitable form to be compressed. The algorithm builds a data dictionary (also called a translation table or string table) of data occurring in an uncompressed data stream. Patterns of data are identified in the data stream and are matched to entries in the dictionary. If the patterns are not present in the dictionary, a code phrase is created based on the data content of that pattern, and it is stored in the dictionary. The phrase is then written to the compressed output stream. When a reoccurrence of a pattern is identified in the data, the phrase of the pattern already stored in the dictionary is written to the output.
Bose, Chaudhuri and Hocquenghem (BCH) The binary input image is firstly divided into blocks of size 7 bits each; only 7 bits needed to represent in each byte, 128 value in total, while eighth bits represent sign of the number (most significant bit) that don’t affect the total value of blocks, and converts it to a galoris field to be accepted as an input to the BCH. Each block is decoded using BCH decoder, then is checked if
218
Information and Coding Theory in Computer Science
it is a valid codeword or not. The BCH decoder converts the valid block to 4 bits. The proposed method adds 1 as an indicator for the valid codeword to an extra file called (map), otherwise if it is not a codeword, it remains 7 and adds 0 to the same file. The benefit of the extra file (map) is that it is used as the key for image decompression in order to distinguish between compressed blocks and the not compressed ones (codeword or not). After the image is compressed, the file (map) is compressed by RLE to decrease its size, and then it is attached to the header of the image. This step is iterated three times, the BCH decoding repeat three times to improve the compression ratio; we stopped repeating this algorithm at three times after done experiment; conclude that if we try to decode more it will affect the other performance factor that leads to increase time needed for compression, and the map file becomes large in each time we decode by BCH so it leads to the problem of increase the size of image, which opposes the objective of this paper to reduce the image size. Below is an example of the compressed image:
Example Next is an example of the proposed system compression stage. In this example a segment of the image is demonstrated using the proposed algorithm. First of all it converts the decimal values into binary, compresses it by LZW and then divides it into blocks of 7 bits A = Original
The block is compressed by LZW algorithm and the output is:
Now it is converting to Binary and divided to 7 bit each:
Lossless Image Compression Technique Using Combination Methods
219
After dividing the image into blocks of 7 bits, the system implements the BCH code that checks each block if it is a codeword or not by matching the block with 16 standards codeword in the BCH. The first iteration shows that we found four codewords. This block is compressed by using BCH algorithm which is converted to blocks of 4 bit each.
When implementing the BCH algorithm, the file (Map 1) initializes. If the block is a codeword, it is added to the file 1 and adds 0 if the block is a non-codeword. In this example Map 1 is: Map 1 = 0 1 0 0 1 0 0 0 1 0 1 This operation is repeated three times. The file (Map 3, Map 2, and Map 1) is compressed by RLE before attaching to the header of the image to gain more compression ratio.
Compression Algorithm Steps The proposed method compression the original image by implements a number of steps Figure 2 represents the flowchart of the proposed method. The algorithm steps are: Input: image (f) Output: compressed file Begin Initialize parameters SET round to zero READ image (f) Convert (f) to gray scale SET A = ( ) // set empty value to matrix A A = image (f) Bn = convert matrix A into binary Initial matrix Map 1, Map 2, Map 3 to store parity bits Out1 = Compress matrix by LZW algorithm function norm 2l zw (Bn); Convert matrix compress by LZW into binary
220
Information and Coding Theory in Computer Science
Set N = 7, k = 4 WHILE (there is a codeword) and (round ≤3) xxx = the size of the (Out 1) remd = matrix size mod N; div = matrix size /N; FOR i = 1 to xxx-remd step N
[R]
FOR R = i to i + (N ‒ 1)
divide the image into blocks of size 7 save into parameter msg = out 1 END FOR R c2 = convert (msg) to Galoris field; origin = c2 d2 = decoding by BCH decoder (bchdec (c2, n, k,)) c2 = Encode by BCH encoder for test bchenc (d2, n, k) IF (c2 == origin) THEN // original message parameter INCREMENT the parameter test (the number of codeword found) by 1; add the compressed block d2 to the matrix CmprsImg add 1 to the map[round] matrix ELSE add the original block (origin) to the matrix CmprsImg add 0 to the map[round] matrix ENDIF END FOR i Pad and Add remd bits to the matrix CmprsImg and encode it Final map file = map [round] to reuse map file in the iteration FOR stp = 1 to 3
Compress map by RLE encoder and put in parameter map_RLE [stp] = RLE (map [stp]) END FOR stp INCREMENT round by 1 ENDWHILE END
Lossless Image Compression Technique Using Combination Methods
Figure 2. Algorithm 1. Encoding algorithm.
221
222
Information and Coding Theory in Computer Science
Decompression It is reversible steps to the compression stage to reconstruct the images. At first the system decompress the attach file (map) by RLE decoder because we depend on its values to know which block in the compress image is a code word to be decompressed. That means if the value of the map file is 1, then it reads 4 bit block from the compressed image which means it’s a codeword then decompressed by BCH encoder. If the value is 0, it reads the 7 bit block from the compressed image which means that it is not a codeword. This operation is repeated three times, after that the output from BCH is decompressed using LZW algorithm. The below example explains these steps.
Example Read the map file after decompressing it by RLE algorithm.
Depending on the value of the map file, in the positions 4 and 7, the value is 1 which means that the system will read 4 bit from the compressed image. This means that a codeword and by compressing it by BCH, it reconstructs 7 bit from 16 codewords valid in BCH that match it, and the remained value of the file is 0. This mean it a non codeword reads 7 bit from the compressed image. The compressed image is:
Decompression procedure shown in the Figure 3 is implemented to find the original image from the compressed image and it is performed as follows: Input: compressed image, attach file mapi
Lossless Image Compression Technique Using Combination Methods
Output: original image Begin Initial parameter SET P = ( ) // set empty value to matrix P SET j = 1 SET n = 7 SET k = 4 SET round = 3 // number of iteration Rle_matrix = RLE decoder (mapi) WHILE round > 0 FOR i = 0 to length of (Rle_matrix) IF Rle_matrix [i] = 1 THEN encode by BCH FOR s = j to j + (k − 1) encode compress image by BCH encoder and put in parameter (c2) c2 = bchenc (CmprsImg (s)), n, k) INCREMENT j by 4 add c2 to matrix p ENDFOR s ELSE //block is not compress then read it as it FOR s1 = j to j + (n − 1) add uncompress block from CmprsImg [s1] to matrix p INCREMENT j by 7 ENDFOR s1 ENDIF ENDFOR i Decrement parameter round by 1 ENDWHILE LZW_dec = decompress matrix p by LZW Image Post processing
223
224
Information and Coding Theory in Computer Science
Original_image = bin2dec (LZW_dec) //convert from binary to decimal to reconstruct original image END The above steps explain the implementation of the compression and decompression of the proposed methods using combination of LZW algorithm and BCH algorithm after many testing before reaching this final decision.
Figure 3. Algorithm 2. Proposed decoding algorithm.
ENDFOR i
Lossless Image Compression Technique Using Combination Methods
225
Decrement parameter round by 1 ENDWHILE LZW_dec = decompress matrix p by LZW Image Post processing Original_image = bin2dec (LZW_dec) //convert from binary to decimal to reconstruct original image END The above steps explain the implementation of the compression and decompression of the proposed methods using combination of LZW algorithm and BCH algorithm after many testing before reaching this final decision. The next section shows the result from using this method by using MatLab platform; calculating the compression ratio by using this equation: Cr = original size compress size
Use the same dataset and the same size to compare between proposed method and LZW, RLE, Huffman, and then compare it depending on the bit that needs to represent each pixel according to the equation below:
Or by use the following equation:
(q) Is the number of bit represent each pixel in uncompressed image, (S0) the size of the original data and (Sc) the size of the compressed data.
Also compare it with the standards of the compression technique, and finally explain the test (the codeword found in image) comparing it with the original size of the image in bit.
Result and Discussion In order to evaluate the compression performance of our proposed method, we compared the proposed method in this paper with other standard lossless image compression schemes in the literature review. At first, the comparison is based on the compression ratio, and the second is based on a bit per pixel.
226
Information and Coding Theory in Computer Science
Lossless images compression lets the images to occupy less space. In lossless compression, no data are lost during the process, which means that it protects the quality of the image. Decompression process restores the original image without losing essential data. The tested images used during compression are stored in GIF, PNG, JPG, and Tiff formats that are all compressed automatically by the proposed method. Hardware used: PC, processor Intel® core™ i3 CPU, hard size: 200 GB, RAM 2.00 GB, Software using Windows 7 ultimate Operating System, and MatLab Version 7.5. 0.342 R (2007b). Analyzes and discusses the results obtained in performing the LZW and BCH algorithms discussed above on the set of images. The proposed system uses the set of images that are commonly used in image processing (airplane, baboon, F-18, Lina, and peppers, etc.) as a test set. The proposed method has been tested on different image sizes. The simulation result is compared with RLE, Huffman and LZW. The results show that the proposed method has higher compression ratio than the standard compression algorithm mentioned above. The results based on the compression ratio shown in Tables 1 and 2 show the compression based on bit per pixel. Table 1. Compression with typical compression methods based on compression ratio which divide original image size by size of compressed image.
Lossless Image Compression Technique Using Combination Methods
227
Table 2. Compression results of images in bit/pixel.
The above results show that the compression by the proposed system is the best compared to the results of compressing the image by using RLE algorithm, LZW algorithm or Huffman algorithm. Here in Figure 4, we illustrate the comparison based on compression ratio between the proposed algorithm (BCH and LZW) and the standard image compression algorithms (RLE, Huffman and LZW) which can be distinguished by color. And Figure 5 explains the size of original image compared with image after compressed by the standard image compression algorithm and the proposed method. Table 2 shows the results of the compression based on bit per pixel rate for the proposed method, and the standards compression algorithm.
228
Information and Coding Theory in Computer Science
Figure 4. Comparing the proposed method with (RLE, LZW and Huffman) based on compression ratio.
Figure 5. Comparing the proposed method with (RLE, LZW and Huffman) based on image size.
Figure 6 explains the result of the above Table 2.
Figure 6. Comparing the proposed method with (RLE, LZW and Huffman) based on bit per pixel.
Lossless Image Compression Technique Using Combination Methods
229
Discussion In this section we show the efficiency of the proposed system which uses MatLab to implement the algorithm. In order to demonstrate the compression performance of the proposed method, we compared it with some representative lossless image compression techniques on the set of ISO test images that were made available to the proposer that were shown in the first column in all tables. Table 1 lists compression ratio results of the tested images which calculated depend on size of original image to the size of image after compression; the second column of this table lists the compression ratio result from compress image by the RLE algorithm. Column three and four list the compression ratio result from compress by LZW and Huffman algorithms respectively while the last column lists the compression ratio achieved by the proposed method. In addition the average compression ratio of each method after applied on all tested images (RLE 1.2017, LZW 1.4808, Huffman 1.195782 and BCH and, LZW the average is 1.676091). The average of compression ratio on tested images based on the proposed method is the best ratio achieved, this mean image size is reduced more when compressed by using combination method LZW and BCH compared to the standards of lossless compression algorithm, and Figure 2 can clear the view of the proposed method that has higher compression ratio than the RLE, LZW and Huffman. Figure 3 displays the original image size and the size of image after compressed by each RLE, LZW, Huffman and compress by the proposed method which show it had the less image size which achieves the goal of this paper to utilize storage need to store the image and therefore, reduce time for transmission. The second comparison depends on bit per pixel shown in Table 2. The goal of the image compression is to reduce the size as much as possible, while maintaining the image quality. Smaller files use less space to store, so it is better to have fewer bits need to represent in each pixel. The table tests the same image sets and explains the proposed method that needs fewer numbers of bit per pixel than the other standard image compression and the average bit per pixel of all tested images are 6.904287, 5.656522, 6.774273 and 5.01157 to RLE, LZW, Huffman and proposed method respectively.
CONCLUSIONS This paper was motivated by the desire of improving the effectiveness of lossless image compression by improving the BCH and LZW. We
230
Information and Coding Theory in Computer Science
provided an overview of various existing coding standards lossless image compression techniques. We have proposed a high efficient algorithm which is implemented using the BCH coding approach. The proposed method takes the advantages of the BCH algorithm with the advantages of the LZW algorithm which is known for its simplicity and speed. The ultimate goal is to give a relatively good compression ratio and keep the time and space complexity minimum. The experiments were carried on collection of dataset of 20 test images. The result valuated by using compression ratio and bits per pixel. The experimental results show that the proposed algorithm improves the compression of images comparing compared with the RLE, Huffman and LZW algorithms, the proposed method average compression ratio is 1.636383, which is better than the standard lossless image compression.
FUTURE WORK In this paper, we develop a method for improve image compression based on BCH and LZW. We suggest for future work to use BCH with another compression method and that enable to repeat the compression more than three times, and to investigate how to provide a high compression ratio for given images and to find an algorithm that decrease file (map). The experiment dataset in this paper was somehow limited so applying the developed methods on a larger dataset could be a subject for future research and finally extending the work to the video compression is also very interesting, Video data is basically a three-dimensional array of color pixels, that contains spatial and temporal redundancy. Similarities can thus be encoded by registering differences within a frame (spatial), and/or between frames (temporal) where data frame is a set of all pixels that correspond to a single time moment. Basically, a frame is the same as a still picture. Spatial encoding in video compression is performed by taking advantage of the fact that the human eye is unable to distinguish small differences in color as easily as it can perceive changes in brightness, so that very similar areas of color can be “averaged out” in a similar way to JPEG images. With temporal compression only the changes from one frame to the next are encoded as often a large number of the pixels will be the same on a series of frames.
Lossless Image Compression Technique Using Combination Methods
231
REFERENCES 1.
R. C. Gonzalez, R. E. Woods and S. L. Eddins, “Digital Image Processing Using MATLAB,” Pearson Prentice Hall, USA, 2003. 2. K. D. Sonal, “Study of Various Image Compression Techniques,” Proceedings of COIT, RIMT Institute of Engineering & Technology, Pacific, 2000, pp. 799-803. 3. M. Rabbani and W. P. Jones, “Digital Image Compression Techniques,” SPIE, Washington. doi:10.1117/3.34917 4. D. Shapira and A. Daptardar, “Adapting the Knuth-Morris-Pratt Algorithm for Pattern Matching in Huffman Encoded Texts,” Information Processing and Management, Vol. 42, No. 2, 2006, pp. 429-439. doi:10.1016/j.ipm.2005.02.003 5. H. Zha, “Progressive Lossless Image Compression Using Image Decomposition and Context Quantization,” Master Thesis, University of Waterloo, Waterloo. 6. W. Walczak, “Fractal Compression of Medical Images,” Master Thesis, School of Engineering Blekinge Institute of Technology, Sweden. 7. R. Rajeswari and R. Rajesh, “WBMP Compression,” International Journal of Wisdom Based Computing, Vol. 1, No. 2, 2011. doi:10.1109/ ICIIP.2011.6108930 8. M. Poolakkaparambil, J. Mathew, A. M. Jabir, D. K. Pradhan and S. P. Mohanty, “BCH Code Based Multiple Bit Error Correction in Finite Field Multiplier Circuits,” Proceedings of the 12th International Symposium on Quality Electronic Design (ISQED), Santa Clara, 1416 March 2011, pp. 1-6. doi:10.1109/ISQED.2011.5770792 9. B. Ranjan, “Information Theory, Coding and Cryptography,” 2nd Edition, McGraw-Hill Book Company, India, 2008. 10. P. G. Howard and V. J. Scott, “New Method for Lossless Image Compression Using Arithmetic Coding,” Information Processing & Management, Vol. 28, No. 6, 1992, pp. 749-763. doi:10.1016/03064573(92)90066-9 11. P. Franti, “A Fast and Efficient Compression Method for Binary Image,” 1993. 12. M. Burrows and D. J. Wheeler, “A Block-Sorting Lossless Data Compression Algorithm,” Systems Research Center, Vol. 22, No. 5, 1994.
232
Information and Coding Theory in Computer Science
13. 13. B. Meyer and P. Tischer, “TMW—a New Method for Lossless Image Compression,” Australia, 1997. 14. 14. M. F. Talu and I. Türkoglu, “Hybrid Lossless Compression Method for Binary Images,” University of Firat, Elazig, Turkey, 2003. 15. 15. L. Zhou, “A New Highly Efficient Algorithm for Lossless Binary Image Compression,” Master Thesis, University of Northern British Columbia, Canada, 2004. 16. 16. N. J. Brittain and M. R. El-Sakka, “Grayscale True TwoDimensional Dictionary-Based Image Compression,” Journal of Visual Communication and Image Representation, Vol. 18, No. 1, pp. 35-44. 17. 17. R.-C. Chen, P.-Y. Pai, Y.-K. Chan and C.-C. Chang, “Lossless Image Compression Based on Multiple-Tables Arithmetic Coding,” Mathematical Problems in Engineering, Vol. 2009, 2009, Article ID: 128317. doi:10.1155/2009/128317 18. 18. J. H. Pujar and L. M. Kadlaskar, “A New Lossless Method of Image Compression and Decompression Using Huffman Coding Technique,” Journal of Theoretical and Applied Information Technology, Vol. 15, No. 1, 2010. 19. 19. H. Bahadili and A. Rababa’a, “A Bit-Level Text Compression Scheme Based on the HCDC Algorithm,” International Journal of Computers and Applications, Vol. 32, No. 3, 2010. 20. 20. R. Al-Hashemi and I. Kamal, “A New Lossless Image Compression Technique Based on Bose,” International Journal of Software Engineering and Its Applications, Vol. 5, No. 3, 2011, pp. 15-22.
Chapter
NEW RESULTS IN PERCEPTUALLY LOSSLESS COMPRESSION OF HYPERSPECTRAL IMAGES
11
Chiman Kwan and Jude Larkin Applied Research LLC, Rockville, Maryland, USA
ABSTRACT Hyperspectral images (HSI) have hundreds of bands, which impose heavy burden on data storage and transmission bandwidth. Quite a few compression techniques have been explored for HSI in the past decades. One high performing technique is the combination of principal component analysis (PCA) and JPEG-2000 (J2K). However, since there are several new compression codecs developed after J2K in the past 15 years, it is worthwhile to revisit this research area and investigate if there are better techniques for HSI compression. In this paper, we present some new Citation: Kwan, C. and Larkin, J. (2019), “New Results in Perceptually Lossless Compression of Hyperspectral Images”. Journal of Signal and Information Processing, 10, 96-124. doi: 10.4236/jsip.2019.103007. Copyright: © 2019 by authors and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0
234
Information and Coding Theory in Computer Science
results in HSI compression. We aim at perceptually lossless compression of HSI. Perceptually lossless means that the decompressed HSI data cube has a performance metric near 40 dBs in terms of peak-signal-to-noise ratio (PSNR) or human visual system (HVS) based metrics. The key idea is to compare several combinations of PCA and video/ image codecs. Three representative HSI data cubes were used in our studies. Four video/image codecs, including J2K, X264, X265, and Daala, have been investigated and four performance metrics were used in our comparative studies. Moreover, some alternative techniques such as video, split band, and PCA only approaches were also compared. It was observed that the combination of PCA and X264 yielded the best performance in terms of compression performance and computational complexity. In some cases, the PCA + X264 combination achieved more than 3 dBs than the PCA + J2K combination. Keywords: Hyperspectral Images (HSI), Compression, Perceptually Lossless, Principal Component Analysis (PCA), Human Visual System (HVS), PSNR, SSIM, JPEG-2000, X264, X265, Daala
INTRODUCTION Hyperspectral images (HSI) have found a wide range of applications, including remote chemical monitoring [1] , target detection [2] , anomaly and change detection [3] [4] [5] , etc. Due to the presence of hundreds of bands in HSI, however, heavy burden in data storage and transmission bandwidth has been introduced. For many practical applications, it is unnecessary to compress data losslessly because lossless compression can achieve only two to three times of compression. Instead, it will be more practical to apply perceptually lossless compression [6] [7] [8] [9] . A simple rule of thumb is that if the peak-signal-to-noise ratio (PSNR) or human visual system (HVS) inspired metric is above 40 dBs, then the decompressed image is considered as “near perceptually lossless” [10] . In several recent papers, we have applied perceptually lossless compression to maritime images [10] , sonar images [10] , and Mastcam images [11] [12] [13] . In the past few decades, there are some alternative techniques for compressing HSI. In [14] , a tensor approach was proposed to compress the HSI. In [15] , a missing data approach was presented to compress HSI. Another simple and straightforward approach is to apply PCA directly to HSI. For instance, in [3] , the authors have used 10 PCA compressed
New Results in Perceptually Lossless Compression of Hyperspectral Images
235
bands for anomaly detection. There are also some conventional, simple, and somewhat naïve approaches, to compressing HSI. One idea known as split band (SB) is to split the hundreds of HSI bands into groups of 3-band images and then compress each 3-band image separately. Another idea known as the video approach (Video) is to treat the 3-band images as video frames and compress the frames as a video. The SB and Video approaches have been used for multispectral images [13] and were observed to achieve reasonable performance. One powerful approach to HSI compression is the combination of PCA and J2K [16] . The idea was to first apply PCA to decorrelate the hundreds of bands and then a J2K codec is then applied to compress the few PCA bands. In the compression literature, there are a lot of new developments after J2K [17] in the past 15 years. X264 [18] , a fast implementation of H264 standard, has been widely used in Youtube and many other social media platforms. X265 [19] , a fast implementation of H265, is a new codec that will succeed X264. Moreover, a free video codec known as Daala, emerged recently [20] . In light of these new codecs, it is about time and worthwhile to revisit the HSI compression problem. In this paper, we summarize our study in this area. Our aim is to achieve perceptually lossless compression of HSI at 100 to 1 compression. The key idea is to compare several combinations of PCA and video/image codecs. Three representative HSI data cubes such as the Pavia and AVIRIS datasets were used in our studies. Four video/image codecs, including J2K, X264, X265, and Daala, have been investigated and four performance metrics were used in our comparative studies. Moreover, some alternative techniques such as video, split band, and PCA only approaches were also compared. It was observed that the combination of PCA and X264 yielded the best performance in terms of compression performance (rate-distortion curves) and computational complexity. In the Pavia data case, the PCA + X264 combination achieved more than 3 dBs than the PCA + J2K combination. Most importantly, our investigations showed that the PCA + X264 combination can achieve more than 40 dBs of PSNR at 100 to 1 compression. This means that perceptually lossless compression of HSI is achievable even at 100 to 1 compression. The key contributions are as follows. First, we revisited the hyperspectral image compression problem and extensively compared several approaches: PCA only, Video approach, Split Band approach, and a two-step approach.
236
Information and Coding Theory in Computer Science
Second, for the two-step approach, we compared four variants: PCA + J2K, PCA + X264, PCA + X265, and PCA + Daala. We observed that the twostep approach is better than PCA only, Video, and Split Band approaches, as perceptually lossless compression can be achieved at 100 to 1 ratio. Third, within the two-step approach, our experiments showed that the PCA + X264 combination is better than other variants in terms of performance and computational complexity. To the best of our knowledge, we have not seen such a study in the literature. Our paper is organized as follows. Section 2 summarizes the HSI data, the technical approach, the various algorithms, and performance metrics. In Section 3, we focus on the experimental results, including the PCA only results, video approach, split band approach, and two-step approach (PCA + video codecs). Four performance metrics were used to compare different algorithms. Finally, some concluding remarks are included in Section 4.
DATA AND APPROACH Data We have used several representative HSI data in this paper. The Pavia and AVIRIS image cubes were collected using airborne sensors and the Air Force image was collected on the ground. The numbers of bands in the three data sets vary from one hundred to more than two hundred. Image 1: Pavia [21] The first image we had tested was the Pavia data with a 610 × 340 × 103 image cube. The image was taken with a Reflective Optics System Imaging Spectrometer (ROSIS) sensor during a flight over northern Italy. Figure 1 shows the RGB bands of the Pavia image cube. Image 2: AF image The second image was the image cube used in [3] and it consists of 124 bands and has a height of 267 pixels and a width of 342 pixels. The RGB image of this data set is shown in Figure 2. Image 3: AVIRIS The third image was taken from NASA’s Airborne Visible Infrared Imaging Spectrometer (AVIRIS). There are 213 bands with wavelengths from 380 nm to 2500 nm. The image size is 300 × 300 × 213. Figure 3 shows the RGB image of the data cube.
New Results in Perceptually Lossless Compression of Hyperspectral Images
Figure 1. RGB image of the Pavia image cube.
Figure 2. RGB image of the AF image cube.
237
238
Information and Coding Theory in Computer Science
Figure 3. RGB image of the AVIRIS image cube.
Compression Approaches Here, we first present the various work flows of several representative compression approaches for HSI. We then include some background materials for several video/image codecs in the literature. We will also mention two conventional performance metrics and two other metrics motivated by human visual systems (HVS).
PCA Only PCA is also known as Karhunen-Loève transform (KLT). Comparing with discrete cosine transform (DCT) and wavelet transform, PCA is optimal because it is data-dependent whereas the DCT and WT are independent of input data. The work flow is shown in Figure 4. After some preprocessing steps, PCA compresses the raw HSI data cube (N bands) into a pre-defined number of bands (r bands) and those r bands will be saved or transmitted. At the receiving end, an inverse PCA will be performed to reconstruct the HSI image cube.
Split Band (SB) Approach This idea is very simple. The HSI bands are divided into groups of 3-band images. Each 3-band image is then compressed as a still image with an image codec. This approach has been observed to work well for multispectral (MS) image cubes [13] where there are only nine bands. The work flow is shown in Figure 5.
New Results in Perceptually Lossless Compression of Hyperspectral Images
239
Video Approach This approach is similar to the SB approach. Here, the 3-band images are treated as video frames and then a video codec is then applied. Details can be found in Figure 5. We include some details for some of the blocks.
Pre-processing The preprocessing has a few components. First, it is important to ensure the input image dimensions to have even numbers because some codecs may crash if the image size has odd dimensions. Second, the input image is normalized to double precision with values between 0 and 1. Third, the different bands are saved into tiff format. Fourth, all the bands are written into YUV444 and Y4M formats.
Codecs Different codecs have different requirements. For J2K, we used Matlab’s video writer to create a J2K format with certain quality parameters. We then used Matlab’s data reader to decode the compressed data and the individual frames will be retrieved. For X264 and X265, the videos are encoded using the respective encoders with certain quality parameters. The video decoding was done within FFMPEG. For Daala, we directly used the Daala’s functions for encoding and decoding.
Performance Evaluation In the evaluation part, each frame is reconstructed and compared to the original input band. Four performance metrics have been used.
Figure 4. PCA only compression work flow.
Information and Coding Theory in Computer Science
240
Figure 5. Work flow for SB and Video approaches.
Two-step Approach: PCA + Video The two-step approach has been used in [13] [16] before. In [13] , X265 was observed to perform better in the second step. In [16] , the second step was a J2K codec. However, the study in [13] was for MS images rather than an HSI. The work flow for the two-step approach is summarized in Figure 6. In the second step, we propose to treat the PCA bands as a video.
Brief Review of Relevant Compression Algorithms Instead of reinventing the wheels, we will use image codecs in the market and objectively evaluate different codecs and eventually recommend the best codec to our customer. With the above in mind, we include a brief overview of some representative codecs.
DCT based algorithms • •
•
JPEG [22]: JPEG is the very first image compression standard. The video counterparts are the MPEG-1 and MPEG-2 standards. JPEG-XR [23]: It was developed by Microsoft. The performance is comparable to JPEG-2000. It is mainly used for still image compression. VP8 and VP9 [24] [25]: These video compression algorithms are owned by Google. The performance is somewhat close to X-264. We did include VP8 and VP9 in our study because they are not as popular as X264 and X265.
New Results in Perceptually Lossless Compression of Hyperspectral Images
•
•
241
X-264 [18]: X264 is the current state-of-the-art in video compression. Youtube uses X264. It has good still image compression. X-265 [19]: This is the next-generation video codec and has excellent still image compression and video compression. However, the computational complexity is much more than that of X264. In general, X265 has the same basic structure as previous standards.
Figure 6. Two-step approach to HSI compression.
Several studies concluded that X265 yields the same quality as X264, but with only half of the bitrate. It should be noted that X264 and X265 are optimized versions of H264 and H265, respectively.
Daala [20] Recently, there is a parallel activity at xiph.org foundation, which implements a compression codec called Daala [20]. It is based on DCT. There are preand post-filters to increase energy compaction and remove block artifacts. Daalaborrows ideas from [26] . The block-coding framework in Daala can be illustrated in Figure 7. In this study, we compared Daala with X264, X265, and J2K in our experiments.
Wavelet-based Algorithms J2K is a wavelet [17] [27] [28] [29] based compression standard. It has better performance than JPEG. However, J2K requires the use of the whole image for coding and hence is not suitable for real-time applications. In addition, motion-J2K for video compression is not popular in the market.
Performance Metrics In almost all compression systems, researchers used peak signal-to-noise ratio (PSNR) or structural similarity (SSIM) to evaluate the compression algorithms. Given a fixed compression ratio, algorithms that yield higher
242
Information and Coding Theory in Computer Science
PSNR or SSIM will be regarded as better algorithms. However, PSNR or SSIM do not correlate well with human perception. Recently, a group of researchers investigated a number of different performance metrics [30] . Extensive experiments were performed to investigate the correlation between human perceptions with various performance metrics. According to the results found in [30] , it was determined that two performance metrics correlate well with human perception. One image example shown in Figure 8 demonstrates that HVS and HVSm have high correlation with human subjective evaluation results. In the past, we have used HVS and HVSm in several applications [11] [12] [13] .
EXPERIMENTAL RESULTS Here, we briefly describe the experimental settings. In PCA only approach, a program was written for PCA. The input is one hyperspectral image and the number of principal components to be used in the compression. The outputs are the PCA bands. The performance metrics are generated by comparing the original hyperspectral image with the inverse-PCA outputs. In the Video only approach, we used ffmpeg to call X264 and X265. For Daala, we used the latest open-source code in Daala’s website. For J2K, we used the built-in J2K function in Matlab.
Figure 7. Daala codec for block-based image coding systems.
New Results in Perceptually Lossless Compression of Hyperspectral Images
243
Figure 8. Comparison of SSIM and PSNR-HVS-M (HVSm). HVSm has better correlation with human perception [30] .
In each codec, there is a quantization or quality parameter (qp) that controls the compression ratio. We chose around 50 qp parameters in our experiments in order to generate smooth performance curves. In the two-step approach, the PCA is applied first, followed by the video codecs.
Experiment 1: Pavia Data PCA Only Here, we applied PCA directly to compress the 103 bands to 3, 6, and 9 bands, which we denote as PCA3, PCA6, and PCA9, respectively. From Figure 9, one can see that PCA3 achieved 33 times of compression with 44.75 dB of PSNR. The other metrics are also high. Similarly, PCA6 and PCA9 also attained high values in performance metrics. This means that PCA alone can achieve reasonable compression performance. However, if our goal is to achieve 100 to 1 compression with higher than 40 dBs of PSNR, then the PCA only approach may be insufficient.
Video Approach As mentioned earlier, the video approach treats the HSI data cube as a video where each frame takes 3 bands out of the data cube. There are 35 frames in total in the video for the Pavia data. We then applied four video codecs (J2K, X264, X265, and Daala) to the video. Four performance metrics were generated as shown in Figure 10. If one compares the metrics in Figure 9 and
244
Information and Coding Theory in Computer Science
Figure 10, one can see that Video approach is slightly better than the PCA only approach. For instance, at 0.03 compression ratio, PCA3 yielded 38.2 dBs and the Video approach yielded more than 40 dBs in terms of PSNR. X265 performed better than others at compression ratios less than 0.1.
Split Band (SB) Approach Here, SB approach means that every 3 bands in the hyperspectral image cube are treated as a separate image. We then applied four image codecs to each 3-band image. The averaged metrics from all 3-band images were computed. Figure 11 summarizes the performance metrics. J2K has better scores in three out of four metrics. Comparing the Video and SB approaches as shown in Figure 10 and Figure 11, one can see that the Video approach is slightly better. For example, at a compression ratio of 0.05, J2K has 43 dBs (HVSm) using the SB approach and X265 has 52 dBs (HVSm) using the Video approach.
Two-step approach Two-step approach first compresses the HSI cube by using PCA to a number of bands (3, 6, 9, etc.) The second step applies a video codec to compress the PCA bands. We have five case studies below.
Figure 9. Performance of PCA only: (a) PSNR in dB for Pavia; (b) SSIM for Pavia; (c) HVS in dB for Pavia; (d) HVSm in dB for Pavia.
New Results in Perceptually Lossless Compression of Hyperspectral Images
245
Figure 10. Performance of video approach: (a) PSNR in dB for Pavia; (b) SSIM for Pavia; (c) HVS in dB for Pavia; (d) HVSm in dB for Pavia.
Figure 11. Performance of SB approach: (a) PSNR in dB for Pavia; (b) SSIM for Pavia; (c) HVS in dB for Pavia; (d) HVSm in dB for Pavia.
PCA3 + Video Codec Figure 12 summarizes the two-step approach (PCA3 + Video). It can be seen that, at 0.01 compression ratio, the two-step approach can get above
246
Information and Coding Theory in Computer Science
40 dBs of PSNR. The other metrics are also high. Daala has better visual performance (HVS and HVsm) than others. We can also notice that the PCA3 + Video approach can attain much higher compression than PCA only, SB, and Video approaches. That is, the compression ratio can be more than 100 times compression with close to 40 dBs of HVSm in the two-step approach whereas the SB and Video approach cannot achieve 100 to 1 compression with the same performance metrics (40 dBs).
PCA6 + Video Figure 13 summarizes the PCA6 + Video results. At 0.01 compression ratio, the PCA6 + Video approach appears to be slightly better than that of PCA3 + Video. X264 is better than others in three out of four metrics. In particular, at 0.01 compression, X264 has 45 dBs in terms of HVSm. This value is very high and can be considered as perceptually lossless.
PCA9 + Video Approach From Figure 14, it is clear that PCA9 + Video is slightly worse as compared to the PCA6 + Video case. For example, at 0.01 compression ratio, PCA9 + Video has 44 dBs in terms of PSNR whereas PCA6 + Video has 45 dBs of PSNR. X264 is better than other codecs.
Figure 12. Performance of two-step (PCA3 + Video) approach: (a) PSNR in dB for Pavia; (b) SSIM for Pavia; (c) HVS in dB for Pavia; (d) HVSm in dB for Pavia.
New Results in Perceptually Lossless Compression of Hyperspectral Images
247
Figure 13. Performance of two-step (PCA6 + Video) approach: (a) PSNR in dB for Pavia; (b) SSIM for Pavia; (c) HVS in dB for Pavia; (d) HVSm in dB for Pavia.
Figure 14. Performance of two-step (PCA9 + Video) approach: (a) PSNR in dB for Pavia; (b) SSIM for Pavia; (c) HVS in dB for Pavia; (d) HVSm in dB for Pavia.
248
Information and Coding Theory in Computer Science
PCA12 + Video As shown in Figure 15, the performance of PCA12 + Video is somewhat similar to PCA9 + Video.
Figure 15. Performance of two-step (PCA12 + Video) approach: (a) PSNR in dB for Pavia; (b) SSIM for Pavia; (c) HVS in dB for Pavia; (d) HVSm in dB for Pavia.
PCA15 + Video As shown in Figure 16, the performance of PCA15 + Video is somewhat similar to PCA12 + Video.
New Results in Perceptually Lossless Compression of Hyperspectral Images
249
Figure 16. Performance of two-step (PCA15 + video) approach: (a) PSNR in dB for Pavia; (b) SSIM for Pavia; (c) HVS in dB for Pavia; (d) HVSm in dB for Pavia.
Comparison of Different Combination of the Two-step Approaches The performance comparison of different combinations of the two-step approaches is summarized in Table 1. First, we observe that PCA3 and PCA6 have better performance than PCA9 to PCA15. Second, PCA6 has better performance in X264 and Daala. Third, for PCA6, we observed that X264 is 3.16 dBs in terms of PSNR better than J2K. In terms of HVSm, X264 is 4.2 dBs better than J2K. This is quite significant. For PCA6, Daala has slightly better performance than X264 and X265. However, we noticed that Daala took more computational times than X264. Hence, for practical applications, X264 may be a better choice for HSI compression.
250
Information and Coding Theory in Computer Science
Table 1. Performance comparison of different combinations of two-step approach. Bold numbers indicate the best performing method for each column.
New Results in Perceptually Lossless Compression of Hyperspectral Images
251
Experiment 2: AF Image Cube PCA only Approach Here, we applied PCA directly to compress the 124 bands to 3, 6, and 9 bands, which we denote as PCA3, PCA6, and PCA9, respectively. From Figure 17, one can see that PCA3 achieved 42 times of compression with 40.7 dBs of PSNR. The other metrics are also high. Similarly, PCA6 and PCA9 also attained high performance, but lower compression ratios. This means PCA alone can achieve reasonable compression. However, PCA only is not enough to achieve 100 to 1 compression.
Figure 17. Performance of PCA only: (a) PSNR in dB for AF image cube; (b) SSIM for AF image cube; (c) HVS in dB for AF image cube; (d) HVSm in dB for AF image cube.
Video Approach Comparing the performance of video approach (Figure 18) with the PCA only approach (Figure 17), one can immediately notice that the Video approach allows higher compression ratios to be achieved. For instance, at 0.01 compression ratio, X265 achieved about 38 dBs in PSNR. X265 performs well for small ratios (high compression).
252
Information and Coding Theory in Computer Science
Figure 18. Performance of Video approach: (a) PSNR in dB for AF image cube; (b) SSIM for AF image cube; (c) HVS in dB for AF image cube; (d) HVSm in dB for AF image cube.
SB Approach Comparing the SB approach in Figure 19 with the Video approach in Figure 18, the Video approach is better. For instance, if one looks at the PSNR values at 0.05 compression ratio, one can see that the X265 codec in the Video approach has a value of 44 dBs whereas the best codec (J2K) has a value of 41.5 dBs.
Figure 19. Performance of SB approach: (a) PSNR in dB for AF image cube; (b) SSIM for AF image cube; (c) HVS in dB for AF image cube; (d) HVSm in dB for AF image cube.
New Results in Perceptually Lossless Compression of Hyperspectral Images
253
Two-step Approach Here, the PCA is combined with the Video approach. That is, the PCA is first applied to the 124 bands to obtain 3, 6, 9, 12, and 15 bands. After that, a video codec is applied to further compress the PCA bands.
PCA3 + Video From Figure 20, we can see that the PCA3 + Video can achieve 0.01 compression ratio with more than 40 dBs of PSNR. Hence, the performance is better than the earlier approaches (PCA only, Video, and SB). Daala has better performance in terms of HVS and HVsm.
Figure 20. Performance of PCA3 + Video approach: (a) PSNR in dB for AF image cube; (b) SSIM for AF image cube; (c) HVS in dB for AF image cube; (d) HVSm in dB for AF image cube.
PCA6 + Video From Figure 21 and Figure 20, we can see that PCA6 + Video is better than PCA3 + Video. For example, at 0.01 compression ratio, Daala has 44 dBs (HVSm) for PCA6 + Video whereas Daala only has 34.75 dB for PCA3 + Video. X264 has better metrics in PSNR and SSIM, but Daala has better performance in terms of HVS and HVSm.
254
Information and Coding Theory in Computer Science
Figure 21. Performance of PCA6 + Video approach: (a) PSNR in dB for AF image cube; (b) SSIM for AF image cube; (c) HVS in dB for AF image cube; (d) HVSm in dB for AF image cube.
PCA9 + Video Comparing Figure 21 and Figure 22, PCA9 + Video is worse than that of PCA6 + Video. For instance, at 0.01 compression ratio, PCA9 + Video has 42 dBs (PSNR) and PCA6 + Video has slightly over 44 dBs of PSNR. Daala has better scores in HVS and HVSm, but X264 has higher values in PSNR and SSIM.
Figure 22. Performance of PCA9 + Video approach: (a) PSNR in dB for AF image cube; (b) SSIM for AF image cube; (c) HVS in dB for AF image cube; (d) HVSm in dB for AF image cube.
New Results in Perceptually Lossless Compression of Hyperspectral Images
255
PCA12 + Video As shown in Figure 23, the performance of PCA12 + Video is worse than some of the earlier combinations. For example, Daala’s HVSm value is 40 dBs at 0.01 compression ratio and this is lower than PCA6 + Video (Figure 21) and PCA9 + Video (Figure 22). PCA12 + Video is better than PCA3 + Video (Figure 20).
Figure 23. Performance of PCA12 + Video approach: (a) PSNR in dB for AF image cube; (b) SSIM for AF image cube; (c) HVS in dB for AF image cube; (d) HVSm in dB for AF image cube.
PCA15 + Video From Figure 24, we can see that PCA15 + Video is similar to PCA3 + Video (Figure 20), but worse than the other PCA + Video combinations (Figure 21, Figure 22, Figure 23).
256
Information and Coding Theory in Computer Science
Figure 24. Performance of PCA15 + Video approach: (a) PSNR in dB for AF image cube; (b) SSIM for AF image cube; (c) HVS in dB for AF image cube; (d) HVSm in dB for AF image cube.
Comparison of Different Combinations of the Two-step Approach From Table 2, we have the following observations. First, PCA6 + Video combination has the best performance for each codec. Second, X264 has the best performance in PSNR whereas Daala has the best performance in HVS and HVSm. Third, X264 is 1.45 dBs better than J2K for PCA6 case. As mentioned earlier, X264 is faster to run than Daala. Hence, X264 may be more suitable in practical applications. Table 2. Performance comparison of different combinations of two-step approach. Bold numbers indicate the best performing method for each column.
New Results in Perceptually Lossless Compression of Hyperspectral Images
257
Experiment 3: AVIRIS Image Cube PCA Only Approach Here, we applied PCA directly to compress the 213 bands to 3, 6, and 9 bands, which we denote as PCA3, PCA6, and PCA9, respectively. From Figure 25, one can see that PCA3 achieved 72 times of compression with 39.7 dBs of PSNR. The other metrics are also high. Similarly, PCA6 and PCA9 also attained high performance, but lower compression ratios. This means PCA alone can achieve reasonable compression.
258
Information and Coding Theory in Computer Science
Figure 25. Performance of PCA only: (a) PSNR in dB for AVIRIS image cube; (b) SSIM for AVIRIS image cube; (c) HVS in dB for AVIRIS image cube; (d) HVSm in dB for AVIRIS image cube.
Video Approach Here, the 213 bands are divided into groups of 3 bands. As a result, there are 71 groups, which are then treated as 73 frames in a video. After that, different video codecs are applied. The performance metrics are shown in Figure 26. Comparing with PCA only approach, the video approach is slightly inferior. For instance, PCA6 has PSNR of 44 dBs at a compression ratio of 0.028 whereas the Video only approach has about 42.5 dBs at 0.028 ratio.
New Results in Perceptually Lossless Compression of Hyperspectral Images
259
Figure 26. Performance of Video approach: (a) PSNR in dB for AVIRIS image cube; (b) SSIM for AVIRIS image cube; (c) HVS in dB for AVIRIS image cube; (d) HVSm in dB for AVIRIS image cube.
SB Approach Here the 73 groups of 3-band images are compressed separately. The results shown in Figure 27 are worse than the video approach. This is understandable as the correlations between frames were not taken into account in the SB approach.
Figure 27. Performance of SB approach: (a) PSNR in dB for AVIRIS image cube; (b) SSIM for AVIRIS image cube; (c) HVS in dB for AVIRIS image cube; (d) HVSm in dB for AVIRIS image cube.
260
Information and Coding Theory in Computer Science
Two-step Approach We have the following five case studies based on the number of PCA bands coming out of the first step.
PCA3 + Video From Figure 28, the performance metrics appear to flatten out after a compression ratio of 0.005. The maximum PSNR value is below 40 dBs. Other metrics are also not very high. Comparing with the PCA only, Video, and SB approaches, PCA3 + Video does not show any advantages.
Figure 28. Performance of PCA3 + Video approach: (a) PSNR in dB for AVIRIS image cube; (b) SSIM for AVIRIS image cube; (c) HVS in dB for AVIRIS image cube; (d) HVSm in dB for AVIRIS image cube.
PCA6 + Video From Figure 29, we can see that PCA6 + Video has much better performance than PCA3 + Video as well as PCA only, Video, and SB approaches. At 0.01 compression ratio, the PSNR values reached more than 42 dBs. Other metrics also performed well. Daala has higher scores in HVS and HVSm. X265 is slightly better in PSNR and SSIM.
New Results in Perceptually Lossless Compression of Hyperspectral Images
261
Figure 29. Performance of PCA6 + Video approach: (a) PSNR in dB for AVIRIS image cube; (b) SSIM for AVIRIS image cube; (c) HVS in dB for AVIRIS image cube; (d) HVSm in dB for AVIRIS image cube.
PCA9 + Video Comparing Figure 29 and Figure 30, we can see that PCA9 + Video has better metrics than that of PCA6 + Video. For instance, at 0.01 compression ratio, PCA9 + Video has achieved 40 dBs of HVSm (Daala), but PCA6 + Video has 38.5 dBs.
Figure 30. Performance of PCA9 + Video approach: (a) PSNR in dB for AVIRIS image cube; (b) SSIM for AVIRIS image cube; (c) HVS in dB for AVIRIS image cube; (d) HVSm in dB for AVIRIS image cube.
262
Information and Coding Theory in Computer Science
PCA12 + Video Comparing Figure 30 and Figure 31, we can see that PCA12 + Video is slightly worse than that of PCA9 + Video.
Figure 31. Performance of PCA12 + Video approach: (a) PSNR in dB for AVIRIS image cube; (b) SSIM for AVIRIS image cube; (c) HVS in dB for AVIRIS image cube; (d) HVSm in dB for AVIRIS image cube.
PCA15 + Video Comparing Figure 31 and Figure 32, it can be seen that PCA15 + Video is slightly worse than PCA12 + Video.
Figure 32. Performance of PCA15 + Video approach: (a) PSNR in dB for AVIRIS image cube; (b) SSIM for AVIRIS image cube; (c) HVS in dB for AVIRIS image cube; (d) HVSm in dB for AVIRIS image cube.
New Results in Perceptually Lossless Compression of Hyperspectral Images
263
Comparison of Different Combinations in the Two-step Approach Table 3 summarizes the performance metrics of different combinations of the two-step approach. First, PCA6 performed better than other PCA + Video combinations. Second, X265 performed better than other codecs. However, since X265 requires a lot of computational power and X264 is only slightly inferior to X265, it is better to use the PCA6 + X264 combination. Third, the HVSm values in Daala and X265 are somewhat strange, but we could not find an explanation for that even though we repeated the experiments several times. This behavior only happened in the AVIRIS data. Fourth, we noticed that X264 is 0.25 dBs in PSNR better than that of J2K. Table 3. Performance comparison of different combinations of two-step approach at 0.01 compression ratio for the AVIRIS data. Bold numbers indicate the best performing method for each column.
264
Information and Coding Theory in Computer Science
CONCLUSION In this paper, we summarize some new results for HSI compression. The key idea is to revisit a two-step approach to HSI data compression. The first step adopts PCA to compress the HSI data spectrally. That is, the number of bands is greatly reduced to a few bands via PCA. The second step applies the latest video/image codecs to further compress the few PCA bands. Four well-known codecs (J2K, X264, X265, and Daala) were used in the second step. Three HSI data sets with diversely varying numbers of bands were used in our studies. Four performance metrics were utilized in our experiments. We have several key observations. First, we observed that compressing of the HIS to six bands has the best overall performance in all of the three HSI data sets. This is different from the observation in [16] where more PCA bands were included in the J2K step. Second, the X264 codec gave the best performance in terms of compression performance and computational complexity. Third, the PCA6 + X264 combination can be 3 dBs better than the PCA6 + J2K combination in the Pavia data at 100 to 1 compression and this is quite significant. Fourth, even at 100 to 1 compression, the PCA6 +
New Results in Perceptually Lossless Compression of Hyperspectral Images
265
X264 combination can attain better than 40 dBs in PSNR for all of the three data sets. This means the compression performance is perceptually lossless at 100 to compression.
ACKNOWLEDGEMENTS This research was supported by NASA Jet Propulsion Laboratory under contract # 80NSSC17C0035. The views, opinions and/or findings expressed are those of the author(s) and should not be interpreted as representing the official views or policies of NASA or the U.S. Government.
266
Information and Coding Theory in Computer Science
REFERENCES 1.
2.
3.
4.
5.
6.
7.
8.
9.
Ayhan, B., Kwan, C. and Jensen, J.O. (2019) Remote Vapor Detection and Classification Using Hyperspectral Images. Proceedings SPIE, Chemical, Biological, Radiological, Nuclear, and Explosives (CBRNE) Sensing XX, Vol. 11010, 110100U. https://doi.org/10.1117/12.2518500 Zhou, J., Kwan, C. and Ayhan, B. (2017) Improved Target Detection for Hyperspectral Images Using Hybrid In-Scene Calibration. Journal of Applied Remote Sensing, 11, Article ID: 035010. https://doi. org/10.1117/1.JRS.11.035010 Zhou, J., Kwan, C., Ayhan, B. and Eismann, M. (2016) A Novel Cluster Kernel RX Algorithm for Anomaly and Change Detection Using Hyperspectral Images. IEEE Transactions on Geoscience and Remote Sensing, 54, 6497-6504. https://doi.org/10.1109/TGRS.2016.2585495 Zhou, J., Kwan, C. and Budavari, B. (2016) Hyperspectral Image SuperResolution: A Hybrid Color Mapping Approach. Journal of Applied Remote Sensing, 10, Article ID: 035024. https://doi.org/10.1117/1. JRS.10.035024 Qu, Y., Wang, W., Guo, R., Ayhan, B., Kwan, C., Vance, S. and Qi, H. (2018) Hyperspectral Anomaly Detection through Spectral Unmixing and Dictionary Based Low Rank Decomposition. IEEE Transactions on Geoscience and Remote Sensing, 56, 4391-4405. https://doi. org/10.1109/TGRS.2018.2818159 Wu, H.R., Reibman, A., Lin, W., Pereira, F. and Hemami, S. (2013) Perceptual Visual Signal Compression and Transmission. Proceedings of the IEEE, 101, 2025-2043. https://doi.org/10.1109/ JPROC.2013.2262911 Wu, D., Tan, D.M., Baird, M., DeCampo, J., White, C. and Wu, H.R. (2006) Perceptually Lossless Coding of Medical Images. IEEE Transactions on Medical Imaging, 25, 335-344. https://doi.org/10.1109/ TMI.2006.870483 Oh, H., Bilgin, A. and Marcellin, M.W. (2013) Visually Lossless Encoding for JPEG 2000. IEEE Transactions on Image Processing, 22, 189-201. https://doi.org/10.1109/TIP.2012.2215616 Tan, D.M. and Wu, D. (2016) Perceptually Lossless and Perceptually Enhanced Image Compression System & Method. U.S. Patent 9,516,315,6.
New Results in Perceptually Lossless Compression of Hyperspectral Images
267
10. Kwan, C., Larkin, J., Budavari, B., Chou, B., Shang, E. and Tran, T.D. (2019) A Comparison of Compression Codecs for Maritime and Sonar Images in Bandwidth Constrained Applications. Computers, 8, 32. https://doi.org/10.3390/computers8020032 11. Kwan, C., Larkin, J., Budavari, B. and Chou, B. (2019) Compression Algorithm Selection for Multispectral Mastcam Images. Signal & Image Processing: An International Journal, 10, 1-14. https://doi. org/10.5121/sipij.2019.10101 12. Kwan, C. and Larkin, J. (2018) Perceptually Lossless Compression for Mastcam Images. IEEE Ubiquitous Computing, Electronics & Mobile Communication Conference, New York, 8-10 November 2018, 559565. https://doi.org/10.1109/UEMCON.2018.8796824 13. Kwan, C., Larkin, J. and Chou, B. (2019) Perceptually Lossless Compression of Mastcam Images with Error Recovery. Proceedings SPIE, Signal Processing, Sensor/Information Fusion, and Target Recognition XXVIII, Vol. 11018. https://doi.org/10.1117/12.2518482 14. Li, N. and Li, B. (2010) Tensor Completion for On-Board Compression of Hyperspectral Images. IEEE International Conference on Image Processing, Hong Kong, 517-520. https://doi.org/10.1109/ ICIP.2010.5651225 15. Zhou, J., Kwan, C. and Ayhan, B. (2012) A High Performance Missing Pixel Reconstruction Algorithm for Hyperspectral Images. 2nd International Conference on Applied and Theoretical Information Systems Research, Taipei, 27-29 December 2012. 16. Du, Q. and Fowler, J.E. (2007) Hyperspectral Image Compression Using JPEG2000 and Principal Component Analysis. IEEE Geoscience and Remote Sensing Letters, 4, 201-205. https://doi.org/10.1109/ LGRS.2006.888109 17. JPEG-2000. https://en.wikipedia.org/wiki/JPEG_2000 18. X264. http://www.videolan.org/developers/x264.html 19. X265. https://www.videolan.org/developers/x265.html 20. Daala. https://xiph.org/daala/ 21. http://lesun.weebly.com/hyperspectral-data-set.html 22. JPEG. https://en.wikipedia.org/wiki/JPEG 23. JPEG-XR. https://en.wikipedia.org/wiki/JPEG_XR 24. VP8. https://en.wikipedia.org/wiki/VP8
268
Information and Coding Theory in Computer Science
25. VP9. https://en.wikipedia.org/wiki/VP9 26. Tran, T.D., Liang, J. and Tu, C. (2003) Lapped Transform via TimeDomain Pre- and Post-Filtering. IEEE Transactions on Signal Processing, 51, 1557-1571. https://doi.org/10.1109/TSP.2003.811222 27. Kwan, C., Li, B., Xu, R., Tran, T. and Nguyen, T. (2001) Very Low-Bit-Rate Video Compression Using Wavelets. Wavelet Applications VIII, Proceedings SPIE, Vol. 4391, 176-180. https://doi. org/10.1117/12.421197 28. Kwan, C., Li, B., Xu, R., Tran, T. and Nguyen, T. (2001) SAR Image Compression Using Wavelets. Wavelet Applications VIII, Proceedings SPIE, Vol. 4391, 349-357. https://doi.org/10.1117/12.421215 29. Kwan, C., Li, B., Xu, R., Li, X., Tran, T. and Nguyen, T. (2006) A Complete Image Compression Codec Based on Overlapped Block Transform with Post-Processing. EUROSIP Journal of Applied Signal Processing, 2006, Article ID: 010968. https://doi.org/10.1155/ ASP/2006/10968 30. Ponomarenko, N., Silvestri, F., Egiazarian, K., Carli, M., Astola, J. and Lukin, V. (2007) On Between-Coefficient Contrast Masking of DCT Basis Functions. Proceedings of the 3rd International Workshop on Video Processing and Quality Metrics for Consumer Electronics, Scottsdale, Arizona.
Chapter
LOSSLESS COMPRESSION OF DIGITAL MAMMOGRAPHY USING BASE SWITCHING METHOD
12
Ravi kumar Mulemajalu1 and Shivaprakash Koliwad2 Department of IS&E., KVGCE, Sullia, Karnataka, India
1
Department of E&C., MCE, Hassan, Karnataka, India.
2
ABSTRACT Mammography is a specific type of imaging that uses low-dose x-ray system to examine breasts. This is an efficient means of early detection of breast cancer. Archiving and retaining these data for at least three years is expensive, difficult and requires sophisticated data compression techniques. We propose a lossless compression method that makes use of the smoothness property of the images. In the first step, de-correlation of the given image is done using two efficient predictors. The two residue images Citation: Mulemajalu, R. and Koliwad, S. (2009), “Lossless Compression of digital mammography using base switching method”. Journal of Biomedical Science and Engineering, 2, 336-344. doi: 10.4236/jbise.2009.25049. Copyright: © 2019 by authors and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0
270
Information and Coding Theory in Computer Science
are partitioned into non overlapping sub-images of size 4x4. At every instant one of the sub-images is selected and sent for coding. The sub-images with all zero pixels are identified using one bit code. The remaining sub- images are coded by using base switching method. Special techniques are used to save the overhead information. Experimental results indicate an average compression ratio of 6.44 for the selected database. Keywords: Lossless Compression, Mammography image, Prediction, Storage Space
INTRODUCTION Breast cancer is the most frequent cancer in the women worldwide with 1.05 million new cases every year and represents over 20% of all malignancies among female. In India, 80,000 women were affected by breast cancer in 2002. In the US, alone in 2002, more than 40,000 women died of breast cancer. 98% of women survive breast cancer if the tumor is smaller than 2 cm [1]. One of the effective methods of early diagnosis of this type of cancer is non-palpable, non-invasive mammography. Through mammogram analysis radiologists have a detection rate of 76% to 94%, which is considerably higher than 57% to 70% detection rate for a clinical breast examination [2]. Mammography is a low dose x-ray technique to acquire an image of the breast. Digital image format is required in computer aided diagnosis (CAD) schemes to assist the radiologists in the detection of radiological features that could point to different pathologies. However, the usefulness of the CAD technique mainly depends on two parameters of importance: the spatial and grey level resolutions. They must provide a diagnostic accuracy in digital images equivalent to that of conventional films. Both pixel size and pixel depth are factors that critically affect the visibility of small low contrast objects or signals, which often are relevant information for diagnosis [3]. Therefore, digital image recording systems for medical imaging must provide high spatial resolution and high contrast sensitivity. Due to this, mammography images commonly have a spatial resolution of 1024x1024, 2048x2048 or 4096x4096 and use 16, 12 or 8 bits/pixel. Figure 1 shows a mammography image of size 1024x1024 which uses 8 bits/pixel.
Lossless Compression of Digital Mammography Using Base Switching...
271
Figure 1. A Mammography image of size 1024x1024 which uses 8 bits/pixel.
Nevertheless, this requirement retards the implementation of digital technologies due to the increment in processing and transmission time, storage capacity and cost that good digital image quality implies. A typical mammogram digitized at a resolution of 4000x5000 pixels with 50-µm spot size and 12 bits results in approximately 40 Mb of digital data. Processing or transmission time of such digital images could be quite long. An efficient data compression scheme to reduce the digital data is needed. The goal of the image compression techniques is to represent an image with as few bits as possible in such a way that the original image can be reconstructed from this representation without or with minimum error or distortion. Basically image compression techniques have been classified into two categories namely lossy and lossless methods. Lossy compression methods cannot achieve exact recovery of the original image, but achieves significant compression ratio. Lossless compression techniques, as their name implies, involve no loss of information. The original data can be recovered exactly from the compressed data. In medical applications, lossless coding methods are required since loss of any information is usually unacceptable [4]. Performance of the lossless compression techniques can be measured in terms of their compression ratio, bits per pixels required in the compressed image and the time for encoding and decoding. On the other hand, since
272
Information and Coding Theory in Computer Science
the lossy compression techniques discard some information, performance measure includes the mean square error and peak signal to noise ratio (PSNR) in addition to the measures used for the lossless compression. Lossless image compression systems typically function in two stages [5]. In the first stage, two-dimensional spatial redundancies are removed by using an image model which can range from a relatively simple causal prediction used in the JPEG-LS [6,7] standard to a more complicated multiscale segmentation based scheme. In the second stage, the two-dimensional de-correlated residual which is obtained from the first stage, along with any parameters used to generate the residual is coded with a one-dimensional entropy coder such as the Huffman or the Arithmetic coder. Existing lossless image compression algorithms can be broadly classified into two kinds: Those based on prediction and those that are transform based. The predictive coding system consists of a prediction that at each pixel of the input image generates the anticipated value of that pixel based on some of the past pixels and the prediction error is entropy coded. Various local, global and adaptive methods can be used to generate prediction. In most cases, however the prediction is formed by a linear combination of some previous pixels. The variance of the prediction error is much smaller than the variance of the gray levels in the original image. Moreover, the first order estimate of the entropy of the error image is much smaller than the corresponding estimate for the original image. Thus higher compression ratio can be achieved by entropy coding the error image. The past pixels used in the prediction are collectively referred to as a context. The popular JPEG-LS standard uses the prediction based coding technique [8]. Transform based algorithms, on the other hand, are often used to produce a hierarchical or multi-resolution representation of an image and work in the frequency domain. The popular JPEG-2000 standard uses the transform based coding technique [9]. Several techniques have been proposed for the lossless compression of the digital Mammography. A. Neekabadi et al. [10] uses chronological sifting of prediction errors and coding the errors using arithmetic coding. For the 50 MIAS (Mammography Image Analysis Society) images, CSPE gives better average compression ratio than JPEG-LS and JPEG-2000. Xiaoli Li et al. [11] uses grammar codes in that the original image is first transformed into a context free grammar from which the original data sequence can be fully reconstructed by performing parallel and recursive substitutions and then using an arithmetic coding algorithm to compress the context free grammar.
Lossless Compression of Digital Mammography Using Base Switching...
273
Compression ratio achieved is promising but it involves more complicated processing and large computation time. Delaunay triangulation method [12] is another approach. It uses geometric predictor based on irregular sampling and the Delaunay triangulation. The difference between the original and the predicted is calculated and coded using the JPEG-LS approach. The method offers lower bit rate than the JPEG-LS, JPEG-lossless, JPEG2000 and PNG. A limitation is the slow execution time. Lossless JPEG2000 and JPEG-LS are considered as the best methods for the mammography images. Lossless JPEG 2000 methods are preferred due to the wide variety of features, but are suffered from a slightly longer encoding and decoding time [13]. Recently, there have been a few instances of using segmentation for lossless compression. Shen and Rangayyan [14] proposed a simple region growing scheme which generates an adaptive scanning pattern. A difference image is then computed and coded using the JBIG compression scheme. Higher compression ratio is possible with such a scheme for the high resolution medical images. But the application of the same scheme to normal images did not result in significant performance improvement. Another scheme reported in literature involves using a variable block size segmentation(VBSS) to obtain a context sensitive encoding of wavelet coefficients, the residual being coded using a Huffman or Arithmetic coder [15,16]. The performance of the method is comparable to that of the lossless JPEG standard. Mar wan Y. et al. [17] proposed fixed block based (FBB) lossless compression methods for the digital mammography. The algorithm codes blocks of pixels within the image that contain the same intensity value, thus reducing the size of the image substantially while encoding the image at the same time. FBB method alone gives small compression ratio but when used in conjunction with LZW it provides better compression ratio. We propose a method based on Base switching (BS). Trees-Juen Chuang et al. [18] have used Base-switching method to compress the general images. [19] And [20] also have used the same concept for the compression of digital images. The algorithm segments the image into non overlapping fixed blocks of size n × n and codes the pixels of the blocks based on the amount of smoothness. In the proposed work we have optimized the original BS method for the compression of mammography images. Specific characteristics of mammography images are well suited for the proposed method. These characteristics include low number of edges and vast smooth regions. The organization of the paper is as follows. Section 2 describes the basic Base Switching (BS) method. The proposed algorithm is given in
274
Information and Coding Theory in Computer Science
Section 3. Experimental results and conclusion are given in Sections 4 and 5 respectively.
BASE-SWITCHING ALGORITHM The BS method divides the original image (gray-level data) into nonoverlapping sub-images of size n × n. Given a n × n sub-image A, whose N gray values are g0,g1,…gN-1, define the “minimum” m, “base” b and the “modified sub-image “AI, whose N gray values are
by
(1) (2) (3) Also, (4) where N = n × n and each of the elements of I is 1. The value of ‘b’ is related to smoothness of the sub-image where smoothness is measured as the difference between maximum and minimum pixel values in the subimage. The number of bits required to code the gray values
is,
(5) Then, total bits required for the whole sub-image is, (6) For example, for the sub-image of Figure 2, n = 4, N = 16, m = 95 & b = 9. Modified sub-image of Figure 3 is obtained by subtracting 95 from every gray values of A.
Figure 2. A sub-image A with n = 4, N = 16, m = 95, b = 9 & b = 4.
Lossless Compression of Digital Mammography Using Base Switching...
275
For the sub-image in Figure 3, since B = 4, ZA =64 bits.
Figure 3. Modified sub-image AI.
In order to reconstruct A, value of B and m should be known. Therefore encoded bit stream consists of m, B and AI coded using B bits. In the computation of B, If b is not an integer power of 2, log2 (b) is rounded to the next higher integer. Thus, in such cases, higher number of bits is used than absolutely required. BS method uses the following concept to exploit this redundancy. It is found that, (7) The image
can be treated as an N digit number
in the base b number system. An integer value function f can be defined such that f (AI, b) = decimal integer equivalent to the base-b number.
(8) (9) Then, number of bits required to store the integer f (AI, b) is
(10) Reconstruction of AI is done by switching the binary (base 2) number to a base b number. Therefore, reconstruction of A needs the value of m and b. The format of representation of a sub-image is as shown below.
276
Information and Coding Theory in Computer Science
For the example of Figure 3, b = 9 and therefore ZB = 51 bits. It is easy to prove that always ZB ≤ ZA. We know that, Maximum value of f (AI, b) = bN-1. Total number of bits required to represent f in binary is (11) Always, (12) This verifies that ZB ≤ ZA. (13)
Formats Used for Encoding Original BS algorithm uses a block size of 3x3 for segmentation. There are three formats used by the original algorithm for encoding the sub-images.
Format 1 If b∈{1,2 …,11}, then the coding format is
This format is economical when b < 23.4
Format 2 If b∈ (12, 13,…., 128}, then the coding format is
Lossless Compression of Digital Mammography Using Base Switching...
277
Here P(min,max) is a pair of two 3 bit numbers indicating the position of minimum and maximum values. If b > 11, writing the positions of minimum and maximum values is economical than coding them.
Format 3 If b∈ (129, 130,…., 256}, then the coding format is
Here, c stands for the category bit. If c is 0, then the block is encoded using Formats 1 or 2; otherwise Format 3 is used.
Hierarchical Use of BS Technique The encoded result of Subsection 2.1 can be compressed further in a hierarchical manner. We can imagine that there is a so-called “base-image”, whose gray values are b0, b1, b2, …, b255; then, since it is a kind of image (except that each value is a base value of a sub-image rather than a gray value of a pixel), we can use the same BS technique to compress these base values. The details are omitted. Besides b, the minimal value m of each block can also be grouped and compressed similarly. We can repeat the same procedure to encode b and m values further.
PROPOSED METHOD In the proposed method, we made following modifications to the basic BS method. • • • •
Prediction Increasing the block size from 3x3 to 4x4 All-zero block removal Coding the minimum value and base value
Prediction After reviewing the BS method, it is found that number of bits required for a sub-image is decided by the value of base ‘b’. If ‘b’ is reduced, the number of bits required for a sub-image is also reduced. In the proposed method, prediction is used to reduce the value of ‘b’ significantly. A predictor generates at each pixel of the input image the anticipated value of that
278
Information and Coding Theory in Computer Science
pixel based on some of the past inputs. The output of the predictor is then rounded to the nearest integer, denoted ˆxn and used to form the difference or prediction error (14) This prediction error is coded by the proposed entropy coder. The decoder reconstructs en from received code words and perform the reverse operation (15) The quality of the prediction for each pixel directly affects how efficiently the prediction error can be encoded. The better this prediction is less is the information that must be carried by the prediction error signal. This, in turn, leads to fewer bits. One way to improve the prediction used for each pixel is to make a number of predictions then chose the one which comes closest to the actual value [21]. This method also called as switched prediction has the major disadvantage that the choice of prediction for each pixel must be sent as overhead information. The proposed prediction scheme uses two predictors and one of them is chosen for every block of pixels of size 4x4. Thus, the choice of prediction is to be made only once for the entire 4x4 block. This reduces the amount of overhead. The two predictions are given in Eq.16 and 17, in that, Pr1 is the popular MED predictor used in JPEG-LS standard and Pr2 is the one used by [5] for the compression of mammography images. For the entire pixels of a block of size 4 × 4, one of the two predictions is chosen depending on the smoothness property of the predicted blocks. Here, smoothness is measured as the difference between maximum and minimum pixel values in the block. The predictor that gives the lowest difference value will be selected for that block. The advantage here is that the overhead required for each block is only one bit.
(16)
(17)
Lossless Compression of Digital Mammography Using Base Switching...
279
Here, A, B, C, D, E and F are the neighbors of pixel involved in prediction as depicted in Figure 4.
Figure 4. Neighbors of pixel involved in prediction.
The proposed switched prediction is described in the equation form in 18 where d1 and d2 are the differences between maximum and minimum values for the two blocks obtained using the two predictors Pr1 and Pr2 respectively. (18) Figure 5 illustrates the prediction technique. It shows two error images which are obtained by using two predictors Pr1 and Pr2 respectively. The BS algorithm divides them into 4x4 sub-images and computes the difference between maximum and minimum pixel values for all the four sub-images. For the first sub-image, difference ‘d1’ is 6 and ‘d2’ is 8, where d1 and d2 are the differences of the sub-images corresponding to the predictors Pr1 and Pr2 respectively. Now, since d1 0) is positive constant by Axiom 1. The generalized entropy (2.5) then reduces to the form (2.15) or
Shannon Entropy: Axiomatic Characterization and Application
337
(2.16) where constants (B − A) and C have been omitted without changing the character of the entropy function. This proves the theorem.
TOTAL SHANNON ENTROPY AND ENTROPY OF CONTINUOUS DISTRIBUTION The definition (2.4) of entropy can be generalized straightforwardly to define the entropy of a discrete random variable. Definition 3.1. Let X ∈ ℝ denote a discrete random variable which takes on the values x1,x2,...,xn with probabilities p1, p2,..., pn, respectively, the entropy H(X) of X is then defined by the expression [4]
(3.1) Let us now generalize the above definition to take account of an additional uncertainty due to the observer himself, irrespective of the definition of random experiment. Let X denote a discrete random variable which takes the values x1,x2,...,xn with probabilities p1, p2,..., pn. We decompose the practical observation of X into two stages. First, we assume that X ∈ L(xi) with probability pi, where L(xi) denotes the ith interval of the set {L(x1),L(x2),...,L(xn)} of intervals indexed by xi. The Shannon entropy of this experiment is H(X). Second, given that X is known to be in the ith interval, we determine its exact position in L(xi) and we assume that the entropy of this experiment is U(xi). Then the global entropy associated with the random variable X is given by (3.2) Let hi denote the length of the ith interval L(xi), (i = 1,2,...,n), and define (3.3) We have then
(3.4)
338
Information and Coding Theory in Computer Science
The expression HT(X) given by (3.4) will be referred to as the total entropy of the random variable X. The above derivation is physical. In fact, what we have used is merely a randomization of the individual event X = xi (i = 1,2,...,n) to account for the additional uncertainty due to the observer himself, irrespective of the definition of random experiment [4]. We will derive the expression (3.4) axiomatically as generalization of Theorem 2.1. Theorem 3.2. Let the generalized entropy (2.3) satisfy, in addition to Axioms 1 to 4 of Theorem 2.1, the boundary conditions (3.5) to take account of the postobservational uncertainty, where hi is the length of the ith class L(xi) (or width of the observational value xi). Then the entropy function reduces to the form of the total entropy (3.4). Proof. The procedure is the same as that of Theorem 2.1 up to the relation (2.13): (3.6) Integrating (3.6) with respect to pj and using the boundary condition (3.5), we have (3.7) so that the generalized entropy (2.3) reduces to the form (3.8) where we have taken A = −k < 0 for the same unit of measurement of entropy and the negative sign to take account of Axiom 1. The constants appearing in (3.8) have been neglected without any loss of characteristic properties. The expression (3.8) is the required expression of total entropy obtained earlier. Let us now see how to obtain the entropy of a continuous probability distribution as a limiting value of the total entropy HT(X) defined above. For this let us first define the differential entropy H(X) of a continuous random variable X. Definition 3.3. The differential entropy HC(X) of a continuous random variable with probability density f (x) is defined by [2]
(3.9)
Shannon Entropy: Axiomatic Characterization and Application
339
where R is the support set of the random variable X. We divide the range of X into bins of length (or width) h. Let us assume that the density f(x) is continuous within the bins. Then by mean-value theorem, there exists a value xi within each bin such that (3.10) by
We define the quantized or discrete probability distribution (p1, p2,..., pn)
(3.11) so that we have then (3.12) The total entropy HT(X) defined for hi = h (i = 1,2,...,n), (3.13) then reduces to the form
(3.14) Let h → 0, then by definition of Riemann integral, we have HT(X) → H(X) as h → 0, that is, (3.15) Thus we have the following theorem. Theorem 3.4. The total entropy HT(X) defined by (3.13) approaches to the differential entropy HC(X) in the limiting case when the length of each bin tends to zero.
APPLICATION: DIFFERENTIAL ENTROPY AND ENTROPY IN CLASSICAL STATISTICS The above analysis leads to an important relation connecting quantized entropy and differential entropy. From (3.13) and (3.15), we see that
340
Information and Coding Theory in Computer Science
(4.1) showing that when h → 0 that is, when the length of the bins h is very small, the quantized entropy given by the left-hand side of (4.1) approaches not to the differential entropy HC(X) defined in (3.9) but to the form given by the right-hand side of (4.1) which we call modified differential entropy. This relation has important physical significance in statistical mechanics. As an application of this relation, we now find the expression of classical entropy as a limiting case of quantized entropy. Let us consider an isolated system with configuration space volume V and a fixed number of particles N, which is constrained to the energy shell R = (E,E + ∆E). We consider the energy shell rather than just the energy surface because the Heisenburg uncertainty principle tells us that we can never determine the energy E exactly. we can make ∆E as small as we like. Let f (XN) be the probability density of microstates defined on the phase space Γ = {XN = (q1,q2,...,q2N ; p1, p2,..., p2N )}. The normalized condition is (4.2) where (4.3) Following (4.1), we define the entropy of the system as
(4.4) The constant C appearing in (4.4) is to be determined later on. The probability density for statistical equilibrium determined by maximizing the entropy (4.4) subject to the condition (4.2) leads to N
(4.5)
Shannon Entropy: Axiomatic Characterization and Application
341
where H(XN) is the Hamiltonian of the system, Ω(E,V,N) is the volume of the energy shell (E,E +∆E) [10]. Putting (4.5) in (4.4), we obtain the entropy of the system as [10] (4.6) The constant C has the same unit as Ω(E,V,N) and cannot be determined classically. However, it can be determined from quantum mechanics. Then we have CN = (h)3N for distinguishable particles and CN = N!(h)3N for indistinguishable particles. From Heisenberg uncertainty principle, we know that if h is the volume of a single state in phase space, then Ω(E,V,N)/ (h)3N is the total number of microstates in the energy shell (E,E + ∆E). The expression (4.6) then becomes identical to the Boltzmann entropy. With this interpretation of the constant CN, the correct expression of classical entropy is given by [6, 10] N
(4.7)
The classical entropy that follows a limiting case of von Neumann entropy is given by [14]
(4.8)
This is, however, different from the one given by (4.7) and it does not lead to the form of Boltzmann entropy (4.6).
CONCLUSION The literature on the axiomatic derivation of Shannon entropy is vast [1, 8]. The present approach is, however, different. This is based mainly on the postulates of additivity and concavity of entropy function. These are, in fact, variant forms of additivity and nondecreasing characters of entropy in thermodynamics. The concept of additivity is dormant in many axiomatic derivations of Shannon entropy. It plays a vital role in the foundation of Shannon information theory [15]. Nonadditive entropies like Renyi entropy and Tsallis entropy need a different formulation and lead to different
342
Information and Coding Theory in Computer Science
physical phenomena [11, 13]. In the present paper, we have also provided a new axiomatic derivation of Shannon total entropy which in the limiting case reduces to the expression of modified differential entropy (4.1). The modified differential entropy together with quantum uncertainty relation provides a mathematically strong approach to the derivation of the expression of classical entropy.
Shannon Entropy: Axiomatic Characterization and Application
343
REFERENCES 1.
2. 3. 4. 5. 6. 7. 8. 9.
10. 11. 12. 13. 14. 15.
J. Aczel and Z. Dar ´ oczy, ´ On Measures of Information and Their Characterizations, Mathematics in Science and Engineering, vol. 115, Academic Press, New York, 1975. T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley Series in Telecommunications, John Wiley & Sons, New York, 1991. E. T. Jaynes, Information theory and statistical mechanics, Phys. Rev. (2) 106 (1957), 620–630. G. Jumarie, Relative Information. Theories and Applications, Springer Series in Synergetics, vol. 47, Springer, Berlin, 1990. J. N. Kapur, Measures of Information and Their Applications, John Wiley & Sons, New York, 1994. L. D. Landau and E. M. Lifshitz, Statistical Physics, Pergamon Press, Oxford, 1969. V. Majernik, Elementary Theory of Organization, Palacky University Press, Olomouc, 2001. A. Mathai and R. N. Rathie, Information Theory and Statistics, Wiley Eastern, New Delhi, 1974. D. Morales, L. Pardo, and I. Vajda, Uncertainty of discrete stochastic systems: general theory and statistical inference, IEEE Trans. Syst., Man, Cybern. A 26 (1996), no. 6, 681–697. L. E. Reichl, A Modern Course in Statistical Physics, University of Texas Press, Texas, 1980. A. Renyi, ´ Probability Theory, North-Holland Publishing, Amsterdam, 1970. C. E. Shannon and W. Weaver, The Mathematical Theory of Communication, The University of Illinois Press, Illinois, 1949. C. Tsallis, Possible generalization of Boltzmann-Gibbs statistics, J. Statist. Phys. 52 (1988), no. 1-2, 479–487. A. Wehrl, On the relation between classical and quantum-mechanical entropy, Rep. Math. Phys. 16 (1979), no. 3, 353–358. T. Yamano, A possible extension of Shannon’s information theory, Entropy 3 (2001), no. 4, 280– 292.
Chapter
SHANNON ENTROPY IN DISTRIBUTED SCIENTIFIC CALCULATIONS ON MOBILES AD-HOC NETWORKS (MANETS)
16
Pablo José Iuliano and Luís Marrone LINTI, Facultad de Informatica UNLP, La Plata, Argentina
ABSTRACT This paper addresses the problem of giving a formal metric to estimate uncertainty at the moment of starting a distributed scientific calculation on clients working over mobile ad-hoc networks (MANETs). Measuring the uncertainty related to the successful completion of a distributed computation on the aforementioned network infrastructure is based on the DempsterShafer Theory of Evidence (DST). Shannon Entropy will be the formal mechanism by which the conflict in the scenarios proposed in this paper will be estimated. This paper will begin with a description of the procedure Citation: Iuliano, P. and Marrone, L. (2013), “Shannon Entropy in Distributed Scientific Calculations on Mobiles Ad-Hoc Networks (MANETs)”. Communications and Network, 5, 414-420. doi: 10.4236/cn.2013.53B2076. Copyright: © 2013 by authors and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0
346
Information and Coding Theory in Computer Science
by which connectivity probability is to be obtained and will continue by presenting the mobility model most appropriate for the performed simulations. Finally, simulations will be performed to calculate the Shannon Entropy, after which the corresponding conclusions will be presented. Keywords: MANETs, Shannon, Uncertain, Simulation, Distributed
INTRODUCTION Mobile computing has been established as the de facto standard for Web access, owing to users preferring it to other connection alternatives. Mobile ad-hoc networks, or MANETs, are currently the focus of attention in mobile computing, as they are the most flexible and adaptable network technology in existence today [1]. These qualities are particularly desirable in the development of applications meant for this kind of infrastructure—a number of American government projects, such as the military investment in resources for the development of this technology, bear witness to this fact. As previously mentioned, ad-hoc mobile networks are the most flexible and adaptable communication architecture currently in existence. These wireless networks are comprised of interconnected autonomous nodes. They are self-organized and self-generated, which eliminates the need for a centralized infrastructure. The use of this type of networks as a new alternative for the implementation of distributed computing systems is closely related to the capability to begin calculation, assign parts and collect results once computation is finished. Due to the intrinsic nature of this kind of network, there is no certainty that all the stages involved in this kind of calculation can be completed, which makes estimating the uncertainty in these scenarios, a vital capability.
MEASURING THE PROBLEM The movement patterns of the autonomous nodes, and consequently their interaction, will have a significant impact in the success or failure in collecting the results of a distributed computation. In order to incorporate the notion of connectivity among the nodes, a development will now be presented that shows a formalization of the Connectivity Probability among all the nodes that make up a MANET, that is, the probability that there is a path between one node and any of the rest.
Shannon Entropy in Distributed Scientific Calculations on Mobiles ...
347
Afterwards, we will take on the task of characterizing the mobility of the nodes, particularly their median speed and direction, the range of their communication signal and the size of the surface on which they circulate. Finally, another section will detail how to estimate Shannon Entropy.
Defining Connectivity Probability Let D be the domain bounded by the Euclidean plane R2 = {x, y} , within D there are n nodes. At initial time t = 0, the nodes are somehow located and moving. Let ri = {xi, yi} be the radius vector of node i. Thus, we assume that each node has a communication capacity in the range r: if the distance between two nodes is greater than r, then they cannot establish communication. Nodes can transmit information using multihop connections. Therefore, we can define a network as connected if each pair of nodes has a path between them. Connectivity Probability quantifies the likelihood of obtaining a connected network from a set of nodes. Clearly, in scenarios where nodes maintain fixed positions, the connectivity will depend on node density and connection range. Typically the simulations of static scenarios that attempt to determine the connection probability of a number of nodes located randomly in the simulation area introduce a random variable that equals 1 when the network is connected and 0 otherwise. Thus, the average of the said variable over the number of trials gives the Connectivity Probability [2]. For nodes with mobility, time interval divisions are introduced and defined thus: (1) where denotes a time interval during which the network is connected (unconnected). The following function can then be introduced:
(2) time intervals can be considered to be randomly distributed, whereby the previously presented function turns into a stochastic process. Consequently, in dynamical environments, Connectivity Probability is defined as follows:
348
Information and Coding Theory in Computer Science
(3) where is the expected value, as long as it exists. It can be seen that P+ is time-dependent: P+ = P+(t). For stationary stochastic processes P+ = const. If the stationary process is ergodic, then (3) can be substituted by: (4) This equality is equivalent to: (5) where and the mes function are used to measure the total length of the interval. The problem of whether the network is connected is thus reduced to determining the existence and estimation of the expected value (3), and if the mobility model is stationary and ergodic, (5) can be used to estimate connectivity [2].
Dynamical Systems and Stochastic Processes In a homogeneous network system where node capacity and properties are equal among all, it can be reasonably assumed that it can be described by a single system of differential equations, both for a single node and for all of them. If some form of randomness is introduced to node movement, a differential stochastic process will be needed. If, moreover, the stochastic process is considered to be stationary, a system of autonomous differential equations can be used where the right side of the equation does not explicitly depend on time and where nodes differ only from their initial conditions [2]. In dynamical systems theory, a phase flow is defined as a group of changes along the trajectory during a time interval. Dynamical systems are generated by phase flows and can be described by differential equations as follows: (6) where Π is the phase space, x is a set of coordinates in Π (usually position and speed) and the dot indicates that time is the differential. Let n be a number of nodes and its phase coordinates, then these coordinates satisfy the following differential equation: (7)
Shannon Entropy in Distributed Scientific Calculations on Mobiles ...
349
Thus the dynamic of the n nodes is completely defined by dynamical system (7), which is the direct product of the n copies of the original dynamical system, (6). Its phase space πˆ = π ×K × π = (π )n is a direct product of the n copies of the initial phase space and phase coordinates are a set of coordinates of individual nodes. If system (6) has an invariant measure µ in π, system (7) will also have an invariant measure in and the direct product will be . In the connectivity problem, phase space can be divided into two domains D and thus: when , all the nodes out of the existing n can communicate with each other. And when , some nodes cannot be reached by some others. Following the approach from dynamical systems, the connectivity probability can be estimated as a time interval when . Estimating the connectivity measure can be significantly simplified if dynamical system (7) is ergodic in . By definition, a system is ergodic if the measure of some invariant sub domain of the phase space equals zero or the measure of the entire space. Let be a measurable and integrable function in solutions of ergodic system (6) there is:
, for all the
(8) where (9) is the measure of the entire phase space [2]. Let f be a function characteristic of a measurable domain D:
(10) Since f is limited and D is measurable, is integrable. In this case, the left side of (12) is equivalent to the time interval 0 ≤ t ≤ T when resides in the domain D. Thus the Connectivity Probability of an ad-hoc mobile network will be equivalent to the right side of (12):
Information and Coding Theory in Computer Science
350
(11) This approximation can be interpreted in terms of the theory of stochastic processes in phase space . The probability for a system in measurable domain D is determined by formula (11). Let f (x) be a function characteristic of domain D and x (t) the solution of system (6). Thus the function f (x(t)) can be interpreted as a stochastic process. Let E[f(t)] be the expected value of the function f(t) at time t. If the right side of Equation (6) is not timedependent, then the stochastic process is stationary. In particular, this means that E[f(t)] does not depend on t. If the system is also ergodic, the expected value can be calculated using formula (11): (12) Therefore, the problem of calculating expected value (3) is reduced to a geometric problem in which we must determine the volume of the domains in a phase space if the process is ergodic [2].
Shannon Entropy: Measuring Uncertainty Uncertainty, in particular the amount of conflict in the system, will be measured using the Dempster-Shafer Theory of Evidence (DST). Functions for estimating the conflict in a system using a probability distribution must fulfill certain axiomatic requirements [3], namely: Let fc be the estimator of the amount of conflict and p =〈 p1, p2, … , pn〉 the probability distribution, fc must fulfill: • • • •
•
Expansibility: adding a 0 component to the probability distribution does not modify the value of the uncertainty measure. Symmetry: the calculated uncertainty does not vary in relation to the permutation of the arguments. Continuity: function fc is continuous for all p =〈 p1, p2, … , pn〉. Subadditivity: the uncertainty of the joint probability distribution is less than or equal to the uncertainties of the marginal distributions. Additivity: for any pair of marginal probability dis tributions that are non-interactive, the uncertainty of the associated joint
Shannon Entropy in Distributed Scientific Calculations on Mobiles ...
• •
distribution must be equal to the sum of the uncertainties of the marginal distributions. Monotonicity: uncertainty must increase if the number of elements increases. Branching: Let p =〈p1, p2, … , pn〉 over X = {x1, x2, … , xn}. If two partitions are generated from
•
351
and
, then fc= (p1, p2, … , pn). Normalization: to ensure uncertainty can be measured in bits, it is required that:
(13) Shannon Entropy will be the formal mechanism by which the conflict will be estimated in this document. This measure of uncertainty stems from a probability distribution obtained from observing the results of an experiment or any other research mechanism. Probability distribution p has the form p = 〈p(x) ∣ x ∈ X〉 where X is the domain of discourse. Additionally, a decreasing function in relation to incidence probability is defined, called anticipatory uncertainty, which must have a decreasing monotonous continuous mapping, and be additive and normalized. This yields that the anticipatory uncertainty of an x result is: −log2 p(x). Thus, Shannon Entropy, which provides the expected value of the anticipatory uncertainties for each element of the domain of discourse [3], takes the following form: (14) The normalized version of (14) takes the following form: (15) and is the one used to calculate uncertainty in the simulations performed.
352
Information and Coding Theory in Computer Science
SIMULATION Following, we present an adjustment to the previously obtained theoretical results, in order to reach a simulation method that is consistent with them. A description of scenarios posed and results obtained will follow.
Adjustment of mes
and mes D
It is considered that the area where the computational model proposed for distributed calculations on ad-hoc mobile networks operate will be small—the work surface will be comparable to that of a university campus, governmental building or office [4]. This results in being the total simulation surface and mes D, the area where the nodes are in positions that keep the network connected. However, calculating mes and mes D as previously proposed is an extremely complicated and laborious task [2]. For this reason, an alternative method is presented to determine first the Connectivity Probability and later the uncertainty involved. With the goal of validating the scenario put forth in previous sections, the simulation will take place using a modified version of the Monte-Carlo method, where the nodes will be initially located in random positions in such a way that they will form a connected network. Their position will be updated in each instance of the simulation, in accordance with the specifications of the RWMM model [5], and afterwards the network connectivity will be verified. Thus, with the calculation framed within the aforementioned simulation process, the Connectivity Probability will be obtained by means of the M/N quotient, where N is the total number of simulations and M is the number of simulations in which the network was connected where N was great enough. Thus, mes is N and mes D is M [2].
Results of the Simulation Different ad-hoc mobile network topologies will have different Connectivity Probability values, and, therefore, the Shannon Entropy will vary. In RWMM [5], the nodes for each simulation stage will select a direction in which they will move randomly between (0, 2π], and the speed at which they will move will be the expected value uniformly distributed between the speeds of 1 m/s and 10 m/s—the rates at which we move by foot—which will equal to 3.90 m/s. When a node reaches the edge of the simulation area, it will rotate 180 degrees and will be placed again within the area, after which the process will continue.
Shannon Entropy in Distributed Scientific Calculations on Mobiles ...
353
All the simulations begin in a connected network topology and a fixed connectivity radius within which a node can be connected to another. Then, stage after stage, the following operations will take place: •
•
•
•
For each node nodei of the ad-hoc mobile network, a direction diri is randomly chosen between (0, 2π], and its position posi = (xi, yi) is updated in accordance with newposi = posi + (diri × V × dt) where newposi is the new position of nodei. Once all the node positions have been updated, for each node nodei in position posi it is verified whether it can establish a connection with another node nodej located within its connection radius. This verification is performed by means of the calculation of the Euclidean distance Disti, j between the two nodes, later checking whether, Disti, j