Multimedia Sensor Networks [1 ed.] 9811601062, 9789811601064

Sensor networks are an essential component of the Internet of Things (IoT), and Multimedia Sensor Networks (MSNs) are th

229 25 13MB

English Pages 261 [258] Year 2021

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
1 Introduction to Multimedia Sensor Networks
1.1 Basic Concepts
1.2 Conceptual Architecture
1.2.1 Sensing Layer
1.2.2 Transmission Layer
1.2.3 Processing Layer
1.3 Main Research Topics of Multimedia Sensor Networks
2 Directional Sensing Models and Coverage Control
2.1 Introduction
2.2 Directional Sensor Networks
2.2.1 Motivation
2.2.2 Coverage Problem with Directional Sensing
2.2.2.1 Directional Sensing Model
2.2.2.2 Coverage Probability with Directional Sensing Model
2.2.3 Coverage Enhancing with Rotatable Directional Sensing Model
2.2.3.1 Rotatable Directional Sensing Model
2.2.3.2 The Problem of Area Coverage Enhancing
2.2.4 Potential Field Based Coverage-Enhancing Method
2.2.4.1 Sensing Centroid
2.2.4.2 Potential Field Force
2.2.4.3 Control Laws
2.2.5 Simulation Results
2.2.5.1 Case Study
2.2.5.2 Performance Evaluation
2.3 Three Dimensional Directional Sensor Networks
2.3.1 Motivation
2.3.2 The 3D Directional Sensing Model
2.3.3 Area Coverage-Enhancing Method
2.3.3.1 Problem Formulation
2.3.3.2 Virtual Force Analysis Based Coverage-Enhancing
2.3.3.3 Coverage Optimization Approach
2.3.4 Case Study and Performance Evaluations
2.4 Directional K-Coverage for Target Recognition
2.4.1 Motivation
2.4.2 Collaborative Face Orientation Detection
2.4.3 Problem Description
2.4.3.1 Effective Sensing in Video Surveillance
2.4.3.2 Directional K-Coverage (DKC) Problem
2.4.4 Analysis of Directional K-Coverage
2.4.5 Experimental Results
2.5 L-Coverage for Target Localization
2.5.1 Motivation
2.5.2 Localization-Oriented Sensing Model
2.5.3 Bayesian Estimation Based L-Coverage
2.5.3.1 L-Coverage Concept
2.5.3.2 L-Coverage Illustrations
2.5.4 L-Coverage Probability in Randomly Deployed Camera Sensor Networks
2.5.5 Simulation Experiments
2.6 Exposure-Path Prevention for Intrusion Detection in Multimedia Sensor Networks
2.6.1 Motivation
2.6.2 System Models and Problem Formulation
2.6.2.1 Sensors Deploying Model
2.6.2.2 Continuum Percolation Model-Based Problem Formulation
2.6.3 Bond Percolation Model for Coverage
2.6.4 Critical Density for Exposure Path
2.6.4.1 Critical Density of Omnidirectional Sensors
2.6.4.2 Critical Density of Directional Sensors
2.6.5 Dependence Among Neighboring Edges
2.6.6 Simulation Evaluations
2.6.6.1 Omnidirectional Sensor Networks
2.6.6.2 Directional Sensor Networks
References
3 Data Fusion Based Transmission in Multimedia Sensor Networks
3.1 Introduction
3.2 Adaptive Data Fusion for Energy Efficient Routing in Multimedia Sensor Networks
3.2.1 Motivation
3.2.2 Measurement of Image Fusion Cost
3.2.2.1 Measurement Model for Data Aggregation
3.2.2.2 Image Fusion
3.2.3 System Model and Problem Formulation
3.2.3.1 Network Model
3.2.3.2 Problem Formulation
3.2.4 Minimum Fusion Steiner Tree
3.2.4.1 MFST Algorithm
3.2.4.2 3-D Binary Tree Structure
3.2.5 Design and Analysis of AFST
3.2.5.1 Binary Fusion Steiner Tree (BFST)
3.2.5.2 Adaptive Fusion Steiner Tree (AFST)
3.2.6 Experimental Study
3.2.6.1 Simulation Environment
3.2.6.2 Impact of Correlation Coefficient
3.2.6.3 Impact of Unit Fusion Cost
3.3 Physarum Optimization: A Biology-Inspired Algorithm for the Steiner Tree Problem in Networks
3.3.1 Motivation
3.3.2 Biology-Inspired Optimization and Physarum Computing
3.3.3 Problem Formulation and Physarum Model
3.3.3.1 Steiner Tree Problem
3.3.3.2 Mathematical Model for Physarum
3.3.4 Physarum Optimization for Steiner Tree Problem
3.3.4.1 Initial Pressures of Vertices
3.3.4.2 Main Process of Physarum Optimization
3.3.4.3 Convergence of Physarum Optimization
3.3.4.4 Algorithms of Physarum Optimization
3.4 A Trust-Based Framework for Fault-Tolerant Data Aggregation in Multimedia Sensor Networks
3.4.1 Motivation
3.4.2 System Model
3.4.2.1 Multi-Layer Trustworthy Aggregation Architecture
3.4.2.2 Source Model
3.4.2.3 Trust Model
3.4.3 Trust-Based Framework for Fault-Tolerant Data Aggregation
3.4.3.1 Self Data Trust Opinion of Sensor Node
3.4.3.2 Peer Node Trust Opinion
3.4.3.3 Trust Transfer and Peer Data Trust Opinion
3.4.3.4 Trust Combination and Self Data Trust Opinion of Aggregator
3.4.3.5 Trust-Based and Fault-Tolerant Data Aggregation Algorithm
3.4.4 Experimental and Simulation Studies
3.4.4.1 Continuous Audio Stream
3.4.4.2 Discrete Data
References
4 In-Network Processing for Multimedia Sensor Networks
4.1 Introduction
4.2 Correlation Based Image Processing in Multimedia Sensor Networks
4.2.1 Motivation
4.2.2 Sensing Correlation
4.2.3 Image Processing Based on Correlation
4.2.3.1 Allocating the Sensing Task
4.2.3.2 Image Capturing
4.2.3.3 Image Delivering
4.2.3.4 Image Fusion
4.2.4 Experimetal Results
4.3 Dynamic Node Collaboration for Mobile Target Tracking in Multimedia Sensor Networks
4.3.1 Motivation
4.3.2 Related Works
4.3.3 System Models and Description
4.3.3.1 Motion Model of the Target
4.3.3.2 Sensing Model of Camera Sensors
4.3.3.3 Target Tracking by Sequential Monte Carlo Method
4.3.3.4 The Dynamic Node Collaboration Scheme
4.3.4 Election of the Cluster Heads
4.3.5 Selection of the Cluster Members
4.3.5.1 Utility Function
4.3.5.2 Cost Function
4.3.5.3 The Cluster Members Selection Algorithm
4.3.6 Simulation Results
4.4 Distributed Target Classification in Multimedia Sensor Networks
4.4.1 Motivation
4.4.2 Related Works
4.4.3 Procedure of Target Classification in Multimedia Sensor Networks
4.4.3.1 Target Detection
4.4.3.2 Feature Extraction
4.4.3.3 Classification
4.4.4 Binary Classification Tree Based Framework
4.4.4.1 Generation of the Binary Classification Tree
4.4.4.2 Division of the Binary Classification Tree
4.4.4.3 Selection of Multimedia Sensor Nodes
4.4.5 Case Study and Simulations
4.5 Decomposition-Fusion: A Cooperative Computing Mode for Multimedia Sensor Networks
4.5.1 Motivation
4.5.2 Typical Paradigms of Transmission-Processing for MSNs
4.5.3 Decomposition-Fusion Cooperative Computing Framework
4.5.3.1 Task Decomposition
4.5.3.2 Target Detection
4.5.3.3 Selection of Candidates
4.5.3.4 Selection of Cooperators
4.5.3.5 Interim Results Fusion
References
5 Multimedia Sensor Network Supported IoT Service
5.1 Introduction
5.2 Searching in IoT
5.2.1 Motivation
5.2.2 Concept of IoT Search
5.2.3 Characters of Searching in IoT
5.2.4 Challenges of Searching in IoT
5.2.5 The Progressive Search Paradigm
5.2.5.1 Coarse-to-Fine Search Strategy
5.2.5.2 Near-to-Distant Search Strategy
5.2.5.3 Low-to-High Permission Search Strategy
5.2.6 Progressive IoT Search in the Multimedia Sensors Based Urban Sensing Network
5.3 PROVID: Progressive and Multi-modal Vehicle Re-identification for Large-Scale Urban Surveillance
5.3.1 Motivation
5.3.2 Related Work
5.3.3 Overview of the PROVID Framework
5.3.4 Vehicle Filtering by Appearance
5.3.4.1 Multi-level Vehicle Representation
5.3.4.2 The Null-Space-Based FACT Model
5.3.5 License Plate Verification Based on Siamese Neural Network
5.3.6 Spatiotemporal Relation-Based Vehicle Re-ranking
5.3.7 Applications
5.3.7.1 Application I: Suspect Vehicle Search
5.3.7.2 Application II: Cross-Camera Vehicle Tracking
5.3.8 Experiments
5.3.8.1 Dataset
5.3.8.2 Experimental Settings
5.3.8.3 Evaluation of Appearance-Based Vehicle Re-Id
5.3.8.4 Evaluation of Plate Verification
5.3.8.5 Evaluation of Progressive Vehicle Re-Id
5.3.8.6 Time Cost of the PROVID Framework
References
6 Prospect of Future Research
6.1 Human-Like Perception
6.2 Intelligent Networking and Transmission
6.3 Intelligent Services
Recommend Papers

Multimedia Sensor Networks [1 ed.]
 9811601062, 9789811601064

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Advances in Computer Science and Technology

Huadong Ma Liang Liu Hong Luo

Multimedia Sensor Networks

Advances in Computer Science and Technology Series Editors Hujun Bao, Hangzhou, China Xilin Chen, Chinese Academy of Sciences, Institute of Computing Technology, Beijing, China Xiaotie Deng, Shanghai Jiao Tong University, Shanghai, China Dengguo Feng, Chinese Academy of Sciences, Institute of Software, Beijing, China Minyi Guo, Shanghai Jiao Tong University, Shanghai, China Shi-Min Hu, Tsinghua University, Beijing, China Zhi Jin, Peking University, Beijing, China Xinbing Wang, School of Electronic Information, Shanghai Jiao Tong University, Shanghai, China Nong Xiao, National University of Defense Tech, Changsha, China Ge Yu, Northeastern University, Shenyang, China Hongbin Zha, Beijing, China Jian Zhang, Chinese Academy of Sciences, Beijing Jiaotong University, Beijing, China Zhi-Hua Zhou, Nanjing, Jiangsu, China

The Advances in Computer Science and Technology series publishes state-of-theart results contributed by China Computer Federation (CCF) members or authors associated with or invited by CCF. This series aims to efficiently disseminate advanced researches and practical applications to international research community. The topical scope of Advances in Computer Science and Technology spans the entire spectrum of computer science and information technology ranging from foundational topics in the theory of computing to information and communications science and technology and a broad variety of interdisciplinary application fields. This series mainly include monographs, professional books and graduate textbooks, edited volumes and books. All volumes are authored or edited by established experts in their fields, published according to rigorous peer review, based on the editors’ preview and selection and adequate refereeing by independent experts. Each volume will provide a reasonable self-contained account of a topic, as well as a survey of the literature on the topic. The intended audience includes graduate students, researchers, professionals, and industrial practitioners.

More information about this series at http://www.springer.com/series/13197

Huadong Ma • Liang Liu • Hong Luo

Multimedia Sensor Networks

Huadong Ma School of Computer Science Beijing University of Posts and Telecommunications Beijing, China

Liang Liu School of Computer Science Beijing University of Posts and Telecommunications Beijing, China

Hong Luo School of Computer Science Beijing University of Posts and Telecommunications Beijing, China

ISSN 2198-2686 ISSN 2198-2694 (electronic) Advances in Computer Science and Technology ISBN 978-981-16-0106-4 ISBN 978-981-16-0107-1 (eBook) https://doi.org/10.1007/978-981-16-0107-1 © Springer Nature Singapore Pte Ltd. 2021 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

The Internet of Things (IoT) is one of the most important infrastructures for social progress, economic development, and scientific innovation. As a key technology for IoT to thoroughly sense the physical world, wireless sensor network was listed first among 10 emerging technologies that would change the world by MIT Technology Review in 2003, and it also plays a fundamental role in the development of national strategic emerging industries. For a long time, sensor networks mainly acquired and processed scalar data, which seriously restricted the in-depth development and application of IoT. Since more than 90% of the information acquired by human beings comes from vision and audition, it is necessary to capture multimedia information for fully and accurately understanding the physical world. Therefore, the evolution to multimedia sensor networks is the development direction of sensor networks. However, with diverse multimedia sensing modes, large amounts of data, strong heterogeneity, and complex calculation, the theories and methods of traditional sensor network are no longer effective for multimedia sensor network. How to depict the characteristics of multimedia perception, reveal the transmission mechanism of multimedia sensor networks, and build an efficient multimedia computing model within the network, so as to break through the bottleneck of node perception, transmission, and computing resources and achieve optimized deployment of multimedia sensor networks, efficient and reliable transmission, and timeliness information processing, are challenging issues that have always plagued the international academic community. We began work in the field of multimedia sensor network in 2004. With the continuous funding from the National 973 Project and National Natural Science Foundation of China (NSFC), we have made a systematic and in-depth exploration of theories and technologies of multimedia sensor networks. The archived research achievements in multiple aspects, including directional sensing, fused transmission, in-network computing, and multimedia-supported IoT services, provide theoretical support for solving the bottleneck problems in multimedia sensor networks and the IoT. This book is a summary of our scientific research work in this field, and is also an attempt to systematically summarize the basic theories and key technologies of mulv

vi

Preface

timedia sensor networks. Following the main line of sensing, transmission, processing, and service of multimedia information, this book systematically introduces the basic concepts, fundamental principles, key technologies, and typical applications of multimedia sensor networks. The book is divided into six chapters. Specifically, the first chapter briefly introduces the background, research status, main research contents, and typical application fields of multimedia sensor networks. The second chapter mainly discusses multimedia sensing model and coverage control issues, including 2D and 3D directional sensor network models, as well as directional Kcoverage, location-oriented coverage, and exposure path prevention methods. The third chapter discusses the transmission problem of multimedia sensor network, involving the self-adaptive fusion transmission method with effective energy, the biologically inspired routing optimization algorithm, and the trust-based faulttolerant data aggregation framework. The fourth chapter discusses information processing issues in the multimedia sensor network, including video information processing based on correlation, dynamic node collaboration for mobile target tracking, distributed target classification, and in-network collaborative computing architecture. The fifth chapter mainly introduces the IoT services supported by multimedia sensor networks, including IoT service mode, progressive search, and its application in vehicle re-identification. Finally, the sixth chapter looks forward to the research frontiers and development prospects of future sensing networks, especially intelligent sensing networks and big sensory data processing. We have completed this book with the support and enthusiastic help of many institutes and colleagues. First of all, the research work involved in this book has been continuously supported by the NSFC programs and the National 973 Project. In particular, we are fortunate to undertake the key project of NSFC “The Design Theory and Key Technologies of Wireless Multimedia Sensor Network” (60833009) in 2007, the Distinguished Young Scholars of NSFC “Multimedia Computing in the Network Environment” (60925010) in 2009, and the National 973 Project “Basic Research on Internet of Things Architecture” (2011CB302700) in 2010. At present, we are also in charge of the Innovation Research Group Project of NSFC “Basic Theory and Key Technology of Internet of Things” (61921003) and the Project of International Cooperation and Exchanges NSFC “Intelligent Sensing Network and Its Application Research in Smart City ” (61720106007), which enables us to conduct the systematic and long-term research on multimedia sensor networks. We would like to express our gratitude to the National Natural Science Foundation of China and the National 973 Project for their support. In addition to the three authors, Professor Yan Sun, Associate Professor Dong Zhao, and Associate Professor Huiyuan Fu from our laboratory (Beijing Key Laboratory of Intelligent Communication Software and Multimedia), Dr. Wu Liu, Dr. Dan Tao, and Dr. Xinchen Liu and others who worked or studied in the laboratory participated in the research of some chapters. Besides, some Ph.D. and master’s students also participated in the collection of reference materials. We would like to express our heartfelt thanks to them for their hard work. We would also like to thank Professor Jianzhong Li from Harbin Institute of Technology, Professor Yunhao Liu from Tsinghua University, Professor Xiangyang

Preface

vii

Li from the University of Science and Technology of China, Professor Xinbing Wang from Shanghai Jiaotong University, Professor Guihai Chen from Nanjing University, Professor Yonghe Liu from the University of Texas at Arlington, Professor Xi Zhang from Texas A&M University, Prof. Sajal K. Das from Missouri University of Science and Technology, and other domestic and foreign scholars in the IoT field. Many academic exchanges and cooperative research with them have promoted the continuous formation of innovative achievements and deepened the understanding of sensor networks and the IoT. Many of these contents appear in some chapters of this book. Thanks to the editors at Springer. We have been invited to write a book in the field of IoT several times in recent years. However, due to the time factor and some other reasons, the book was not finished until the winter of 2020. On the occasion of the publishing of this book, we would like to express our heartfelt thanks to them for their sincere help. We also thank our families for their long-term support. While writing this book and undertaking the related research work, many other colleagues and friends provided help in different ways. It is difficult to list them all here. Please understand and thanks for your kind support. In the information society of the twenty-first century, emerging information technologies such as IoT, cloud computing, big data, and artificial intelligence are developing rapidly. In the field of IoT, innovative technologies and disruptive applications are constantly emerging. The authors try to follow the pace of the times and understand the technological progress of the IoT from the perspective of multimedia sensor network. However, limited by the authors’ knowledge and expression skills, the limited space of this book can hardly reflect the full picture of this topic. There might be some mistakes and flaws in this book. Readers, please criticize and correct them. Beijing, China November 2020

Huadong Ma Liang Liu Hong Luo

Contents

1

Introduction to Multimedia Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Conceptual Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Sensing Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Transmission Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Processing Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Main Research Topics of Multimedia Sensor Networks . . . . . . . . . . . . . .

1 2 4 5 7 8 9

2

Directional Sensing Models and Coverage Control . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Directional Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Coverage Problem with Directional Sensing . . . . . . . . . . . . . . . . . . 2.2.3 Coverage Enhancing with Rotatable Directional Sensing Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.4 Potential Field Based Coverage-Enhancing Method . . . . . . . . . . 2.2.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Three Dimensional Directional Sensor Networks . . . . . . . . . . . . . . . . . . . . . 2.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 The 3D Directional Sensing Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Area Coverage-Enhancing Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Case Study and Performance Evaluations . . . . . . . . . . . . . . . . . . . . . 2.4 Directional K-Coverage for Target Recognition . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Collaborative Face Orientation Detection . . . . . . . . . . . . . . . . . . . . . 2.4.3 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.4 Analysis of Directional K-Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 L-Coverage for Target Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Localization-Oriented Sensing Model . . . . . . . . . . . . . . . . . . . . . . . . .

11 11 13 13 13 17 19 23 26 26 27 28 32 35 35 36 37 39 42 43 43 45 ix

x

Contents

2.5.3 2.5.4

Bayesian Estimation Based L-Coverage . . . . . . . . . . . . . . . . . . . . . . L-Coverage Probability in Randomly Deployed Camera Sensor Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.5 Simulation Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Exposure-Path Prevention for Intrusion Detection in Multimedia Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.2 System Models and Problem Formulation. . . . . . . . . . . . . . . . . . . . . 2.6.3 Bond Percolation Model for Coverage . . . . . . . . . . . . . . . . . . . . . . . . 2.6.4 Critical Density for Exposure Path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.5 Dependence Among Neighboring Edges . . . . . . . . . . . . . . . . . . . . . . 2.6.6 Simulation Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

4

Data Fusion Based Transmission in Multimedia Sensor Networks . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Adaptive Data Fusion for Energy Efficient Routing in Multimedia Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Measurement of Image Fusion Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 System Model and Problem Formulation. . . . . . . . . . . . . . . . . . . . . . 3.2.4 Minimum Fusion Steiner Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.5 Design and Analysis of AFST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.6 Experimental Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Physarum Optimization: A Biology-Inspired Algorithm for the Steiner Tree Problem in Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Biology-Inspired Optimization and Physarum Computing . . . 3.3.3 Problem Formulation and Physarum Model . . . . . . . . . . . . . . . . . . . 3.3.4 Physarum Optimization for Steiner Tree Problem . . . . . . . . . . . . 3.4 A Trust-Based Framework for Fault-Tolerant Data Aggregation in Multimedia Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Trust-Based Framework for Fault-Tolerant Data Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.4 Experimental and Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . In-Network Processing for Multimedia Sensor Networks . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Correlation Based Image Processing in Multimedia Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Sensing Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49 53 59 62 62 64 66 69 76 78 82 85 85 87 87 89 92 94 97 102 105 105 108 110 112 121 121 124 126 136 140 145 145 147 147 147

Contents

4.2.3 Image Processing Based on Correlation . . . . . . . . . . . . . . . . . . . . . . . 4.2.4 Experimetal Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Dynamic Node Collaboration for Mobile Target Tracking in Multimedia Sensor Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 System Models and Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.4 Election of the Cluster Heads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.5 Selection of the Cluster Members. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.6 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Distributed Target Classification in Multimedia Sensor Networks . . . 4.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Procedure of Target Classification in Multimedia Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.4 Binary Classification Tree Based Framework . . . . . . . . . . . . . . . . . 4.4.5 Case Study and Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Decomposition-Fusion: A Cooperative Computing Mode for Multimedia Sensor Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2 Typical Paradigms of Transmission-Processing for MSNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.3 Decomposition-Fusion Cooperative Computing Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Multimedia Sensor Network Supported IoT Service . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Searching in IoT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Concept of IoT Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Characters of Searching in IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.4 Challenges of Searching in IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.5 The Progressive Search Paradigm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.6 Progressive IoT Search in the Multimedia Sensors Based Urban Sensing Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 PROVID: Progressive and Multi-modal Vehicle Re-identification for Large-Scale Urban Surveillance . . . . . . . . . . . . . . . . 5.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Related Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Overview of the PROVID Framework . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.4 Vehicle Filtering by Appearance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.5 License Plate Verification Based on Siamese Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

150 153 155 155 157 158 165 166 171 174 174 176 177 182 189 193 193 193 195 199 203 203 206 206 207 208 209 210 212 214 214 217 219 220 222

xii

6

Contents

5.3.6 Spatiotemporal Relation-Based Vehicle Re-ranking . . . . . . . . . . 5.3.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.8 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

223 225 226 239

Prospect of Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Human-Like Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Intelligent Networking and Transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Intelligent Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

243 245 246 248

Chapter 1

Introduction to Multimedia Sensor Networks

Advances in embedded microprocessors, low-power analog and digital electronics, and radio communications have enabled the development of small and low-priced microsensors that made Wireless Sensor Networks (WSNs) one of the promising technologies during the past decade. In most cases, a wireless sensor network consists of a large number of irreplaceable, battery-powered microsensors. In general, the microsensors can sense and gather data from surrounding environment and transmit it to the sink node to perform more complex processing. WSNs can syncretize the logical information world and the actual physical world, and profoundly change the interaction between human and nature. Sensor based applications span a wide range of areas, including scientific research, military, disaster relief and rescue, health care, industrial, environmental, and household monitoring. The Internet of Things (IoT) is one of the most important infrastructures for the social progress, economic development, and scientific innovation. As a key for IoT to thoroughly sense the physical world, the sensor network is listed as the first one of 10 emerging technologies that will change the world by the MIT Technology Review in 2003, and is also the basis of the development of national strategic emerging industries. Most deployed wireless sensor networks measure scalar physical phenomena, such as temperature, pressure, humidity, or location of objects. In general, sensor networks are designed for data-only delay-tolerant applications with low bandwidth demands. However, as the environmental monitoring has become increasingly complex, these simple data obtained by the traditional wireless sensor networks cannot meet the requirement of comprehensive monitoring. On the other hand, the availability of Complementary Metal-Oxide Semiconductor (CMOS) camera and small microphones make possible the development of Multimedia Sensor Networks (MSNs) capable of gathering the multimedia information from the surrounding environment. As an example, consider camera sensors, where available products range from expensive pan-tilt-zoom cameras to highresolution digital cameras, and from inexpensive webcams and “cell-phone-class” © Springer Nature Singapore Pte Ltd. 2021 H. Ma et al., Multimedia Sensor Networks, Advances in Computer Science and Technology, https://doi.org/10.1007/978-981-16-0107-1_1

1

2

1 Introduction to Multimedia Sensor Networks

cameras to even cheaper, tiny cameras. The advent of MSNs has opened a new vision to traditional WSNs by enhancing its existing capability and targeted applications like Intelligent Transportation System (ITS), video surveillance for public safety etc. The arise of MSNs has caused much attention in the academic circle. Some scholars have carried out exploratory research of multimedia sensor network, and many well-known academic journals and conference series have published a number of important research results. Since 2003, ACM has also organized international workshop on video surveillance & sensor networks. Then, multimedia sensor networks become the important topic in ACM/IEEE conferences in computer network and multimedia areas. There are many active scholars in the multimedia sensor network research from the famous institutes such as University of California, Carnegie Mellon University, University of Massachusetts, Tsinghua University, Beijing University of Posts and Telecommunications, Chinese Academy of Sciences, and so on.

1.1 Basic Concepts The integration of low-power wireless networking technologies with inexpensive hardware such as CMOS cameras and microphones is now enabling the development of distributed, networked systems that we refer to as multimedia sensor network. Multimedia sensor networks are comprised of sensor devices equipped with audio and visual information collection modules, can have the ability to retrieve, store or process multimedia data (e.g. audio, image, video stream) in realtime, correlate and fuse multimedia data originated from heterogeneous sources, and wirelessly transmit collected data to the desired destinations, so as to realize the comprehensive and effective environmental monitoring. As illustrated in Fig. 1.1, a typical multimedia sensor network usually consists of multimedia sensor nodes, sink node and control center. Multimedia sensor nodes are scattered densely and randomly in a geographical area of interest. The environmental data collected by multimedia sensor nodes can be transmitted via multi-hops to the sink node, and finally reaches the control center through Internet or communication satellites. Users can configure and manage the multimedia sensor networks, release monitoring tasks and collect monitoring data via the control center. • Multimedia sensor nodes: they are integrated with audio and visual information collection module, data processing module and communication module, and used to measure physical phenomena, such as temperature, pressure, humidity, audio, video, with the help of the built-in various sensors. The processing and storage capacities of multimedia sensor nodes are relatively weak through carrying energy-limited battery power supply. They focus on collecting, forwarding and

1.1 Basic Concepts

3

Internet

Physical environment

Sink node Multimedia sensor

Contorl center

Fig. 1.1 The illustration of multimedia sensor networks

processing (e.g. fusion, compression) richer environmental information including audio, image, video stream. • Sink node: it can be either a multimedia sensor node with stronger storage and computing capability, or a special gateway device which merely has communicating capability. The sink node is responsible for packaging high level user queries to network specific directives and returning the filtered portions of the multimedia stream back to the user. Multiple sinks may be required in a large or heterogeneous network. • Control center: it is responsible for querying or collecting the monitoring information of the multimedia sensor network, and also monitoring the information released by the multimedia sensor network, and providing a friendly interactive interface for users to observe, analyze, mine and make decisions on the monitoring information. The evolution of sensor network from scalar data sensing to multimedia information sensing has brought about the revolution of the intensive IoT sensing paradigm. As a kind of wireless sensor network, multimedia sensor networks have their common characteristics, but also significant individual characteristics, which can be embodied in the following three aspects: • Enhanced network capacity. Due to the introduction of a large amount of audio, image, video stream and other media, the capacities of both multimedia sensor nodes and the overall multimedia sensor network including sensing, processing, storage, transmitting and energy supply, have significantly enhanced. For example, the processing capacity of a multimedia sensor node enhances from Mica series of 6 MHz up to tens or even hundreds MHz. Moreover, the storage capacity is increased from KB scale to MB or GB scale. To better meet the requirement of multimedia transmission, the network bandwidth resource also increases from Kbps level to Mbps level accordingly. • Rich sensing media. Multiple types of data, such as scalar, text, audio, image, video stream and control signal, coexist in a multimedia sensor network. In

4

1 Introduction to Multimedia Sensor Networks

addition, the format of media is diverse, including both scalar information and streaming media information. These media information serve for certain monitoring task, in order to achieve a more comprehensive and accurate monitoring scenario. • Complex processing tasks. In-network computing ability is improved from for simple data statistics to for complex content understanding. Due to the scalar data collected by a traditional wireless sensor network, its processing is simple, mainly including the addition, subtraction, multiplication, division, summing and averaging operations. It is hard for users to form a comprehensive cognition to monitoring environment through these numerical results. Conversely, multimedia data collected by multimedia sensor networks are with rich information and complex format. In this way, compression, recognition and fusion processing can be performed in order to satisfy the demands of diverse IoT applications. Moreover, the service capacity of a multimedia sensor network, characterized by strong heterogeneity, large volume of data and complicated computing, which consists of multiple sensor nodes, is greater than that of a simple addition of several single sensor networks. Multiple types of sensing data can effectively sense the physical world from different perspectives. In fact, we can get a more comprehensive and effective awareness for a scene by fusion of multi-type data.

1.2 Conceptual Architecture In 2003, Holman et al. first employed video sensor networks for coast monitoring, in which the network adopted a kind of centralized single-layer network architecture. Video sensor nodes in the network almost had no collaboration, independently completed data acquisition and processing, and attached directly to the sink node. In this architecture, the sink node control became the host bottleneck. Obviously, it is difficult to adapt to the growing network scale and huge amounts of multimedia data processing pressure. Hence, a multi-layer distributed network architecture became a more reasonable solution. Basically, the layered architecture of MSNs illustrated in Fig. 1.2 can be broadly classified into three categories depending on the nature of targeting application. • Single-tier homogeneous architecture The single-tier flat architecture, as shown in Fig. 1.2a, is composed of homogeneous sensor nodes with the same sensing, computation and communication capability besides the use of same video sensors. The sensor nodes are used for basic multimedia information extraction from surrounding environment. The multimedia information is wirelessly transmitted in hop-by-hop fashion from the source nodes to the sink node. This architecture offers benefits like distributed processing, easy to deploy and manage because of homogeneous nature of sensor nodes. Its disadvantages are lack of resource sharing, poor network scalability and flexibility.

1.2 Conceptual Architecture

5 Sink node Cluster head

Sink node

Audio /video sensor node

(b)

(a)

Audio /video sensor node

Sink node Cluster head

(c)

Audio /video sensor node

Fig. 1.2 Layered architecture of multimedia sensor networks. (a) Single-Layer. (b) Multi-Layer. (c) Mixed

• Multi-tier heterogeneous architecture Heterogeneous sensor nodes in the MSNs can be divided into two or more than two layers according to their different resources or capabilities. For example, in the two-layer structure, as shown in Fig. 1.2b, the sensor nodes are divided into cluster head and member nodes. The sensor node in the cluster gathers scalar as well as multimedia information and sends it to the cluster head which act as central processing unit for that cluster (with more resources and computational power as compared to other cluster nodes). The processed information is then wirelessly transmitted to the control center via the gateway. The advantage of using this architecture is that it can address a lot of application scenarios ranging from simple scalar application to multimedia information processing. • Mixed architecture Mixed architecture involves single-tier homogeneous one and multi-tier heterogeneous one mentioned above. The sensor nodes can communicate with the sink node through either the cluster head or multi-hop among them. Its advantage is good flexibility, which is suitable for a variety of tasks. Its disadvantage is relatively complex network structure with high management and maintaining costs.

1.2.1 Sensing Layer The coverage issue is a fundamental problem for multimedia sensor networks. There are extensive number of studies about the coverage problem in traditional

6

1 Introduction to Multimedia Sensor Networks

omni-directional sensor networks. However, coverage optimization in multimedia sensor networks has newly taken attraction of the research community. Different from the conventional sensing models where the circular area centered by a sensor, the directional sensor has a limited angle of view and thus cannot sense the whole circular area. Directional sensor nodes may have several working directions and may adjust their sensing directions during their operation. In principle, the directional sensor nodes in MSNs are more accurately characterized by the 3D sensing model. However, due to the high complexity in the design and analysis imposed by the 3D sensing model, most existing work focus on the simplified 2D sensing model and the associated coverage-control methods. Obtaining data from the environment is the main function of wireless sensor applications. Each application has different goals and collects different types of data. Coverage problem has some subcategories, such as area coverage, target coverage, path coverage, barrier coverage and functional coverage, each of which requires different strategies for the solution. Area coverage is to achieve the maximum sensing region with a finite number of multimedia sensor nodes, and the area-coverage enhancement problem is called as Maximum Directional Area Coverage (MDAC) problem in multimedia sensor networks. Some of the published papers, especially early ones, use the ratio of the covered area to the overall deployment region as a metric for the quality of the coverage. Grid-based coverage approach has been used to simulate area coverage problems for MSNs. Each vertex on the grid represents a point in the monitored area. The grid resolution shows with how much detail an area is simulated. Several solutions have been proposed to enlarge the covered area with the minimization of the occlusion and overlapping. To cover only the interested targets instead of the whole area, researchers have defined target-based coverage problems. This issue puts emphasis on how to cover the maximum number of targets. In target coverage, each target is monitored continuously by at least one sensor. However, some MSN applications may require at least k sensors for each target in order to increase the reliability of the network. k-coverage problem has been formulated based on this requirement. A considerable number of studies have focused on the maximization of covered stationary targets with the minimum number of sensors. Path coverage refers to the worst and best-case coverage which aims at measuring the probability that a target would travel across an area or an event would happen without and with being detected. In MSNs, the majority of the existing works on worst and best-case coverage was based on the computational geometry theory of Voronoi diagram and Delaunay triangulation. Barrier coverage is to detect intruders as they cross a border or as they penetrate a protected region. Generally, two kinds of barrier coverage are involved. Strong barrier coverage needs to detect intruders with arbitrary moving paths while weak barrier coverage only needs to detect intruders moving along congruent crossing paths. In addition, k-barrier coverage is used to detect an object, that penetrates the protected region. In this case, the sensor network would detect each penetrating object by at least k distinct sensors before it crosses the barrier of wireless sensors. It aims at minimizing the number of sensors that form such functionality.

1.2 Conceptual Architecture

7

Functional coverage focuses on some performance metrics. One sub-topic of the connected coverage problem has drawn more attention, namely how to find the geometric optimal deployment pattern to achieve full coverage and a certain degree of connectivity. In addition, scheduling sensor nodes is the common way of prolonging network lifetime in MSNs. The main goal of sensor scheduling algorithms is to shut off redundant multimedia sensor nodes and make them active when necessary.

1.2.2 Transmission Layer In MSNs, the transmission layer is responsible for transporting the source node’s data collected from the surrounding environment to sink. The reliable data transmission is the main objective of transmission protocol design in addition to support high data rate or congestion control features of application. When the multimedia data is streaming by nature, it is critically important that the sink receives data in ordered sequence in which the source sends data. Otherwise, it will stress packet level reordering in the transmission layer. Since multimedia transmission requires high data rate support by nature, there possibly exist congestions at any point of the network especially at the sink node. Thus, a need arises for power efficient congestion control in order to avoid data collision. UDP is usually preferred over TCP in typical multimedia applications as timeliness is of greater concern than reliability. Selected features of existing standards for the Internet, such as Realtime Transport Protocol (RTP) may be adopted in context of MSNs. TCP is connection oriented approach, and provides strict end-toend reliability which is a good feature from the perspective of reliability but not good considering the energy efficiency. Because TCP demands packet acknowledgements for every data packet transmission and retransmission of data packet in case of failure besides the overhead caused by the TCP link establishment. Clearly, merely single transmission layer solution cannot solve the diverse concerns of MSNs, multihop transmission, fusion transmission, reliable transmission, trustable transmission and opportunistic transmission should be involved. Different with computer networks, in MSNs the role of network layer supporting functions and protocols is quite critical as to conserve energy since the node has limited energy budget. The multimedia nature of the data imposes strict constraints to these routing functions and protocols design in order to meet the tight constraints of application specific QoS and reliability requirements. Routing in MSNs is very challenging due to several characteristics. As the application demand of MSNs are escalating day by day, researchers are paying more concentration to deliver application level QoS, and striving to map these requirements to network layer metrics, such as latency, jitter, energy efficiency, reliability, packet loss, high throughput, etc. As a result, many routing techniques applicable for MSNs have been proposed. According to the current research trend, the routing protocols are classified mainly based on (1) the type and number of QoS constraints they consider,

8

1 Introduction to Multimedia Sensor Networks

(2) the type of data they handle (still vs. streaming data), (3) the type of data delivery model they use (query driven, event driven), (4) the classes of algorithms (genetic algorithm, supervised learning, clustered-control algorithm etc.) they adopt, and (5) the hole-bypassing approaches they use.

1.2.3 Processing Layer Energy-efficient processing methods are of paramount importance in WSNs due to the massive volume of collected data and limited resources, such as energy, communication bandwidth, computing capacity, and so on. As a consequence, in order to allow multimedia applications, MSNs are required to efficiently process multimedia data. By employing different processing capabilities of sensor nodes, in-network processing can complete compression coding of multimedia, feature extraction and target recognition, and redundant information fusion. Then, the processing results with small amount of data and sufficient information are uploaded step by step. This can reduce sensor nodes’ transmission load, save limited network energy, and then extend network’s lifetime. Moreover, this can improve the processing and response speed of the network as well as the monitoring performance. In multimedia applications, data compression is a primary goal to attain efficient transmission and delivery of multimedia data at the minimum bandwidth and power costs. Because WSNs are usually densely deployed over the monitored spatial region, a probable assumption is that the data flows generated by nearby cluster heads present a certain degree of spatial correlation and redundancy. Obviously, the efficiency of WSNs can be improved by removing spatial correlation and redundancy of multimedia data as early as possible in the communication path toward the final users. This requirement can be fulfilled by resorting to in-network distributed data compression and coding techniques. More recent and attractive alternatives to performing in-network distributed data compression and coding are the compressive sensing (CS) and joint distributed source coding methods (JDSC). The CS is an emerging research field in multimedia data acquisition and processing that bypasses the major drawbacks of the classical data compression and coding methods. The JDSC methods aim to optimize source coding without requiring message passing among the encoders; thus, it is capable to maximize both the power and bandwidth efficiency of the underlying transport network. Filtering is to filter out the less interesting or valuable data from the original sensing data, so as to get small amount of data with valuable information. The study mainly focuses on foreground background separation and target tracking. The foreground background separation is to identify and separate foreground objects from the background, the large amount of media data will be shielded except interested foreground objects. Target tracking in video sequences can achieve an effective monitoring, identification and tracking. To reduce the amount of network data transmission, the feature extraction technology can be used to pick up the

1.3 Main Research Topics of Multimedia Sensor Networks

9

moving targets in video sequence, unknown image information can be speculated from known one. Data fusion also plays a very important role. In a MSN with high coverage rate, the sensing data collected by neighboring sensor nodes (especially audio and video streaming media with large data volume) will be redundant, the fusion of media information with strong relevance can effectively reduce the network transmission load, and thus save network resources. In addition, due to the limitation of cost and volume, the sensing accuracy of a sensor node will be generally low. Hence, it is difficult to ensure the accuracy of information by only a few scattered sensor nodes. Usually, the data fusion belongs to lossy compression, which is to omit some details or reduce data quality, and thus reduce the amount of data to be stored or transmitted. This requires that data fusion should weigh the relationship between the network energy consumption and the fusion quality. Compared with the simple scalar environmental data, the fusion of audio, image and video stream has more outstanding research significance. Data fusion can be divided into single-class data fusion and multi-class data fusion.

1.3 Main Research Topics of Multimedia Sensor Networks The main research topics of multimedia sensor networks lies in the following aspects: (1) Architecture of multimedia sensor network. MSN architecture is directly related to the entire network performance, which affects the availability of the network. The objective is to design an architecture so that the available resources in the network can be efficiently utilized and fairly distributed throughout the network, to be scalable enough to handle the growing size in the network and to extend the energy life time of the nodes in the network, which need consider deployment, energy consumption, scalability, flexibility, security and fault tolerance in general. (2) Deployment and coverage control. To better perform the monitoring tasks, the deployed multimedia sensor nodes need effectively cover the monitoring region or target. Many types of multimedia sensor nodes (e.g., homogeneous or heterogeneous ones) coexist in the network, which makes the research on multimedia sensor node deployment and coverage control be more meaningful. Coverage answers the questions about the quality of service (surveillance) that can be provided by a particular MSN. According to their different sensing models, it is critical to investigate multimedia sensor node deployment strategy and scheduling mechanism, so as to guarantee coverage integrity and communication connectivity of MSNs. Typical deployment and coverage issues includes sensing model, K-coverage, L-coverage, barrier coverage and exposure-path prevention.

10

1 Introduction to Multimedia Sensor Networks

(3) Fusion transmission of multimedia. In MSNs, the bandwidth requirement for multimedia data communication is order of magnitude higher than the bandwidth required for the existing WSNs. For example, the scalar WSN architecture involving motes like traditional TelosB or Micaz support Zigbee/802.15.4 radio standard with the data rate of up to 250Kbps. However, in MSNs this bandwidth cannot fulfill the purpose so Zigbee/802.15.4 radio standard is not be suitable for multimedia information communication. Some communication technologies such as WiFi,4G and 5G are involved to meet the MSN bandwidth requirements. Moreover, data fusion based transmission can reduce data redundancy and hence curtail network load, but the fusion process itself may introduce significant energy consumption for multimedia sensor networks with the audio-visual data and/or security requirements. Thus, it is necessary to study the energy efficient fusion-driven routing protocols to optimize over both the communication cost and the fusion cost. For MSNs deployed in noisy and unattended environments, it is necessary to protect the accuracy of the gathered multimedia information, so we need jointly consider data aggregation, information trust and fault tolerance to enhance the correctness and trustworthiness of collected information. (4) In-network information processing. In MSNs, communicating multimedia information requires large bandwidth and also the communication cost would be enormous if we communicate the information unprocessed. This issues arise the need of in-network processing. The multimedia sensor node has high computational capability so the effective use of application layer multimedia processing and data fusion algorithms may not only help in reducing the high bandwidth demand but also lowers the communication cost. The innetwork information processing will span all nodes from the end, the edge to the server. The general processing functions include encoding/decoding, target detection/tracking/localization, object classification and identification. The typical application scenarios are correlation based video fusion and mobile target tracking in video sensor networks, distributed target classification and identification in multimedia sensor networks. (5) Typical applications and IoT services. The availability of low-cost CMOS camera, audio sensor, low power computation and communication modules has led the MSNs to target various new applications, which significantly enhance the existing WSNs capability. Some of the key applications are broadly classified into surveillance, environment monitoring, traffic monitoring, personal and health care, habitat monitoring and target tracking, etc. The IoT services based on multimedia sensor networks contain three different service patterns: (1) Information publish service to utilize ubiquitous sensors to sense the states of objects in the physical world, which is driven by the data, (2) sensing-controlling service, also called Cyber-Physical System, which is driven by the events and (3) IoT search service, which is driven by the users. There are many issues to investigate for the specific applications under the above three service patterns.

Chapter 2

Directional Sensing Models and Coverage Control

2.1 Introduction “How well do the sensors observe the physical space?” is a fundamental problem in many applications of wireless sensor networks (Akyildiz and Su 2002; Chong and Kumar 2003; Gehrke and Madden 2004; Heinzelman et al. 2004; Sinopoli et al. 2003). The coverage, in general answers the question about quality of observation that can be provided by a particular wireless sensor network, has attracted tremendous research interests. The most commonly used sensing model for the coverage problem is the disk model, which assumes that the sensing region for a given sensor is a disk centered around it. A point within the disk sensing region is said to be covered. Meguerdichian et al. (2001) firstly applied the computational geometry and Voronoi Diagrams to study coverage for wireless sensor networks. Shakkottai et al. (2003) discussed the necessary and sufficient conditions for griddeployed sensor networks to cover an unit square region completely. On the other hand, sensors with the directional sensing ability also have wide applications. As discussed in the work of Tao et al. (2005), and Ma and Liu (2005a), the potential applications of camera sensors, which has FoV (Field of View) limitation, span a wide spectrum from commercial to law enforcement, from civil to military. The characteristic of directional sensing fundamentally affects the deployment of sensor nodes, the capture of information, and the scheduling strategies. Different from the conventional disk sensing models, we first present the concept of directional sensor networks and employed a directional sector sensing model in our work (Ma and Liu 2005b, 2007). The similar works can be found in Hörster and Lienhart (2006b) and Rama et al. (2006). Rama et al. (2006) addressed the problem on how to select the optimal number of camera sensors and determine their placement in a given monitored area for multimedia surveillance systems. Hörster and Lienhart (2006a) focused on the placement of camera sensors with respect to maximizing coverage or achieving coverage at a certain resolution. In this chapter, we analyze deployment strategies for satisfying © Springer Nature Singapore Pte Ltd. 2021 H. Ma et al., Multimedia Sensor Networks, Advances in Computer Science and Technology, https://doi.org/10.1007/978-981-16-0107-1_2

11

12

2 Directional Sensing Models and Coverage Control

given coverage probability requirements with directional sensing models, study the issue of coverage enhancement by directional sensors with tunable orientations, and propose a potential field based coverage-enhancing algorithm to improve the area coverage. Intuitively, the directional sensor networks are more accurately characterized by the 3D sensing model. However, due to the high complexity in the design and analysis imposed by the 3D sensing model, most existing works focus on the simplified 2D sensing model and coverage-control method in the directional-sensor networks applications and analyses as described above. In this chapter, we also analyze the 3D directional sensing ability of sensors and present a rotatable 3D sensing model to specify the actual target-detecting scene. To maximize the area coverage during target detecting through directional sensor networks, we develop a virtual force analysis method as 3D directional sensor adjustment strategy to enhance the area coverage after an initial random deployment of 3D viewing orientations. In most existing works about coverage for wireless sensor networks, a point is said to be covered if its position is in the sensing area of a sensor node. This kind of coverage is only considered as the measure of quality for target/event detecting. In fact, coverage can be subject to a wide range of interpretations. For example, most face recognition techniques employed in surveillance systems rely on the assumption of a frontal view of the human face. In order to detect the precise face orientation by cooperation among multiple cameras, we need to guarantee that each point in monitored region be covered by more than one sensor node. Another typical example is target localization, an important function for many applications of multimedia sensor networks. It is necessary to investigate the coverage problem from the perspectives of target recognition and localization. In this chapter, we present the concepts of Directional K-Coverage (DKC) and Localization-oriented coverage (L-coverage) for multimedia sensor networks. They are different from the coverage problem in conventional sensor networks due to the directionality of sensing model and the sensing requirements of target recognition/ localization. The mathematical models are further developed to describe the relation among the number of multimedia sensor nodes deployed randomly, the parameters of sensing models, and the DKC/L-coverage rates. Many applications of sensor networks, ranging from military applications to civilian applications, require that the sensors can detect intrusions in the interested region. The coverage of intrusions paths play an important role in detecting intrusions. As defined in most sensor network literatures, a region is said to be covered if every point in the region is within the sensing radius of a sensor node. However, if the goal of deploying sensors is to detect moving objects and phenomena, the traditional full coverage model may be unnecessary because coverage for intrusion detecting only need to ensure that no moving object or phenomenon can go through the interested field without being detected, while the full coverage implies that every point in the sensor-deployed region is covered. Towards this end, this chapter also discusses the partial coverage by applying the percolation theory to solve the exposure path problem for multimedia sensor

2.2 Directional Sensor Networks

13

networks. We introduce a bond-percolation theory based scheme to derive the critical densities for both omnidirectional sensor networks and directional sensor networks under random sensor deployment where sensors are deployed according to a 2-dimensional Poisson process.

2.2 Directional Sensor Networks 2.2.1 Motivation In this section, we first analyze the directional sensing ability of sensors and define the directional sensing model. Then, we derive the conditions to satisfy certain coverage requirement for randomly deployed directional sensor networks. In omnidirectional sensor networks, the coverage is mainly determined by the distribution of sensor nodes. In contrast, the coverage in directional sensor networks is determined by both the distribution and the directions of sensor nodes. We also consider that the sensor direction can be rotatable. Take the camera sensor for example, the pan-tilt control enables the camera sensor to adjust its FoV (field of view). Therefore, the directionality of the sensors brings a new problem – how to enhance the coverage through adjusting the directions of sensors in directional sensor networks. We further define a rotatable directional sensing model and design a potential field based method, area coverage-enhancing (ACE) algorithm, to maximize the coverage by adjusting sensors’ orientations under the random deployment strategy.

2.2.2 Coverage Problem with Directional Sensing 2.2.2.1

Directional Sensing Model

Different from conventional sensing models where an omni-sensing area centers on the sensor node, we employ a directional sensing model. An analogy can be found in the concept of field of view in cameras (Forsyth and Ponce 2002). We consider a 2-D model where the sensing area of a sensor s is a sector denoted − → by 4-tuple (L, r, V , α). Here L is the location of the sensor node, r is the sensing − → radius, V is the center line of sight of the camera’s field of view which will be termed sensing direction, and α is the offset angle of the field of view on both sides − → of V . Figure 2.1 illustrates the directional sensing model. Note that the conventional omni-sensing model is a special case of new model when α is π .

14

2 Directional Sensing Models and Coverage Control

Fig. 2.1 Directional sensing model

V L1

r a

a

L

A point L1 is said to be covered by directional sensor node s if and only if the following conditions are met: (1) d(L, L1 ) ≤ R, where d(L, L1 ) is the Euclidean distance between the location L of sensor s and point L1 . −−→ − → (2) The angle between LL1 and V is within [−α, α]. −−→   A simpler method to judge if point L1 is covered by s is: if LL1  ≤ r and −−→ −     → −−→ LL1  · V ≥ LL1  cos α, L1 is covered and otherwise not. An area A is covered by sensor s, if and only if for any point L ∈ A, L is covered by s.

2.2.2.2

Coverage Probability with Directional Sensing Model

Deriving the critical density to achieve region coverage for random sensors deployment is a fundamentally important problem. Generally sensor nodes can be deployed in three different way: regular deployment, planned deployment, or random deployment (Tilak et al. 2002). In regular deployment, sensors are placed in regular geometric topology. An example of regular deployment is the grid-based approach where nodes are located on the intersection points of a grid. Planned deployment can be exemplified by the security sensor systems used in museums. In these systems, the most valuable exhibit objects are equipped with more sensors to maximize the coverage of the monitoring scheme. An important problem for planned deployment is to minimize the number of sensors required for covering the sensing area. Regarding this, the widely studied Art Gallery problem investigates the number of observers necessary to cover an art gallery such that every point is monitored by at least one observer. It has been shown that this problem can be solved optimally in a 2D plane but it is NP-hard when extended to a 3D space. In many situations, deterministic deployment is neither feasible nor practical. The deployment policy often is to cover the sensor field with sensors randomly distributed in the environment. The random distribution scheme can be uniform, Gaussian, Poisson or any other distribution dependent on

2.2 Directional Sensor Networks

15

the applications. In those cases, the redundancy and density of sensor deployment are problems to focus on. For sensor networks formed by random deployment, for example, dropped by an airplane, it is difficult, if not impossible, to guarantee 100% coverage of the monitored area even if the node density is very high. Our focus in investigating the coverage problem in directional sensor networks is then set to be probability guarantee. Assume that the area of the monitored region is S, and there are no two sensors located at exactly the same position and sensing region. Notice that a directional sensor with offset angle α covers a sensing area of αr 2 . Assume that the sensors are randomly deployed in the monitored region, and the locations of sensors obey uniform distribution. After N directional sensors are deployed, the probability that the targeted region is covered is given by p = 1 − (1 −

αr 2 N ) . S

(2.1)

Notice that for omnidirectional sensors with α = π , the coverage probability for deploying N sensors is simply p = 1 − (1 −

πr2 N ) . S

(2.2)

Naturally, if the coverage probability of the targeted region is required to be at least p, the number of deployed directional sensors should be N≥

ln(1 − p) . ln(S − αr 2 ) − ln S

(2.3)

Again, for omnidirectional sensors with α = π , the number of sensors required for the given coverage probability p should be N≥

ln(1 − p) . ln(S − π r 2 ) − ln S

(2.4)

If the number of deployed sensors is fixed, we can adjust sensors’ sensing range in order to satisfy the coverage requirement. Assume that the number of sensors is N. We can easily draw the relationship between p and r from Equation 2.1. On the other hand, if the coverage problem must satisfy a given value p, we can adjust the sensing radius to achieve the goal according to Equation 2.5 and the radius r should be  S 1 r= (2.5) (1 − (1 − p) N . α

16

2 Directional Sensing Models and Coverage Control 1

Coverage Rate (p)

* 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

a=180 * a=90 * +

*

*

*

+

+

1250

1500

+ a=60 + a=45 a=30

+

0

250

500

750

1000

Sensor Number (N)

Fig. 2.2 The effect of the sensor number

Coverage Rate (p)

1

* *

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

a=180 * a=90

* +

*+

+ a=60 + a=45 a=30

+ * +

0 0

5

10

15 20 Sensor Radius (r)

25

30

Fig. 2.3 The effect of the sensing radius

Next, we conduct a series of simulations to verify our theoretic analysis. We first consider the effect of the number of sensors to the coverage probability. Figure 2.2 shows that the larger the sensor number N is, the higher the coverage probability p becomes. In other words, the coverage probability will increase with the increasing of number of nodes. Figure 2.3 shows the relationship between the coverage probability p and the sensing radius r. It indicates that the larger the sensing radius is, the higher the coverage probability p becomes. We also evaluated the relationship between the coverage probability and the offset angle. For 1000 sensors, Fig. 2.4 shows the coverage probability that different offset angles will achieve. Moreover, for the same region, if the coverage probability of a directional sensor network is the same as the achievable coverage probability of N omnidirectional sensors, the number of deployed directional sensors M is evaluated. Figure 2.5 shows the relationship between the offset angle α and the factor f = M N.

2.2 Directional Sensor Networks

17

Coverage Rate (p)

1 *+

*

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

* r=25 + r=15

r=20

*

** +

+ +

r=10

+ r=5

0 30

0

45

60 90 Offset Angel (a)

135

180

*

*

135

180

Factor (f)

Fig. 2.4 The effect of the offset angle

9 8 7 6 5 4 3 2 1 0

* * * *

0

30

45

60 90 Offset Angel (a)

Fig. 2.5 The number factor for different offset angle

2.2.3 Coverage Enhancing with Rotatable Directional Sensing Model 2.2.3.1

Rotatable Directional Sensing Model

In the directional sensing model, the sensing area is a sector with the angle of view. Next, we further present a rotatable directional sensing model where the sensing directions of sensors are rotatable. As shown in Fig. 2.6, by a simple geometrical abstraction, the rotatable directional sensing model can be viewed as a rotatable sector in a 2-dimensional plane. Definition 2.1 The rotatable directional sensing model. At time t, the sensing area − → of one sensor is a sector denoted by 4-tuple P , R, V (t), α, where P is the location

18

2 Directional Sensing Models and Coverage Control

Fig. 2.6 Rotatable directional sensing model V (t1 )

V (t2 )

− − → → − →  in a 2-dimension plane; R is the maximum sensing radius; V (t)  Vx (t), Vy (t) is the unit vector which cuts the sensing sector into half, termed as sensing direction; − → α is the offset angle of the field of view on both sides of V (t), and 2α denotes the maximum angle of sensing achieved by the directional sensor. From Fig. 2.6, we have that the rotatable directional sensing model is a Boolean model. Then, at the time t, a point P1 is said to be covered by the directional sensor at P iff (1) d(P , P1 ) ≤ R, where d(P , P1 ) is the Euclidean distance between the −−→ − → point P and P1 ; and (2) the angle between P P1 and V (t) is within [−α, α]. An area S is covered by a directional sensor iff ∀P1 ∈ S, P1 is covered by the sensor.

2.2.3.2

The Problem of Area Coverage Enhancing

We formulate the area coverage-enhancing problem for directional sensor networks as follows. Definition 2.2 Area Coverage Enhancing Problem. Let n directional sensor nodes, denoted by N1 , N2 , · · · , Nn , be randomly deployed in S, and − → − → − → V1 (t0 ), V2 (t0 ), · · · , Vn (t0 ) be the initial orientation vectors of N1 , N2 , · · · , Nn , respectively. Let Ac be the initial covered area. The area coverage enhancing − → − →  − → problem is to find a value of V1 (t), V2 (t), · · · , Vn (t) to maximize A c − Ac where A c is the covered area corresponding to the adjusted sensor orientations. Obviously, the area coverage enhancing problem is a complex optimization problem. Even if we limit   the value range of sensor orientation from [0, 2π ] to 1 2 0, 2π m , 2π m , · · · , 2π where m is a natural number, this problem is still a complex combinatorial optimization problem. It is difficult to find the optimal − → − → − →  V1 (t), V2 (t), · · · , Vn (t) in the reasonable time. Therefore, we design a heuristic

2.2 Directional Sensor Networks

method to obtain an approximate optimal sub-section.

19

− → − → − →  V1 (t), V2 (t), · · · , Vn (t) in the next

2.2.4 Potential Field Based Coverage-Enhancing Method In the literatures of sensor networks (Zou et al. 2004; Li et al. 2006; Howard et al. 2002; Poduri and Sukhatme 2004; Gui and Mohapatra 2004), the potential field is a popular and effective approach for optimizing node distribution in mobile sensor networks, in which each sensor is treated as a virtual particle and repelled by other ones, and moves to the new locations. The combination of repulsive forces is used to move the sensors to the new locations that improve the area coverage. However, these schemes about area coverage enhancing cannot be applied for our ACE problem. The main reasons are as follows: • In the traditional schemes, the sensing model is omnidirectional, i.e., the sensor is at the center of its sensing disk. This implies that the overlaps of covered area are minimum when the sensors are evenly distributed in the deployment region. In our ACE problem, for a directional sensor, the shape of sensing area is a fan and the sensor is at the vertex not the center of the fan. It is obvious that the even distribution of sensors does not mean the even distribution of the sensing areas. • In the traditional schemes, sensors’ locations are not fixed and the movements of sensors follow the translation model. In contrast, our ACE problem assumes that sensors’ locations are fixed after deployment and the sensing orientations are rotatable, i.e., the movements of sensors follow the rotation model. Therefore, to make the ACE problem tractable, we first propose a notion of sensing centroid, and then map the ACE problem into the uniform distribution problem of sensing centroids which can be solved by defining appropriate rules of the potential field force and the control law. Our method can reduce the coverage overlap and coverage hole, thus maximize the area coverage of the directional sensor network.

2.2.4.1

Sensing Centroid

In physics, the centroid means the geometric center of the object’s shape or its physical barycenter. Because the sensing model proposed in Section 2.2.2 is a Boolean one, the centroid of a sensing sector is the geometric center of this sector. Then, for a given directional sensor, the corresponding sensing centroid can be defined as follows: Definition 2.3 Sensing centroid of the directional sensor. Givena directional sensor − → si , the corresponding sensing area is a sector characterized by Li , V i , R, α . The

20

2 Directional Sensing Models and Coverage Control

Fig. 2.7 Sensing centroid

sensing centroid, denoted by ci , of si is defined as the geometric centroid of the  − → sector Li , V i , R, α , i.e.,  ci 

2R sin α cos θi 2R sin α sin θi + xi , + yi , 3α 3α

(2.6)

− → where θi is the angle of V i . As shown in Fig. 2.7, the circular motion of centroid represents the rotation of sensing orientation. In this way, if the sensing centroids are evenly distributed, then the overlaps of the sector sensing areas are at a low level.

2.2.4.2

Potential Field Force

As shown in Fig. 2.8, vi and vj are two arbitrary directional sensors, and dij denotes the Euclidean distance between vi and vj . It is easy to see that the two sensing sectors cannot have overlap when dij > 2R. On the other side, if dij ≤ 2R, one sensing sector can overlap the other by adjusting the orientations of vi and vj . Then, we give a definition of neighboring nodes as follows:

Fig. 2.8 The repulsive force between two sensing centroids

2.2 Directional Sensor Networks

21

Definition 2.4 Neighboring nodes. For two arbitrary directional sensors vi and vj , if the Euclidean distance between them is less than or equal to the twice of sensing radius, then vi and vj are called neighboring nodes. Let ci and cj be the sensing centroids of vi and vj , respectively. We assume that if vi and vj are neighboring sensors, there exist a repulsive force between ci and cj . − → − → Let Fij be the repulsive force on ci . We define the expression of Fij as follows: − → Fij =

⎧ ⎨

β γ Dij

− − → , cj ci , if dij ≤ 2R;

→ ⎩− 0,

(2.7)

otherwise,

where Dij is the Euclidean distance between ci and cj , β and γ are two parameters − → → c− which determine the relationship between the magnitude of Fij , and − j ci is the − → orientation of Fij . Definition 2.5 Neighboring node set. For a given sensor vi , its neighboring node set is the set of all sensors whose distances to vi are less than or equal to 2R. Let i be the neighboring node set of vi , then i  {vj |dij ≤ 2R, i = j }. − → The total force, denoted by Fi , on ci can be expressed as: − → Fi =



∀cj ∈i

− → 0,

− → Fij , if i = ∅;

(2.8)

otherwise.

As shown in Fig. 2.9, consider the four directional sensors v1 , v2 , v3 , and v4 , their centroids are c1 , c2 , c3 , and c4 , respectively. The sensor v1 has two neighboring − → − → nodes: v3 and v4 . This implies that there exist two force, F13 and F14 on c1 . Because − → d12 > 2R, there is no repulsive force between c2 and c1 . Hence, the total force F1 − → − → − → on c1 is given by F1 = F13 + F14 .

2.2.4.3

Control Laws

Our goal is to build a virtual physical system which can be mapped into a real directional sensor network. In this virtual physical system, the virtual particles – sensing centroids – can move under the defined potential field. Then, we need to define some control laws to build the relationship between the motions of sensing centroids and the forces. On the other hand, the sensing centroids are not “free particles”, and they have both kinematic constraint and dynamic constraints.

22

2 Directional Sensing Models and Coverage Control

Fig. 2.9 An example of potential field forces with four sensing centroids

(a) The kinematic constraints. The rotation of a sensor’s sensing orientation can behave as the circular motion of its sensing centroid. Hence, the moving trajectory of each sensing centroid is not arbitrary but strictly obeys the circular motion around its corresponding sensor. The moving direction of a centroid can − → be decided by the projection of its total force ( F T ) onto the circle tangent. (b) The dynamic constraints. During the motion process, the virtual force on a certain centroid is frequently changed. It is unpractical for sensors to exchange their locations and sensing orientations information with neighboring nodes at each time. We assume that at an interval t, the locations and sensing orientations information will be exchanged once. To simplify the problem, we regulate each centroid to move along with its corresponding circle with a fixed, small rotation angle (θ ). By the fine-tuning method, we can gradually approach the optimal centroid distribution. The reason is as follows: • The virtual force on a certain centroid is frequently changed. It is hard to accurately express the relationship between the virtual force and the motion of centroid using a simple function. • At each interval t, the centroid moves with a fixed, small rotation angle, it is convenient to relieve the computational burden on individual sensor. − → When F T = 0, the centroid will be static. Strictly speaking, the centroid will never come to a complete stop; rather, its potential field force will approach − → zero. As shown in Fig. 2.10, assume that at the point O, F T = 0. Because the centroid moves with a fixed rotation angle θ , it may move to some points on  or OB  not the point O. In this case, the centroid will do reciprocating motion AO − → around the point O since F T = 0. To reach a static equilibrium quickly, if − → F T ≤ ε, we regard the centroid has reached a stable state, and needn’t move

2.2 Directional Sensor Networks

23

Fig. 2.10 The vibration of the sensing centroid

any more. After a series of intervals, the network will approach static equilibrium when the whole centroids in network have been static. In this way, we can obtain ideal area coverage by uniform distribution of centroids in the directional sensor network, thus maximize the detection rate of targets appearing in a given region.

2.2.5 Simulation Results In order to perform empirical evaluation of our ACE algorithm, we implemented a stand alone VC++ simulation platform Senetest 2.0. Without the loss of generality, our examples and discussions focus on the square region.

2.2.5.1

Case Study

We utilize a case (S = 500 ∗ 500 m2 , N = 105, R = 60 m and α = 45◦ ) to illustrate the effectiveness of our ACE algorithm. We record the simulation results of ACE algorithm at different stages, as illustrated in Fig. 2.11. The initial area coverage rate is only 65.74% (see Fig. 2.11a), while about 16% improvement is achieved after 30 adjustment times and the enhanced coverage rate approaches 81.45%. It is obvious the monitored region can be covered more uniformly by directional sensors. Figure 2.12 shows the improvement in area coverage during the execution of the ACE algorithm. We can get that the area coverage will increase with the increasing of intervals, and they approximately obey exponential relationship. When the number of interval exceeds 30, the area coverage probability fluctuates around 81.20%. In this case, we consider that the performance of area coverage is nearoptimal.

24

2 Directional Sensing Models and Coverage Control

Fig. 2.11 The process of area coverage enhancement by using the ACE algorithm

Fig. 2.12 Area coverage improvement of the ACE algorithm

2.2 Directional Sensor Networks

2.2.5.2

25

Performance Evaluation

Through a set of simulation results, we discuss the effect of the 4 parameters on the performance of ACE algorithm: the node scale (N), the sensing radius (R), and the sensing offset angle (α). We also respectively analyze and compare the performance of ACE algorithm affected by the three parameters with that of the centralized algorithm proposed in our prior work (Tao et al. 2006). We fix S = 500 ∗ 500 m2 , R = 40 m, α = 45◦ and run the simulation on five different node scales. For each node scale, simulations are run for 20 times per scenario. From the curve in Fig. 2.13a, we can conclude that when R and α are fixed, the less the node scale is, the less the initial area coverage probability becomes. With the increasing of N, the area coverage enhancement p will continuously increase. For example, the ACE algorithm can provide about 6.89% area coverage improvement in a 100-node network, while about 14.40% improvement is achieved in a 200-node network. The area coverage improvement will decrease with the increasing of N . This is because the growth of N will cause the increasing of

1.0 Area Coverage Probability (p)

Area Coverage Probability (p)

.8 .7 .6 .5 .4 Initial Area Coverage Probability Centralized Algorithm ACE Algorithm

.3

.8 .6 .4 Initial Area Coverage Probability Centralized Algorithm ACE Algorithm

.2 0.0

.2 50

100

150 Node Scale (N)

200

250

20

40

60 Sensing Radius (R)

(a)

80

100

(b)

Area Coverage Probability (p)

.8 .7 .6 .5 .4 Initial Area Coverage Probability Centralized Algorithm ACE Algorithm

.3 .2 30

40

50

60 70 80 Sensing Offset Angle (a)

90

(c)

Fig. 2.13 The effect of three key parameters on the performance of ACE algorithm. (a) The effect of node scale (R = 40 m, α = 45◦ , Δθ = 5◦ ). (b) The effect of sensing radius (N = 100, α = 45◦ , Δθ = 5◦ ). (c) The effect of sensing offset angle (N = 100, R = 40 m, Δθ = 5◦ )

26

2 Directional Sensing Models and Coverage Control

initial area coverage probability (e.g. 60%). The direct result is the rapid decrease of the probability of generating coverage hole among multiple neighboring sensors. Without doubt, this will weaken the performance of ACE algorithm. In addition, some nodes in boundary area will cause area coverage loss, which can not be ignored. Furthermore, the effects of sensing radius (R) and sensing offset angle (α) in the ACE algorithm are similar to that of node scale. When the value of node scale is fixed, the less the sensing radius or the sensing offset angle is, the less the probability of generating the overlapping sensing regions in the network will become. In this case, the area coverage improvement is relatively limited (i.e. merely 2%–4% improvement) using either the ACE algorithm or the centralized algorithm. With the increasing of the sensing radius or the sensing offset angle, the area coverage improvement caused by ACE algorithm will continuously increase. For instance, in a 100-node network, the maximal 15.91% area coverage improvement is achieved when R = 70 m and α = 45◦ . The gradual increasing of R or α will result in that the performance of the ACE algorithm decreases, as illustrated in Fig. 2.13b–c. Compared with the ACE algorithm, the centralized one is not sensitive to the changes of the three key parameters. Under the same simulation conditions, the performance of the ACE algorithm is superior to that of the centralized one obviously.

2.3 Three Dimensional Directional Sensor Networks 2.3.1 Motivation For some application scenes of multimedia sensor networks, 2D directional sensing model cannot accurately characterize the coverage problem. In this section, we first analyze the 3D directional sensing ability of sensors, and present a rotatable 3D sensing model to specify the actual target-detecting scene (Ma et al. 2009). In particular, we address the issue of coverage enhancement by sensors with tunable 3D orientations. To maximize the area coverage during target detecting through directional sensor networks, we develop a virtual force analysis method as 3D directional sensor adjustment strategy to enhance the area coverage after an initial random deployment of 3D viewing orientations. Then, we apply the Simulated Annealing (SA) algorithm to find the global optimization. Since our approach uses the centralized coverage control, the computation is just executed in the powerful sink node. Thus, each 3D directional sensor only needs to report its initial position and orientation, and receives the adjusted sensing orientation. As a result, our algorithm does not impose high communication and computation overhead as required by the other existing schemes.

2.3 Three Dimensional Directional Sensor Networks

27

2.3.2 The 3D Directional Sensing Model Different from the previous 2D directional sensing model which is viewed as a sector in a two-dimension plane, 3D sensing model focuses on two distinct features of a PTZ (Pan Tilt Zoom) camera: (1) the sensor is located at a fixed 3D point, and its sensing direction is 3D rotatable around its location; (2) the coverage area of sensor is constrained by the field of view, and functions as the projecting quadrilateral area in the monitored scene plane. Thus, we define a 3D directional sensing model as follows. − → Definition 2.6 3-D directional sensing model is denoted by 5-tuple (P , D , A, − → α, β), where P is the location (x, y, z) of a directional sensor in 3D space; D is the sensing orientation of the directional sensor in the time t. Unless otherwise − → specified, D = (dx(t), dy(t), dz(t)) is of unit length, where dx(t), dy(t), and dz(t) are the components along X-axis, Y-axis, and Z-axis, respectively; A is the maximal value of the tilt angle γ (0 ≤ γ ≤ A); α and β are the horizontal and − → vertical offset angles of the field of view around D . Figure 2.14 illustrates our 3D sensing model. This model is a Boolean model. At the time t, a point P1 in the monitored scene is said to be covered by a 3D directional sensor if and only if the following conditions are met: P1 is located in the projecting quadrilateral area of the sensor viewing space. R = z × tan(A + β) is called the radius of its acting area. The intersecting point (C) between the scene plane (z = 0) − → and the viewing central-line with the direction D is called the centroid of sensing area, which is determined as follows:  dy dy (xc , yc ) = − z + x, − z + y . (2.9) dz dz Fig. 2.14 The 3D directional sensing model

28

2 Directional Sensing Models and Coverage Control

2.3.3 Area Coverage-Enhancing Method 2.3.3.1

Problem Formulation

To make the coverage enhancement problem for 3D directional sensor networks tractable, we first make the following assumptions. A1. All randomly deployed directional sensors are homogeneous. Specifically, all the sensors are 3D rotatable, and their sensing parameters (i.e., A, α, β) are the same. However, their coverage areas, i.e., the projecting quadrilateral areas, are heterogeneous. A2. The location of each sensor is fixed after the initial deployment, and its initial 3D sensing orientation is randomly set. Each sensor is able to control the change of its sensing orientation. A3. Each sensor knows its exact location information. All sensors can communicate with each other, which is necessary for transmission of new orientation to the nodes. A4. The monitored scene of the network is simplified as a 2D plane (i.e., z = 0). Then, we can formulate the coverage-enhancing problem for 3D directional sensor networks as follows. Problem 1 For a group of sensors (N1 , N2 , . . . , NM ) with initial random deploy− → − → − → ments, how to find a group of 3D sensing orientation setting ( D 1 , D 2 , . . . , D M ) to maximize the coverage rate of the monitored region?

2.3.3.2

Virtual Force Analysis Based Coverage-Enhancing

We first need to extend the area coverage-enhancing solution for 2D to that for 3D directional sensor networks, which requests a new method to convert the coverage problem. As we know, the coverage area of 3D directional sensor is a quadrilateral centered as the intersecting point between the viewing central-line and the scene plane. This point functions as the centroid point of coverage area of sensor. In fact, the coverage area is changed with the movement of this centroid point. Thus, the uniform distribution of centroid points can approximately represent that of quadrilateral coverage areas. According to potential field theory, each centroid can be treated as a virtual charged particle and the simple force laws are defined to govern the repulsive interaction among multiple neighboring centroid points. Let Ci denote the coverage area centroid point of the sensor Ni . We will define the virtual force and detail its computation as follows. Definition 2.7 A 3D directional sensor Nj is considered as a neighboring node of the node Ni if and only if the 2D Euclidean distance between the XY coordinates of their locations in the monitored plane is less than Ri + Rj , where Ri and Rj are the radius of acting area of Ni and Nj , respectively.

2.3 Three Dimensional Directional Sensor Networks

29

The neighboring node set of the sensor Ni is denoted as:  i = {Nj | (xi − xj )2 + (yi − yj )2 < Ri + Rj , i = j }. − → The force on the centroid Ci by other centroid Cj , denoted by F ij , is defined by the following equation: − → F ij =

 ⎧ kR − → ⎪ ⎨ dij , ρ ij , if Nj ∈ i ; (2.10)

⎪ ⎩− → 0,

otherwise

where kR is a measure of the repulsive force, dij is the Euclidean distance between − → Ci and Cj , and the magnitude of F ij is inversely proportional to dij . The direction − → → ρ ij , which is the unit vector of orientation from Cj to of F ij is determined by − Ci . Notice that only when the sensor Ni and Nj are neighboring nodes, there exists repulsive force between their corresponding coverage area centroid; otherwise, there exists no force. The total repulsive force on its coverage area centroid Ci can be − → − → expressed by F i = Nj ∈ i F ij . For example (see Fig. 2.15a), consider the four sensors N1 , N2 , N3 , and N4 , their coverage area centroid points are C1 , C2 , C3 , and C4 , respectively. For C1 , its corresponding sensor N1 has two neighboring nodes: − → − → − → N3 and N4 . Hence, the force on C1 is given by F 1 = F 13 + F 14 . The virtual physical system must be mapped onto a real physical system composed of real 3D directional sensors by defining the control laws (Howard et al. 2002). In fact, the coverage area centroids of real sensors are not “free particles”,

i

F14 F1

C2

C1

C3

F13 iy i

C4 (a)

i

ix

(b)

Fig. 2.15 (a) An example of four centroids with virtual forces. (b) The adjustment of centroid

30

2 Directional Sensing Models and Coverage Control

because they have both kinematic and dynamic constraints. The kinematic constraints can be largely ignored if sensors have holonomic drive mechanisms (Howard et al. 2002). Here, the rotation of a sensor’s 3D sensing orientation can behave as the motion of its coverage area centroid. Although its corresponding sensor is freely rotatable around the fixed location, the moving trajectory of each centroid is limited within the circular area with the radius r = z × tan(A). The moving direction of a centroid can be decided by that of its total force, and by the projection of force onto circle tangent once the direction of force is toward outside of the circular bound of acting area. Dynamic constraints involve the relationship between force and motion. During the motion process, the virtual force on a certain centroid is frequently changed. To simplify the problem, we regulate each sensor to rotate along with the force direction of its coverage area centroid with a fixed, small rotation angle (θ ) at each adjustment, as shown in Fig. 2.15b. By the fine-tuning method, we can gradually approach the stable centroid distribution. The reasons are as follows: (1) the dynamic change of virtual force makes it hard to accurately express the relationship between the virtual force and the motion of centroid; (2) it is convenient for the adjustment in a fixed angle to reduce the computation cost. − → When | F i |=0, the centroid Ci will be static. Strictly speaking, the centroid will − → never come to a complete stop. To reach a static equilibrium quickly, if | F i | ≤ ε, where ε is a predetermined threshold, we say that the centroid Ci has reached a stable state, and does not need move any more. The above scheme, as shown in Algorithm 1 given below, is to adjust the sensing direction for a directional sensor.

Algorithm 1: NewDirection (Ni : a sensor; θ : angle)

1 2 3 4 5 6 7 8 9

− → Input: D i – Initial sensing direction of sensor Ni ; θ – Rotate angle of sensor; Initialization: − → F i := 0; //initialize the force; Computation:; Find the coverage area centroid Ci for Ni ; − → Find the neighboring node set i and compute F i ; − → if | F i | ≥ ε then − → Decide the motion of Ci by the force F i ; − → Find the new direction D i by rotating the angle θ; else − → Keep D i unchanged;

The virtual-force-analysis based coverage-enhancing iteration procedure (denoted as VFA-ACE algorithm) stops when the time of adjustment exceeds the threshold or no significant coverage rate improvement is achieved. After a series of adjustments, the network will approach stable equilibrium when all centroid

2.3 Three Dimensional Directional Sensor Networks

31

points are static. In this way, we can gradually enhance area coverage, but cannot ensure that the optimal coverage solution can be reached, thus maximizing the area coverage of 3D directional sensor networks.

Algorithm 2: SA-ACE/*SA based Area Coverage Enhance*/

1 2 3 4 5 6 7 8 9 10

Parameters: t – Temperature (angle) step size; Tbound – Lower bound on temperature (angle); Initialization: T := θ; //Initial temperature is the initial adjusting angle − → − → − → Config: = I nitConf ig( D 1 , D 2 , . . . , D M ); //Initial direction vector E := −coveragerate(Conf ig); //Initial coverage rate of SN while true do //Find a new direction configuration using Algorithm 1; Find the coverage area centroid Ci for Ni ; Find the neighboring node set i (K is its node number); for i = 1 to M do Call Newdirection(Ni , T ); //calculate the new coverage rate Enew If we reach a direction configuration with a lower energy, then accept it.

11 12 13

if Enew < E then Accept the new configuration; E := Enew ;

14

// If we reach a direction configuration with a higher energy, then accept it with a given probability. if exp (E − Enew )/T < random(0; 1) then Accept the new configuration; E := Enew ; else Reject the new configuration;

15 16 17 18 19 20 21 22 23

// Lower the annealing temperature. T := T − t; // If we have reached the lower bound of the annealing temperature or the higher coverage rate, then finish. if T < Tbound then break;

2.3.3.3

Coverage Optimization Approach

The area coverage optimization is a constraint optimization problem. The global area coverage optimization is clearly an NP-hard problem. Because we cannot find the analytic relationship between the optimization objective and the tunable parameters, we choose to use the heuristic optimization technique. We select the Simulated Annealing (SA) algorithm to optimize the area coverage-enhancing for 3D directional sensor networks.

32

2 Directional Sensing Models and Coverage Control

The SA is a global optimization method that tries to find optimal solution in the candidate solution space. Its basic idea is as follows. From the current state, select a random successor state. If it has a lower energy than the current state, then accept the transition, i.e., use the successor state as the current state. Otherwise, accept the transition randomly with a given probability. The energy function determines the optimization objective, which is defined as the negation of coverage rate of sensor networks in this section. Thus, minimization of the energy function amounts to maximization of the coverage rate. The annealing schedule has an important impact on optimization quality. It is measured with temperature step size, i.e., the angle step size, the size of temperature drops at each iteration of the SA algorithm. Intuitively, the slower the temperature cools down, the better the optimization quality, but the algorithm running time also gets longer. Therefore, there is a tradeoff between optimization quality and amount of time spent during the optimization procedure.The SA-based area coverage enhancing (SA-ACE) algorithm is shown in Algorithm 2.

2.3.4 Case Study and Performance Evaluations We have implemented a 3D simulation platform 3DSenetest 1.0. Our cases and discussions focus on the square region S = 300×300 m2 , and the camera parameters α = 60◦ , β = 45◦ , but the height (z) is randomly taken from 5 to 15. We utilize a case (M = 150, A = 50◦ , θ = 5◦ , and t = 0.1) to illustrate the effectiveness of our algorithm. The simulation results of SA-ACE algorithm at different stages are illustrated in Fig. 2.16. The initial area coverage rate is only 30.46% (see Fig. 2.16a), and the improved coverage rates are 66.03%, 80.69%, and 85.95% after 15, 30, and 50 adjustments (see Fig. 2.16b–d), respectively. Thus, the monitored region can be covered more uniformly by 3D directional sensors. Figure 2.17 shows the improvement in area coverage during the execution of both VFA-ACE algorithm and SA-ACE algorithm. We find that the area coverage will increase as the number of adjustments increases, and SA-ACE algorithm obviously can achieve faster convergence speed. We evaluate the performance of SA-ACE algorithm from three key parameters: the maximal tilt angle (A), the angle step size (t), and the initial adjustment angle (θ ). (1) The Effect of A: When M, θ , and t are fixed, the smaller the maximal tilt angle of sensor is, the lower the probability of overlapping coverage areas in the network will be. In this case, the increase of area coverage rate is relatively limited (i.e., approximately 17.17%–17.21% improvement in final coverage rate when A is smaller than or equal to 38◦ ) using the SA-ACE algorithm. With the increase of the maximum tilt angle, the coverage rate improved by SAACE algorithm will continuously increase. This is expected since a higher A enlarges the maximum coverage area of each sensor. For instance, in a 150-

2.3 Three Dimensional Directional Sensor Networks

33

Fig. 2.16 The process of area coverage enhancement by using the SA-ACE algorithm. (a) p0 = 30.46 %. (b) p15 = 66.03 %. (c) p30 = 80.69 %. (d) p50 = 85.95 %

100 SA−ACE Algorithm VFA−ACE Algorithm

Area Coverage Rate (%)

90 80 70 60 50 40 30 20

0

5

10

15

20

25

30

35

40

Adjustment Times

Fig. 2.17 Coverage improvement (SA-ACE: θ = 5◦ , t = 0.1; VFA-ACE: θ = 1◦ )

45

50

34

2 Directional Sensing Models and Coverage Control

Area Coverage Rate (%)

100

80

Final Coverage Rate Init. Coverage Rate

60

40

20

0 36

38

40

42

44 46 Maximum Tilt Angle

48

50

52

Fig. 2.18 The effect of maximal tilt angle (M = 150, θ = 5◦ , t = 0.1) 100

Area Coverage Rate (%)

90 80 70 Final Coverage Rate Init. Coverage Rate

60 50 40 30 20 0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Angle Step Size

Fig. 2.19 The effect of angle step size (M = 150, A = 50◦ , θ = 5◦ )

node network, 100% final coverage rate is achieved when A = 52◦ as shown in Fig. 2.18. (2) The Effect of t and θ : When M, A, and θ are fixed, the larger the angle step size is, the lower the final coverage rate will be (see Fig. 2.19). This is because a large angle step size decreases the number of adjustments and makes it difficult to converge to the annealing optimal coverage rate. However, a too small angle step size will result in excessive adjustment times or costs. Thus, there is a tradeoff between the angle step size and the expected coverage improvement. The initial adjustment angle (θ ) is used to describe the initial rotation degree of centroid, which is used at the first iteration of the SA algorithm. When M, A, and t are fixed, from the curve shown in Fig. 2.20, the greater the initial adjustment angle is, the greater the improved area coverage rate will be.

2.4 Directional K-Coverage for Target Recognition

35

100

Area Coverage Rate (%)

90 80 Final Coverage Rate Init. Coverage Rate

70 60 50 40 30 20 1

2

3

4

5 6 Init. Adjustment Angle

7

8

9

10

Fig. 2.20 The effect of initial adjustment angle (M = 150, A = 50◦ , t = 0.1)

2.4 Directional K-Coverage for Target Recognition 2.4.1 Motivation Video surveillance mainly involves monitoring of humans using visual information obtained by cameras from the face of the object. For the majority of surveillance applications, capturing the frontal face view and discriminating face orientation of the object (i.e. intruder) are of significant importance. Many existing face orientation detection algorithms are based on analyzing a frame captured by a single camera in which both eyes and/or mouth are observable (Hongo et al. 1997; Kapoor and Picard 2002). In other words, they suppose images to be taken from a generally frontal view. However, a single camera can hardly capture enough frontal view information. Moreover, in view of the limitation of eyes and mouth detection techniques, it’s easy for a single camera to cause great error. Hence, researchers have paid much attention to face orientation detection using cooperation among multiple cameras in camera sensor networks. Chang and Aghajanet (2006) proposed a collaborative technique for face analysis in camera sensor networks with a dual objective of detecting the camera view closest to a frontal view of the subject and estimating the face orientation angles in all the camera views based on additional fusion of local angle estimates. Once the camera with the closest view to the frontal face view is identified, further exchange of the face orientation angles estimated by all cameras allows for a collaborative refinement of the estimates according to their associated confidence levels. From the description above, we can infer that in order to detect the precise face orientation by cooperation among multiple cameras, we should guarantee each point in monitored region be covered by more than one camera. That is, the coverage problem in camera sensor networks is a K-coverage one, where K is greater than 1. A wireless sensor network K-covers its deployment region if every point in its

36

2 Directional Sensing Models and Coverage Control

deployment region is within the coverage ranges of at least K sensors. Wan and Yi (2006) studied how the probability of a deployment region being K-covered by randomly deployed sensors changes with the sensing radius or the number of sensors. Liu et al. (2007) proposed a mathematical model, which can compute the number of sensors needed to achieve the expected coverage rate, only if the proportion of the sensing radius to the monitored region is known. However, the existing works cannot be applied into camera sensor networks directly. Different from conventional sensor networks, K-coverage issue in camera sensor networks is characterized by two distinct features: (1) Unlike an isotropic sensor, a camera has a field-of-view and thus cannot sense the whole circle area. (2) For a given point in monitored region, not all the cameras covering this point can effectively sense an object on it. In most surveillance applications, the dorsal view is generally useless for visual analysis. These differences are calling for novel approaches for coverage in camera sensor networks. In this section, we define the K-coverage in camera sensor networks as the Directional K-Coverage (DKC) in view of the directionality of sensing model and effective sensing in camera sensor network. Thus, we propose a mathematical model to describe the relation among camera number deployed randomly, the range of effective sensing angle and DKC rate. We can evaluate that for a given region, how many cameras need to be deployed randomly to perform face orientation detection.

2.4.2 Collaborative Face Orientation Detection The approach proposed in the work of Chang and Aghajanet (2006) is based on a networked camera setting, in which the cameras collaborate on detecting the best frontal view by exchanging soft information each one extracts from its view to the person’s face. In addition to solving the detection problem of determining the best frontal face view, the proposed approach also formulates an estimation problem in which the orientation angles of all the face images are found by further exchange of local estimates among multiple cameras. A typical scenario of collaborative face orientation detection is illustrated in Fig. 2.21. We place three cameras at different points of view in front of a given face. Each camera finds candidates for eyes and/or face with primitive image processing. This includes a simple segmentation scheme, a skin color detection method with geometric constraints, and using a function of chrominance and weighted positioning on the face candidate to detect eye candidates. These techniques are developed to be of low computational complexity nature, allowing them to be suitable for in-node processing. Due to their rather simple design, each individual scheme running at each camera node may not produce a unique candidate for the face and eye features sought in the body mask. However, the fact that each camera employs multiple methods to detect the feature candidates, and that the camera nodes exchange their soft information with each other, allows the network to enhance its detection

2.4 Directional K-Coverage for Target Recognition

37

Fig. 2.21 A scenario of face orientation detection using three cameras

accuracy as well as confidence level and produce accurate results describing the orientation of each facial view. The probability of those candidates for eyes and/or face being correct is measured by a goodness figure. Hence, we can infer that multiple cameras, which are within a proper range of angles, can capture a goodness figure, and perform collaborative face orientation detection.

2.4.3 Problem Description In this section, we will define directional K-coverage issue. Based on the directional sensing model of camera proposed in our prior work (Ma and Liu 2007), we propose the concept of effective sensing.

2.4.3.1

Effective Sensing in Video Surveillance

In video surveillance applications, although a camera can cover a certain point, but the pictures captured by this camera cannot always meet surveillance requirements. During the process of face orientation detection using multiple cameras, only in a frontal or near-frontal face view, the camera can locate face and/or eyes. Hence, a camera which capturing frontal or near-frontal face views can perform effective sense. −−−→ As shown in Fig. 2.22, we define β = arg Lt LC (0 ≤ β ≤ 2π ), where LC is the location of camera C and Lt is the location of the target’s centroid. As − → stated earlier, the orientation of an object is represented by a random unit vector Vf  − → and the orientation angle of a face γ = arg Vf is distributed uniformly in the range [0, 2π ). A camera can capture the frontal part of a face having centroid at Lt

38

2 Directional Sensing Models and Coverage Control

Fig. 2.22 Geometry relation between camera and face orientation

whenever the orientation angle of the face γ ∈ SC , where SC is expressed as: SC = {∂|β − θ ≤ ∂ ≤ β + θ } . where SC represents the set of all orientation angles for an object having centroid at Lt for which C can capture the frontal part of an object. The value of θ determines the range of effective angles, and is influenced by the precision of image recognition technology. Therefore, camera C capturing the frontal part of a face meets two conditions: First, the location of object Lt can be covered by the FOV (field-of-view) of C. Second, the orientation angle of a face is within the range of SC . Definition 2.8 (Effective sensing) A camera C is said to effectively sense a face F at Lt , if and only if the following two conditions are met: −−−→ − → −−−→ −−−→ (1) LC Lt ≤ r and LC Lt · VC ≥ LC Lt cos α; (2) γ ∈ SC where SC = {∂|β − θ ≤ ∂ ≤ β + θ }. Definition 2.9 (Effective K-coverage.) A point Lt is said to be K-covered effectively if and only if any orientation of a face at point Lt can be sensed effectively by at least K cameras.

2.4.3.2

Directional K-Coverage (DKC) Problem

In traditional sensor networks, the definition of K-Coverage is that each point in the monitored region should be covered by at least K sensors. However, during the process of collaborative face orientation detection, each point in the monitored region needs to be effectively covered by at least K cameras (Assume that each process of face orientation detection requires at least K cameras to participate in).

2.4 Directional K-Coverage for Target Recognition

39

K-coverage issue in camera sensor networks is characterized by two features: (1) Different from conventional sensing models, the directional sensing model has a finite angle of view and thus cannot sense the whole circle area. (2) For a given point in monitored region, not all the cameras cover this point can effectively sense an object at it. Only the cameras which are within a range of angle of the frontal view can effectively sense the object. We define the K-coverage in camera sensor networks as Directional K-Coverage (DKC). This issue can be described as follows: Directional K-Coverage (DKC) Problem. Given N homogeneous cameras randomly deployed in the monitored region A, the covering area of each camera i(1 ≤ i ≤ N) is Ai . Solving the minimal N to satisfy that the ratio of the effectively k-covered area to the area A is not less than a required value.

2.4.4 Analysis of Directional K-Coverage To make the problem tractable, we first make the following assumptions while analyzing the DKC problem: A1. All cameras are homogeneous. Specifically, all cameras have the same sensing radiuses (r), offset angles (α); A2. All cameras’ locations are fixed, and there are no two cameras located at exactly the same coordinates in the 2-dimensional plane; A3. Each camera knows its exact location and its sensing direction information; A4. There exists no boundary effect. For camera sensor networks formed by random deployment with a uniform density, it is difficult, if not impossible, to guarantee 100% directional K-coverage of the monitored area even if the node density is very high. Our focus is investigating the coverage problem in camera sensor networks is then set to be probability guarantee. Note that, throughout the rest of the section, unless otherwise mentioned, coverage refers to effective coverage in camera sensor networks. For each point L ∈ A, the probability of covering point L by a camera Ci (1 ≤ i ≤ N ) (the location is a random variable with a uniform density) is Ai / A . According to assumption  A1, we can get that the probability of covering point L  by each camera is αr 2 / A . Assume that there exists a face F at point L, the face orientation (γ ) is uniformly distributed in the range [0, 2π ), the probability of γ ∈ SC is θ/π . Hence, the probability of L being covered by Ci can be denoted as the product of the above two probabilities: pC =

αr 2 θ . A π

(2.11)

40

2 Directional Sensing Models and Coverage Control

  Fig. 2.23 The relation among N , j and pj where αr 2 / A = 0.06, θ = 60◦

  Due to the neglect of boundary effect, the value of αr 2 / A should be small. Otherwise, the error rate of P will be rather great. Therefore, the probability of L being covered by j cameras simultaneously is: pj = CN pC j (1 − pC )N −j . j

(2.12)

  Here, set αr 2 / A = 0.06, θ = 60◦ , thus we can get pC = 0.02. According to Equation 2.12, the relation among N, j and pj can be depicted as Fig. 2.23. X-axis denotes the number of camera N ; Y-axis denotes the value of j (j = 0 means no camera can cover a given point); and Z-axis denotes the value of pj . We can clearly find that when the number of cameras is within [0, 100], the large value of j (> 4) will result in that pj reaches zero. To find out the rule of j and pj , we respectively set three different N(N = 20, 50, 100), and their corresponding curves are shown in Fig.5. When N = 20, the maximum of p0 is 66.76%, and pj will reduce with the increase of j . When N = 50, p0 and p1 approach 37% or so. When N = 100, p0 is merely 13.26%, and the maximum of p1 and p2 is about 27%. We define j (N) is the value of j when pj is the maximum and the number of camera is N. That is: j (N) = arg max pj . So, we can draw a conclusion that the greater N is, the greater j (N ) becomes.

2.4 Directional K-Coverage for Target Recognition

41

According to Equation 2.12, we can get that the probability of each point in monitored region being covered by at least K cameras is: P = pK + pK+1 + . . . + pN K+1 K N = CN pC K (1 − pC )N −K + CN pC K+1 (1 − pC )N −K−1 + . . . + CN pC N

=

N 

m CN pC m (1 − pC )N −m .

(2.13)

m=K

Compared to N, the value of K is very small. Therefore, Equation 2.13 can be transformed as follows: P =1−

L−1 

m CN pC m (1 − pC )N −m .

(2.14)

m=0

From Fig. 2.24 we can see that when j > 4, the value of pj will be rather small. We only discuss the cases of K = 1, 2, 3. The case of K = 1 is a single coverage issue. When K = 1, 2, 3, respectively, we can obtain the following three Equations: P1 = 1 − (1 − pC )N

(2.15)

P2 = 1 − (1 − pC )N − NpC (1 − pC )N −1

(2.16)

P3 = 1 − (1 − pC )N − NpC (1 − pC )N −1 − 0.5N (N − 1)pC 2 (1 − pC )N −2

1.0 N=20 N=50 N=100

.8

.6

.4

.2

0.0

0

2

4

6 j

Fig. 2.24 The relation between j and pj where N = 20, 50 and 100

8

10

(2.17)

42

2 Directional Sensing Models and Coverage Control 1.0

K=1

.8

K=2

K=3

.6

.4

.2

N2

N1

0.0 20

40

60

80

100

N3 120

140

160

180

200

camera number

  Fig. 2.25 The relation between DKC rate P and the number of camera N where αr 2 / A = ◦ 0.06, θ = 60 , and K = 1, 2, 3

Above equations represent the relation between DKC rate and the number of camera N. The corresponding curves are shown in Fig. 2.25. If the required DKC rate is at least 60%, we evaluate the camera number to be randomly deployed are N1 = 46(K = 1), N2 = 101(K = 2) and N3 = 155(K = 3), respectively.

2.4.5 Experimental Results In order to perform empirical evaluation of the mathematical model, we conduct a series of simulations on the developed Senetest 2.0 platform. The simulation parameters are summarized in Table 2.1. To compare experimental results to theoretical values, we run the simulations on three different K(K = 1, 2, 3). Set offset angle θ = 60◦ , thus we can get coverage rate pC = 0.02 according to the default values of A, r and α in Table 2.1 Table 2.1 Parameters setting Parameter Area A Coverage rate P Camera number N Camera radius r Offset angle α Offset angle θ K

Default 500 pixel * 500 pixel 1 200 120 pixel 60◦ 60◦ 3

Variation 500 pixel * 500 pixel 0–1 20–200 120 pixel 60◦ 60◦ 1–3

2.5 L-Coverage for Target Localization

43

and Equation 2.11. For each K, we get different N with the interval of 20 within [20, 200]. Simulations are run for 50 times for each N value. The average Kcoverage rates are represented with the corresponding curves in Fig. 2.26a–c. Here, we measure the error rate of coverage rate with Ps /Pt marked as δ, where Ps is the experimental value of probability of DKC and Pt is the theoretical value. From Fig. 2.26, we can see that experimental results are slightly less than theoretical values. This is because boundary effect (i.e. some cameras in boundary area will cause coverage loss). In Fig. 2.26a, when camera number is about 100, the error rate of coverage achieves the maximum (δ = 10.38%). While when coverage rate is great (>85%), the boundary effect will be weaken (δ = 4.61%) due to the increase of camera number. When K = 2, 3, the error rate of coverage also increase with the increase of camera number. When camera number is 200, δ respectively are 11.22% and 13.16% in Fig. 2.26b and c. As a result, we can conclude that the increase of K will cause the slight increase of the maximal error rate of coverage. For K = 2, 3, once the expected coverage rate is greater than 70%, the error rate approaches 10% or so. So, when we evaluate the camera number need to be deployed, we should take this part of error rate into account. The value of θ is influenced by precision of image recognition technology. That is, the greater angle between frontal face and lateral face that face detection algorithms can detect is, the greater θ will become. Here, we set θ ∈ [30, 90], camera number N = 150, K = 3, and other parameters are referred to Table 2.1. According to Equations 2.11 and 2.17, we can obtain the relation between P and θ , which can be expressed as the function P = f (θ ). Meanwhile, we get different θ with the interval of 10◦ within [30◦ , 90◦ ]. Simulations are run for 50 times for each θ value. The average coverage rates are depicted in Fig. 2.27. From the function curve above, we can find that with the increase of θ , P will increase; while its grown rate will decrease slowly. All simulation results are closely matched with the theoretical analysis. The increase of the error rate (between simulation results and theoretical values) would be mainly caused by boundary effect. When θ = 80◦ , the error rate approaches the maximum δ = 15.79%.

2.5 L-Coverage for Target Localization 2.5.1 Motivation In this section, we investigate the coverage problem from the perspective of target localization for multimedia sensor networks. We formulate the problem of Lcoverage in the framework of parameter estimation. Camera sensors cooperate to make an estimate of the target’s location. In the deployment region of a given camera sensor network, if the location of a point can be reliably estimated, then this point can be claimed to be L-covered. Wang et al. (2005, 2007) also formulated the coverage problem in the framework of parameter estimation, and they proposed the notion of information coverage based on BLUE (Best Linear Unbiased Estimator).

44

2 Directional Sensing Models and Coverage Control

Fig. 2.26 Comparison between experimental results and theoretical values of K-coverage rate, where K = 1, 2, 3 and A = 500pixel ∗ 500pixel, r = 120pixel, α = 60◦ , θ = 60◦

100

Coverage probability (%)

80

60 experimental results theoretical results

40

20

0

20

40

60

80

100

120

140

160

180

200

140

160

180

200

Camera number

(a) 100 experimental results theoretical results

Coverage probability (%)

80

60

40

20

0

20

40

60

80

100

120

Camera number

(b) 100 experimental results theoretical results

Coverage probability (%)

80

60

40

20

0 20

40

60

80

100

120

140

160

180

200

Camera number

(c)

Compared to the information coverage, our proposed L-coverage in camera sensor networks has two main differences as follows: • Information coverage is the measure of quality for target/event existence detecting, and the L-coverage is the measure of quality for target localization.

2.5 L-Coverage for Target Localization Fig. 2.27 The relation between θ and DKC rate, where N = 150, K = 3, A = 500pixel ∗ 500pixel, r = 120pixel, α = 60◦

45 100 experimental results f (T)

80 60 40 20 0

30

40

50

60 T

70

80

90

• The sensing model of information coverage satisfies that the parameters of a target/event decay with distance linearly. The sensing model of our L-coverage is based on the perspective projection model and the camera noise model, which are nonlinear. Hence, we proposed a localization-oriented sensing model for camera sensors by taking the perspective projection model and the camera noise model into account (Liu et al. 2011). Based on this sensing model, we define the notion of L-coverage by using Bayesian estimation theory. Furthermore, we analyze the relationship between L-coverage probability and the density of camera sensors under the random deployment, and derive the density requirement of camera sensors for a given L-coverage probability. For the same deployment region of a camera sensor network, we compare the L-coverage probability to the classic K-coverage probability which is the measure of detecting quality. Our derived results can be used in both the initial deployment phase and the dynamic reconfiguration phase after camera sensors have been deployed, which are detailed as follows: • In initial deployment phase: Using our results of L-coverage, an appropriate number of camera sensors can be derived, in order to guarantee that most of points in the deployment region are L-covered. • In dynamic reconfiguration phase: For a given deployment of camera sensor networks, our results of L-coverage can measure the quality of localization for the deployment region, and then suggest future deployment or reconfiguration schemes for improving the overall quality of target localization.

2.5.2 Localization-Oriented Sensing Model As shown in Fig. 2.28a, given a number of randomly deployed camera sensors, our goal is to locate a target as accurately as possible in a ground plane. In order to achieve this goal, we first study how to use camera sensors to locate a target, and

46

2 Directional Sensing Models and Coverage Control

X

Fig. 2.28 (a) Schematic of target localization in a camera sensor network. There are several camera sensors deployed in a surveillance region, and a target is in the center of this region. (b) The image captured by the camera sensor which is indicated in the dotted ellipse. The distance, X, from the vertical centerline of the target blob to the centerline of image is the observation measurement by this camera sensor for target localization

then build a localization-oriented sensing model for camera sensors. For simplicity, we make the following assumptions. A1. All camera sensors follow the same sensing model. We assume that the camera sensors are modeled by the perspective projection, and all camera sensors have the same shape of FOV (field-of-view) region. Additionally, all noises are Gaussian noises with zero mean. A2. The camera sensors can observe a moving target synchronously (Wang et al. 2003). If the target moves with a limited speed, the synchronization can be readily implemented by using the methods proposed in the work of Elson et al. (2002). A3. The message functions and transmissions of messages introduce no information loss (Wang et al. 2007). In other words, the quantization/modulation/encoding for measurements and the transmission channels are lossless.

2.5 L-Coverage for Target Localization

47

Fig. 2.29 Perspective projection model

θ Table 2.2 Parameters of perspective projection Parameters T (xt , yt ) Li (xi , yi ) X(Lt ) θi F

Descriptions Location of target in ground plane Location of camera sensor ci in ground plane Horizontal shift of the target’s image from the center of the image plane Rotatable angle of camera sensor ci around x-axis, i.e., the orientation of ci Focal length of camera sensor

When a camera sensor captures a image frame, it can employ background subtraction1 (Piccardi 2004; Kim et al. 2005) to identify the moving target. As shown in Fig. 2.28b, the area of an image frame where there is a significant difference between the observed and estimated images determines the location of a moving object in this image plane. The area containing the change in the frame is further processed to find the horizontal shift, denoted by X, of the target’s image from the center of the image plane. In our localization scheme, X is the observation measurement of the camera sensor, and only X is communicated to the central processor (sink node). Let T (xt , yt ) be the location of a target. For a given camera sensor ci , we can get the theoretic horizontal shift, denoted by Xi , of the target image by using the perspective projection model. As shown in Fig. 2.29, the relationship between Xi and T = T (xt , yt ) is  yt − yi Xi = F · tan θi − arctan . xt − xi

(2.18)

The descriptions of parameters used in Eq. (2.18) are summarized in Table 2.2.

1 Background

subtraction is a commonly used technique for segmenting out objects of interest in a scene for applications such as video surveillance.

48

2 Directional Sensing Models and Coverage Control

When the distance between the target location T and the camera sensor location Li becomes far enough, the background subtraction cannot segment out the objects of interest. This implies that the camera sensor ci cannot detect the target on T . Let r be the maximum detecting distance. Because r  F, we employ a sector model to describe the sensing region of a camera sensor. Here, we use Di to denote the sensing region of ci . If a point belongs to Di , then the point can be detected by ci . As  − →  illustrated in Sect. 2.1, the sector model can be denoted by a 4-tuple Li , r, Vi , α , − → where Vi is the unit vector, which evenly splits the sensing sector into two halves, determining sensing direction,2 and α is the offset angle in the field of view on both − → sides of Vi . The point T is said to be covered if and only if (1) |Li − T | ≤ r and (2) −−→ − → the angle between Li T and Vi is within [−α, α], where |Li − T | is the Euclidean distance between Li and T . From Eq. (2.18), we can obtain  yt − yi Xi . = tan θi − arctan xt − xi F

(2.19)

If there is only one measurement, i.e., there is only one camera sensor that can detect the target, then the values of xt and yt cannot be uniquely determined because there are two unknowns with just one equation Eq. (2.19). Thus, we need at least two measurements to uniquely determine the location of the target. Similar to the Kcoverage in literatures, the target is K-covered in camera sensor networks if it falls in at least K sector-based sensing regions. Theoretically, from Eq. (2.19) it appears that the L-coverage problem is equal to the K-coverage problem where K = 2. However, this statement is not true, because the projection model used in Eq. (2.18) is just an ideal model. The measurement Xi of camera sensor ci can be corrupted by some additive noises in practice. This implies that only two or even more than two measurements cannot guarantee the target to be located accurately when the noises are large (Thus, K-coverage with K = 2 is not L-coverage). These noises mainly come from two aspects: the sensing model of camera sensors and the processing of background subtraction. Referring to Ercan et al. (2006), we assume that the measurement error variance, denoted by σi2 , for camera sensor ci is of the following form: σi2 = ζ di2 + σp2 + σs2 .

(2.20)

In Eq. (2.20), di is the distance from the camera sensor ci to the target. Making camera noise variance dependent on distance can efficiently model the weak perspective projection while allowing the usage of projective model in Eq. (2.18). Our noise model takes the errors in the calibration of camera sensors into account. Errors in the location of camera sensor ci are contained in σp2 , and errors in the 2θ i

− → is the directional angle of Vi .

2.5 L-Coverage for Target Localization

49

sensor’s orientation are measured in ζ . Moreover, the accuracy of background subtraction method and posture/motion of the target also cause errors, and these errors are reflected in σs2 . Therefore, we adopt the Gaussian error model to represent the relationship between the measurement Xi of camera sensor ci , and the location T of the target. Then, the conditional probability density function of the random measurement variable Xi given T is determined as follows: 

(Xi − Xi )2 f (Xi | T ) = √ exp − 2σi2 2π σi 1

 .

(2.21)

2.5.3 Bayesian Estimation Based L-Coverage 2.5.3.1

L-Coverage Concept

Consider a deployment field S and a set of N geographically distributed camera sensors. Let T ∈ S be the location of a target. If T can be detected by k (0 ≤ k ≤ N) camera sensors simultaneously, then we call these k camera sensors {c1 , c2 , . . . , ck } a detecting set, denoted by Ck , of the camera sensors for T . This implies that k measurements are available. Assume that the priori probability distribution of T obeys the uniform distribution in S. Then, for an arbitrary point t (x, y) in the twodimensional space, the probability density function of T is ⎧ ⎨ 1 , t ∈ S; f (t) = S ⎩ 0, t∈ / S,

(2.22)

where S denotes the area of S. According to Eq. (2.21), we have the measurement expression as follows: Xi = Xi + ei , ∀i ∈ {1, 2, . . . , k}, where ei is the additive noise of Xi , and ei follows the normal distribution of N (0, σi ). Let X  (X1 , X2 , . . . , Xk ) be an arbitrary point in the k-dimensional space of (X1 , X2 , · · · , Xk ). Because the measurements (X1 , X2 , · · · , Xk ) are i.i.d., from Eq. (2.21) we obtain

f (X|t) =

k  i=1

f (Xi |t) =

⎧ k ⎪ ⎨ ⎪ ⎩ i=1 0,



1 2π σi

e



(Xi −Xi )2 2σi2

,if k cameras detect t; otherwise.

(2.23)

50

2 Directional Sensing Models and Coverage Control

According to Eqs. (2.22), (2.23), and the Bayesian formula, we get: f (X|t) f (X|t)f (t) = . S f (X|t)f (t)dxdy S f (X|t)dxdy

f (t|X) =  

(2.24)

Let Tk  ( x,  y ) and Tk  |Tk − T | denote the estimate and the estimation error for a given measurement (X1 , X2 , · · · , Xk ), respectively, where | Tk − T | is the Euclidean distance between Tk and T , i.e., Tk = |Tk − T | = ( x − xt )2 + ( y − yt )2 . The mean square error (MSE) is a common measure of estimator quality. A well-known Bayesian estimator can be applied to estimate Tk and to achieve the minimum MSE. Then, we have the following lemma. Lemma 2.1 The Minimum MSE estimator specified by Eq. (2.24) is determined by    xf (X|t)dxdy yf (X|t)dxdy S Tk = ( ,  S x,  y) =   . S f (X|t)dxdy S f (X|t)dxdy

(2.25)

Proof The MSE of Tk is x )2 + (yt −  y )2 ] MSE(Tk ) = E[(xt −      x − xt )2 + ( y − yt )2 f (t|X)dxdy. = ( S

To minimize the MSE(Tk ), we take its partial derivative over  x and  y , respectively, as follows:     x − xt )2 + ( y − yt )2 f (t|X)dxdy  S (   ∂ x − xt )2 + ( = y − yt )2 f (t|X) dxdy S ∂ x ( = −2 (x −  x )f (t|X)dxdy.    S t 2  ∂ x − xt ) + ( y − yt )2 f (t|X)dxdy ∂ y  S (   ∂ = x − xt )2 + ( y − yt )2 f (t|X) dxdy ( S ∂ y = −2 S (yt −  y )f (t|X)dxdy. ∂ ∂ x

Setting the above partial derivatives to be zeros and solving them, respectively, we get !

  x =  S xf (t|X)dxdy,  y= S yf (t|X)dxdy.

Then, substituting Eq. (2.24) into Eq. (2.26), we obtain Eq. (2.25).

(2.26)  

2.5 L-Coverage for Target Localization

51

We use the mean of Tk , denoted by δk , to measure how well the point T is located by Ck , which is determined by δk  E[Tk ] =

 

Tk f (X|t)dX,

(2.27)

where Tk = |Tk − T |,  is the k-dimensional real-number space of (X1 , X2 , · · · , Xk ), and the conditional pdf f (X|t) is a function of r, α, and the other parameters. The smaller δk is, the more accurate the estimate is. We assume that the accuracy of localization satisfies the requirement if δk is not larger than a predefined threshold ε, i.e., δk ≤ ε. Then, we can define the notion of L-coverage as follows: Definition 2.10 Localization-oriented coverage, also called L-coverage. A point is said to be L-covered if there exist k camera sensors to estimate the location of this point, and the mean estimate error δk satisfies δk ≤ ε, where 0 < k ≤ N . Remarks on Definition 2.10: The value of ε is determined by the localization application. On the other hand, ε is generally relevant to the sensing radius r. For example, if r = 100 and ε = 100, the localization accuracy is relatively low; if r = 1000 and ε = 100, the localization accuracy is relatively high. Then, we also define the ratio variable a  ε/r to measure the requirement of localization accuracy. Next, we will use an example to further illustrate the proposed L-coverage model.

2.5.3.2

L-Coverage Illustrations

As shown in Fig. 2.30, we deploy 10 camera sensors in a rectangular region. The values of related parameters are listed in Table 2.3. In Fig. 2.30, ci (xi , yi , θi ) denotes the location and orientation of camera sensor ci . There are two cases that a camera sensor cannot detect the target on T : (1) T is out of this camera sensor’s AOV (angle of view), see camera sensor c4 in Fig. 2.30; or (2) the distance between T and the camera sensor exceeds r, see c5 in Fig. 2.30. Let ui denote the horizontal pixel coordinates of the target for camera sensor ci . We use three camera sensors c1 , c2 , and c3 , to make measurements, and thus the corresponding horizontal pixel coordinates of the target are u1 = 140, u2 = 1055, and u3 = 990, respectively, as shown in Fig. 2.30. We first need to transform these pixel coordinates ui of the horizontal shifts into the real-world coordinates Xi . Because the resolution of these camera sensors is 1280 × 960 and the size of the Charge Coupled Device (CCD) is 8.8mm×6.6mm, the transformation formula is as follows:  8.8 1280 × . (2.28) Xi = ui − 2 1280

52

2 Directional Sensing Models and Coverage Control

Fig. 2.30 A scene for illustration of L-coverage. There are 10 randomly deployed camera sensors and a target T in a surveillance region. We use three camera sensors, c1 , c2 , and c3 , to capture the images of the target. The ci (xi , yi , θi ), i = 1, 2, 3 denote the location and orientation of camera sensor ci , and ui , i = 1, 2, 3 denote the horizontal pixel coordinates of the target Table 2.3 The values of related parameters. We use Sony DSC-F717 as the camera sensor in the experiments

Parameters F CCD Angle of view r T (xt , yt ) ζ σp σs ε

Values 9.45 mm 8.8 × 6.6 mm 100◦ 4000 mm (1950 mm, 650 mm) 5 × 10−8 0.1 0.1 1000 mm

Next, we determine whether T is L-covered or not by using 3 different detecting sets of camera sensors, respectively. Case I: C1 = {c1 } We first use one measurement to estimate the location of target. According to Eqs. (2.18) and (2.20), X1 (T ) = −3.15 and σ1 = 0.48, respectively. Substituting u1 = 140 into Eq. (2.28), we get X1 = −3.4375. According to Eq. (2.24), we get the probability distribution function f (t| − 3.4375). When X1 = −3.4375, using Eq. (2.25) the corresponding minimum MSE T1 = (4774.89, 1460.8). According to f (X1 |(1950, 650)) and Eq. (2.27), we get δ1 =

2.5 L-Coverage for Target Localization

53

2953 mm. Because δ1 exceeds the threshold ε = 1000 mm, T cannot be L-covered by the camera sensor c1 . Case II: C2 = {c1 , c2 } The detecting set C2 = {c1 , c2 } implies that we locate the target by combing the measurements of c1 and c2 . From Eqs. (2.18) and (2.20), we can get σ1 = 0.48, σ2 = 0.48, X1 (T ) = −3.15, and X2 (T ) = 3.15. Substituting u1 = 140 and u2 = 1055 into Eq. (2.28), the measurements of c1 and c2 are −3.4375 and 2.8531, respectively. The corresponding T2 = (3584, 753). According to f (X1 , X2 |(1950, 650)) and Eq. (2.27), we get δ2 = 1607 mm. In this case, δ2 also exceeds the threshold 1000 mm, and thus T cannot be L-covered by {c1 , c2 }. Case III: C2 = {c1 , c3 } In this case, we locate the target by combing the measurements of c1 and c3 . From Eqs. (2.18) and (2.20), σ1 = 0.48, σ3 = 0.75, X1 (T ) = −3.15, and X3 (T ) = 1.89. Substituting u1 = 140 and u3 = 990 into Eq. (2.28), X1 = −3.4375 and X3 = 2.4063. The corresponding T2 = (2246.95, 820.089). According to f (X1 , X3 |(1950, 650)) and Eq. (2.27), we get δ2 = 261 mm< 1000 mm. This implies that T can be L-covered by {c1 , c3 }.

2.5.4 L-Coverage Probability in Randomly Deployed Camera Sensor Networks We consider the random deployment where camera sensors are randomly scattered within a vast two-dimensional geographical region, and their locations are uniformly and independently distributed in the region. Under this deployment strategy, the locations of camera sensors can be modeled by a two-dimensional stationary Poisson point process with intensity λ. This indicates that the number, N(S ), of camera sensors in any sub-region S follows a Poisson distribution with a parameter λ S , where S is the area of S . Let k be a positive integer, the probability that N (S ) is equal to k is then given by " # (λ S )k −λ S Pr N(S ) = k = e . k!

(2.29)

Moreover, we assume that the orientation of each camera sensor is a random variable with the uniform distribution on [0, 2π ], i.e., θ ∼ U (0, 2π ). Let L(T ) be the indicator function indicating whether a point T is L-covered or not, i.e., ! L(T ) =

1, if T is L-covered; 0, if T is not L-covered,

54

2 Directional Sensing Models and Coverage Control

and let AL be the L-covered area in the deployment region S. Then, we have  AL =

L(T )dT . S

We can define the L-coverage probability as follows: Definition 2.11 L-coverage probability, denoted by PL . In a deployment field S, the ratio between the mean L-covered area and the area of S is said to be the Lcoverage probability of S, i.e., PL 

E[AL ] , S

0 ≤ PL ≤ 1.

Remarks on Definition 2.11: By using the Fubini’s theorem (Thomas and Finney 1996) and exchange the order of integral and expectation (Wang et al. 2007), we can get the expected value of AL as follows:  E[AL ] =

 E[L(T )]dT =

S

Pr{L(T ) = 1}dT S

= S Pr{L(T ) = 1},

(2.30)

where Pr{L(T ) = 1} is constant for all T ∈ S. According to Eq. (2.30) and Definition 2.11, the coverage probability is equal to the probability that T is Lcovered, i.e., PL = Pr{L(T ) = 1}.

(2.31)

The L-coverage probability PL = 1 implies that S is completely L-covered. However, because the deployment of camera sensors follows the Poisson point process, it is difficult to guarantee PL = 1 for a finite density λ. In this section, we mainly focus on the relationship between the L-coverage probability and the density of camera sensors. From Definition 2.10, a point T ∈ S being L-covered by k camera sensors implies that (1) there exist k camera sensors which can detect T ; and (2) the corresponding δk of these k camera sensors is not larger than the predefined threshold ε. Let NT be the number of camera sensors which can detect T . According to Eq. (2.31), we have PL =

∞ 

Pr{NT = k}Pr{δk ≤ ε},

(2.32)

k=1

where Pr{NT = k} is derived by the following lemma. Lemma 2.2 If camera sensors are modeled by a two-dimensional stationary Poisson point process with intensity λ, then the probability that there are k camera

2.5 L-Coverage for Target Localization

55

sensors which can detect T is given by k  λαr 2 −λαr 2 Pr{NT = k} = . e k!

(2.33)

Proof From the sensing model of camera sensors, it is easy to know that if a camera sensor can detect T , then the location of this camera sensor must be in the disk, denoted by R, which is centered around T with radius r. On the other hand, not all camera sensors in R can detect T because of their orientations. Assume that there are n camera sensors in R, from Eq. (2.29), we have Pr{NR = n} =

(λπ r 2 )n −λπ r 2 e , n!

where NR is the number of the cameras sensors in R. The probability that a camera sensor in R can detect T is α/π . Then, the probability that k (k ≤ n) of these n camera sensors can detect the target T is   α k  α n−k n Pr{NT = k|NR = n} = 1− . k π π Then, Pr{NT = k} =

∞ 

Pr{NR = n}Pr{NT = k|NR = n}

n=k

=

∞  (λπ r 2 )n n=k

n!

e−λπ r

2

 α k  π

1−

α n−k π

 n k

k  λαr 2 −λαr 2 e , = k!  

which is Eq. (2.33).

From Lemma 2.2, we can readily get the expression of K-coverage probability for camera sensor networks. In the literature, a point is K-covered if it is covered by at least K sensors. Then, in camera sensor networks, the K-coverage probability, denoted by PK , is i ∞   λαr 2 −λαr 2 PK  Pr{NT ≥ K} = e i! i=K

=1−

K−1  i=0

 i λαr 2 −λαr 2 . e i!

(2.34)

56

2 Directional Sensing Models and Coverage Control

Fig. 2.31 Relationship between Pr{δ1 ≤ ε} and a with different r. We set ζ = 5 × 10−8 , σp = 0.1, and σs = 0.1

However, according to Eq. (2.27), it is difficult to derive the closed-form analytical expression for Pr{δk ≤ ε}. Next, we study Pr{δk ≤ ε} by using Monte Carlo simulations. Let T (0, 0) be the location of a target. We randomly deploy one camera sensor c in the disk centered around T with the radius r. Let Lc be the location of c, and γc −−→ be the orientation of Lc T . In order to detect T , the orientation θc of camera sensor c satisfies the random uniform distribution on [γc −α, γc +α]. According to Eq. (2.27), we can get the corresponding δ1 . Assume that ζ , σp , and σs are fixed, above process is repeated 1000 times to obtain 1000 δ1 ’s. Define a  ε/r and let it vary from 0 to 0.25. For each value of a, we can get the total number, denoted by NL,1 , of δ1 ’s which are not larger than corresponding ε. Then, Pr{δ1 ≤ ε} approximates the ratio of NL,1 over 1000. Figure 2.31 plots the statistical results of 1000 δ1 ’s. From Fig. 2.31, we can observe that about 80% δ1 ’s are larger than r/10. For most applications of camera sensor networks, δ1 is much larger than ordinary requirements. On the other hand, Eq. (2.19) shows that it is impossible to derive the unique (xt , yt ) by using just one measurement. The above observations imply that the point which is covered by only one camera sensor cannot generally be L-covered. Therefore, we can have the following property: Property 2.1 The point which is detected by only one camera sensor is not Lcovered. Next, we randomly deploy two camera sensors, c1 and c2 , in the disk centered around T with radius r. Their orientations, θ1 and θ2 , satisfy the random uniform distribution on [γ1 − α, γ1 + α] and [γ2 − α, γ2 + α], where γ1 and γ2 are the −−→ −−→ orientations of L1 T and L2 T , respectively. Then, we can get the corresponding δ2 according to Eq. (2.27). We repeat above process 1000 times to obtain 1000 δ2 ’s. Let a vary from 0 to 0.25 per 0.025 steps. For each value of a, we can get the total

2.5 L-Coverage for Target Localization

57

Fig. 2.32 Relationship between Pr{δ2 ≤ ε} and a with different r. We set ζ = 5 × 10−8 , σp = 0.1, and σs = 0.1

number, denoted by NL,2 , of δ2 ’s which are not larger than corresponding ε. Then, Pr{δ2 ≤ ε} approximate the ratio of NL,2 over 1000. Figure 2.32 plots the statistical results of 1000 δ2 ’s. As shown in Fig. 2.32, when r = 4000, about 80% δ2 ’s are smaller than r/10. This implies that if the requirement of localization accuracy is not very strict, then the probability that a point is Lcovered by two camera sensors, i.e., Pr{δ2 ≤ ε}, is high. Furthermore, we can also obtain the following observations from Fig. 2.32: • When a is at the lower end, Pr{δ2 ≤ ε} increases quickly as a increases; when a is at the higher end, Pr{δ2 ≤ ε} increases slowly as a increases. • For a fixed a, Pr{δ2 ≤ ε} decreases as r increases. When k ≥ 3, it is complicated to calculate δk according to Eq. (2.27), because the dimensions of  are large. There is a property for Pr{δk ≤ ε} that Pr{δk ≤ ε} increases as k increases, i.e., Pr{δk ≤ ε} < Pr{δk+1 ≤ ε}. This is because the more camera sensors for estimation, the lower the estimation error. Thus, we have Pr{δ2 ≤ ε} < Pr{δk ≤ ε} ≤ 1, for k > 2. Then, according to Eq. (2.34), we can obtain Pr{δ2 ≤ ε}P2 < PL < Pr{δ2 ≤ ε}Pr{NT = 2} + P3 ,

(2.35)

where P2 = 1 − e−λαr − λαr 2 e−λαr , 2 2 P3 = 1 − e−λαr − λαr 2 e−λαr − 2

2

(λαr 2 )2 −λαr 2 e . 2

Figure 2.33 plots Pr{δ2 ≤ ε}P2 and Pr{δ2 ≤ ε}Pr{NT = 2} + P3 against λ with two different values of Pr{δ2 ≤ ε} by taking Pr{δ2 ≤ ε} = 0.8 and 0.5, respectively. From Fig. 2.33, Pr{δ2 ≤ ε}Pr{NT = 2} + P3 approaches to 1 and Pr{δ2 ≤ ε}P2 approaches to Pr{δ2 ≤ ε} = 0.8 or 0.5 when λ goes to infinity. This implies that

58

2 Directional Sensing Models and Coverage Control

Fig. 2.33 The curves of Pr{δ2 ≤ ε}P2 and Pr{δ2 ≤ ε}Pr{NT = 2} + P3 when Pr{δ2 ≤ ε} = 0.8 and 0.5, respectively

(1) the difference between Pr{δ2 ≤ ε}P2 and Pr{δ2 ≤ ε}Pr{NT = 2} + P3 , denoted by P , approaches to 1 − Pr{δ2 ≤ ε} when λ goes to infinity; (2) P decreases as Pr{δ2 ≤ ε} increases. On the other hand, we have lim

λ→∞

Pr{δ2 ≤ ε}P2 = Pr{δ2 ≤ ε} PL

and Pr{δ2 ≤ ε}Pr{NT = 2} + P3 = 1. λ→∞ PL lim

Therefore, when Pr{δ2 ≤ ε} approaches to 1, we can use Pr{δ2 ≤ ε}Pr{NT = 2}+P3 as the approximation of PL , i.e., PL ≈ Pr{δ2 ≤ ε}Pr{NT = 2} + P3 . In this section, we assume that if Pr{δ2 ≤ ε} > 0.8, then PL ≈ Pr{δ2 ≤ ε}Pr{NT = 2} + P3 . However, when ε is at the lower end, Pr{δ2 ≤ ε} may be smaller than 0.8. Thus, the difference between PL and Pr{δ2 ≤ ε}Pr{NT = 2} + P3 cannot be neglected. Let Pr{δ2 ≤ ε} = ϕ(a) be a function of a where a = ε/r. As shown in Fig. 2.32, we can get the plot of ϕ(a) by using Monte Carlo simulations. Because Pr{δ2 ≤ ε} increases monotonically as a increases, we can define a threshold value for a, denoted by at , as follows: at  inf{a | ϕ(a) ≥ 0.8}. If ε < at r, i.e., a < at , then Pr{δ2 ≤ ε} < 0.8. Define r 

ε . at

(2.36)

2.5 L-Coverage for Target Localization

59

Substituting r = r into Eqs. (2.27), (2.33), and (2.34), we can get the corresponding δ2 , Pr{NT = 2}, and PK with the sensing radius r , denoted by δ2 , Pr{NT = 2}, and PK , respectively. Let ϕ (a) be the function which expresses the relationship between Pr{δ2 ≤ ε} and a. Because Pr{δ2 ≤ ε} decreases as r increases, ϕ (at ) > 0.8. This implies that PL ≈ Pr{δ2 ≤ ε}Pr{NT = 2} + P3 . We can also use Monte Carlo simulations to get the curve of ϕ (a). However, in order to simplify the computation, we use P2 ≈ Pr{NT = 2}Pr{δ2 ≤ ε} + P3 , because Pr{δ2 ≤ ε} > 0.8. Therefore, we can derive the approximated expression for PL as follows:

PL ≈

⎧ 2 2 ⎪ 1 − e−λαr − λαr 2 e−λαr ⎪ ⎪    2 2 ⎪ 2 ⎪ ⎪ − 1 − ϕ εr (λαr2 ) e−λαr , if ε > at r; ⎪ ⎪ ⎨ 

2

⎪ −λα aε ⎪ t ⎪ 1−e ⎪ ⎪  2 ⎪   ε ⎪ 2 ⎪ ⎩ −λα ε e−λα at , at

(2.37) otherwise.

From Eq. (2.37), we can obtain the corresponding density of camera sensors for a given L-coverage probability.

2.5.5 Simulation Experiments We verify our derived model and analytical analyses on the L-coverage in camera sensor networks through simulation experiments. In order to perform empirical evaluations of the L-coverage probability, we have built up a simulation platform by using VC++. The fixed parameters of the simulation platform are as follows: S = 500 × 500, α = π/6, ζ = 5 × 10−8 , σp = 0.1, and σs = 0.1. In each simulation run, we randomly scatter a number of camera sensors according to a 2-dimensional Poisson process with mean λ × 250,000 within S. The number of camera sensors, N, varies from 0 to 1000 per 100 steps. A grid of 500 × 500 vertices is created for S. From Property 2.1, when a point is covered by only one camera sensor, this point cannot be L-covered. This implies that the region which is covered by only one camera sensor is also the vacancy for L-coverage. Then, we can divide these 250,000 vertices into three categories: (1) the vertices covered by 0 or 1 camera sensor, (2) the vertices covered by 2 camera sensors, and (3) the vertices covered by at least 3 camera sensors. Assume that there are n vertices which are covered by 0 or 1 camera sensor. We repeat above process 100 times to obtain the mean of n, E[n], for each value of λ. Then, the simulation result of

60

2 Directional Sensing Models and Coverage Control

Fig. 2.34 Comparisons between the simulation results and the analytical results of Pr{NT < 2}, Pr{NT = 2}, and Pr{NT > 2}, where r = 40

Pr{NT < 2} is computed as the ratio between E[n] and 250,000. By using the same method, we can also get the simulation results of Pr{NT = 2} and Pr{NT > 2}. Figure 2.34 plots the simulation results and analytical results of Pr{NT < 2}, Pr{NT = 2}, and Pr{NT > 2} against λ. As shown in Fig. 2.34, the analytical results of Pr{NT < 2} are always slightly larger than simulation results. The analytical results of Pr{NT > 2} are slightly smaller than simulation results. Pr{NT = 2} increases at first, and then decreases as λ increases. The analytical results of Pr{NT = 2} are slightly larger than simulation results during the increasing process of Pr{NT = 2}, and are slightly smaller than simulation results during the decreasing process. These observations imply that: • For Pr{NT < 2}, Pr{NT = 2}, and Pr{NT > 2}, the simulation results are close to the analytical results; • In order to obtain a given coverage probability, the simulation results for λ are slightly smaller than the analytical results. For each value of λ, we randomly generate 20 different topologies of camera sensors. In each topology, we compute the corresponding δ2 for all the vertices covered by 2 camera sensors. The threshold ε takes the value of 0,1,. . . ,10. Assume that there exist n vertices which satisfy δ2 ≤ ε. Then, we can get the ratio between n and the number of vertices which are covered by 2 cameras. Repeat this process 20 times to obtain the mean of this ratio, i.e., the simulation result of Pr{δ2 ≤ ε}. Table 2.4 summarizes the simulation results and analytical results of Pr{δ2 ≤ ε}. We can also observe that the simulation results are very close to the corresponding analytical results. Set ε = 4, we can get the simulation results of PL by substituting the simulation results of Pr{NT = 2}, Pr{NT > 2}, and Pr{δ2 ≤ ε} into Eq. (2.31). As shown in Fig. 2.35, the simulation results of PL are slightly larger than analytical results. As λ increases, the difference between the simulation result and the corresponding analytical result increases firstly, and then decreases. On the other hand, for each value of λ, we compare the simulation results of P1 and P2 to the simulation result

2.5 L-Coverage for Target Localization

61

Table 2.4 Comparison between simulation results (the numbers in the columns 2 through 11) and analytical results (the numbers in the column 12) of Pr{δ2 ≤ ε} λ (for simulation results) ε

0.0004 0.0008 0.0012 0.0016 0.0020 0.0024 0.0028 0.0032 0.0036 0.0040 Results 100 0.0500 0.0522 0.0426 0.0571 0.0652 0.0500 0.0424 0.0447 0.0468 0.0349 0.0611 200 0.3400 0.2410 0.1782 0.2368 0.2301 0.2421 0.2263 0.2481 0.1959 0.1968 0.2659 300 0.6600 0.5663 0.5798 0.5666 0.5471 0.6184 0.5758 0.6328 0.5205 0.5873 0.6092 400 0.8900 0.8032 0.8032 0.7949 0.8225 0.8395 0.8121 0.8685 0.8041 0.8222 0.8302 500 0.9400 0.8795 0.8936 0.8858 0.8986 0.9132 0.9051 0.9330 0.9035 0.8984 0.8958 600 0.9500 0.9116 0.9335 0.9302 0.9420 0.9395 0.9414 0.9578 0.9240 0.9365 0.9416 700 0.9800 0.9357 0.9574 0.9577 0.9583 0.9632 0.9677 0.9752 0.9561 0.9460 0.9641

800 0.9800 0.9558 0.9734 0.9725 0.9692 0.9816 0.9758 0.9851 0.9649 0.9587 0.9757 900 0.9900 0.9719 0.9814 0.9831 0.9873 0.9895 0.9818 0.9950 0.9766 0.9714 0.9793 1000 1.0000 0.9839 0.9920 0.9873 0.9946 0.9974 0.9859 0.9975 0.9825 0.9873 0.9865

Fig. 2.35 Comparisons among 1-coverage probability (P1 ), 2-coverage probability (P2 ), and L-coverage probability (PL )

of PL , respectively. From Fig. 2.35, we can observe that PL is much smaller than P1 (P1 , i.e., 1-coverage probability, is the ordinary coverage probability for detecting applications). Because Pr{δ2 ≤ 4} is about 80%, PL is slightly smaller than P2 . We can obtain the simulation results of Pr{δ2 ≤ ε}P2 and Pr{NT = 2}Pr{δ2 ≤ ε} + P3 by using the simulation results of Pr{δ2 ≤ ε}, Pr{NT = 2}, P2 , and P3 . Even the difference, P , between Pr{δ2 ≤ ε}P2 and Pr{NT = 2}Pr{δ2 ≤ ε} + P3 increases as λ increases, P is always small when ε is at the higher end (Fig. 2.36a). Therefore, PL approximates to Pr{NT = 2}Pr{δ2 ≤ ε} + P3 . Figure 2.36a also shows that P decreases as ε increases. If ε is small, i.e., the application has a strict restriction on localization accuracy, then Pr{δ2 ≤ ε} is small, and the P is large (Fig. 2.36b). According to Eq. (2.36), we use r = 20 as the sensing radius of camera sensors, and then get the corresponding P2 . As shown in Fig. 2.36b, the difference between P2 and Pr{δ2 ≤ ε}P2 is small, i.e., PL can approximate to P2 . We notice that the simulated Pr{δ2 ≤ ε}P2 is not always less than P2 , which is because of the limited number of simulations.

62

2 Directional Sensing Models and Coverage Control

Fig. 2.36 (a) The differences among Pr{δ2 ≤ 6}P2 , Pr{NT = 2}Pr{δ2 ≤ 6} + P3 , Pr{δ2 ≤ 4}P2 , and Pr{NT = 2}Pr{δ2 ≤ 4} + P3 ; (b) the differences among Pr{δ2 ≤ 2}P2 , Pr{NT = 2}Pr{δ2 ≤ 2} + P3 , and P2 . We set r = 40 and r = 20

2.6 Exposure-Path Prevention for Intrusion Detection in Multimedia Sensor Networks 2.6.1 Motivation This section focuses on the following scenario. Given a network of sensors, we need to determine whether an intruder can traverse through the deployment field from one side to the opposite side such that the sensors do not have coverage of the traversed path. This is the exposure path problem and corresponds to the sensor network’s worst-case coverage. The exposure path measures the ability to move through the sensed field without being discovered. Therefore, we consider the coverage problem in intrusion detecting applications from a different point of view, i.e., how to guarantee there is no exposure path in a sensor-deployed region. In the literature, some existing works focus on the minimal exposure path problem which seeks to find a path between two given points such that the total exposure acquired from the sensors by traversing the path is minimized. Meguerdichian et al. (2001) were the first to identify the importance of computational geometry and Voronoi Diagrams in sensor network coverage. Megerian et al. (2001, 2002) developed algorithms for exposure calculations, specifically for finding minimal exposure paths by using a multi-resolution technique and Dijkstra’s or FloydWarshall’s shortest path algorithms. In the work of Veltri et al. (2003), the authors used the variational calculus to derive the closed-form solution for minimal exposure in the presence of a single sensor, and proposed a grid-based approximation algorithm for finding the minimal exposure path in multiple sensors case. Djidjev (2007, 2010) designed an approximation algorithm for the minimum exposure

2.6 Exposure-Path Prevention for Intrusion Detection in Multimedia Sensor. . .

63

path problem with guaranteed performance characteristics. He also described a framework for a faster implementation of the algorithm. Ferrari and Foderaro (2010) presented an artificial-potential approach for planning the minimum-exposure paths of multiple vehicles in a dynamic environment containing multiple mobile sensors, and multiple fixed obstacles. Moreover, the approach can be used for heterogenous sensor models and meet multiple objectives. Wang et al. (2010) proposed a fuzzy theory based exposure path algorithm to find the worst and best fuzzy information exposure path in wireless sensor networks. The fuzzy information exposure path of a point is defined as the function of the distance between sensors and objects. The higher the fuzzy information exposure, the higher the confidence level that some information of an object is exposed and the more likely the object is monitored. Above works mainly focus on how to find the minimal exposure path in deployed wireless sensor networks. In fact, for many applications, network designers want to know the relationship between exposure paths and the size of wireless sensor network. Because the result is helpful for determining how many sensor nodes should be deployed for an application. On the other hand, all the aforementioned works are based on the omnidirectional sensing model. These methods cannot be applied for directional sensor networks directly, because the directional sensing model has different characters. Then, we need to study the exposure path problem in omnidirectional and directional sensor networks, respectively. On the other hand, sensors may be spread in an arbitrary pattern. As shown in Fig. 2.37a, if the density of sensors gets lower than the critical threshold, it is easy to find some exposure paths in the sensor deployment region. In contrast, if a density becomes higher than the critical threshold, with high probability, the deployment region does not have any exposure path, as illustrated in Fig. 2.37b and c for instance. However, if a density is much higher than the critical threshold, see Fig. 2.37c, it causes vast redundancy of sensor nodes resulting in high implementation complex and cost. Consequently, the critical density threshold is the optimal density.

Fig. 2.37 Illustration of the relationship between exposure paths and sensor density. We study the exposure path problem in the both of omnidirectional and directional sensor networks in this section, and here take the omnidirectional sensor network for example

64

2 Directional Sensing Models and Coverage Control

To solve the exposure path problem, we used the percolation theory (Kesten 1982; Grimmett 1999) to derive the optimal density so as to guarantee the probability that there exists an exposure path converges to 0 while minimizing the density of sensors. As a mathematical theory, the percolation technique was introduced in the work of Broadbent and Hammersley (1957), as a stochastic way of modeling the flow of a fluid or gas through a porous medium of small channels which may or may not let gas or fluid pass. It is one of the simplest models exhibiting a phase transition.3 Percolation is attractive because it reveals the important relationships between probabilistic and algebraic/topological properties of graphs. Percolation theory has been used in the past to study the connectivity of wireless networks (Penrose 1997; Gupta and Kumar 1999; Glauche et al. 2003). Most existing percolation-based schemes for wireless networks apply the continuum-percolation theory, which however suffer from the loose lower and upper bounds on the critical density. To overcome this problem and make the percolation theory applicable to the exposure path problem, we proposed a bond-percolation theory based scheme by mapping the exposure path problem into a bond percolation model. Using this model, we derived the critical densities for both omnidirectional sensor networks and directional sensor networks under random sensor deployment where sensors are deployed according to a 2-dimensional Poisson process (Liu et al. 2013).

2.6.2 System Models and Problem Formulation 2.6.2.1

Sensors Deploying Model

We consider random sensor node deployment where sensors are randomly scattered within a vast 2-dimensional geographical region, and the locations of these sensors are uniformly and independently distributed in the region. Such random deployment can be the result of certain sensor deployment strategies. For example, sensors can be airdropped or launched via artillery in battlefields or unfriendly environments. Under this assumption, the sensor locations can be modeled by a stationary twodimensional Poisson point process with an intensity λ. This indicates that the number, N (S ), of sensors in any sub-region S follows a Poisson distribution with a parameter λ S , where S is the area of S . Let k be a positive integer, the probability that N (S ) is equal to k is then given by " # (λ S )k −λ S Pr N(S ) = k = e . k!

3 Phase

(2.38)

transition is defined as the transformation of a system from one phase to another, whose distinguishing characteristic is an abrupt change in one or more physical properties.

2.6 Exposure-Path Prevention for Intrusion Detection in Multimedia Sensor. . .

65

− → In directional sensor networks, let θ denote the angle of V s . We assume that θ of every sensor is a random variable with the uniform distribution on [0, 2π ], i.e., θ ∼ U (0, 2π ).

2.6.2.2

Continuum Percolation Model-Based Problem Formulation

Let R2 be the 2-dimensional Euclidean plane. The sensor nodes of a homogeneous Poisson process P in R2 , with an intensity λ > 0, can be characterized by a collection of random points. Let S be a 2-dimensional region covered by a sensor. In omnidirectional sensor networks, the sensed region S is a disk (in the ordinary Euclidean metric) with a radius r. A Boolean model driven by a Poisson process P with a density λ and a radius r is denoted by (P, r, λ), and we call it Poisson Boolean disk model in this section. On the other hand, in the directional sensor network, S becomes a sector with a radius r and a central angle α. We call (P, r, α, λ) a Poisson Boolean sector model. As shown in Fig. 2.38, the deployment space is partitioned into two regions, the occupied (covered) region, which is the region covered by at least one disk, and the vacant region, which is just the complement of the occupied (covered) region. Both the occupied and vacant regions consist of a number of different connected components. The connected components in the occupied region and the vacant regions are called occupied components and vacant components, respectively. The occupied component of the origin which has non-empty intersection with the origin of R2 is denoted by W . In the case of vacancy, all definitions are similar, but using the symbol V instead of W . Definition 2.12 Exposure Path. A continuous curve p from one side of deployment region to the opposite side is said to be an exposure path if p belongs to any vacant component. Fig. 2.38 A realization of a Boolean disk model. The shaded area is the occupied region, while the vacant region is represented by unshaded area

66

2 Directional Sensing Models and Coverage Control

In percolation theory, we are mainly interested in unbounded occupied and vacant components. Depending on the density of the underlying Poisson process, the Boolean model is either subcritical – that is, the expected number of Poisson points in the occupied component is finite almost surely – or supercritical – namely, the occupied component contains an infinite number of Poisson points with a strictly positive probability. To formulate this phase transition, we define d(W, λ)  sup{|xy| : x, y ∈ W }, d(V , λ)  sup{|xy| : x, y ∈ V } to be the sizes of W and V , respectively, where |xy| is the distance between x and y in R2 . Then, we define the critical densities, denoted by λc and λ∗c , as follows: λc  inf{λ : Pr{d(W, λ) = ∞} > 0}, λ∗c  sup{λ : Pr{d(V , λ) = ∞} > 0}. The existing work (Meester and Roy 1996) about continuum percolation has proved that λ∗c = λc for (P, r, λ). From above definitions, it is easy to get the following property: Property 2.2 If λ ≤ λc , there exist exposure paths in sensor-deployed region a.s.. We then formulate the problem – exposure path prevention in omnidirectional and directional sensor networks – as follows: for both Poisson Boolean diskmodel (P, r, λ) and Poisson Boolean sector-model (P, r, α, λ), how to calculate the critical density λc = inf{λ : Pr{d(W, λ) = ∞} > 0}? Theorem 4 of Hall (1985) and Theorem 3.10 of Meester and Roy (1996) proved that for a Poisson Boolean model (P, 1, λ) on R2 , the critical density satisfies: 0.174 < λc < 0.843.

(2.39)

Clearly, the lower and upper bounds on λc given by Eq. (2.39) are very loose such that they cannot be used to determine a practically useful density for sensors deploying process. Moreover, Eq. (2.39) is only for Poisson Boolean disk model. This implies that the bounds given by Eq. (2.39) is not suited for Poisson Boolean sector model. To overcome these deficiencies, we apply bond percolation theory to derive the tighter bounds to determine λc for both omnidirectional and directional sensor networks in the following sections.

2.6.3 Bond Percolation Model for Coverage We model the exposure path problem by a 2-dimensional lattice. The √ √ set containing n vertexes, denoted by Ve = {v1 , v2 , · · · , vn }, forms a n × n lattice √ on the unit square region as shown in Fig. 2.39. To ease presentation, we use n to

2.6 Exposure-Path Prevention for Intrusion Detection in Multimedia Sensor. . .

67

Fig. 2.39 The unit square region with a virtual lattice

Fig. 2.40 (a) The edge ei,j is L-closed and U-open. (b) A case of L-coverage lattice. (c) A case of U-coverage lattice

√ approximately represent  n for simplicity. Since our derivations are robust to approximation errors, this approximation does not affect our final results. Let ei,j denote the edge between vertex vi and vertex vj , where i, j ∈ [1, n]. We call any two vertexes connected by a common edge the neighboring vertexes. √ This implies that√ the distance between any two neighboring vertexes is equal to 1/ n, because √ the n × n lattice is on the unit square region. We utilize two different rules to characterize how an arbitrary edge ei,j is open or closed, which yield the lower bound and upper bound of critical sensor density, respectively. Definition 2.13 L-closed/L-open Edge. If at least one point on edge ei,j is covered by a sensor network, then ei,j is called as L-closed edge. Otherwise, if all points on ei,j are not covered, then ei,j is called L-open edge. Definition 2.14 U-closed/U-open Edge. If all points on edge ei,j are covered by a sensor network, then ei,j is called as U-closed edge. Otherwise, if at least one point on ei,j is not covered, then ei,j is called as U-open edge. Remarks on Definitions 2.13 and 2.14: Fig. 2.40a illustrates the difference between Definitions 2.13 and 2.14. The edge ei,j in Fig. 2.40a is partially covered by

68

2 Directional Sensing Models and Coverage Control

two sensors. From Definition 2.13, we have that ei,j is L-closed. On the other hand, ei,j is also U-open according to Definition 2.14. This indicates that the U-closed edge is more strict than the L-closed edge in terms of coverage. We further define two different indicator functions to determine whether an edge is open or closed. For L-closed/L-open edges: ! L(ei,j ) =

0, if all points on ei,j are not covered; 1, if at least one point on ei,j is covered.

For U-closed/U-open edges: ! U (ei,j ) =

0, if at least one point on ei,j is not covered; 1, if all points on ei,j are covered.

Define pl  Pr{L(ei,j ) = 1}, pu  Pr{U (ei,j ) = 1}. Then, we give two related definitions for the bond percolation model for coverage as follows: Definition 2.15 L-Coverage Lattice and U-Coverage Lattice. Let Z2 be the 2dimensional lattice with vertex set Ve and edge set E. Let vi and vj be an arbitrary pair of neighboring √ vertexes in Ve . Set the distance between two neighboring vertexes to be 1/ n. If the edge ei,j is L-closed when at least one point on ei,j is covered, and is L-open when all points on ei,j are not covered, we call Z2 an L-coverage lattice. If ei,j is U-closed when all points on ei,j are covered, and is U-open when at least one point on ei,j is not covered, Z2 is said to be a U-coverage lattice. Definition 2.16 L-Open/U-Open Path and L-Closed/U-Closed Path. A path in Z2 is a sequence of edges e1,2 , e2,3 , · · · , ei,i+1 , · · · , such that for all i ≥ 1, vi and vi+1 are neighboring vertices in Z2 . A path is called the L-open/U-open path if all the edges ei,i+1 in this path are L-open/U-open. A path is called the L-closed/U-closed path if all the edges ei,i+1 in this path are L-closed/U-closed. Remarks on Definitions 2.15 and 2.16: Fig. 2.40b and c illustrate an L-coverage lattice case and a U-coverage lattice case, respectively. The two cases are with the same deployment of sensors. The number of closed edges in Fig. 2.40b is more than that for closed edges in Fig. 2.40c. This implies that we can use L-coverage lattice

2.6 Exposure-Path Prevention for Intrusion Detection in Multimedia Sensor. . .

69

to derive the lower bound of the critical density for sensor networks, and use Ucoverage lattice to derive the upper bound of the critical density.4 The percolation models based on L-coverage lattice and U-coverage lattice are both the bond percolation. Obviously, L-open paths extending from one side to the opposite side in the L-coverage lattice are the exposure path. From Definition 2.16, we further obtain the following property. Property 2.3 If there exists a U-closed path in the U-coverage lattice from one side to the opposite side, there is not any exposure path. Remarks on Property 2.3: If a path is U-closed in the U-coverage lattice, all points on this path are covered. The path extends from one side to the opposite side, this implies that it is in an unbound occupied component a.s.. Referring to definition of λc , here the sensor density exceeds λc . Because λc = λ∗c , the sensor density is also larger than λ∗c . Thus, Pr{d(V ) = ∞} = 0 and there is not any exposure path.

2.6.4 Critical Density for Exposure Path The most important property of the percolation model is that it exhibits a phase transition (Kesten 1982; Grimmett 1999). For the probability, denoted by p, that an arbitrary edge in lattice is closed in a general sense, there exists a threshold value pc ∈ [0, 1], such that the global behavior of the system is significantly different in two regions. For all p > pc , there exists one closed path extending from one side of the system to the other, whereas for all p < pc , no such closed path exists. As proved by Harry Kesten (1980), the critical probability of bond percolation on the square lattice equals 1/2. It took about two decades from the first numerical estimates in 1960 for square bond percolation, over non-rigorous arguments that pc = 0.5 exactly, to its mathematical proof. It is easy to get the following property according to Property 2.3. Property 2.4 If pl < 0.5, then Pr{There exists the exposure path} > 0; if pu > 0.5, then Pr{There exists the exposure path} = 0. Remarks on Property 2.4: In the bond percolation, the probabilities that all edges are open or closed are independent. However, in our coverage percolation model, pl (pu ) of a given edge depends on the neighboring edges. Strictly speaking, 4 In

order to easily distinguish these two scenarios for the lower bound and upper bound of the critical sensor density, we define “L-” and “U-” in above definitions.

70

2 Directional Sensing Models and Coverage Control

coverage percolation proposed in this section is not a bond percolation. On the other hand, pl (pu ) of an edge is independent on most edges in the lattice. Therefore, we can approximate the coverage percolation by using the bond percolation model. Define λl  sup{λ : pl ≤ 0.5}, λu  inf{λ : pu ≥ 0.5}. Then, we derive the expressions of λl and λu in both omnidirectional sensor networks and directional sensor networks as follows.

2.6.4.1

Critical Density of Omnidirectional Sensors

edge of the L-coverage lattice. Let tn be a As shown in Fig. 2.41, ei,j is an arbitrary$ point on ei,j . we define a new operator “

as follows: Ri

%

Rj 

&

Rn ,

∀tn ∈ei,j

where Rn is the disk centered at tn with a radius r. Figure 2.41 illustrates that all $ points on ei,j are not covered if and only if there is no sensor in Ri Rj . Then, we have the following lemma. Lemma 2.3 In omnidirectional sensor networks, λl =

Fig. 2.41 Covered region division of ei,j

− log 0.5 πr2 +

2r √ n

.

(2.40)

2.6 Exposure-Path Prevention for Intrusion Detection in Multimedia Sensor. . .

71

Proof From Eq. (2.38),      %  % Pr N Ri Rj = 0 = exp −λ Ri Rj   2r . = exp −λ π r 2 + √ n Then, we have   2r 2 . pl = 1−Pr{L(ei,j ) = 0} = 1−exp −λ π r + √ n

(2.41)

Equation (2.41) shows that pl increases monotonously as λ increases. From λl = sup{λ : pl ≤ 0.5},   2r 2 = 0.5. exp −λl π r + √ n Thus, we can get Eq. (2.40). However, it is difficult to derive the explicit expression for the probability that all points on ei,j are covered. Then, we need to find an approximation of pu . Lemma 2.4 In omnidirectional sensor networks, if ei,j is an arbitrary edge of the U-coverage lattice, then we have pu > 1 − exp(−λA),

(2.42)

where  A=4

0 1 √ −r 2 n

'

 1 2 r2 − x − √ dx. 2 n

(2.43)

Proof Clearly, the probability that all points on ei,j are covered by one sensor is less than the probability that all points on ei,j are covered by a sensor network, i.e., pu > Pr{all points on ei,j are covered by one sensor}. From Fig. 2.41, we can see that ( all points on ei,j are covered by one sensor if and only if there exists sensor in Ri Rj . From Eq. (2.38),      )  ) Rj > 0 = 1 − exp −λ Ri R j < pu . Pr N Ri Let A be the area of Ri

(

Rj . It is easy to obtain Eq. (2.43) according to Fig. 2.41.  

72

2 Directional Sensing Models and Coverage Control

Remarks on Lemma 2.4: We choose the probability that all points on ei,j are covered by one sensor (denoted by po ) as the approximation of pu . In the rest of this section, we assume that λu = inf{λ : po ≥ 0.5}, and ! U (ei,j ) =

1, if ei,j is covered by one sensor; 0, otherwise.

The following theorem gives the tighter bounds for λc in omnidirectional sensor networks. Theorem 2.1 In omnidirectional sensor networks, we have − log 0.5 πr2

+

2r √ n

< λc < 4

0 1 √ −r 2 n



− log 0.5  r2 − x −

1 √ 2 n

2

.

(2.44)

dx

Proof According to Property 2.4, pl < 0.5 is the sufficient condition of Pr{d(V ) = ∞} > 0. From the definitions of λ∗c and λl , it is easy to show that λl < λ∗c . From Eq. (2.42), pu > 0.5 if 1 − exp(−λA) > 0.5. This implies that 1 − exp(−λA) > 0.5 is also the sufficient condition of Pr{d(W ) = ∞} > 0. From definition of λc , we have −(log 0.5)/A > λc . Because λc = λ∗c , we can get Eq. (2.44) according to Eqs. (2.40), (2.42), and (2.43). √ Remarks on Theorem 2.1: Plugging r = 1, 1/ n = 0.7 into Eq. (2.44), we have 0.1526 < λc < 0.3914 for the Poisson Boolean disk model (P, 1, λ). Our upper bound is much smaller than the upper bound in Eq. (2.39), implying that Theorem 2.1 offers tighter bounds on λ√ c for wireless sensor networks. √ Please note that it is a case for (P, 1, λ) when 1/ n = 0.7. The values of 1/ n and r can effect the accuracy of the bounds in Eq. (2.44). √Because the dependence between neighboring edges will be enhanced when 1/ n is much smaller than r, we let √ 1/√n be a relatively large value. In next section, we study the effects of r and 1/ n on the dependence quantitatively.

2.6.4.2

Critical Density of Directional Sensors

As shown in Figs. 2.41 and 2.42a–d, suppose there is a directional sensor on an $ R . The diskcentered at N with radius r  intersects arbitrary point N(x, y) in R i j  2 2 2 2 X axis at D1 (x− r − y , 0) and D2 (x + r − y , 0). Let d1 = x − r 2 − y 2 and d2 = x + r 2 − y 2 . Let a ∧ b denote the angle between vector a and b. Then, we have the following lemma.

2.6 Exposure-Path Prevention for Intrusion Detection in Multimedia Sensor. . . N

N

D1

D2 vi

O

D1 D2 vj

O

vi

vi

vj

N

N

D1

73

O

D2

D1

vj

vi

O

D2 vj

Fig. 2.42 The relationships among D1 , D2 , vi and vj

Lemma 2.5 In directional sensor networks, if ei,j is an arbitrary edge of the Lcoverage lattice, then we have λl =  Ri

$

log 0.5 . Rj (pn − 1)dσ

(2.45)

where dσ  dxdy is the differential of area, and ⎧ −−→ −−→ ⎪ ⎪ 2π −α− Nvi ∧ ND2 , ⎪ ⎪ ⎪ 2π ⎪ −−→ −−→ ⎪ ⎪ ⎪ 2π −α− ND1 ∧ ND2 ⎪ ⎨ , 2π pn = − − → − − → ⎪ 2π −α− Nvi ∧ Nvj ⎪ ⎪ , ⎪ ⎪ ⎪ 2π ⎪ − − → − − → ⎪ ⎪ ⎪ ⎩ 2π −α− ND1 ∧ Nvj , 2π

if d1 < − 2√1 n and − if −

1 √ 2 n

1 √ 2 n

< d2 < 2√1 n ;

< d1 < d2 < 2√1 n ;

if d1 < − 2 if −

1 √ 2 n

1 √

n

and d2 > 2

1 √

n

(2.46) ;

< d1 < 2√1 n and d2 > 2√1 n .

Proof Let pn denote the probability that a directional sensor on N(x, y) cannot − → cover any points on ei,j r, V s , α).  As shown in . The sensor can be√described as (N, √ 2 2 Fig. 2.42a, when x − r − y < −1/(2 n) and −1/(2 n) < x + r 2 − y 2 < √ 1/(2 n), the node on N cannot cover any point on ei,j if and only if the angle −−→ − → of V s is smaller than the angle of Nvi minus α/2, and is larger than the angle of

74

2 Directional Sensing Models and Coverage Control

$ −−→ N D2 plus α/2. Let pn denote the probability that the directional sensor in Ri Rj cannot cover any point on ei,j . Then, −−→ −−→ 2π − α − Nvi ∧ ND2 . 2π   < x − r 2 − y2 < x + r 2 − y2 < pn =

Similarly, if − 2√1 n

1 √ , 2 n

see Fig. 2.42b, then

−−→ −−→ 2π − α − ND1 ∧ ND2 ; 2π  and x + r 2 − y 2 > 2√1 n , see Fig. 2.42c, then

pn = if x −

 r 2 − y 2 < − 2√1 n

if − 2√1 n

−−→ −−→ 2π − α − Nvi ∧ Nvj ; pn = 2π   < x − r 2 − y 2 < 2√1 n and x + r 2 − y 2 > 2√1 n , see Fig. 2.42d, then pn =

−−→ −−→ 2π − α − ND1 ∧ Nvj . 2π

$ Let Ri Rj be divided into n small enough regions, S1 , S2 , · · · , Sn , dσ  dxdy be the differential of area equal to Sm , 1 ≤ m ≤ n, as n → +∞, and Pm be the probability that there is no sensor which can cover ei,j in Sm , we get Pr{L(ei,j ) = 0}=

n 

Pm =

m=1

=

n 

m=1

*∞ n   (λdσpn )k m=1

k=0

k!

+ e

−λdσ

exp(λdσ (pn − 1)) 

=exp λ

n 

 dσ (pn − 1) .

(2.47)

m=1

When n → +∞,   pl = 1 − exp λ

 Ri

$

(pn − 1)dσ . Rj

(2.48)

2.6 Exposure-Path Prevention for Intrusion Detection in Multimedia Sensor. . .

75

From Eq. (2.48), it is easy to show that pl increases monotonously as λ increases. Solving 





1 − exp λl Ri

$

(pn − 1)dσ

= 0.5,

Rj

 

we obtain Eqs. (2.45) and (2.46).

Lemma 2.6 In directional sensor networks, if ei,j is an arbitrary edge of the Ucoverage lattice, then we have   pu > 1 − exp λ

 Ri

(

Rj

(pn − 1)dσ ,

(2.49)

where ⎧ −−→ −−→ if Nvi ∧ Nvj > α; ⎨ 1,

− − → − − → pn = 2π − α + Nvi ∧ Nvj ⎩ , otherwise. 2π

(2.50)

Proof From Figs. 2.41 and 2.43, we can show that all points on ei,j are covered by ( − → −−→ one sensor if and only if (i) there exists sensor in Ri Rj ; and (ii) V s ∧ Nvi < − → −−→ α/2, V s ∧ Nvj < α/2. Let pn denote the probability that the directional sensor in ( −−→ −−→ Ri Rj cannot cover all points on ei,j . If Nvi ∧ Nvj > α, pn = 1. Otherwise, pn

Fig. 2.43 The relationships among N , vi and vj

−−→ −−→ 2π − α + Nvi ∧ Nvj . = 2π

76

2 Directional Sensing Models and Coverage Control

( Let Ri Rj be divided into n small enough regions, and dσ be the area of each small region. Then, according to the Eq. (2.47), we have Pr{all points on ei,j are covered by one sensor}   

= 1 − exp λ ( (pn − 1)dσ , Ri

Rj

when n → +∞. Because pu > Pr{all points on ei,j are covered by one sensor},  

we obtain Eq. (2.49).

Similar to omnidirectional sensor networks, we also choose the probability, po , that all points on ei,j are covered by one sensor, as the approximation of pu . Then, the following theorem gives the bounds for λc in the directional sensor networks. Theorem 2.2 In directional sensor networks, we have  Ri

$

log 0.5 log 0.5 < λc <  ( .

Rj (pn − 1)dσ Ri Rj (pn − 1)dσ

Proof The proof of Theorem 2.2 is similar to the proof for Theorem 2.1.

(2.51)  

2.6.5 Dependence Among Neighboring Edges Obviously, pl (pu ) of ei,j is dependent on pl ’s(pu ’s) of the neighboring edges. In this section we study how to quantitatively measure the dependence between e1,2 √ and e2,3 in terms of pl and pu , and reveal the relationships among r, 1/ n, and the dependence. The mutual information of two variables is a quantity that measures the mutual dependence between the two variables X  L(e1,2 ) and Y  L(e2,3 ). From the definition of mutual information, we have    PXY (x, y) , (2.52) I (X, Y ) = PXY (x, y) log PX (x)PY (y) x∈{0,1} y∈{0,1}

where PXY (x, y) = Pr{L(e1,2 ) = x, L(e2,3 ) = y}, PX (x) = Pr{L(e1,2 ) = x}, PY (y) = Pr{L(e2,3 ) = y}.

2.6 Exposure-Path Prevention for Intrusion Detection in Multimedia Sensor. . .

77

Fig. 2.44 Region division of two neighboring edges

∗ (x, y), In the case of I (U (e1,2 ), U (e2,3 )), all definitions are similar, but using PXY ∗ ∗ PX (x), and PY (x) instead of PXY (x, y), PX (x), and PY (x), respectively. Next, we study I√(X, Y ) for omnidirectional sensor networks. To ease presentation, we set d = 1/ n. From Eq. (2.52),

PX (0) = PY (0) = e−λ(π r

2 +2dr)

PX (1) = PY (1) = 1 − e−λ(π r

,

2 +2dr)

.

As shown in Fig. 2.44, L(e1,2 ) = 0, L(e2,3 ) = 0 means all points on e1,2 and e2,3 are not covered. Then,   % 2 R3 ) = 0 = e−λ(π r +4dr) . PXY (0, 0) = Pr N(R1 Furthermore, we have PXY (1, 0) = PXY (0, 1) = PX (0) − PXY (0, 0) = e−λ(π r

2 +2dr)

− e−λ(π r

2 +4dr)

,

PXY (1, 1) = 1 − PXY (0, 1) − PXY (1, 0) − PXY (0, 0) = 1 + e−λ(π r

2 +4dr)

− 2e−λ(π r

2 +2dr)

.

Figure 2.45 √ plots the curves of I (L(e1,2 ), L(e2,3 )) and I (U (e1,2 ), U (e2,3 )) when r = 20, 1/ n =5 or 10. From Fig. 2.45, we obtain the following observations: • I (U (e1,2 ), U (e2,3 )) < I (L(e1,2 ), L(e2,3 )) when λ is small; I (U (e1,2 ), U (e2,3 )) > I (L(e1,2 ), L(e2,3 )) when λ becomes large enough. √ • Both I (U (e1,2 ), U (e2,3 )) and I (L(e1,2 ), L(e2,3 )) increase, as 1/ n decreases. √ When r = 20 and 1/ n = 10, λl = 0.000418 and λu = 0.000803. Then, the corresponding I (L(e1,2 ), L(e2,3 )) = 0.2634 and I (U (e1,2 ), U (e2,3 )) = 0.123,

78

2 Directional Sensing Models and Coverage Control

√ Fig. 2.45 Comparison between I (L(e1,2 ), L(e2,3 )) and I (U (e1,2 ), U (e2,3 )) with different 1/ n in omnidirectional sensor networks. r = 20

respectively. This implies that the dependence between neighboring edges is weak, and thus our coverage percolation can be considered as a good approximation for the bond percolation. Similarly, we can derive I (X, Y ) for directional sensor networks, and the results also show that the dependence between neighboring edges is weak.

2.6.6 Simulation Evaluations We verify our model and analytical analyses on the exposure path problem in both omnidirectional and directional sensor networks through simulations. As shown in Fig. 2.46a, we set the deployment region is a 500 × 500 square, and the interval √ of neighbor vertexes 1/ n = 10. All sensor deployments in simulations under the stationary Poisson point process. According to the covered region of sensors,

Fig. 2.46 Simulation platform. (a) Deployment of sensor nodes. (b) L-coverage lattice. (c) Ucoverage lattice

2.6 Exposure-Path Prevention for Intrusion Detection in Multimedia Sensor. . .

79

we can build two kind of coverage lattice: L-coverage lattice (see Fig. 2.46b) and U-coverage lattice (see Fig. 2.46c). In this section, we use [variable] to denote the simulation result for that variable [variable]. In the followings, we use this simulation platform to obtain the experimental critical densities for omnidirectional and directional sensor networks, respectively. 2.6.6.1

Omnidirectional Sensor Networks

Let r = 20. The number of sensors, Ns , varies from 100 to 280 per 20 steps. This implies that the density of sensor nodes, λ, varies from 0.0004 to 0.00128 per 0.00008 steps. For each value of λ, we generate 50 different (P, r, λ) randomly. If there is no exposure path, we call it a non-exposure case. The number of nonexposure cases divided by 50 is the probability that there is no exposure path, which is denoted by PN . For each (P, r, λ), we calculate the ratio of the number of closed edges to the number of all edges. Then, we get 50 ratios for each different λ. The mean of these 50 ratios is the probability that any edge ei,j in lattice is closed. Figure 2.47 shows the relationship between λ and the probability that ei,j is closed. For the L-coverage lattice and the U-coverage lattice, we use Definitions 2.13 and 2.14 to determine whether ei,j is closed, respectively. We then get the analytical relationship plots for pl and po as shown in Fig. 2.47. The analytical pl and po are close to pl and po , respectively. Moreover, we can obtain pu by simulations. Figure 2.47 indicates that pu is slight larger than po . For each (P, r, λ), we get three different PN ’s corresponding to the continuum percolation, the L-coverage lattice, and the U-coverage lattice which are denoted by PN,c , PN,l , and PN,u , respectively. From Fig. 2.47, λ l = 0.000410 and λ u = 0.000750. The corresponding PN,l and PN,u in Fig. 2.48 are 0.048 and 0.024, respectively. According to Eq. (2.44), we can get λu = 0.000803 and λl = 0.000418. Compared to the analytical results of λu and λl , the simulation results are some what smaller. There are three main reasons which causes the difference Fig. 2.47 Relationship between λ and the probability that ei,j is closed in omnidirectional sensor networks

80

2 Directional Sensing Models and Coverage Control

Fig. 2.48 Relationship between PN and λ in omnidirectional sensor networks

Fig. 2.49 The differences among PN,c , PN,l , and PN,u with different r’s. (a) r = 15. (b) r = 25

between the analytical results and the simulation results: 1. The difference between pl and pl , and the difference between pu and pu ; 2. The dependence on pl or pu . 3. The bound effect of the sensor deployment region. √ Substituting r = 20 and 1/ n = 10 into Eq. (2.44), we obtain 0.000418 < λc < 0.000803. As shown in Fig. 2.48, the simulation result λ c = 0.000680, i.e., the simulation result of λc is consistent with the analytical results given by Theorem 2.1. According to the similar simulation process, we also get the corresponding curves of PN,c , PN,l , and PN,u for different r’s as shown in Fig. 2.49. When r = 15, λ l = 0.000680, λ u = 0.001560, and λ c = 0.001117. When r = 25, λ l = 0.000241, λ u = 0.00473, and λ c = 0.000462. These results imply that: 1. The difference between λ l and λ u decreases as r increases. 2. λ c gets closer to λ u as r increases.

2.6 Exposure-Path Prevention for Intrusion Detection in Multimedia Sensor. . .

81

The direct reason is that λ decreases as r increases for a given Poisson Boolean disk model (P, r, λ). Another important reason is that po gets closer to pu as r increases, i.e., the upper bound of λc gets tighter as r increases. 2.6.6.2

Directional Sensor Networks

Let r = 20, α = π/3. The number of sensors, Ns , varies from 250 to 800 per 50 steps. This implies that the density of sensor nodes, λ, is varied from 0.001 to 0.0032 per 0.0002 steps. For each value of λ, we generate 50 different (P, r, α, λ)’s randomly. Figure 2.50 shows the relationship between λ and the probability that e is closed. We can get the analytical relationship plots of pl and po according to Eqs. (2.48) and (2.49). From Fig. 2.50, pl is very close to pl , and po is larger than po by about 3%. We can also get the probability that all points on e are covered, pu , by simulations. pu is larger than po , and their difference increases slowly as λ increases. Compared to omnidirectional sensor networks, the difference between pu

and po is larger. By using the similar simulation process in part A of this section, we can get the corresponding λ c , λ l , and λ u in directional sensor networks. Let r = 20, α = π/3. As shown in Fig. 2.51a, λ c is about 0.001200. When pl = 0.5, the corresponding λ l = 0.000920 and PN,l = 0.38. When pu = 0.5, the corresponding λ u = 0.003120 and PN,l = 0.61. From Eq. (2.51), the analytical results of λl and λu are 0.000912 and 0.003205, respectively. The simulation results of λl and λu are very close to the analytical results. In order to show the effect of α, we also get the corresponding curves of PN,c , PN,l , and PN,u for different α’s. From Fig. 2.51a–c, we have the following observations: 1. The difference between λ l and λ u decreases as α increases. 2. When α is small, the difference between λ c and λ l is much less than the difference between λ c and λ u . But λ c gets closer to λ u as α increases. Fig. 2.50 Relationship between λ and the probability that ei,j is closed in directional sensor networks

82

2 Directional Sensing Models and Coverage Control

Fig. 2.51 The differences among PN,c , PN,l , and PN,u with different α’s

There are also two main reasons. First, from the directional sensing model we have that λ decreases as α increases for a given Poisson Boolean sector model (P, r, α, λ). Second, po gets closer to pu as α increases, i.e., the upper bound of λc gets tighter as α increases.

References Akyildiz, I.F., Su, W., Sankarasubramaniam, Y., Cayirci, E.: A survey on sensor networks. IEEE Commun. Mag. 40(8), 102–114 (2002) Broadbent, S.R., Hammersley, J.M.: Percolation processes, I and II. Math. Proc. Camb. Philos. Soc. 53(3), 629–641 (1957) Chang, C., Aghajanet, H.: Collaborative face orientation detection in wireless image sensor networks. In: Proceedings of ACM SenSys Workshop on Distributed Smart Cameras (2006) Chong, C., Kumar, S.: Sensor networks: evolution, opportunities, and challenges. In: Proceedings of the IEEE (2003) Djidjev, H.: Efficient computation of minimum exposure paths in a sensor network field. In: Proceedings of International Conference on Distributed Computing in Sensor Systems (2007) Djidjev, H.: Approximation algorithms for computing minimum exposure paths in a sensor field. ACM Trans. Sensor Netw. 7(3), 1–25 (2010)

References

83

Elson, J., Girod, L., Estrin, D.: Fine-grained network time synchronization using reference broadcasts. ACM SIGOPS Oper. Syst. Rev. 36(SI), 147–163 (2002) Ercan, A.O., Yang, D.B., El Gamal, A., Guibas, L.J.: Optimal placement and selection of camera network nodes for target localization. In: Proceedings of International Conference on Distributed Computing in Sensor Systems (2006) Ferrari, S., Foderaro, G.: A potential field approach to finding minimum-exposure paths in wireless sensor networks. In: Proceedings of IEEE International Conference on Robotics and Automation (2010) Forsyth, D., Ponce, J.: Computer vision: a modern approach. Prentice Hall Professional Technical Reference, Englewood Cliffs (2002) Gehrke, J., Madden, S.: Query processing in sensor networks. IEEE Pervasive Comput. 3(1), 233– 244 (2004) Glauche, I., Krause, W., Sollacher, R., Greiner, M.: Continuum percolation of wireless ad hoc communication networks. Phys. A: Stat. Mech. Appl. 325(3–4), 577–600 (2003) Grimmett, G.: Percolation, 2nd edn. Springer, Berlin (1999) Gui, C., Mohapatra, P.: Power conservation and quality of surveillance in target tracking sensor networks. In: Proceedings of ACM International Conference on Mobile Computing and Networking (2004) Gupta, P., Kumar, P.R.: Critical power for asymptotic connectivity in wireless networks. In: Stochastic Analysis, Control, Optimization and Applications, pp. 547–566. Birkhäuser, Boston (1999) Hall, P.: On continnum percolation. Ann. Probab. 13(4), 1250–1266 (1985) Heinzelman, W., Murphy, A., Carvalho, H., Perillo, M.: Middleware to support sensor network applications. IEEE Netw. 18(1), 6–14 (2004) Hongo, H., Murata, A., Yamamoto, K.: Consumer products user interface using face and eye orientation. In: Proceedings of IEEE International Symposium on Consumer Electronics (Cat. No. 97TH8348) (1997) Hörster, E., Lienhart, R.: On the optimal placement of multiple visual sensors. In: Proceedings of ACM International Workshop on Video Surveillance and Sensor Networks (2006a) Hörster, E., Lienhart, R.: Approximating optimal visual sensor placement. In: Proceedings of IEEE International Conference on Multimedia and Expo (2006b) Howard, A., Matric, M.J., Sukhatme, G.S.: Mobile sensor network deployment using potential field: a distributed scalable solution to the area coverage problem. Distrib. Autono. Robot. Syst. 5, 299–308 (2002) Kapoor, A., Picard, R.: Real-time, fully automatic upper facial feature tracking. In: Proceedings of IEEE International Conference on Automatic Face Gesture Recognition (2002) Kesten, H.: The critical probability of bond percolation on the square lattice equals 1/2. Commun. Math. Phys. 74(1), 41–59 (1980) Kesten, H.: Percolation Theory for Mathematicians, vol. 194. Birkhäuser, Boston (1982) Kim, K., Chalidabhongse, T.H., Harwood, D., Davis, L.S.: Real-time foreground-background segmentation using codebook model. Real-Time Imag. 11(3), 172–185 (2005) Li, S.J., Xu, C.F., Wu, Z.H., Pan, Y.H.: Optimal deployment and protection strategy in sensor network for target tracking. Acta Electron. Sin. 34(1), 71–76 (2006) Liu, M., Cao, J., Zheng, Y., Chen, L., Xie, L.: Analysis for multi-coverage problem in wireless sensor networks. J. Softw. 18(1), 127–136 (2007) Liu, L., Zhang, X., Ma, H.: Localization-oriented coverage in wireless camera sensor networks. IEEE Trans. Wirel. Commun. 10(2), 484–494 (2011) Liu, L., Zhang, X., Ma, H.: Percolation theory-based exposure-path prevention for wireless sensor networks coverage in internet of things. IEEE Sensors J. 13(10), 3625–3636 (2013) Ma, H., Liu, Y.: Correlation based video processing in video sensor networks. In: Proceedings of IEEE International Conference on Wireless Networks, Communications and Mobile Computing (2005a) Ma, H., Liu, Y.: On coverage problems of directional sensor networks. In: Proceedings of International Conference on Mobile Ad-Hoc and Sensor Networks (2005b)

84

2 Directional Sensing Models and Coverage Control

Ma, H., Liu, Y.: Some problems of directional sensor networks. Int. J. Sensor Netw. 2(1–2), 44–52 (2007) Ma, H., Zhang, X., Ming, A.: A coverage-enhancing method for 3D directional sensor networks. In: Proceedings of IEEE INFOCOM (2009) Meester, R., Roy, R.: Continnum Percolation, vol. 119. Cambridge University Press, Cambridge (1996) Megerian, S., Koushanfar, F., Qu, G., Potkonjak, M.: Exposure in wireless sensor networks. In: Proceedings of ACM MobiCom (2001) Megerian, S., Koushanfar, F., Qu, G., Veltri, G., Potkonjak, M.: Exposure in wireless sensor networks: theory and practical solutions. Wirel. Netw. 8(5), 443–454 (2002) Meguerdichian, S., Koushanfar, F., Potkonjak, M., Srivastava, M.: Coverage problems in wireless ad-hoc sensor network. In: Proceedings of IEEE INFOCOM (2001) Penrose, M.D.: The longest edge of the random minimal spanning tree. Ann. Appl. Probab. 7, 340–361 (1997) Piccardi, M.: Background subtraction techniques: a review. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics (2004) Poduri, S., Sukhatme, G.S.: Constrained coverage for mobile sensor networks. In: Proceedings of IEEE International Conference on Robotics & Automation (2004) Rama, S., Ramakrishnan, K.R., Atrey, P.K., Singh, V.K., Kankanhalli, M.S.: A design methodology for selection and placement of sensors in multimedia surveillance system. In: Proceedings of ACM International Workshop on Video Surveillance and Sensor Networks (2006) Shakkottai, S., Srikant, R., Shroff, N.B.: Unreliable sensor grids: coverage, connectivity and diameter. In: Proceedings of IEEE INFOCOM (2003) Sinopoli, B., Sharp, C., Schenato, L., Schaffert, S., Sastry, S.: Distributed control applications within sensor networks. In: Proceedings of the IEEE (2003) Tao, D., Ma, H., Liu, Y.: Energy-efficient cooperative image processing in video sensor network. In: Proceedings of Pacific-Rim Conference on Multimedia. Lecture Notes in Computer Science (2005) Tao, D., Ma, H., Liu, L.: Coverage-enhancing algorithm for directional sensor networks. In: Proceedings of International Conference on Mobile Ad-Hoc and Sensor Networks (2006) Thomas, G.B., Finney, R.L.: Calculus and Analytic Geometry, 8th edn. Addison-Wesley, Reading (1996) Tian, D., Georganas, N.D.: A coverage-Preserving node scheduling scheme for large wireless sensor networks. In: Proceedings of ACM International Workshop on Wireless Sensor Networks and Applications (2002) Tilak, S., et al.: Infrastructure tradeoffs for sensor networks. In: Proceedings of ACM International Workshop on Wireless Sensor Networks and Applications (2002) Veltri, G., Huang, Q., Qu, G., Potkonjak, M.: Minimal and maximal exposure path algorithms for wireless embedded sensor networks. In: Proceedings of International Conference on Embedded Networked Sensor Systems (2003) Wan, P., Yi, C.: Coverage by randomly deployed wireless sensor networks. IEEE Trans. Inf. Theory 52(6), 2658–2669 (2006) Wang, Q., Chen, W., Zheng, R., Lee, K., Sha, L.: Acoustic target tracking using tiny wireless sensor devices. In: Proceedings of Information Processing in Sensor Networks (2003) Wang, B., Wang, W., Srinivasan, V., Chua, K.C.: Information coverage for wireless sensor networks. IEEE Commun. Lett. 9(11), 967–969 (2005) Wang, B., Chua, K.C., Srinivasan, V., Wang, W.: Information coverage in randomly deployed wireless sensor networks. IEEE Trans. Wirel. Commun. 6(8), 2994–3004 (2007) Wang, R., Gao, Y., Wan, W., Mao, S.: Fuzzy information exposure paths analysis in wireless sensor networks. In: Proceedings of IEEE International Conference on Audio, Language and Image Processing (2010) Zou, Y., et al.: Sensor deployment and target localization in distributed sensor networks. ACM Trans. Embed. Comput. Syst. 3(1), 61–91 (2004)

Chapter 3

Data Fusion Based Transmission in Multimedia Sensor Networks

3.1 Introduction In literatures of wireless sensor networks, extensive research work has been devoted to providing energy efficient routing algorithms for data gathering (Heinzelman et al. 1999, 2000; Ahmed et al. 2003; Krishnamachari et al. 2002; Scaglione and Servetto 2002; Pattem et al. 2004; Zhang and Cao 2004a,b; Intanagonwiwat et al. 2002, 2003; Goel and Estrin 2003; Cristescu et al. 2004; Rickenbach and Wattenhofer 2004; Yu et al. 2004; Lindsey and Raghavendra 2002). While some of these approaches assume statistically independent information and have developed the shortest path tree based routing strategies (Heinzelman et al. 1999; Ahmed et al. 2003), others have considered the more realistic case of correlated data gathering (Heinzelman et al. 2000; Krishnamachari et al. 2002; Scaglione and Servetto 2002; Pattem et al. 2004; Zhang and Cao 2004a; Intanagonwiwat et al. 2002, 2003; Goel and Estrin 2003; Cristescu et al. 2004; Rickenbach and Wattenhofer 2004; Yu et al. 2004; Lindsey and Raghavendra 2002). By exploring data correlation and employing in-network processing, redundancy among sensed data can be curtailed and hence the network load can be reduced (Krishnamachari et al. 2002). The objective of sensor routing algorithms is then to jointly explore the data structure and network topology to provide the optimal strategy for data gathering with as minimum energy as possible. Regardless of the techniques employed, existing strategies miss one key dimension in the optimization space for routing correlated data, namely the data aggregation cost. Indeed, the cost for data aggregation may not be negligible for certain applications. For example, sensor networks monitoring field temperature may use simple average, max, or min functions which essentially are of insignificant cost. However, multimedia sensor networks may require complex operations for data

© Springer Nature Singapore Pte Ltd. 2021 H. Ma et al., Multimedia Sensor Networks, Advances in Computer Science and Technology, https://doi.org/10.1007/978-981-16-0107-1_3

85

86

3 Data Fusion Based Transmission in Multimedia Sensor Networks

fusion.1 Energy consumption of beamforming algorithm for acoustic signal fusion has been shown to be on the same order of that for data transmission (Wang et al. 2001). Moreover, encryption and decryption at intermediate nodes will significantly increase fusion cost in the hop-by-hop secure network since the computational cost is on the scale of nJ per bit (Carman et al. 2000). In our own experimental study, we show that aggregation processes such as image fusion cost tens of nJ per bit, which is on the same order as the communication cost reported in the literature (Heinzelman et al. 2000; Wang et al. 2001). That means while in-network data fusion can reduce data redundancy and hence curtail network load, the fusion process itself may introduce significant energy consumption for multimedia sensor networks with the audio-visual data and/or security requirements. Therefore, fusion-driven routing protocols for multimedia sensor networks cannot optimize over communication cost only – fusion cost must also be accounted for. How to balance the aggregation cost and transmission cost is the first challenge when designing en-routing fusion algorithm for gathering correlated data in multimedia sensor networks. The best adaptive fusion strategy should not only optimize information routes, but can also embed the decisions as to when and where fusion shall be performed in order to minimize the total network energy consumption. In this chapter, we present a routing algorithm, called Adaptive Fusion Steiner Tree (AFST) (Luo et al. 2006), for energy efficient data gathering. Not only does AFST jointly optimize over the costs for both data transmission and fusion, but also AFST evaluates the benefit and cost of data fusion along information routes and adaptively adjusts whether fusion shall be performed at a particular node. The data fusion routing can be modelled as the Steiner tree problem. The Steiner tree problem in graphs is well-known to be NP-hard (Karp 1972) and therefore it is nearly impossible to find polynomial time algorithms for solving it exactly. In 1968, the first Steiner tree approximation algorithm named minimum spanning tree heuristic (Gilbert and Pollak 1968) was proposed and the approximation ratio2 is 2. After that, many approximation algorithms for the Steiner tree problem in graphs that have polynomial running time were proposed (Zelikovsky 1993, 1996; Berman and Ramaiyer 1994; Karpinski and Zelikovsky 1997; Hougardy and Prömel 1999). In 2000, Robins and Zelikovsky incorporated the loss of a Steiner tree into the relative greedy algorithm, and achieved the current best approximation ratio of 1.550 (Robins and Zelikovsky 2000). Another approach to solve the Steiner tree problem is to use bio-inspired optimization algorithms, such as genetic algorithms or viral systems (Voss and Gutenschwager 1999; Gendreau et al. 1999; Cortes et al. 2008). However, their algorithm complexity are still high.

1 In

this chapter we will consider “aggregation” and “fusion” interchangeable, denoting the data reduction process on intermediate sensor nodes. 2 The quality of an approximation algorithm A is usually measured by its performance ratio R . A For the Steiner tree problem, the performance ratio is defined as the maximum ratio,between the ! A(I ) optimum and the solution returned by A, i.e., RA := sup : all instances I . OP T (I )

3.2 Adaptive Data Fusion for Energy Efficient Routing in Multimedia Sensor. . .

87

Hence, it is still a big challenge to design practical optimization algorithms which can provide the low-complexity and the high-parallelism for solving the Steiner tree problem. This chapter exploits a cellular computing model in the slime mold physarum polycephalum to solve the Steiner tree problem. Inspired by the path-finding and network formation capability of physarum, we developed a optimization algorithm, named as the physarum optimization, with low-complexity and high-parallelism (Liu et al. 2013). Complexity analysis shows that our proposed algorithm could achieve good performance with low complexity. Moreover, the core mechanism of our physarum optimization also may provide a useful starting point to develop some practical distributed algorithms for network design. Data aggregation in multimedia sensor networks is not only benefit for reducing the throughput of data transmission, thus saving energy effectively (Michaelides and Panayiotou 2009), but also to enhance the accuracy of gathered data. Usually, users of applications are often concerned whether the aggregated results are trustworthy so as to reflect the real situation of the physical environment. Therefore, for aggregation based information collection or event detection systems, it is not only important to gather comprehensive data, but also to reduce the impact of faulty and fake data, thus providing trusted and fault-tolerant data aggregation. At the same time, the trustworthiness of aggregated results should be reported to the users for decision making (Bao et al. 2012). In other words, providing reliable data with measurable trust is the key issue in the design of MSNs in order to improve the quality of information (QoI). In this chapter, we jointly consider data aggregation, information trust and fault tolerance to enhance the correctness and trustworthiness of collected information. Based on the multi-layer aggregation architecture of MSNs, we designed a trustbased framework for data aggregation with fault tolerance with a goal to reduce the impact of erroneous data and provide measurable trustworthiness for aggregated results (Sun et al. 2012). By extracting statistical characteristics from different sources and extending Josang’s trust model, we proposed how to compute self data trust opinion, peer node trust opinion and peer data trust opinion. According to the trust transfer and trust combination rules designed in our framework, we derived the trust opinion of the sink node on the final aggregated result. In particular, this framework can evaluate both discrete data and continuous media streams in MSNs through a uniform mechanism.

3.2 Adaptive Data Fusion for Energy Efficient Routing in Multimedia Sensor Networks 3.2.1 Motivation The objective of fusion-driven routing algorithms is to jointly explore the data structure and network topology to provide the optimal strategy for data gathering

88

3 Data Fusion Based Transmission in Multimedia Sensor Networks

with as minimum energy as possible. Even most fusion-driven routing algorithms (Goel and Estrin 2003; Cristescu et al. 2004; Rickenbach and Wattenhofer 2004) explored the data correlation to fully benefit from information reduction resulting from data fusion, they neglect the data fusion cost which may be as high as the transmission cost. Different from transmission cost that depends on the output of the fusion function, the fusion cost is mainly determined by the inputs of the fusion function. Therefore, in addition to transmission cost, the fusion cost can significantly affect routing decisions when involving data aggregation. We first presented a randomized algorithm termed Minimum Fusion Steiner Tree (MFST) that jointly optimizes over both the fusion and transmission costs to minimize overall energy consumption (Luo et al. 2006). MFST is proved to achieve a routing tree that exhibits 54 log(n + 1) approximation ratio to the optimal solution, where n denotes the number of source nodes. While MFST has been shown to outperform other routing algorithms including Shortest Path Tree (SPT), Minimum Spanning Tree (MST), and Shallow Light Tree (SLT) in various system settings, it assumes that aggregation is performed at the intersection nodes whenever data streams encounter. However, as we shall show below, such a strategy may introduce unnecessary energy consumption. Specifically, performing fusion at certain nodes may be less efficient than simply relaying the data directly. Figure 3.1 depicts a sensor network where sensor nodes are deployed on grid and sensed information of the source nodes is to be routed to sink t. Arrow lines form the aggregation tree in which nodes u and v initially aggregate data of areas A and B, respectively. As the sink is far away, u and v further aggregate their data at v and then send one fused data to the sink. Assume each hop has identical unit transmission cost c0 , the fusion cost is linear to the total amount of incoming data, and the unit fusion cost is q0 . Let w(u) and w(v) respectively denote the amount of data at u and v before the aggregation between them. The amount of resultant aggregated data at v can be expressed as (w(u) + w(v))(1 − σuv ), where σuv represents Fig. 3.1 Illustration of fusion benefit, or disadvantage, in sensor networks

source A

B u

v

router sink fusion point

L hops

s

t

3.2 Adaptive Data Fusion for Energy Efficient Routing in Multimedia Sensor. . .

89

the data reduction ratio owing to aggregation. In this scenario, if v performs data fusion, the total energy consumption of the route from v to t, assuming there are L hops in between, is Lc0 (w(u) + w(v))(1 − σuv ) + q0 (w(u) + w(v)). On the contrary, if v does not perform data fusion, the total energy consumption of the same route is simply the total relaying cost, Lc0 (w(u) + w(v)). To minimize the total energy consumption of the network, v should not perform data fusion as long q0 as σuv < Lc . This simple example reveals that to minimize total network energy 0 consumption, the decision at an individual node has to be based on data reduction ratio due to aggregation, its related cost, and its effect on the communication costs at the succeeding nodes. Although the criteria can be easily obtained for this simple example, a sensor network confronting various aggregation/communication costs, and data/topology structures, undoubtedly will dramatically augment the difficulty of the fusion decisions. To overcome this challenge, we propose Adaptive Fusion Steiner Tree (AFST), a routing scheme that not only optimizes over both transmission and fusion costs, but also adaptively adjusts its fusion decisions for sensor nodes. By evaluating whether fusion is beneficial to the network based on fusion/transmission costs and network/data structures, AFST dynamically assigns fusion decisions to routing nodes during the route construction process. Analytically we prove that AFST outperforms MFST. Through an extensive set of simulations, we demonstrate that AFST provides significant energy saving over MFST (up to 70%) and other routing algorithms under a wide range of system setups. By adapting both the routing tree and fusion decisions to various network conditions, including fusion cost, transmission cost, and data structure, AFST provides a routing algorithm suitable for a broad range of applications. In particular, we prove that the routing tree resulted from AFST consists of two parts: a lower part where aggregation is always performed, and an upper part where no aggregation occurs. The result can be readily applied in designing clustering algorithms in sensor networks: based on where fusion stops, the network can be partitioned into clusters where data aggregation is confined to be within the clusters only.

3.2.2 Measurement of Image Fusion Cost We have performed a series of experiments to investigate the aggregation cost in multimedia sensor networks. Our platform used is Sim-panalyzer (http://www.eecs. umich.edu), an extension of Simple Scalar (Austin et al. 2002) for power analysis. Simple Scalar is a well established architectural simulator that provides cycle-level tool sets for detailed processor-architecture study. As its extension, Sim-panalyzer provides analytical power simulation models for ARM CPU and Alpha CPU. In our experiments, we choose StrongARM SA-1110 as our embedded core as it is commonly used for high performance sensor node design in academic research and industrial development.

90

3 Data Fusion Based Transmission in Multimedia Sensor Networks

Fig. 3.2 Measurement model for data aggregation

Input Data

Raw Data

Decompression

Data Filtering

Fusion Algorithm Compression Transmission

3.2.2.1

Measurement Model for Data Aggregation

We first detail the model of data aggregation process used in our simulation. As illustrated in Fig. 3.2, we assume that each sensor is capable of data sensing, preprocessing, and data aggregation. A sensor node will compress its local raw data before transmitting it on the route back to the sink. If it receives data from another node, in a compressed form, to perform data aggregation, the node shall decompress the data first and perform the designated aggregation algorithm with its own raw data. The aggregated data will then be compressed and routed to the next hop. In this model, the aggregation cost is composed of two parts, marked in gray in Fig. 3.2, the cost for decompressing the input data and the cost for performing the fusion algorithm itself. In the following, we will only focus on these two parts. Energy consumption of a system contains static power dissipation and dynamic power dissipation. Dynamic power dissipation denotes the application processing energy with given instruction set and data set. Evidently, this part is the investigating target of our experiment. The model for dynamic power dissipation commonly employed in energy measurement (Wang et al. 2001) is Vdd

2 Etotal = Ctotal Vdd + Vdd (I0 e nVT )(

N ), f

(3.1)

where N represents the number of cycles for executing the algorithm with the given data set. It is determined by the algorithm complexity and affected by the compiling method. Ctotal denotes the total switched capacitance during the execution of the algorithm. Ctotal is proportional to N and it is also affected by the switching activity parameter. Vdd and f are the core supply voltage and clock frequency of the CPU, respectively. VT denotes thermal voltage. I0 and n are core specific which we assign 1.196 mA and 21.26 according to Wang et al. (2001). The SA-1110 core can be configured to various combinations of supply voltage and system clock. In our experiment we use 100 MHz clock and 1.5 V for core voltage.

3.2 Adaptive Data Fusion for Energy Efficient Routing in Multimedia Sensor. . .

(a)

(b)

91

(c)

Fig. 3.3 Image Fusion Example. (a) Input image. (b) Local image. (c) Fused image

3.2.2.2

Image Fusion

One of our experiments is wavelet-based image fusion. Wavelet-based image fusion is generally considered as the most efficient algorithm for image fusion and different approaches have been proposed (Chipman et al. 1995). In our experiment, we simulate a simple and efficient method described in Chipman et al. (1995). According to the measurement model illustrated in Fig. 3.2, a fusion node will use Discrete Wavelet Transformation (DWT) to generate wavelet coefficients from local image. To perform fusion with another node’s data, the node will use Zerotrees expansion to decompress the input data into wavelet coefficients. Given the wavelet coefficients of the two input images, the averages of these coefficients are computed as those for the fusion result. While this algorithm is simple, it provides a lower bound on the computation cost. More complex algorithms for better fusion result will undoubtedly incur even higher cost. Consequently, the aggregation cost is the cost of the decompression process and the merging process.3 The gray scale images depicted in Fig. 3.3 are used in the simulation. Figure 3.3a is the input image representing data from another node. Notice that it has a blurred area at the top-left corner. Figure 3.3b is the local image at the fusion node, with a blurred area at the bottom-left corner. Figure 3.3c is the fused image by performing the aforementioned fusion algorithm on Fig. 3.3a and b. Notice that in the fused image, the blurred areas are effectively eliminated. Figure 3.4a and b depict our experiment results. During our simulation, we scaled the image content from 64 × 64 pixels to 220 × 220 pixels. Reflected on the X axis of the figures, it is the total data size in byte of the two input images of the fusion algorithm. As we perform two level DWT and apply the embedded coding scheme (Pennebaker and Mitchell 1993; Shapiro 1993) using Zerotrees, the

3 The

cost of the final Zerotree compression on the merged data shall not be considered as part of the fusion cost as the node would perform Zerotree compression on its local data even if fusion was not performed.

92

3 Data Fusion Based Transmission in Multimedia Sensor Networks

8

x 10−3

120 110 Energy cost on per bit (nJ)

Energy cost (J)

7 6 5 4 3 2 1 0

100 90 80 70 60 50 40 30

2000

4000

6000

8000

10000

Input data size (byte)

12000

20

2000

4000

6000

8000

10000

12000

Input data size (byte)

(a)

(b)

Fig. 3.4 Wavelet image fusion cost. (a) Energy cost. (b) Per bit energy cost

compression ratio is approximately 8:1. Therefore, the X axis ranges from 1 KB (corresponding to 64 × 64 pixels) to 12 KB (corresponding to 220 × 220 pixels) as the summation of two image sizes. The total image fusion energy consumption in J oule is shown in Fig. 3.4a while the per bit energy cost is shown in Fig. 3.4b. As we can see, the total image fusion cost monotonically increases with the input data size. Figure 3.4b shows that the per bit energy cost remains roughly constant at 75 nJ/bit, comparable to per bit communication cost reported in the literature (Heinzelman et al. 2000; Wang et al. 2001). The irregular variations of per bit energy cost between consecutive measurements (image sizes) are due to the Zerotrees encoding scheme. We have also performed study on other fusion algorithms such as byte-wise Huffman coding. The result is also on the order of tens of nJ /bit. We omit the details here due to space limit. From our experimental study, we can conclude that the fusion cost for certain sensor networks is comparable to the transmission cost. Therefore, the impact of fusion cost to energy efficient data gathering in sensor networks is an important area needed to be carefully investigated.

3.2.3 System Model and Problem Formulation 3.2.3.1

Network Model

We model a sensor network as a graph G = (V , E) where V denotes the node set and E denotes the edge set representing the communication links between nodepairs. We assume a set S ⊂ V of n nodes, are data sources of interests and the sensed data needs to be sent to a special sink node t ∈ V periodically. We refer the period of data gathering as a round in this section. For a node v ∈ S, we define node weight w(v) to denote the amount of information outgoing from v in every round. An edge e ∈ E is denoted by

3.2 Adaptive Data Fusion for Energy Efficient Routing in Multimedia Sensor. . .

93

e = (u, v), where u is the start node and v is the end node. The weight of edge e is equivalent to the weight of its start node, i.e., w(e) = w(u). Two metrics, t (e) and f (e), are associated with each edge, describing the transmission cost and fusion cost on the edge, respectively. Transmission cost, t (e), denotes the cost for transmitting w(e) amount of data from u to v. we abstract the unit cost of the link for transmitting data from u to v as c(e) and thus the transmission cost t (e) is t (e) = w(e)c(e).

(3.2)

Notice that c(e) is edge-dependent and hence can accommodate various conditions per link, for example, different distances between nodes and local congestion situations. Fusion cost, f (e) denotes energy consumption for fusion process at the end node v. f (e) depends on the amount of data to be fused as well as the algorithms utilized. In this section, we use q(e) to abstract the unit fusion cost on edge e. Since data fusion is performed by intermediate nodes to aggregate their own data with their children’s, in order to avoid confusion, we use w (·) to denote the temporary weight of a node before current data fusion. Then the cost for fusing the data of nodes u and v at node v is given by   f (e) = q(e) · w(u) + w (v) .

(3.3)

Key to a sensor data routing protocol with data fusion is the data aggregation ratio. Unfortunately, this ratio is heavily dependent on application scenarios. Here, we use an abstract parameter σ to denote the data reduction ratio due to aggregation. To be more specific, if node v is responsible for fusing node u’s data (denoted by w(u)) with its own, we have w(v) = (w(u)+ w (v))(1−σuv ), where w (v) and w(v) denotes the data amount of node v before and after fusion. Notice that σu,v may be different before and after the fusion process between node u and another node, as the weight w(u) will change by the fusion and the data correlation between node u and v will be different as well. Due to aggregation cost, node v may choose not to perform data aggregation in order to realize maximum energy saving. Instead, it will simply relay the incoming data of node u. In this case, the new weight of node v is simply w(v) = (w(u) + w (v)). Jointly considering both cases described above, we can summarize the aggregation function at node v as w(v) = (w(u) + w (v))(1 − σuv xuv ), where xuv ∈ {0, 1} denotes whether fusion occurs on edge e = (u, v).

(3.4)

94

3.2.3.2

3 Data Fusion Based Transmission in Multimedia Sensor Networks

Problem Formulation

Given the source node set S and sink t, our objective is to design a routing algorithm that minimizes the energy consumption when delivering data from all source nodes in S to the sink t. Not only do we need to design routing paths back hauling sensed information driven by information aggregation, but also we have to optimize over the decisions as to whether aggregation shall occur or not on a particular node. Mathematically, a feasible routing scheme is a connected subgraph G =

(V , E ) where G ⊂ G contains all sources (S ⊂ V ) and the sink (t ∈ V ). Depending on whether fusion is performed or not, the edge set E can be divided into two disjoint subsets Ef and En , where Ef = {e|e ∈ E , xe = 1} and En = {e|e ∈ E , xe = 0}. Our goal is to find a feasible subgraph G∗ such that G∗ = argminG

   f (e) + t (e) + t (e). e∈Ef

(3.5)

e∈En

3.2.4 Minimum Fusion Steiner Tree 3.2.4.1

MFST Algorithm

The minimum fusion Steiner tree (MFST) is based on the techniques presented in Meyerson et al. (2000) and Goel and Estrin (2003). It first pairs up source nodes (or source with the sink) based on defined metrics and then randomly selects a center node from the node-pair. The weight of the non-center node will be transferred to the center node, paying appropriate transmission and fusion costs on that edge. Subsequently, the non-center node will be eliminated and the center node with aggregated weight will be grouped as a new set of sources. This process will then be repeated on the new set until the sink is the only remaining node. The algorithm is detailed below for the sake of completeness. In MFST, the size of the set Si is reduced half after one iteration of the algorithm. Therefore, the process terminates after log(n + 1) iterations. In the remainder of this section, we call each iteration a “stage” of the algorithm. MFST jointly considers both fusion and transmission costs. It has been shown that it yields 54 log(n + 1) approximation ratio to the optimal solution. Although extensive experiments (Luo et al. 2006) have shown that MFST can outperform other routing algorithms including SLT, SPT, and MST, one optimizing dimension is still missing, namely the aforementioned fusion decisions at sensor nodes. As MFST requires fusion to be performed along a routing path whenever possible, unnecessary energy may be wasted due to the inefficiency of fusion, for example, little information reduction due to weak correlation and high fusion cost. Specially, this phenomena can be magnified in the proximity of the sink itself. As aggregated information streams are approaching the sink, their correlation decreases

3.2 Adaptive Data Fusion for Energy Efficient Routing in Multimedia Sensor. . .

95

Algorithm 3: MFST Algorithm 1 Initialize the loop index i = 0. Define S0 = S ∪ {t}, and E ∗ = ∅. Let w0 (v) for any v ∈ S denote its original weight, and let w0 (t) = 0. 2 for every pair of nodes (u, v) ∈ Si do 3 if (u, v) is no-sink pair then i (v)) 4 α(wi (u), wi (v)) = wi (u)wi2(v)(wi (u)+w ; 2 wi (u)+wi (v)

5 6 7 8

else if v is just the sink t then α(wi (u), wi (v)) = wi (u); find the minimum cost path (u, v) in G according to the metric M(e) = q(e)(wi (u) + wi (v)) + α(wi (u), wi (v))c(e); define Ki (u, v) to be the distance under metric M(e) of this path;

9 Find minimum-cost perfect matching4 between nodes in Si . Let (ui,j , vi,j ) denote the j -th matched pair in Si , where 1 ≤ j ≤ |Si |/2. If there is only one non-sink node left after matching, match it to itself without any cost, and consider it as the last “single-node pair” in Si ; 10 for each matched pair (u, v) do 11 add those edges that are on the path defining Ki (u, v) to set E ∗ ; 12 for each pair of non-sink matched nodes (u, v) do 13

choose u to be the center with probability P (u = center) =

node v will be the center; 14 for each pair of sink matched nodes (u, t) do 15 choose sink t to be the center;

wi2 (u) , wi2 (u)+wi2 (v)

otherwise,

16 Transport weight of non-center node to its corresponding center node. The weight of the center satisfies wi+1 (center) = (wi (u) + wi (v))(1 − σuv ); 17 Remove all non-center nodes from Si , then the remaining center nodes induce Si+1 ; 18 if Si+1 contains only the sink then 19 return G∗ = (V ∗ , E ∗ ), where E ∗ is the set of edges we constructed and V ∗ includes the source nodes and the sink; 20 else 21 increment i and return to step 2;

which will introduce small data reduction owing to fusion. At the same time, directly relaying the data will not incur high communication cost as fewer hops are needed for relaying. Naturally, in this scenario, there is a high probability for direct relaying to outperform aggregation. 3.2.4.2

3-D Binary Tree Structure

In order to make our analysis more clear, we use a 3-D binary tree structure to describe the process of the hierarchical matching technique used in MFST.

4 Minimum-cost perfect matching is a matching of edges in a graph that guarantees the total cost (distance) for all pairs under M(e) is minimized. For polynomial-time algorithms for this problem, see Papadimitriou and Steiglitz (1998).

96

3 Data Fusion Based Transmission in Multimedia Sensor Networks t G3 t

d t (sink)

g

a

G

1

b

d

b

d

t

f

d

b

G

2

e

c (a)

G0

a

f

tg c

f e

(b)

Fig. 3.5 Expression of data aggregation tree in a 3-D binary tree structure. Solid lines in (b) correspond to edges in (a) and dotted lines represent virtual edges

Figure 3.5 illustrates a mapping example of original aggregation tree and its transformation. From bottom to top, the edges between two layers represent the result of node matching and center selection in each iteration of MFST. Assuming there are n sources in the aggregation tree G∗ obtained via MFST, to perform the transformation, we first clone N = log(n + 1) copies of G∗ , denoted by G1 , G2 , . . . , GN . For convenience, we label the original G∗ as G0 and arrange them into vertical layers as shown in Fig. 3.5b. For simplification, we will refer node v’s clone in layer k as vk . Subsequently, we map the original aggregation tree G∗ as ˆ ∗ embedded into these clones. G ˆ ∗ is the targeted shown in Fig. 3.5a to a new graph G 3-D binary tree. The process is to map the result of each matching stage k in MFST to the edges between Gk and Gk+1 . If we match u and v in stage k and v is selected as the center node in MFST, we first add an edge e linking v’s clones in Gk and Gk+1 with zero unit transmission cost c(e) = 0. This edge is termed virtual edge (dotted line). Then we connect uk to vk+1 with the same unit communication cost c(e) as in G∗ . The result of this transformation is a binary tree which is rooted at the sink in GN and has all leaves in S residing in G0 . For each source v ∈ S, there is a path going through all clones of G∗ , from v in G0 to the sink t in GN , via exactly N hops. In order to guarantee that the resulting 3-D tree is binary, the number of source nodes is assumed a power of 2 minus 1. If this is not the case, dummy nodes with weight 0 can be created as needed to complement the binary tree, corresponding to the single node pair matching cases. These dummy nodes have aggregation ratio 0 with any other nodes and incur zero fusion cost. As virtual edges and dummy nodes/edges incur zero cost for communication and fusion, the 3-D binary tree is equivalent to the original aggregation tree. ˆ ∗ (the 3-D binary tree) in layer k, let wvk denote the For each node vk ∈ G temporary weight of node v after combining the incoming data from lower layers ˆ ∗ , between the k(with or without data fusion). Each edge ek = (uk , vk+1 ) ∈ G th layer and the (k + 1)-th layer, is characterized by four parameters: edge weight

3.2 Adaptive Data Fusion for Energy Efficient Routing in Multimedia Sensor. . .

97

wek = wuk , unit transmission cost cek , data aggregation ratio σek , and unit fusion cost qek . Note that virtual edges have cek = 0 and σek = 1 (full aggregation), and dummy edges have wek = 0 and σek = 0 (no aggregation). To incorporate fusion decision, we use xek ∈ {0, 1} to represent whether or not information on edge ek is fused at the end node of this edge. Therefore, the optimal routing structures is an optimization problem over xek as well as the tree structure. Towards this end, we present Binary Fusion Steiner Tree (BFST), an approximate solution to this routing problem.

3.2.5 Design and Analysis of AFST While MFST (Luo et al. 2006) has provided an approximation routing algorithm that jointly optimizes over both the transmission and fusion costs with proven performance bound, it lacks fusion decisions in routing construction. This motivates us to design Adaptive Fusion Steiner Tree (AFST), which achieves significantly better performance than MFST due to the incorporation of fusion decision. In designing AFST to solve the optimization problem as presented in (3.5), our approach is as follows. By exploiting certain network properties, we first propose a heuristic solution termed Binary Fusion Steiner Tree (BFST), which is analytically shown to have better performance than MFST. However, BFST is still constrained to the tree structure obtained from MFST. By employing SPT rather than the structure obtained via MFST, where appropriate, we further improve BFST to AFST. As a result, we are able to analytically show that AFST is capable of achieving better performance than BFST. 3.2.5.1

Binary Fusion Steiner Tree (BFST)

In BFST, we first obtain a routing tree using MFST algorithm, where fusion is performed by any intermediate node. Subsequently, we evaluate whether fusion on individual nodes will reduce the energy consumption of the network. If not, the fusion process on the node will be cancelled and instead data will be directly relayed. In our analysis, we will employ the 3-D binary tree described in the previous subsection. To simplify the analysis, we assume that in BFST, the data reduction ratio σ is non-increasing on each path from the source to the sink while the unit fusion cost q is non-decreasing, excluding virtual edges. This assumption can be naturally justified. First, strong correlation and thus high aggregation ratio usually are due to spatial correlation resulting from short distances between nodes. In turn, these short distances will lead to small unit transmission cost. Based on the metric M(e) defined in MFST (which is a combination of fusion cost and transmission cost), it will match strongly correlated nodes before matching weakly correlated nodes. Therefore, for edges on a source-sink path, the aggregation ratio for edges near the sink will not be larger than those further away. Reflected on the 3-D binary tree, this will lead to non-increasing σ on a particular source-sink path. The reason

98

3 Data Fusion Based Transmission in Multimedia Sensor Networks

Algorithm 4: BFST Algorithm 1 Run MFST algorithm to obtain routing tree with fusion at every node possible. Convert the resulting aggregation tree to the 3-D binary tree as described above; 2 for all edges in the tree do 3 set xek = 1; 4 for all edges in the aggregation tree(from bottom to top, excluding virtual edges) do 5 calculate the fusion benefit for each edge, which can represent the energy saving by data fusion on that edge.Let uk ,vk+1 denote the fusion benefit of edge ek = (uk , vk+1 ). It is defined as uk ,vk+1 =  

6 7 8 9

(wuk + wvk )C(vk+1 , tN ) − (wuk + wvk )(1 − σek )C(vk+1 , tN ) + qek (wuk + wvk ) , where C(vk+1 , tN ) denotes the summation of unit transmission costs from vk+1 to tN in MFST; if uk ,vk+1 > 0 then set xek = 1; else set xek = 0;

10 for all edges with xuk ,vk+1 = 0 do 11 set xvk ,vk+1 = 0 to their corresponding virtual edges;

tN

Fig. 3.6 A fraction of the binary fusion tree for BFST

sk + 2 ek+1 ek uk

sk + 1

vk+1 vk

pk

sk

for skipping virtual edges is that their data aggregation ratio is set to 1 and does not affect the actual energy consumption of the network. Second, the unit fusion cost q is determined mainly by the complexity of the fusion algorithm and the input data set. As the information is being routed toward the sink, the data size and complexity will naturally increase due to aggregation on the route. Therefore, performing fusion thereon will incur more computation and hence more energy consumption per unit data. Based on this assumption, we first introduce Lemma 3.1. Lemma 3.1 xe is non-increasing on each path from a source to the sink in BFST. Proof Let Fig. 3.6 represent a branch of the binary fusion tree produced by BFST. In which, solid lines represent actual edges in the aggregation tree; dotted lines denote

3.2 Adaptive Data Fusion for Energy Efficient Routing in Multimedia Sensor. . .

99

virtual edges added for analysis, and dash-dotted lines are paths to the sink. On any path from a source node to the sink, assume ek is the first edge with xek = 0. We will enumerate different cases. Case 1: If ek , the first edge not performing fusion, is not a virtual edge, there are two sub-cases depending on its succeeding edge, ek+1 , as discussed below. If both ek and its succeeding edge ek+1 are not virtual edges, as exemplified by ek = (uk , vk+1 ) and ek+1 = (vk+1 , sk+2 ) shown  in Fig. 3.6, we have uk ,vk+1 ≤ 0. From uk ,vk+1 = (wuk + wvk )C(vk+1 , tN ) − (wuk + wvk )(1 − σek )C(vk+1 , tN ) +  qek (wuk + wvk ) , we have σuk ,vk+1 C(vk+1 , tN ) ≤ quk ,vk+1 . As σuk ,vk+1 ≥ σvk+1 ,sk+2 and quk ,vk+1 ≤ qvk+1 ,sk+2 based on our assumption, and the total unit transmission cost from vk+1 to tN is more than that from sk+2 to tN , i.e., C(vk+1 , tN ) > C(sk+2 , tN ), we can infer that σvk+1 ,sk+2 C(sk+2 , tN ) < σuk ,vk+1 C(vk+1 , tN ) ≤ qvk+1 ,sk+2 . This will lead to vk+1 ,sk+2 < 0. Consequently, we have xvk+1 ,sk+2 = 0. If ek is not a virtual edge but its succeeding edge ek+1 is, as exemplified by ek = (pk , sk+1 ) and ek+1 = (sk+1 , sk+2 ), the same conclusion can be obtained similarly. Case 2: If ek is a virtual edge, as exemplified by ek = (vk , vk+1 ), according to BFST algorithm, its matching pair edge ek = (uk , vk+1 ) must have xuk ,vk+1 = 0. From the result of Case 1, we also have xvk+1 ,sk+2 = 0. Inductively, we can conclude that all succeeding edges of an edge that does not perform fusion will not perform fusion either. Since the fusion decision xek ∈ {0, 1}, it is evident that xe is non-increasing on the path from source to sink.   Theorem 3.1 The total cost of BFST is no more than MFST. Proof Since BFST retains the same tree structure as MFST, for all edges with xek = 1, the BFST and MFST schemes will consume the same amount of energy. For any edge with xek = 0, owing to Lemma 3.1, any edge ei on the path from this edge to the sink satisfy xei = 0. This means that all edges on the path after ek have negative effect on energy conservation. In other words, performing fusion will introduce additional cost. Therefore, BFST is a better algorithm than MFST by avoiding fusion when direct relaying is a better choice.   Intuitively, Lemma 3.1 depicts that the routing tree generated by BFST can be divided into two parts: the lower part where data aggregation is always performed and the upper part where direct relaying is employed. As no data aggregation is performed in the upper part of the tree, instead of sticking to MFST, we can further improve the routing structure to reduce energy consumption. Inspired thereby, we develop AFST.

100

3.2.5.2

3 Data Fusion Based Transmission in Multimedia Sensor Networks

Adaptive Fusion Steiner Tree (AFST)

AFST further improves BFST by introducing SPT into the routing tree. Similar to BFST, it performs a matching process as in MFST in order to jointly optimize over both transmission and fusion costs. During the matching process, it also dynamically evaluates if fusion shall be performed or not. If it is determined at a particular point that fusion is not beneficial to the network, as shown by the analysis of BFST, we can conclude that any succeeding nodes on the routing path shall not perform fusion either. Consequently, we can employ SPT as the strategy for the remainder of the route as SPT is optimal for routing information without aggregation. Our analysis shows that AFST achieves better performance than BFST and thereon MFST. The size of set Si is reduced at least half after one run of the algorithm. However, the process may terminate sooner than MFST and BFST if fusion is deemed unworthy in the early iterations. Theorem 3.2 The total cost of AFST is no more than BFST. Proof The tree resulting from AFST also contains a lower part where aggregation is always performed and an upper part where no aggregation occurs. The lower part of AFST is the same as that of BFST due to their MFST based matching procedure and thus incurs the same cost as well. The task left is then to show that for any non-fusion pair (u, v) satisfying inequality ui ,vi+1 < 0 and vi ,ui+1 < 0, their transmission costs based on SPT in AFST is no more than the corresponding routing costs, including fusion and transmission costs, incurred in BFST. The proof is given below. From ui ,vi+1 < 0 and vi ,ui+1 < 0, we have σuk ,vk+1 SP (vk , t) < quk ,vk+1 and σvk ,uk+1 SP (uk , t) < qvk ,uk+1 where SP (vk , t) denotes the summation of unit transmission cost from vk to the sink t using shortest path. Without loss of generality, assume that v is selected as the center in BFST, and our goal is to prove wvk SP (vk , t) + wuk SP (uk , t)

(3.6)

< wuk c(uk , vk+1 ) + quk ,vk+1 (wuk + wvk ) +(wuk + wvk )(1 − σuk ,vk+1 )SP (vk+1 , t). For that, we have wvk SP (vk , t) + wuk SP (uk , t) ≤ wvk SP (vk+1 , t) + wuk (c(uk , vk+1 ) + SP (vk+1 , t)) = (wuk + wvk )SP (vk+1 , t) + wuk c(uk , vk+1 ) ≤ (wuk + wvk )(1 − σuk ,vk+1 )SP (vk+1 , t) +quk ,vk+1 (wuk + wvk ) + wuk c(uk , vk+1 )  

3.2 Adaptive Data Fusion for Energy Efficient Routing in Multimedia Sensor. . .

101

Algorithm 5: AFST Algorithm 1 Initialize the loop index i = 0. Define S0 = S ∪ {t}, and E ∗ = ∅. Let wv0 for any v ∈ S equal to its original weight, and let wt0 = 0; 2 for every pair of non-sink nodes (u, v) ∈ Si do 3 find the minimum cost path (u, v) in G according to the metric M(e) = q(e)(wi (u) + wi (v)) + α(wi (u), wi (v))c(e); 4 define Ki (u, v) to be the distance under metric M(e) of this path; 5 Find minimum-cost perfect matching between nodes in Si . 6 if there is only one non-sink node left after matching then 7 match it to itself without any cost, and consider it as the last “single-node pair” in Si 8 for each matched pair (u, v) do 9 calculate the fusion benefit for node u and v respectively according to this new definition:   ui ,vi+1 = (wui + wvi )SP (vi , t) − qei (wui + wvi ) + (wui + wvi )(1 − σei )SP (vi , t) , where SP (vi , t) denotes the summation of unit transmission cost from vi to the sink t using shortest path. 10 We call (u, v) a non-fusion pair if there is no fusion benefit regardless which node is selected as the center. It means that the two following inequations are satisfied ui ,vi+1 < 0 and vi ,ui+1 < 0. Otherwise, we call them a fusion pair; 11 for each non-fusion pair (u, v) do 12 add those edges that are on the shortest paths of (u, t) and (v, t) to set En∗ ; 13 remove both nodes u and v from Si ; 14 for each fusion pair (u, v) do 15 add those edges that are on the path defining Ki (u, v) to set Ef∗ ; 16

choose u to be the center with probability P (u = center) =

wu2

i

wu2 +wv2 i

17

18

. Otherwise v will

i

be the center. For pair (u, t), choose sink t to be the center; transport weight of non-center node to its corresponding center node. According to Equation (3.4), the weight of the center satisfies wi+1 (center) = (wui + wvi )(1 − σui ,vi+1 ); remove all non-center nodes from Si , then the remaining center nodes induce Si+1 ;

19 if Si+1 is empty or contains only the sink then 20 return G∗ = (V ∗ , E ∗ ) (E ∗ = Ef∗ + En∗ ), where Ef∗ and En∗ is the set of fusion edges and non-fusion edges, respectively, and V ∗ includes source nodes and the sink; 21 else 22 increment i and return to step 2;

When it is determined that the fusion benefit is positive for every node, the tree structure obtained from AFST degenerates to the tree from MFST. However, when fusion is not always beneficial for all nodes, AFST will stop doing nonsensical data fusion and directly deliver data to the sink for more energy saving, as a result, it can significantly outperform MFST. From the process of route construction, we can see that AFST can dynamically assign fusion decisions to routing nodes during the route construction process by evaluating whether fusion is beneficial to the network based on fusion/transmission costs and network/data structures.

102

3 Data Fusion Based Transmission in Multimedia Sensor Networks

3.2.6 Experimental Study We select MFST, SPT, MST, and SLT to represent the class of routing schemes where fusion occurs on all routing nodes if possible, and compare the performance of AFST with these routing algorithms. For routing schemes that does not perform aggregation, we employ SPT as it is the optimal routing strategy in this class. To distinguish it from the SPT scheme with data aggregation opportunistically occurs where information streams intersect, we denote the SPT without performing aggregation by SPT-nf, short name for SPT-no-fusion.

3.2.6.1

Simulation Environment

In our setup, 100 sensor nodes are uniformly distributed in a region of a 50 × 50 m square. We assume that each node produces one 400-byte packet as original sensed data in each round and sends the data to the sink located at the bottom-right corner. All sensors act as both sources and routers. We assume the maximal communication radius of a sensor is rc , and instantiate unit transmission cost on each edge, c(e), using the first order radio model presented in the work of Heinzelman et al. (2000). According to this model, the transmission cost for sending one bit from one node to another that is d distance away is given by βd γ + ε when d < rc , where γ and β are tunable parameters based on the radio propagation. We set γ = 2 and β = 100 pJ/bit/m2 to calculate the energy consumption on the transmit amplifier. ε denotes energy consumption per bit on the transmitter circuit and receiver circuit. Typical values of ε range from 20 to 200 nJ/bit according to the work of Wang et al. (2001). We set it to be 100 nJ/bit in our simulation. We model data reduction due to aggregation based on correlation among sensed data. The correlation model employed here is an approximated spatial model where the correlation coefficient (denoted by ρ) decreases with the distance between two nodes provided that they are within the correlation range, rs . If two nodes are more than rs distance apart, simply the correlation coefficient ρ = 0. Otherwise, it is given by ρ = 1 − d/rs , where d denotes the distance between the nodes. If node v is responsible for fusing node u’s data (denoted by w(u)) with its own, we assume that the weight of node v after fusion is given by w(v) = max(w(u), w (v)) + min(w(u), w (v))(1 − ρuv ) where w (v) and w(v) respectively denote the data amount of node v before and after fusion. In our simulation, we use ρ to describe the impact of data structure. For the fusion cost, we assume that q is constant and use ω to denote the average fusion cost per bit at each node.

3.2 Adaptive Data Fusion for Energy Efficient Routing in Multimedia Sensor. . .

3.2.6.2

103

Impact of Correlation Coefficient

Naturally, performing aggregation is futile in a network with no data redundancy (ρ = 0). Even in a network with 100% data redundancy (ρ = 1), data fusion at all possible nodes may not bring benefit because of the high fusion cost. In this simulation, we fix the sensor node transmission range and study the impact of correlation coefficient to the performance of AFST. We increase the correlation range, rs , from 0.2 to 2000 m which corresponds to varying ρ from 0 to 1. Figure 3.7a and b depict the total energy costs under light fusion cost and heavy fusion cost, respectively, for AFST and other algorithms like MFST, SPT, SLT, and SPT-nf. Naturally, costs of all algorithms with data fusion decrease as ρ increases. This exemplifies that data aggregation in sensor networks can greatly benefit the routing performance by reducing redundancy among correlated data. As SPT-nf totally ignores data aggregation, its cost remains constant. When the data correlation in the network is very weak (ρ → 0), AFST follows SPT-nf, the optimal solution, by avoiding any data aggregation. When the data correlation in the network increases, AFST dynamically adjusts its decisions accordingly. We also observe that different fusion cost will affect AFST as well. As shown in Fig. 3.7a, when fusion cost is low, AFST performs data fusion partially in order to benefit from data aggregation and resultant data reduction, even when the correlation degree in the network is weak (rs being 4 m). When the data in network is highly correlated (ρ → 1), AFST follows MFST to pursue the most energy saving by performing data fusion at each possible node. On the other hand, Fig. 3.7b illustrates the performance of AFST when the fusion cost is high. As we can see, even when ρ → 1, there are only a few nodes (about 15%) performing data fusion. This shows that AFST chooses not to perform fusion most of the time as the fusion cost itself is too high, overwhelming the benefit of reduction in data and communication costs.

MFST AFST SPT SPT−nf SLT

150

100

MFST AFST SPT SPT−nf SLT

250 Total cost (mJ)

Total cost (mJ)

200

200

150

100 50 0.25

70 1

4

16

64

256

Correlation range rs (m)

2048

0.25

1

4

16

64

256

2048

Correlation range rs (m)

Fig. 3.7 Impact of average correlation coefficient to energy consumption (rc = 30m, rs = 0.25 ∼ 2000m). (a) Low fusion cost (ω = 50 nJ/bit). (b) High fusion cost (ω = 120 nJ/bit)

104

3 Data Fusion Based Transmission in Multimedia Sensor Networks

Figure 3.7a and b also demonstrate that the cost of AFST is extremely smooth in the whole range of the correlation coefficient and steadily outperform others. As correlation among nodes are often varying from application to application, from node to node, and even from time to time, only a general algorithm such as AFST optimized for a wide range of ρ can accommodate those versatile scenarios.

3.2.6.3

Impact of Unit Fusion Cost

In this set of experiments, we study the impact of varying unit fusion cost to the algorithms. Figure 3.8 illustrates the results when ω, the unit fusion cost, increases from 10 to 200 nJ/bit. Figure 3.8a shows the total cost of AFST as compared with other algorithms as the unit fusion cost increases. As we can see, the total costs of MFST and SLT increase unboundedly along with the increase of ω, even though MFST has a lower slope. On the contrary, AFST follows the performance curve of MFST first and then leans towards SPT-nf, the optimal solution when fusion cost is high. The figure can be best explained when we jointly examine it with Fig. 3.8b, which depicts the number of clusters, the branches of the routing tree that always perform aggregation on their nodes. All algorithms with network wise fusion are unable to stop fusion even when fusion cost is extremely high. However, for AFST, as shown in Fig. 3.8b, when ω is very small, there are only two fusion clusters. This denotes that data fusion is performed almost on all nodes, which takes advantage of the low fusion cost. When ω increases, AFST increases the number of fusion clusters and hence reduces the number of fusions due to reduced fusion benefit in order to balance the fusion cost and transmission cost. And when ω is too large, AFST can achieve the same constant cost as SPT-nf by completely stopping data fusion.

MFST AFST SPT−nf SLT

Total cost (mJ)

140 120 100 80

100 Number of fusion clusters

160

90 80 70 60 50 40 30 20

60

10 40 10

50 100 150 Unit fusion cost ω (nJ/bit)

(a)

200

10

50

100 150 Unit fusion cost ω (nJ/bit)

200

(b)

Fig. 3.8 Impact of unit fusion cost to energy consumption(rc = 30m, rs = 20m, ω = 10 ∼ 200 nJ/bit). (a) Total cost. (b) Number of fusion clusters

3.3 Physarum Optimization: A Biology-Inspired Algorithm for the Steiner. . .

105

As described in Sect. 3.2.3, fusion cost may vary widely from network to network, from application to application. As an example, a temperature surveillance sensor network may have little fusion cost to calculate the max, min, or average temperature. On the other hand, a wireless video sensor network may incur significant fusion cost when performing image fusion. Our experiments show that among all algorithms, AFST can adapt best to a wide range of fusion costs and hence be applicable to a variety of applications.

3.3 Physarum Optimization: A Biology-Inspired Algorithm for the Steiner Tree Problem in Networks 3.3.1 Motivation As illustrated in the last section, the data fusion routing can be modelled as the Steiner tree problem which is an important NP-hard problem extensively existed in various applications, especially in network design. Given an arbitrary weighted graph with a distinguished vertex subset, the Steiner tree problem, named after Jakob Steiner, is to find a minimum-cost subtree spanning the distinguished vertices (Gilbert and Pollak 1968). Finding the minimum-cost Steiner tree is important for many applications, such as VLSI physical design (Cong et al. 1998), FPGA routing placement (Alexander et al. 1998), telecommunication network design (Cheng and Du 2001), keyword-based selection of relational databases (Yu et al. 2007), data-centric routing in wireless sensor networks (Krishnamachari et al. 2002), multicast packing (Oliveira and Pardalos 2005; Wang et al. 2004), network topology control (Misra et al. 2009; Mao et al. 2012), and access strategies design for ISP network (Chiaraviglio et al. 2012). However, the Steiner tree problem is an NP-hard problem even in the Euclidean and rectilinear metrics.5 Therefore, many approximation algorithms for the Steiner tree problem in graphs that have polynomial running time and return results that are not far from the optimal result were proposed. The first Steiner tree approximation algorithm is the minimum spanning tree heuristic which is apparently first mentioned in the work of Gilbert and Pollak (1968). The idea is simply to compute a minimum spanning tree instead of a minimum-cost Steiner tree. The Steiner tree (minimum spanning tree) obtained by this algorithm is at most twice as long as the optimum result, i.e., the approximation

5 Most

versions of the Steiner tree problem are NP-hard. In fact, one of these was among Karp’s original 21 NP-hard problems (Karp 1972). A few restricted cases can be solved in polynomial time.

106

3 Data Fusion Based Transmission in Multimedia Sensor Networks

ratio6 is 2. For more than twenty years, the algorithm achieving a better approximation ratio had not been found. In 1990, Zelikovsky proposed a simple greedy algorithm using the idea of 3-Steiner trees. This algorithm has an approximation ratio of 1.834 (Zelikovsky 1993). This approach was extended by Berman and Ramaiyer using k-Steiner trees. They designed a family of algorithms which achieves an approximation ratio of 1.734 for large k (Berman and Ramaiyer 1994). Since then, all approximation algorithms for the Steiner tree problem have used Zelikovsky’s idea. In the work of Zelikovsky (1996), Zelikovsky present the relative greedy algorithm which achieves an approximation ratio of 1.694. In the paper of Karpinski and Zelikovsky (1997), Karpinski and Zelikovsky introduced the concept of loss of a Steiner tree which is defined as the cost of the minimum spanning forest in the Steiner tree. They obtained an approximation ratio of 1.644 with an algorithm that minimizes the sum of edge cost and the loss of a Steiner tree. Their idea was generalized by Hougardy and Prömel, resulting in an approximation ratio of 1.598 (Hougardy and Prömel 1999). In 2000, Robins and Zelikovsky incorporated the loss of a Steiner tree into the relative greedy algorithm, and achieved the current best approximation ratio of 1.550 (Robins and Zelikovsky 2000). Voss and Gutenschwager (1999) presented a chunking based genetic algorithm for the Steiner tree problem in graphs. It is an extension of the simple GA where chunking is considered as an improvement with respect to runtime. This chunking based algorithm consumed less time than the simple GA based on the shortest distance graph heuristic in the average case, and solution quality was not significantly different from the slower GA. Gendreau et al. (1999) proposed a tabu search algorithm – Tabusteiner – to solve the Steiner tree problem. The main feature of Tabusteiner is a sophisticated strategy for quickly obtaining a solution and diversification mechanisms. Based on a library of medium-to-large-sized problems, the authors compared Tabusteiner with the best two genetic algorithms found in the literature. The results illustrate that Tabusteiner outperforms the genetic algorithms. Cortes et al. (2008) presented a new bio-inspired optimization approach – viral systems – to deal with the Steiner tree problem. They also applied it to a library of medium-to-large-sized cases, and compared to the metaheuristics that have provided the best results for the Steiner problem. The VS algorithm provides better solutions than genetic algorithms and solutions of similar quality for the most sophisticated tabu search approaches. On the other hand, the physarum polycephalum is a large, single-celled amoebalike organism. Its body contains a tubular network through which nutrients, signals, and body mass are transported (see Fig. 3.9a). In the past decade, this organism has been well-studied from a computational point of view. T. Nakagaki and his

6 The

quality of an approximation algorithm A is usually measured by its performance ratio RA . For the Steiner tree problem, the performance ratio is defined as the maximum ratio,between the ! A(I ) optimum and the solution returned by A, i.e., RA := sup : all instances I . OP T (I )

3.3 Physarum Optimization: A Biology-Inspired Algorithm for the Steiner. . .

(a)

(b)

(c)

107

(d)

Fig. 3.9 Photographs of the physarum polycephalum. (a) The tubular body of a physarum. (b) Example of maze-solving by physarum. (This photograph comes from reference Nakagaki et al. 2000). (c) Examples of connecting path in uniformly/nonuniformly illuminated fields. (These photographes come from paper Nakagaki et al. 2007). (d) A tubular network formed by the physarum for multiple food sources, which could be applied to Tokyo rail system design. (This photograph comes from reference Tero et al. 2010)

co-workers have found that the physarum is able to determine the shortest path through a maze, as well as connect different arrays of food sources (FSs) in an efficient manner with low total length yet short average minimum distance between pairs of FSs (Nakagaki et al. 2000, 2001). The maze-solving behavior of a physarum is shown in Fig. 3.9b. In Fig. 3.9c, the upper picture shows a thick tube for absorbing nutrients is formed that connects the FSs through the shortest route when two FSs are presented to the physarum in the dark. Because the physarum is photophobic, when the organism is illuminated by an inhomogeneous light field, the tubes connecting the FSs do not follow the simple shortest path, but rather react to the illumination inhomogeneity (see the lower picture in Fig. 3.9c). From experimental observations, researchers have found that the physarum could find the risk-minimum path in the inhomogeneous field of risk (Nakagaki et al. 2007). The researchers also have found that when multiple (more than two) FSs are presented at different points on the physarum, the organism constructs a network appropriate for maximizing the nutrient uptake from FSs. As shown in Fig. 3.9d, they observed physarum connecting a template of 36 FSs that represented geographical locations of cities in the Tokyo area, and compared the result with the actual rail network in Japan (Tero et al. 2010). Their findings indicate that the physarum can form a tubular network linking the FSs through direct connections and additional intermediate junctions that reduce the overall length of the connecting network. Therefore, the core mechanisms, captured in the network formation of physarum, may be useful to guide the Steiner tree construction in networks. Before extracting the computing method from the physarum, it is necessary to answer one fundamental question regarding the information processing of physarum: how does the organism realize the capacity of path-finding and network formation? Tero et al. (2007, 2008) provide the answer from a mathematical view of point. Although the proposed physarum model gives many insights into physarum’s behaviors, some parts of their model only focus on using computational and mathematical methods to analyze biological phenomena. This is the reverse of

108

3 Data Fusion Based Transmission in Multimedia Sensor Networks

using the insights from biology to advance computational solution. Therefore, the existing physarum model fails to describe some important characters of information process in the physarum, such as the locality and the parallelism. Specifically, in the physarum organism, the information used for dynamics in each tube is not global, but rather is local. Then, the organism can perform in the style of parallel computing. However, Nakagaki et al. (2009) point that the numerical scheme used in their model simulation needs global information and does not apply parallel processing. By advancing the physarum model in literature, this section designs a practical biology-inspired algorithm, named as physarum optimization, that combines the low-complexity and the high-parallelism for solving the Steiner tree problem. We further present two schemes, edge-cutting scheme and feedback-adjusting scheme, to accelerate the convergence. More importantly, our proposed algorithm can be implemented in parallel, and we also give the parallel implementation of physarum optimization. We further compare our proposed algorithm with the approximation algorithms which could achieve the current best approximation ratio and the best metaheuristic approximations. Complexity analysis shows that our algorithm is with lower complexity while achieving the similar performance to the classical algorithms for most cases.

3.3.2 Biology-Inspired Optimization and Physarum Computing Many biological processes in nature can be considered as processes of constrained optimization. Then, inspiration from biology has led to some successful algorithms. Biology-inspired algorithms have applied to numerous applications and achieved tremendous success. Two typical categories of biology-inspired algorithms are evolutionary algorithms and swarm intelligence algorithms which are inspired by the natural evolution and collective behaviors in swarms of animals, respectively. Evolutionary Algorithms (EAs), inspired by biological mechanisms of evolution, use iterative progress comprising growth, development, reproduction, selection, and survival in a population. The common underlying idea behind EAs is: the environmental pressure causes natural selection (the principles of Charles Darwin Theory of survival of the fittest) and this causes a rise in the fitness of the population. As the most successful algorithm among EAs, the Genetic Algorithm (GA) proposed by Holland in 1975 is an evolutionary based stochastic optimization algorithm with a global search for generating useful solutions (Holland 1973). The Artificial Immune System (AIS) (Farmer et al. 1986), inspired by the principles and processes of human immune system, is a highly evolved distributed adaptive system. It has been gaining significant attention due to its powerful adaptive learning and memory capabilities. The Viral System (VS) (Cortes et al. 2008) inspired by the performance of viruses is an approach for solving combinatorial problems. In this approach, the replication mechanism as well as the hosts’ infection processes is used to generate a metaheuristic that allows the obtention of valuable results.

3.3 Physarum Optimization: A Biology-Inspired Algorithm for the Steiner. . .

109

Swarm Intelligence (SI) encompasses the implementation of collective intelligence based on the behavior of insect swarms. As the most famous swarm intelligence algorithm, the Ant Colony Optimization (ACO) is proposed by Dorigo in 1999 (Dorigo et al. 1999). It is a meta-heuristic used for complex, combinatorial optimization problems. ACO models and simulates ant foraging behavior, brood sorting, nest building and self-assembling. The Particle Swarm Optimization (PSO) (Kennedy and Eberhart 1995) is a stochastic population-based global optimization technique. PSO is inspired by the social behavior of bird flocking searching for food. The Artificial Bee Colony (ABC) (Karaboga and Basturk 2007) is a predominant algorithm which simulates the intelligent foraging behavior of a honeybee swarm with three groups of bees: employed bees, onlookers and scouts. The Fish Swarm Algorithm (FSA) (Li et al. 2002) is inspired by the natural schooling behavior of fish. FSA presents a strong ability to avoid local minimums in order to achieve global optimization. In recent years, cellular computing, which models of computation based on the structure and the processes of living cells, becomes an important branch of biologyinspired computing. Membrane computing or P Systems draws inspiration from the manner cells are organized in complex biological structures and from their interactions in tissues or higher-order biological structures (P˘aun and Pérez-Jiménez 2006). Physarum computing is another cellular computing model which attracts more and more research attention. Next, we will briefly review the related works of physarum computing. The physarum organism has been well-studied from the perspective of computation in the past decade. T. Nakagaki and his co-workers did a lot of pioneering works on cellular computing in physarum. In the year 2000, they found that the physarum could track the shortest path between two selected points in a labyrinth (Nakagaki et al. 2000). They also revealed that the physarum is capable of solving a problem including spatial inhomogeneity of conditions, as the real organism can find the riskminimum path in an inhomogeneous field of risk introduced by the spatial patterning of toxic light illumination (Nakagaki et al. 2007). In the work of Tero et al. (2010), the computational capabilities of physarum were applied to network design. They showed that the physarum forms networks with comparable efficiency, fault tolerance, and cost to those of real-world infrastructure networks – the Tokyo rail system. Tero et al. (2007), described a mathematical model of the adaptive dynamics of the physarum which contains a key parameter corresponding to the extent of the feedback regulation between the thickness of a tube and the flux through it. Nakagaki et al. (2009), introduced a biologically inspired method, physarum solver, for path-finding and the Steiner problem. Recently, more and more researchers started to study the physarum computing. The study of physarum was the focus of a special issue of the International Journal of Unconventional Computing, entitled “Physarum Computing” (Special issue on Physarum Computing 2008), as originally proposed by A. Adamatzky. Adamatzky (2010) contained many illustrative examples of the computational power of physarum. In our previous work, Liu et al. (2012) designed a physarum-inspired heuristic algorithm to solve the shortest path problem for large-scale weighted graphs or inhomogeneous fields,

110

3 Data Fusion Based Transmission in Multimedia Sensor Networks

and exploited the algorithm to solve the minimal exposure path problem in wireless sensor network.

3.3.3 Problem Formulation and Physarum Model 3.3.3.1

Steiner Tree Problem

Given a graph G = (V , E) with a nonnegative cost function C : E → R+ on its edges, the cost of a connected subgraph of G is defined to be the sum of its edge costs. Let R ⊆ V be a set of terminals. Any tree in G spanning all terminals is called a Steiner tree. Please note that a Steiner tree may contain non-terminal vertices in V \ R and these are referred to as Steiner points. The Steiner tree problem in graphs seeks a minimum-cost Steiner tree, i.e., a tree that spans all terminals in R and whose total cost is minimum. This minimum-cost Steiner tree is also called Steiner minimum tree and denoted as SMT . Then, the Steiner tree problem in graphs can be formally formulated as follows. Given: A graph G = (V , E) with n vertices, m edges, and the edge cost function C : E → R+ , a subset R ⊆ V with g vertices. Find: SMT = arg min C(ST ) where ST is a tree in G spanning all g vertices ST

in R. As one of the Karp’s 21 problems, the Steiner tree problem in graphs is proved to be NP-hard by Richard Karp in 1972. Then, we have the following theorem. Theorem 3.3 (Karp 1972) The Steiner tree problem in graphs is NP-hard.

3.3.3.2

Mathematical Model for Physarum

The path-finding and network formation process of physarum is based on the morphogenesis of the tubular structure: high rate of protoplasmic flow stimulates an increase in tube diameter, whereas tubes tend to decline at low flow rate. Tube thickness therefore adapts to the flow rate. On the other hand, the decrease of tube thickness is accelerated in the illuminated part of the organism. Thus, the tube structure evolves according to a balance of these mutually antagonistic processes. Based on the observed phenomena of the tube structure’s evolution, a simple physarum model, which takes a mathematically simplified and tractable form, is proposed by Nakagaki et al. (2009) and Tero et al. (2007, 2008, 2010). Next, we outline the basic physarum model, and further details are available in the above referred papers. Suppose the initial shape of a physarum organism is represented by a randomly meshed lattice shown in Fig. 3.10, in which the edge and the vertex correspond to the plasmodial tube and the junction between tubes, respectively. For a given pair

3.3 Physarum Optimization: A Biology-Inspired Algorithm for the Steiner. . .

111

Fig. 3.10 Schematic illustration of a part of randomly meshed lattice which represents the initial shape of a physarum

of vertices, vi and vj , let pi and pj be the pressures at vi and vj , respectively. The vertices vi and vj are connected by ei,j of length Li,j and radius ri,j . Assuming a Poiseuille flow, the flux through ei,j from vi to vj is: Qi,j =

4 (p − p ) π ri,j i j

8ξ Li,j

=

 Di,j  p i − pj , Li,j

(3.7)

4 /8ξ is a measure of the where ξ is the viscosity of the fluid, and Di,j = π ri,j conductivity of the tube. As the length Li,j is a constant, the behavior of the tube network is described by the conductivities, Di,j ’s, of the edges. At each time step, a random FS (Food Source) is selected as a source (vs ), and a second random FS is chosen as a sink (ve ). The flux source vs drives flow through the network, so that j Qs,j = I0 , where I0 is the flux flowing into the  source vertex. The vertex ve is with a corresponding withdrawal of I0 such that j Qe,j = −I0 . By considering the conservation  law of flux, the inflow and outflow at each internal vertex must balance, i.e., j Qi,j = 0, where vi = vs , ve . Then, the network Poisson equation for the vertex pressures, derived from the above equations, is given by:

 j

Qi,j

⎧  Di,j   ⎨ I0 , if vi = vs ; pi − pj = −I0 , if vi = ve ; = ⎩ Li,j j 0, otherwise.

(3.8)

By setting pe = 0 as the basic pressure level, all pi ’s can be determined by solving the above equation system, and each Qi,j = Di,j (pi − pj )/Li,j is also obtained. To accommodate the adaptive behavior of the tubes, all corresponding conductivities Di,j ’s change in time according to the following equation: dDi,j = fq (-Qi,j -) − αDi,j . dt

(3.9)

The first term on the right side of Eq. (3.9) describes the expansion of tubes in response to the flux. The function fq is monotonically increasing and satisfies

112

3 Data Fusion Based Transmission in Multimedia Sensor Networks

f - q (0)-μ= 0. The functional form fq ( Qi,j ) is generally given by fq (-Qi,j -) = -Qi,j - (μ is commonly equal to 1 for the sake of simplicity) or fq (-Qi,j -) = -  -  -Qi,j -μ / 1 + -Qi,j -μ . The second term on the right side of Eq. (3.9) represents the rate of tube constriction, so that in the absence of flow the tubes will gradually disappear. The parameter α expresses how rapidly the thickness decreases in response to the illumination (risk). Because the total amount of fluid in the tube network must be conserved, the paths appear to compete with each other. This competition continues for a period of time, and a well-established tree structure connecting all of the FSs appears. Although this physarum model captures some mechanistic features of the physarum, there exist two main shortcomings. • Considering the numerical scheme used in the model simulation, Eq. (3.8) yields a linear equation system with sparse symmetric matrix which is numerically solved by standard Incomplete Cholesky Conjugate Gradient (ICCG) method (Kershaw 1978). Because the number of equations equal to the number of vertices, at each time step ICCG runs in O(|V |2 ), where |V | denotes the number of vertices. Therefore, the computational complexity of the physarum model is very high when |V | is large. • The model simulation does not apply parallel processing. In contrast, an actual physarum organism can perform in the style of parallel computing.

3.3.4 Physarum Optimization for Steiner Tree Problem As mentioned in the mathematical model of physarum, the shape of a physarum can be represented by a graph G(V , E) with n vertices and m edges. The set of g food sources is regarded as the set of terminals R ⊆ V where |R| = g. The illumination (risk) on a given edge is defined as the edge cost, i.e., the cost function C represents the illumination on edges which determines the rate of tube (edge) constriction. Then, we can utilize the core mechanism of physarum to study the Steiner tree problem, and propose a practical heuristic algorithm.

3.3.4.1

Initial Pressures of Vertices

We first need to initialize the pressures of vertices. Because the Steiner tree problem utilizes the edge cost instead of Euclidean edge length for general expression, we assume the lengths of all edges are equal, denoted by l. We also suppose the initial radiuses of all the tubes are the same. According to Eq. (3.7), the initial conductivities of all edges are equal, denoted by D 0 . In this section, we let [variable]t denote the value of this variable at time t and [variable]0 denote the initial value.

3.3 Physarum Optimization: A Biology-Inspired Algorithm for the Steiner. . .

113

Definition 3.1 Neighboring Vertex of a vertex. For an arbitrary vertex vi in the graph G, another vertex vj is called the neighboring vertex of vi if there exists an edge ei,j between vi and vj . Let Ni denote the set of all neighboring vertices of vi . Let a given vertex rk ∈ R be the sink and the other g − 1 vertices in R be the sources. Each source in R\{rk } drives flow I0 through G and rk is with a corresponding withdrawal of (g − 1)I0 . Similar to Eq. (3.8), the equations of initial pressures for all vertices are: ⎧  I0 l ⎪ ⎪ pi0 − pj0 = 0 , for vi ∈ R\{rk }; ⎪ ⎪ D ⎪ ⎪ vj ∈Ni ⎪ ⎪ ⎨  (g − 1)I0 l pj0 = , for vi = rk ; D0 ⎪ ⎪ v ∈N j i ⎪  ⎪ ⎪ ⎪ ⎪ pi0 − pj0 = 0, for vi ∈ / R. ⎪ ⎩

(3.10)

vj ∈Ni

By solving the linear equation system, we obtain the initial pressure pi0 of each vertex when rk is the sink. We let every vertex in R be the sink by turns and the other vertices in R be the sources, and repeat the above procedure g times. Then, we can get g different values of pressure for each vertex. Definition 3.2 Pressure Vector. For an arbitrary vertex vi in the graph G, the pressure vector of vi , denoted by Pi , is defined as a g-tuple (pi,1 , pi,2 , . . . , pi,g ) where pi,k , k = 1, 2, . . . , g denotes the pressure of vi when rk is the sink. 0 . Because the calculation of p 0 runs in We also denote the initial pressure as pi,k i,k 2 O(n ) where n = |V |, the initialization of pressure vectors runs in O(gn2 ).

3.3.4.2

Main Process of Physarum Optimization

For a given edge ei,j ∈ E, the corresponding cost ci,j  C(ei,j ) expresses how rapidly the conductivity Di,j decreases. The main characteristic of the physarum model described in Sect. 3.3.3 is the positive feedback between the thickness (conductivity) of each tube and the flux through it. The state of a physarum is described by the fluxes, Qi,j ’s, and the conductivities, Di,j ’s, of all edges. Therefore, our proposed physarum optimization utilizes an iterative process to express the feedback-driven evolution of physarum. In traditional physarum model, a random FS is selected as the source and a second random FS is chosen as the sink at each time step. As mentioned above, this section lets every vertex in R be the sink by turns at each time step. The main advantage of our solution is to reduce the randomness of final results. We call an iteration of physarum optimization the evolution of Pi and Di,j caused by the flux

114

3 Data Fusion Based Transmission in Multimedia Sensor Networks

through ei,j in the interval (t, t + 1). According to Eq. (3.7), we calculate the flux through ei,j at the t-th iteration as follows: Qti,j =

g  t   Di,j t t pi,k . − pj,k l

(3.11)

k=1

Based on Eq. (3.9), we further redefine the adaptation equation of the conductivity as follows:   t+1 t t t t = κi,j Di,j = Di,j + σ |Qti,j | − ρci,j Di,j Di,j ,

(3.12)

where

t =1+σ κi,j

- g -  t t pi,k − pj,k k=1

l

− ρci,j ,

where σ and ρ are two parameters that control the relative weights of the flux and the cost, respectively. t+1 From Eq. (3.8), pi,k is determined by solving the network Poisson equation as follows: ⎧ t+1  for vi ∈ R\{rk };  ⎨ I0 ,  Di,j t+1 t+1 pi,k = −(g − 1)I0 , for vi = rk ; − pj,k (3.13) ⎩ l vj ∈Ni / R. 0, for vi ∈ t+1 t+1 t+1 t+1 = 0. Then, Pit+1 = {pi,1 , pi,2 , . . . , pi,p } is determined by When vi = rk , pi,k solving g network Poisson equations. As mentioned above, this equation system should be numerically solved by the ICCG method, i.e., each iteration runs in O(gn2 ). For an iterative algorithm, the overall computational complexity is too high. Therefore, finding a calculation method of Pit+1 with lower complexity is the key for designing a practical physarum computing algorithm. From series of numerical simulations for the physarum model, we observe that the variation of pressure is continuous and relatively small between two consecutive time steps. Moreover, the evolution process cares about the trend of evolution and the final state more than the specific values of intermediate states. Thus, we

3.3 Physarum Optimization: A Biology-Inspired Algorithm for the Steiner. . .

115

t+1 t into Eq. (3.13) to derive an approximate expression of p t+1 substitute pj,k = pj,k i,k as follows: ⎧  t+1 t ⎪ I0 l + Di,j pj,k ⎪ ⎪ ⎪ ⎪ v ∈N ⎪ j i ⎪ ⎪ , for vi ∈ R\{rk };  ⎪ ⎪ t+1 ⎪ D ⎪ i,j ⎪ ⎪ ⎪ vj ∈Ni ⎨ t+1 pi,k ≈ 0, (3.14) for vi = rk ;  ⎪ ⎪ t+1 t ⎪ Di,j pj,k ⎪ ⎪ ⎪ ⎪ v ∈N ⎪ j i ⎪ ⎪ , for vi ∈ / R.  ⎪ ⎪ t+1 ⎪ Di,j ⎪ ⎪ ⎩ vj ∈Ni

Figure 3.11 illustrates the comparison between the pressures obtained by two different methods. For each time step, we use Eq. (3.14) to calculate the pressure,

, by solving denoted by pi,k , of vi , and we also get the pressure, denoted by pi,k

are Eq. (3.13). From Fig. 3.11, we observe that the variation trends of pi,k and pi,k

similar. When t is relatively small, both pi,k and pi,k decrease swiftly. After a while,

decreases slowly while p pi,k i,k increases slowly, and pi,k gradually approaches to

. pi,k Figure 3.11 also shows that when t is relatively small, the difference between

pi,k and pi,k is large sometimes. But this difference does not affect the final result. Because the conductivity is not determined by the pressure but the pressure

− p

difference. Table 3.1 lists pi,k − pj,k and pi,k j,k at different time steps. t instead



Obviously, pi,k −pj,k is close to pi,k −pj,k most time. Therefore, using pj,k Fig. 3.11 Comparison

over between pi,k and pi,k time, where pi,k is the pressure of vi calculated by

is the Eq. (3.14) and pi,k pressure of vi obtained by solving Eq. (3.13). In this case, the sink rk = (2, 10) is a vertex in a 10 × 10 grid

t pi,k − pj,k

− p

pi,k j,k

50 3.27 2.39

100 1.66 1.42

150 0.86 1.09

200 0.60 0.93

250 0.55 0.84

300 0.56 0.79

350 0.60 0.72

400 0.61 0.70

− p at different time steps Table 3.1 Comparison between pi,k − pj,k and pi,k j,k

450 0.62 0.68

500 0.63 0.67

550 0.63 0.66

600 0.63 0.65

650 0.63 0.64

700 0.62 0.64

750 0.62 0.63

800 0.61 0.63

116 3 Data Fusion Based Transmission in Multimedia Sensor Networks

3.3 Physarum Optimization: A Biology-Inspired Algorithm for the Steiner. . .

117

t+1 t+1 of pj,k for calculating pi,k is reasonable, and each iteration runs in O(gm + gn) where m = |E|. t+1 t Moreover, using pj,k instead of pj,k in Eq. (3.13) brings another important t+1 only depends on the local information, i.e., advantage – the calculation of pi,k t+1 all pi,k ’s can be calculated independently. Taking this advantage, our physarum optimization can be implemented as a parallel form. We will discuss the parallel implementation of physarum optimization latter.

3.3.4.3

Convergence of Physarum Optimization

For an iterative algorithm, convergence is an important measure of the algorithm performance. Our physarum optimization utilizes two schemes to accelerate the convergence of physarum optimization. (i) Edge-cutting scheme The main idea of this scheme is to delete the edge, whose conductivity is small enough. It is simple but faces two challenges. The first challenge is when an edge is deleted the pressures of nearby vertices will fluctuate quickly (see Fig. 3.12). We call this phenomenon pressure fluctuation. t+1 t instead of p t+1 . This approximate This is because pi,k is calculated by using pj,k j,k calculation causes the imbalance of pressures when the edge deletion occurs. Moreover, shown in Fig. 3.12, the fluctuations of pi,k and pj,k are in opposite phases at the same time step. Thus, the pressure difference pi,k − pj,k changes with a large amplitude, and the fluctuation of pressure difference will further enhance the pressure fluctuation. When the fluctuation becomes big enough, the ei,j will be cut by mistake (see Fig. 3.12, ei,j is cut when t = 55). Then, the pressure fluctuation often causes that all edges are cut quickly. To eliminate the pressure fluctuation, we Fig. 3.12 Pressure fluctuations of vi and vj , where vi and vj are two vertices in a 10 × 10 grid

118

3 Data Fusion Based Transmission in Multimedia Sensor Networks

t+1 t+1 t , i.e., let pi,k be the average of pi,k calculated by Eq. (3.14) and pi,k

t+1 pi,k

  ⎧  t+1 t t ⎪ I p l + D + p ⎪ 0 i,k j,k i,j ⎪ ⎪ ⎪ vj ∈Ni ⎪ ⎪ ⎪ , for vi ∈ R\{rk };  ⎪ ⎪ t+1 ⎪ 2 Di,j ⎪ ⎪ ⎪ ⎪ vj ∈Ni ⎨ ≈ 0, for vi = rk ;    ⎪ ⎪ t+1 t t ⎪ ⎪ Di,j pi,k + pj,k ⎪ ⎪ ⎪ ⎪ v ∈N j i ⎪ ⎪ , for vi ∈ / R.  ⎪ ⎪ t+1 ⎪ 2 Di,j ⎪ ⎪ ⎩

(3.15)

vj ∈Ni

Figure 3.12 also illustrates the pi and pj after eliminating fluctuations by using Eq. (3.15). The changes of pi,k − pj,k become continuous and steady. The second challenge is how to judge whether the conductivity of an edge is small enough. The most intuitive method is to set a threshold, denoted by T , of the t+1 conductivity. When Di,j is smaller than T , the corresponding tube is regarded as a vanishing one. We then delete ei,j from G. From series of numerical simulations, we observe that the number of remaining edges rapidly decreases at the beginning, and after a while the number of remaining edges decreases very slowly. This is because after several iterations, the conductivities of the remaining edges become relatively big and it is difficulty to further delete edges using the fixed threshold. To overcome this problem, we use a dynamic threshold to cut edges. Let T increase over time, i.e., T can be expressed as a monotonically increasing function fd of t. In this section, we define T = fd (t) = εt, where ε is a proportion coefficient. Simulation results in Section V part B show that at the beginning the convergence speed using our method is slower than that using the fixed thresholds, but our method could make the graph converge to a Steiner tree rapidly. (ii) Feedback-adjusting scheme The positive feedback between the flux and the conductivity is expressed by Eq. (3.12). We can accelerate the convergence of physarum optimization by adjusting the parameters σ and ρ in Eq. (3.12). Let Tc be the least number of iterations for finding the Steiner tree by Physarum Optimization.

3.3.4.4

Algorithms of Physarum Optimization

Given the process of physarum optimization in the preceding sub-sections, the so-called physarum optimization algorithm (POA for short) is described by Algorithm 6. From the formal description of POA, we have that each iteration updates Di,j and Pi according to the pressures of neighboring vertices for each vertex vi . Because the vertex pressure vector changes in each iteration, it is not easy to perform

3.3 Physarum Optimization: A Biology-Inspired Algorithm for the Steiner. . .

119

Algorithm 6: Physarum optimization algorithm for Steiner tree problem

1 2 3 4 5

Require: The graph G = (V , E) with n vertices and m edges; Edge cost ci,j = C (ei,j ), ∀ei,j ∈ E; The subset R ⊆ V with p terminals. Initialize the pressure vector Pi0 , ∀vi ∈ V according to Eq. (3.10) and the conductivity D 0 ; for each vertex vi do Pi ← Pi0 ; for each neighboring vertex vj do Di,j ← D 0 ;

6 for t = 1 to Tc do 7 for each vertex vi do 8 for each neighboring vertex vj do 9 Update Di,j according to Eq. (3.12) if Di,j < εt then 10 Delete ei,j 11

Update Pi according to Eq. (3.15)

Fig. 3.13 A subgraph Si of G including 4 vertices. The red points denote the neighboring vertices of Si

different iterations in parallel. However, each iteration can be performed in parallel as follows. Assume that the graph G is partitioned into ns subgraphs where 1 < ns ≤ n. The work associated with each subgraph is assigned to a different process, i.e., the number of processes is also ns . Theoretically, the graph G can be partitioned into n subgraph at most, i.e., each subgraph includes only one vertex. As shown in Fig. 3.13, let Si be a given subgraph and Pi be the corresponding process. We first give two relative definitions as follows. Definition 3.3 Neighboring vertex of a subgraph. For a given subgraph Si , a vertex being not in Si is called a neighboring vertex of Si if it is the neighboring vertex of one vertex in Si at least. Definition 3.4 Neighboring subgraph. For a given subgraph Si , another subgraph, which includes at least one neighboring vertex of Si , is called the neighboring subgraph of Si .

120

3 Data Fusion Based Transmission in Multimedia Sensor Networks

Algorithm 7: Parallel Physarum optimization algorithm for Si

1 2 3 4

Require: The subgraph Si and the neighboring vertices; Edge costs of all edges in Si and edges between Si and the neighboring vertices; Terminals in Si ; Initial pressure vectors of both vertices and neighboring vertices of Si . for each vertex vi in Si do Pi ← Pi0 ; for each neighboring vertex vj do Di,j ← D 0 ;

5 for t = 1 to Tc do 6 for each vertex vi in Si do 7 for each neighboring vertex vj do 8 Update Di,j according to Eq. (3.12) if Di,j < εt then 9 Delete ei,j 10 11 12

Update Pi according to Eq. (3.15) Broadcast the updated pressure vectors of vertices in Si to the neighboring processes; Receive the pressure vectors of neighboring vertices of Si from the neighboring processes;

The red vertices in Fig. 3.13 are the neighboring vertices of Si . After initializing the pressure vector Pi0 , ∀vi ∈ V , the initial pressure vectors of both vertices and neighboring vertices of Si are assigned to Pi . At each iteration, Pi updates the conductivities and pressure for each vertex in Si . Then, Pi broadcasts the updated pressures to all neighboring processes,7 and receives the updated pressures of all neighboring vertices of Si from the neighboring processes. The algorithm description of Pi is described by Algorithm 7. Complexity Analysis: The complexity of Algorithm 6 is mainly determined by the loops of lines 8, 9, 10, and 16, and POA runs in O(n2 g + Tc mg + Tc ng) where Tc is the number of iterations, n = |V | is the number of vertices, m = |E| is the number of edges, and g = |R| is the number of terminals. Simulation results in Section V show that Tc is approximately proportional to g. From the description of Algorithm 2, it is easy to get that Pi runs in O(Tc g|Vi | + Tc g|Ni |) where |Vi | be the vertex number of Si and |Ni | be the number of neighboring vertices of Si . Complexity Comparisons: Cortes et al. (2008) pointed out that the viral system runs in O(I T ER · gn2 ) where I T ER denotes the maximum number of iterations. We also implement the 1.55 approximation algorithm and the tabu search algorithm which run in O(mn+n2 log n+g 3 n+g 6 ) and O(n2 log n+I T ER·n2 +I T ER·mn), respectively. Thus, we can obtain that: • The complexity of centralized POA (Algorithm 6) is lower than the 1.55 approximation algorithm when the number of terminals, g, is relatively large;

7 We

also call the process for a neighboring subgraph a neighboring process.

3.4 A Trust-Based Framework for Fault-Tolerant Data Aggregation in. . .

121

the complexity of Algorithm 6 is similar to the 1.55 approximation algorithm when g is relatively small. • The complexity of Algorithm 6 is lower than the tabu search algorithm, especially when g is relatively small. • The complexity of Algorithm 6 is obviously lower than the viral system. • The complexity of parallel POA (Algorithm 7) is much lower than all of the 1.55 approximation algorithm, the tabu search algorithm, and the viral system, especially when the vertex number of the subgraph is small.

3.4 A Trust-Based Framework for Fault-Tolerant Data Aggregation in Multimedia Sensor Networks 3.4.1 Motivation A common problem in multimedia sensor networks is information error and loss caused by components failure, external interference, wireless transmission error, and security threats such as fake data injection. Due to low reliability and hence accuracy of the data sensed by individual sensor nodes, data aggregation is widely used for reliable event detection and prevention of faulty or fake reports. In addition, users of MSN applications are often concerned whether the aggregated results are trustworthy. Therefore, for the aggregation based event detection in environment monitoring systems, it is not only important to gather comprehensive data, but also to provide reliable and fault-tolerant data aggregation with measurable trust in order to improve the quality of information (QoI). In the typical hierarchical environment monitoring system (as shown in Fig. 3.14), sensor nodes collect the environment signals according to a certain sampling mechanism and report them to higher level sensor nodes (called aggregators). An aggregator and its direct children form an aggregation set. The aggregators forward the aggregated results to their higher level aggregators Fig. 3.14 Multi-layer trustworthy aggregation architecture for MSNs

122

3 Data Fusion Based Transmission in Multimedia Sensor Networks

recursively, and eventually to the sink node. Since the nodes participating in the process may be destroyed or the data may be corrupted/manipulated, the aggregators and the sink node should have a mechanism to provide trustworthiness of aggregated results to the users. Recently, there are many researches on reliability, fault tolerance and security of data aggregation. In the field of trust management for data aggregation, RDAT (Ozdemir 2008) presented a reliable data aggregation protocol, which enables the aggregators to evaluate the action of each sensor node using the concept of functional reputation. MERIG (Luo et al. 2011) studied the problem of Minimum Energy Reliable Information Gathering by adaptively using redundant transmission on fusion routes, then, packets with more information are delivered with higher reliability. In Shaikh et al. (2008), sensor nodes send data and the corresponding trust opinion to the cluster-head, which in turn comprehensibly analyzes the trust opinion of the data reported by each node, and thus determines unreliable nodes. Zhang et al. (2006) introduce a trust mechanism for data aggregation aiming at preventing attacks, and addresses fault tolerance implicitly. However, it does not consider the trustworthiness of continuous media streams. In the field of fault-tolerance in MSNs, most of the fault-tolerant techniques (Ding et al. 2007; Merhi et al. 2009) make use of temporal or spatial correlation, and statistical characteristics to reduce the impact of erroneous data. In Merhi et al. (2009), a lightweight acoustic target localization system is presented for wireless sensor networks based on Time Difference of Arrival (TDOA). Then, a fuzzy adaptive resonance theory (ART) is used by the data fusion center to detect errors and fuses the estimates along a decision tree based on spatial correlation and consensus vote. Although these data fault-tolerant mechanisms consider data aggregation, they seldom deal with security problems. Because of resource limitations, WSNs are also vulnerable to endure attacks which can not be tackled using single cryptographic or authentication techniques. In an unsecured network, in general, there are two types of attacks for data aggregation operation. The first one is that aggregators receive false data from sensor nodes. The second one is that the base station receives false data from compromised aggregator nodes. A number of methods have been developed to make wireless sensor networks secure and reliable (Chen et al. 2009; Ho et al. 2011, 2012; Han et al. 2007). However, the current researches seldom jointly consider data aggregation, information trust, fault tolerance and security. Consider a temperature monitoring network as an example. When the sensing part of a sensor node fails suddenly, the sampling value collected by the sensor changes accordingly, the node itself can judge the data abnormal immediately based on temporal correlation. But, if the sensed data from this node maintains the abnormal value from then on, as time goes by, this node may treat its reports as normal. In this case, comparing the reports from this sensor with those from its neighbors, the aggregator should determine the trustworthiness of this sensor still low and thus reduce the weight (contribution) to the aggregation process. However, if all sensor nodes report data with a sudden change, any individual node may judge itself as abnormal; while the aggregator may conclude that the

3.4 A Trust-Based Framework for Fault-Tolerant Data Aggregation in. . .

123

trustworthiness of all reports are almost the same due to high spatial correlation and hence captures the event immediately. Moreover, in an unsecured network, fake data can be injected on both sensor nodes and aggregators. Usually, there are two methods to build fake data. One is sending abnormal data (e.g. ultralarge or ultra-small data) to interfere with the calculation of data aggregation. The other is generating random data to make aggregated results fluctuant. As far as compromised sensor nodes are concerned, they can report their data with very high trustworthiness. For withstanding the former attack, the aggregator should check the signal coherence between the compromised sensor node and other neighbors, and infer the trustworthiness of the compromised node low. To confront the latter attack, the aggregator should check not only the signal coherence with its neighbors but also the rule of its own signal sequence, thus infer the trustworthiness of the compromised node low. In this way, the trustworthiness of the aggregated results cannot be highly distorted by the failed or the compromised nodes and can rapidly adapt to the changing environment. Motivated by this simple example, we develop a formal framework to derive the trustworthiness of the reporting data and thus implement a fault-tolerant and trust-measurable system. Meanwhile, the framework can also defense the attacks of fake data injection. Furthermore, the framework takes into consideration the trust opinions on both discrete data and continuous media streams, thus extending the method proposed in Zhang et al. (2006). More specifically, we introduce a trust mechanism to evaluate the trustworthiness of a node and its data. For this purpose, we define memory depth to describe the temporal correlation and aggregation breadth to describe the spatial correlation. A sensor node uses memory depth to calculate self trustworthiness of its data based on the historical data stored and the current data. In an aggregation set, for discrete data as well as continuous media stream, the aggregator calculates the trustworthiness of each node based on both temporal and spatial correlations. When a node reports the sensed data to its aggregator, the trustworthiness of the aggregator to the reported result is related to both the trustworthiness of the aggregator to the node and that of the node to its sensed data. Similar to Josang’s trust model (Josang 2001), the aggregator calculates the trustworthiness of the report from each node through trust transfer rule based on the concept of quaternion group, and calculates the trustworthiness of the aggregated result through trust combination rule. This process significantly reduces the impact of erroneous data and achieves local fault tolerance. Moreover, the trustworthiness of the final aggregated result precisely reflects the trustability of the reported event or media stream. However, in traditional wireless sensor networks with fault tolerance, the trustworthiness is usually calculated only based on temporal or spatial correlation without considering the effect of self data trustworthiness.

124

3 Data Fusion Based Transmission in Multimedia Sensor Networks

3.4.2 System Model In this section, we first introduce the multi-layer aggregation architecture and source model for MSNs, and then summarize the basic trust model due to Josang, used in developing our framework.

3.4.2.1

Multi-Layer Trustworthy Aggregation Architecture

The proposed trust-based framework for fault-tolerant data aggregation in MSNs is designed for sensor networks with multi-layer architecture, as shown in Fig. 3.14. In this architecture, nodes are classified as sensor nodes, aggregators, or sink nodes, according to different roles they play in the data aggregation process that naturally constructs multiple aggregation levels. Sensor node: In an aggregation set, each sensor node is associated with a reputation representing the self trustworthiness of its collected data. This reputation is related to the source of data, the temporal correlation with historical data, and the statistical characteristics of the information. Memory depth is defined as the number of stored historical data which represents the temporal correlation. The sensor node calculates its self data trustworthiness with the help of the memory depth, and then reports to its aggregator. If the memory depth is zero, the node’s self data trustworthiness is 1. Aggregator: According to the spatial correlation among sensor nodes in an aggregation set, the aggregator determines the aggregation breadth, which is the number of children taking part in data aggregation. Then, the aggregator evaluates the trustworthiness of each node’s report within the aggregation breadth, performs trust-weighted data aggregation, and calculates the trustworthiness of the aggregated result. Finally, the aggregator reports the aggregated result and its trustworthiness to the higher level aggregator, eventually to the sink node. Sink node: The sink node receives the aggregated results from the lower level aggregators, fuses them to obtain the final report, and determines the resulting trustworthiness. In an unreliable sensor network, measurement errors and transmission errors may cause the wrong aggregated results. In an unsecured sensor network, fake data injection can also make the wrong aggregated results occur. The focus of our model is to give a comprehensive solution on the above two problems.

3.4.2.2

Source Model

In MSNs, a source provides the original signal of the event. Sensors around it may capture the signal of the source. Usually, in the densely deployed MSNs, the location of the nodes with higher signal intensity can be regarded as the source location. Based on the spatial propagation features of signal, the source can be classified as

3.4 A Trust-Based Framework for Fault-Tolerant Data Aggregation in. . .

125

the source without center or one with a center. The signal strength of the former is almost the same in the event region, such as temperature and humidity in a room. The signal strength of the latter uniformly decreases with the increase of distance from the center when the signal spread out with omni-direction, such as the smoke concentration when fire breaks out. The cases of signal with directional propagation and mobile source center are not considered in this section. Considering the temporal features of signal, the source can be classified as smoothly fluctuating source and suddenly changed source. The signal strength of the former is almost stationary with little fluctuation, the signal strength of the latter one may jump suddenly, such as the on/off status of lamp. Different source models can be set up for different systems, where each source can be represented in a time-dependent and rule-based model. For example, the variation of room temperature should be stationary and the difference among sampling values (from different nodes and at different time) should be within a range. Audio changes faster than the temperature and voice strengths at different place differ a lot, but an audio prediction model can be set up by using the historical sampling values to predict the current sampling value.

3.4.2.3

Trust Model

In MSNs, the data received by the sensors may be noisy and unreliable, thus making it hard to extract precise information. Due to this imperfect knowledge, we use Josang’s trust model (Josang 1999; Josang et al. 2007) to deal with data uncertainty. In Josang’s trust model, a belief metric, denoted by opinion, is used to express the degree of the belief in the truth of a statement. The definition of opinion is given as follows. Definition 3.5 An opinion, ω = {b, d, u, a}, is a quadruple where the components respectively correspond to the belief, disbelief, uncertainty, and relative atomicity, such that a, b, d, u ∈ [0, 1] and b + d + u = 1 (Josang 1999; Josang et al. 2007). The relative atomicity (a) is used for computing the expected opinion as O = E(ω) = b + au.

(3.16)

Therefore, a determines the contribution of the uncertainty u to E(ω). In this section, we use opinion to mathematically formulate the concept of trustworthiness. Furthermore, we define three kinds of trust opinion, namely peer node trust opinion, self data trust opinion, and peer data trust opinion, to formulate different type of trust in the multi-layer MSNs architecture. In our proposed framework, the trust opinion of a node includes two aspects: the trust opinions on its data and on other nodes. Peer node trust opinion is used to represent the trust opinion of a parent node to its child node; self data trust opinion is used to denote a node’s trust opinion on its own collected data (if the node is a sensor) or aggregated result (when the node is an aggregator); while peer data trust

126

3 Data Fusion Based Transmission in Multimedia Sensor Networks

opinion is used to formulate the trust opinion of a parent node to the data reported by a child node. Once an aggregator receives data from a child node in its aggregation set, the aggregator’s peer data trust opinion on the reports is related to both its peer node trust opinion on the reporter and the self data trust opinion of the reporter. Either of the low trust opinions will lead the aggregator not to believe the report. Subsequently, the aggregator aggregates the reports from all children nodes in its aggregation set weighted by the peer data trust opinions, and generates its self data trust opinion on the aggregated result. Finally, the aggregator reports the aggregated result along with its self data trust opinion to the aggregator in a higher layer. In this process, the trust opinion on the final report at the sink node depends only on the peer data trust opinion on the reports from each of the ith level aggregators, and the self data trust opinion of the ith level aggregator depends on its peer data trust opinion on the reports from the (i − 1)th level aggregators. Therefore, the computation process can be effectively combined with the multi-layer data aggregation process. Through this layered trust mechanism, we can eliminate the impact of inaccurate data step by step in the process of data transmission and data aggregation. Thus, the purpose of gaining high precise information through collaboration of many low precise sensors is achieved.

3.4.3 Trust-Based Framework for Fault-Tolerant Data Aggregation Figure 3.15 illustrates an example of trust model in an aggregation set with one aggregator J , three sensor nodes W1 , W2 , W3 , and one target z . An arrow denotes trust or opinion about truth. W1 , W2 , and W3 determine their self data trust opinions for the target z based on the temporal correlation. The aggregator J calculates the peer node trust opinion on each sensor node based on the spatial correlation. According to the self data trust opinion and peer node trust opinion, the aggregator has to determine the peer data trust opinion on the reports provided by each sensor through trust transfer. Finally, the aggregator J can derive its self data trust opinion on the aggregated result through trust combination. In this subsection, we define operations for all kinds of trust opinions, and then introduce the trust-based algorithm for fault-tolerant data aggregation. Fig. 3.15 Trust model in an aggregation set

3.4 A Trust-Based Framework for Fault-Tolerant Data Aggregation in. . .

3.4.3.1

127

Self Data Trust Opinion of Sensor Node

A sensor node calculates its self data trust opinion by judging whether the collected data conforms to its source model. Take audio stream as an example. An audio prediction model is set up to predict the current sample value  x according to several historical samples while the actual sample value is x. Then, the self data reputation c can be calculated as c = max{1 − rel(x,  x ), 0},

(3.17)

where rel(x, y) ==

⎧ ⎪ ⎨

0, x = y = 0, 1, y = 0, x = y, ⎪ ⎩ | x−y |, others. y

For fault tolerance, the aggregator calculates the peer data trust opinion according to the self data trust opinion and peer node trust opinion of each node in the aggregation set. In order to simplify the calculation, we ignore the data whose peer data trust opinion is less than 0.5 so as to shield from hostile attacks and fake data effectively. Then, we have disbelief d = 0; and for uncertain data, a = 0.5. Definition 3.6 Let cn be the self data reputation of node n. Then the self data trust opinion of node n on its collected data is given by ωcn = {cn , 0, 1 − cn , 0.5}.

(3.18)

Depending on the precision requirement in the prediction model, we use memory depth, denoted by μ, to describe the temporal correlation, which illustrates the amount of historical sampling values required in the prediction of the current sampling value. For example, in the kth sampling period, if μ = 10, then  xk = α1 xk−1 + α2 xk−2 + · · · + α10 xk−10 , and ck = max{1 − rel(xk ,  xk ), 0}. Once μ = 0, then ck = 1 and the self data reputations of all nodes are the same, which is simply the case in the traditional trust system proposed in Chen et al. (2007). Thus, our framework generalizes their work. By adjusting the value of memory depth, we can adapt to different scenarios. For instance, for the slowly changing data (e.g. data in some environmental monitoring applications), it can increase the memory depth, avoid interference from false and fake data, and thus prevent data misstatement. Oppositely, for fast changing data (e.g. fire detection, or monitoring the temperature of the reactor in a nuclear plant), it can decrease the memory depth, weaken the impact of outdated data, avoid event omission, and prevent alarm delay.

128

3.4.3.2

3 Data Fusion Based Transmission in Multimedia Sensor Networks

Peer Node Trust Opinion

Sensor nodes in a neighborhood have high spatial correlation in their sensory data. We define aggregation breadth, denoted by ν, to describe the spatial correlation. The aggregation breadth is used to illustrate the size of an aggregation set determined by the applications. In one aggregation set, the peer node trust opinion of a parent node to a child node can be calculated according to a consensus-based principle. Here, we adopt a distance-based method to calculate the peer node trust opinion of the aggregator to its child node by measuring the consistency of the report from that child with other children. Different methods of distance calculation should be used for different source models. In the following, we consider two cases.

Case 1: Source Without Center For a source without center, the collected data in the monitoring region are similar (Zhang et al. 2006). Therefore, in the normal state, the distance between the sampling value of one node and the median sampling value of all neighbors describes the coherence degree of this node to all neighbors in the aggregation set. Hence, faulty data generated by a disabled node as well as continually fake data with high self data trust opinion generated by a compromised node can be prevented by exploiting the spatial correlation. Once the compromised node alternatively reports normal and faulty data with high self data reputation, we need to obtain the coherence trend based on the temporal correlation to avoid the attack effectively. Now, let us mathematically define the peer node reputation. Suppose the memory depth μ = K and the aggregation breadth ν = N. That is, K sampling values are used to measure the signal consistency (the current moment is in the Kth sampling period), and the size of the aggregation set is N. Let xnk denote the sampling value of node n (1 ≤ n ≤ N) in the kth (1 ≤ k ≤ K) sampling period and  xk denote the median value of N nodes in the kth sampling period. Then, based on the spatial correlation, the distance between the sample value of node n and the median is given as hnk = max{1 − rel(xnk ,  xk ), 0}.

(3.19)

During a consistency calculation period (i.e., K sampling periods), based on the temporal correlation, the peer node reputation of node n is calculated as rn = hn =

K 1  hnk . K

(3.20)

k=1

We remark that the memory depth, K, can be set to 1 if there is no attack in the network. Then, we can simply use Eq. 3.19 to calculate the peer node reputation.

3.4 A Trust-Based Framework for Fault-Tolerant Data Aggregation in. . .

129

Case 2: Source with a Center For a source with a center, the sampling value of each node in the monitoring region is related to the distance from the node to the source. It is normal that the data collected by different nodes is different at the same instance, but the changing nature of the data should be similar during a time period. So the peer node trust opinion is calculated according to the coherence of signal for the type of source. In the calculation of signal consistency, the amount of historical sampling values is determined by the time-domain characteristics of the signal. For example, during time interval (0, t), the audio signal curves of four nodes are shown in Fig. 3.16. Considering historical sampling values, we examine whether the change of the distance of signal curve from one node to all other nodes is stationary, so as to determine whether the reading of this node is consistent with its neighbors. As shown in Fig. 3.16, although the average distance h3 between the signal curve of node 3 and the median curve is smaller than the average distance h1 of node 1, the variation of h3 is larger than that of h1 , implying the signal consistency of node 3 is poor. Based on the temporal correlation, the average distance of node n during a consistency calculation period is given by Eq. 3.20. Therefore, the peer node reputation of node n can be obtained in terms of the standard deviation as follows: ' rn = 1 −

K

k=1 |hnk

K

− hn |2

.

(3.21)

To effectively avoid the hostile attacks and fake data, we ignore the data whose peer data trust opinion is less than 0.5. Then, we can set the disbelief d = 0 for the remaining data to simplify the calculation, and set a = 0.5 for uncertain data. Combining the two cases above, we define the peer node trust opinion as follows. Fig. 3.16 Illustration of distance for audio source

130

3 Data Fusion Based Transmission in Multimedia Sensor Networks

Definition 3.7 Letting rn be the peer node reputation of aggregator to node n, the peer node trust opinion of aggregator on this node is given by ωrn = {rn , 0, 1 − rn , 0.5}.

3.4.3.3

(3.22)

Trust Transfer and Peer Data Trust Opinion

Peer data trust opinion is the aggregator’s trust opinion on the report from a child in an aggregation set. When an aggregator receives a sensor’s report, if the sensor believes the report with high confidence and the aggregator trusts this sensor, then the aggregator will also show high confidence in the sensor’s report. However, if the aggregator has doubts about the sensor, it discredits the sensor’s report regardless of the sensor’s opinion. Following this rule, the opinions can be properly propagated along the transmission path. This process is defined as trust transfer. Definition 3.8 Let ωr = {br , dr , ur , ar }, ωc = {bc , dc , uc , ac } and ωt = {bt , dt , ut , at } be respectively the trust opinions of the aggregator on a node, that of the node on its report, and that of the aggregator on the node’s report. Operator ⊗ is defined for trust transfer: ωt = ωr ⊗ ωc , such that bt = br bc ; dt = br dc ; ut = 1 − br (1 − uc ); at = ac .

(3.23)

Definition 3.9 Let ωt = {bt , dt , ut , at } be the peer data trust opinion, then bt is the peer data reputation. Property 3.1 Trust transfer does not satisfy the commutative law, that is, ωr ⊗ ωc = ωc ⊗ ωr . Property 3.2 Trust transfer reduces the belief. Corollary 3.1 Trust transfer may make the expected opinion O either increase or decrease. If Oc ≥ ac , then Ot ≤ Oc ; else if Oc < ac , then Ot > Oc . When dc = 0, the expected opinion O definitely decreases. Proof In this system, when calculating the trust opinion, dc = 0 since every node trusts itself. Therefore, we have Ot = bt + ut at = br bc + (1 − br (1 − uc ))ac = br (bc + uc ac ) + (1 − br )ac = br Oc + (1 − br )ac . (i) When if

dc = 0, there are two sub-cases.

Oc ≥ ac , Ot = br Oc + (1 − br )ac ≤ br Oc + (1 − br )Oc = Oc ⇒ O t ≤ Oc ;

3.4 A Trust-Based Framework for Fault-Tolerant Data Aggregation in. . .

if

131

Oc < ac , Ot = br Oc + (1 − br )ac > br Oc + (1 − br )Oc = Oc ⇒ O t > Oc ;

(ii) When since

dc = 0,

Oc − ac = bc + uc ac − ac = bc − bc ac ≥ 0

⇒ Oc ≥ ac , we get

Oc ≥ ac ⇒ Ot ≤ Oc .

Remark 3.1 When dc = 0 and ac = 0.5, if Oc > 0.5, which means the expected probability of the self data trust opinion of a child node is more than 50%, due to trust transfer, the probability expectation of the peer data trust opinion of the parent node Ot decreases. On the contrary, if Oc < 0.5, due to trust transfer, the belief decreases (bt ≤ bc ) although Ot increases. Since Ot takes uncertainty factors into account, if Oc is very small, the trust opinion on aggregation result may be larger according to the trust transfer. But in a practical application system, we can discard the unbelievable data to ensure the accuracy of the aggregation result once the trust opinion is less than 50%. As an example, in Fig. 3.15, we denote the self data trust opinions of three sensor nodes as ωzW1 , ωzW2 , and ωzW3 , respectively, and denote the peer node trust opinions J , ωJ , and ωJ respectively. Assume that of aggregator to the three nodes as ωW W2 W3 1

J = (0.90, 0.00, 0.10, 0.50), ωzW1 = ωzW2 = ωzW3 = (0.90, 0.00, 0.10, 0.50), ωW 1 J J ωW2 = (0.90, 0.00, 0.10, 0.50), and ωW3 = (0.10, 0.00, 0.90, 0.50), the trust opinion of each node on z is discounted by the aggregator J according to J ’s trust opinion on that node. Then the aggregator J computes peer data trust J ⊗ ωW1 = (0.81, 0.00, 0.19, 0.50), ωJ W2 = opinions on z as: ωzJ W1 = ωW z z 1

(0.81, 0.00, 0.19, 0.50), and ωzJ W3 = (0.09, 0.00, 0.91, 0.50).

3.4.3.4

Trust Combination and Self Data Trust Opinion of Aggregator

In Josang’s trust model, the trust consensus of two opinions is an opinion that reflects both opinions in a fair and equal way. For example, if two sensors have observed a target over two different time intervals, they might have different opinions about it depending on the behavior of the target in the respective periods. The consensus opinion is the opinion that a single sensor would have after having observed the target during both periods. In this section, we use the concept of trust consensus (Josang et al. 2007) to calculate the aggregator’s trust opinion on the aggregated result in an aggregation set, which is called the trust combination. Definition 3.10 Nodes A and B collect data of one source in an aggregation set independently. Let ωA = {bA , dA , uA , aA }, ωB = {bB , dB , uB , aB }, and ω =

132

3 Data Fusion Based Transmission in Multimedia Sensor Networks

{b, d, u, a} be the aggregator’s trust opinion on the report from node A, that on the report from node B, and that on the aggregated result. Operator ⊕ is defined for trust combination of data, that is ω = ωA ⊕ ωB . The operator ⊕ satisfies the following rules: bA uB + bB uA dA uB + dB uA ; d= ; κ κ aA uB + aB uA − (aA + aB )uA uB uA uB ; a= ; u= κ uA + uB − 2uA uB

b=

κ = uA + uB − uA uB ; a=

aA + aB , 2

if

uA = uB = 1.

Property 3.3 Trust combination decreases the uncertainty of data. Proof Let

0 ≤ uA < uB ≤ 1 since

so u =

uA + uB − uA uB > 0,

uA uB uA (uB + uA (1 − uB )) ≤ = uA . uA + uB − uA uB uA + uB − uA uB

Property 3.4 If disbelief d = 0, then the trust combination increases the belief. Proof Let

0 ≤ bA < bB ≤ 1 and dA = dB = 0,

b − bB = Since

bA uB + bB uA uB (bA − bB (1 − uA )) − bB = . uA + uB − uA uB uA + uB − uA uB

bA + dA + uA = 1 and dA = dB = 0,

therefore,

bA − bB (1 − uA ) = bA − bA bB ≥ 0 ⇒ b ≥ bB .

In a fault tolerant system without considering the effects of fake data and hostile attacks, we have d = 0. Therefore, the trust combination can improve the aggregator’s trust opinion on the aggregated result. Property 3.5 Trust combination satisfies the commutative law, i.e., ωA ⊕ ωB = ωB ⊕ ωA . Proof Straightforward from Definition 3.10.

 

Property 3.6 Trust combination satisfies the associative law, i.e., (ωA ⊕ωB )⊕ωC = ωA ⊕ (ωB ⊕ ωC ).

3.4 A Trust-Based Framework for Fault-Tolerant Data Aggregation in. . .

133

Proof ω1 = (ωA ⊕ ωB ) ⊕ ωC ,

Let

ω2 = ωA ⊕ (ωB ⊕ ωC ), κ = uA uB + uB uC + uC uA − 2uA uB uC . bA uB uC + bB uC uA + bC uA uB = b2 ; κ dA uB uC + dB uC uA + dC uA uB d1 = = d2 ; κ uA uB uC u1 = = u2 ; κ aA uB uC + aB uC uA + aC uA uB − (aA + aB + aC )uA uB uC a1 = a2 = ; uA uB + uB uC + uC uA − 3uA uB uC aA + aB + aC a1 = a2 = , if uA = uB = uC = 1; 3

Since

b1 =

Therefore,

ω1 = ω2 .

Property 3.7 Trust combination operation has no impact on the computation order. According to commutative law and associative law, obviously the property is true. This means that the trust opinion on the aggregated result is not affected by the order of aggregation calculation. Corollary 3.2 Let N independent nodes collect data from the same source. Let ωn = {bn , dn , un , an } be the aggregator’s trust opinion on the report from node n and ω = {b, d, u, a} be the aggregator’s trust opinion on the aggregation result. Then ω = ω1 ⊕ ω2 ⊕ · · · ⊕ ωN and the computation rules are b= d=

N b

n=1 n

κ N d

n=1 n

a=

j =1 uj

un

j =1 uj

un

j =1 uj

κ N a

n=1 n

;

.N

κ

.N u=

.N

;

;

.N

N

n=1

N a .N u − n=1 n j =1 j ; . N j =1 uj − N u j =1 j un

j =1 uj

un .N

134

3 Data Fusion Based Transmission in Multimedia Sensor Networks

.N κ=

a=

j =1 uj

N

n=1

un

N a

n=1 n , N

− (N − 1)

N 

uj ;

j =1

when ∀n, un = 1.

(3.24)

Proof We will prove this corollary by induction on the number of nodes collecting data from the same source. If there are only two reporting nodes, the rules are trivially true by Definition 3.10. Assume the rules are correct for K > 2 reporting nodes. In other words, let ω = {b , d , u , a } be the aggregator’s combined trust opinion on the K nodes. Then ω = ω1 ⊕ ω2 ⊕ · · · ⊕ ωK satisfies the computation rules in Eq. 3.24. Thus we have

K b

n=1 n



K a

n=1 n

b = a =

.K

j =1 uj

un

κ

.K

; d =

.K

.K

j =1 uj

κ

j =1 uj κ



un

; u =

;

.K K a − n=1 n j =1 uj ; . K j =1 uj − K u j =1 j un

j =1 uj

K

n=1 .K

un .K

j =1 uj

K κ = n=1



K d

n=1 n

− (K − 1)

un

K 

uj .

j =1

Adding the (K + 1)th node, we will prove that the aggregator’s combined trust opinion, ω = ω1 ⊕ ω2 ⊕ · · · ⊕ ωK+1 on K + 1 nodes, also satisfies the computation rules in Eq. 3.24. According to the combination law, we have ω = ω ⊕ωK+1 . Let ω = {b, d, u, a}. Applying Definition 3.10, we get b=

b uK+1 + bK+1 u

u + uK+1 − u uK+1

.K . j =1 uj K b + bK+1 K uK+1 n=1 n j =1 uj un = .K . K

j =1 uj + κ uK+1 − uK+1 j =1 uj .K+1

=

K b

n=1 n

.K+1 j =1

uj

uK+1

Setting

j =1

un

K + n=1

+ bK+1

.K+1 j =1

uj

un

.K+1

K+1 κ = n=1

we obtain

uj

b=

j =1

κ

j =1 uj uK+1

−K uj

un

K+1

n=1 bn

.K+1

.K+1 j =1

−K

K+1  j =1

.K+1 j =1

un

uj

.

. uj uj ,

3.4 A Trust-Based Framework for Fault-Tolerant Data Aggregation in. . .

135

Therefore, b satisfies the first computation rule in Eq. 3.24. Similarly, we can proof that d, u, and a satisfy the computation rules. As a result, the corollary is true.   For the example in Fig. 3.15, assuming that the trust opinion of each node is independent, they can finally be combined using the trust combination operator to J (W ,W ,W ) J ⊗ produce the aggregator’s own trust opinion on z such that: ωz 1 2 3 = (ωW 1 J ⊗ ωW2 ) ⊕ (ωJ ⊗ ωW3 ). ωzW1 ) ⊕ (ωW z z W3 2 From the results calculated in the last subsection, we observe that the aggregator J (W ) J (W ) J (W ) shall distrust node W3 since ωz 3 is highly uncertain. Both ωz 1 and ωz 2 represent trust opinions that can be used for making a decision. By combining all J (W ,W ,W ) three independent trust opinions into one, the aggregator calculates ωz 1 2 3 = (0.8961, 0.0000, 0.1039, 0.5000) which is not impacted by node W3 . Calculating the trust opinion on the final aggregated result at the sink node is similar to the calculation at the aggregator. First, the peer data trust opinion ωt = {bt , dt , ut , at } on the reports from each lower aggregator is calculated through trust transfer, and then the self data trust opinion ω = {b, d, u, a} on the final aggregated result is calculated by means of trust combination. The trust opinion’s expected probability of the final aggregated result is finally obtained according to Eq. 3.16.

3.4.3.5

Trust-Based and Fault-Tolerant Data Aggregation Algorithm

From the above discussions, the trust-based data aggregation process with fault tolerance is described in Algorithm 8.

Algorithm 8: Trust-based algorithm for fault-tolerant Data aggregation 1 Each node collects data, calculates the self data reputation by Eq. 3.17, and forms the self data trust opinion in accordance with Eq. 3.18; 2 Each node sends its collected data and self data trust opinion to the higher level aggregator; 3 The aggregator receives the reports from its children in a collection period. If it is an unsecured network, the aggregator verifies the self-data reputation from each child according to the source model first. Then the aggregator calculates the peer node reputation on each child node with the help of Eq. 3.20 or Eq. 3.21 according to the source model, and forms peer node trust opinion by Eq. 3.22, thus generating the peer data trust opinion on report from each child using Eq. 3.23; 4 The aggregator uses Eq. 3.16 to calculate the trust opinion’s probability expectation On for the report from each child node; 5 The aggregator assigns an aggregation weight, Gn = NOn , to the reports from node n n=1

On

and calculates the aggregated result; 6 The aggregator calculates the self reputation of the aggregated result with the help of Eq. 3.24 according to the peer data trust opinions on reports from children nodes, and reports the aggregated result with its self data trust opinion to the higher level aggregator; 7 The higher level aggregators perform the aggregation process according to step 3 to step 6, eventually to the sink node;

136

3 Data Fusion Based Transmission in Multimedia Sensor Networks

In the weighted aggregation process, the aggregation weight Gn determines the contribution of the data from child node n to the aggregated result. It also participates in the process of aggregation calculation in different ways for different aggregation operations. For example, when calculating the average  temperature or combining audio signals with equal gain, we use formula x = N n=1 Gn xn ; when combining audio signals with maximum ratio gain, the aggregation weights will serve as part of the signal to noise ratio of each lower level node. When calculating the maximum or minimum value, the aggregation weight is a reference which indicates whether the report shall participate in the aggregation calculation. In MSNs, since the node’s energy extremely limited, a good trust management scheme should economize energy consumption. In some trust management frameworks (Chen et al. 2007), the sensor nodes need to communicate with neighboring nodes for behavior observation and trust recommendation. On the contrary, in the proposed multi-layer trustworthy aggregation architecture, we limit the communication only between the nodes in adjacent levels. Therefore, trust is calculated without recommendations from neighboring nodes. Furthermore, the self data trust opinion can be transmitted along with the reporting data and without additional messages, so the data packet only needs two additional bytes for storing the opinion. If the size of one packet is 512 bytes, then the overhead for transmitting trust opinions is far less than 1%. Therefore, our framework can provide the functions of trust calculation, data aggregation and information cleaning with minimal communication overhead and energy consumption. Hence it is light weight and practical.

3.4.4 Experimental and Simulation Studies In this section, experiments on a real testbed (Sun et al. 2009) and simulation studies are conducted to evaluate the performance of our framework against node failure and environmental interference in MSNs. There are 60 nodes in the network. Since both media stream and discrete data can be transmitted simultaneously in one sensor network, we use these two kinds of data to verify the correctness of forming the trust opinion on the aggregated results and examine the efficiency of the framework under different fault patterns.

3.4.4.1

Continuous Audio Stream

In order to illustrate the performance of our proposed framework for media stream, we take one aggregation set as example and depict the waveform of signal from each node. There are four nodes in this aggregation set, three of which act as audio sensors and one acts as the aggregator. The sampling frequency of each audio sensor is 4 KHz. We took 25,400 sampling values and divided them into eight phases. In the experiments, we set μ = 127 and adopt ADPCM algorithm (Gibson et al. 1980)

3.4 A Trust-Based Framework for Fault-Tolerant Data Aggregation in. . .

137

to predict the current sampling value. In each phase, we took the minimum value among all predictions to calculate the self data reputation. The 1st sensor node is perfect during 8 phases, so the audio signal collected by it reflects the real audio source, as shown in Fig. 3.17a. We added strong external noise to the 2nd node in the 2nd and 3rd phases; as illustrated in Fig. 3.17b, the signal quality looks poor in these two phases. We reduced the gain of the microphone on the 3rd node from the 5th phase to simulate the node failure. As shown in Fig. 3.17c, the audio signal of the 3rd node decays significantly from the 5th phase. The audio signal of the aggregated result is shown in Fig. 3.17d. It shows that the waveform of the aggregated results is similar to the audio signal collected by the 1st sensor node. We know that, the most efficient way is MRC (Maximal Ratio Combining) (Luo et al. 2010) when combining several audio streams together. Our trust-based framework belongs to the MRC scheme, which assigns larger weight to signals with high signal-to-noise ratio and high reputation in aggregation. Therefore, it can provide safer and higher quality media streams with trust opinion at the sink node.

3.4.4.2

Discrete Data

We further evaluate the performance of our proposed framework for discrete data by analyzing the temperature monitoring process in one aggregation set in the same sensor network. The aggregation set includes one aggregator and 30 sensor nodes, and each sensor reports one data every sampling round. There are 80 sampling rounds during the experiment, and they are divided into four phases as follows. The first phase (1–20 rounds) simulates the normal circumstances, in which 30 sensor nodes collect and report the correct data 26 ◦ C. The second phase (21–40 rounds) simulates the case of fixed abnormal data injection, in which six sensor nodes fail and report 40 ◦ C, while two sensor nodes are captured and report 200 ◦ C purposely. The third phase (41–60 rounds) simulates the case of random error injection, in which six failed nodes fix their reports at 40 ◦ C and two compromised report data between 26 ◦ C to 40 ◦ C randomly. In the fourth phase (61–80 rounds), the reports from other 22 sensor nodes increase to 100◦ C suddenly and keep this value to simulate the sudden environmental change, and the other six abnormal and two compromised nodes keep reporting false data. Figure 3.18 shows the trends of trust reputations and aggregated results along with time. We find that even the abnormal and compromised nodes report data with high self data reputation in phase 1 and phase 2 (as shown in Fig. 3.18a), the peer data reputations about the abnormal and compromised nodes are low (as shown in Fig. 3.18c) and thus their reports are discarded and have no contribution to the aggregated results. So, The aggregated results are around 26 ◦ C in the first three phases. In the last phase, when the temperature suddenly changes to 100 ◦ C, although the self data reputations of normal nodes are very low, the aggregator rapidly get the change with 0.62 self-data reputation and report the emergency in time. From the analysis above, the proposed framework effectively weakens the

138

3 Data Fusion Based Transmission in Multimedia Sensor Networks

Fig. 3.17 Audio signal aggregation from three nodes. (a) Audio signal collected by the 1st sensor node. (b) Audio signal collected by the 2nd sensor node. (c) Audio signal collected by the 3nd sensor node. (d) Audio signal of the aggregated result

1

1

0.8

0.8 peer node reputation

Self Data Reputation

3.4 A Trust-Based Framework for Fault-Tolerant Data Aggregation in. . .

0.6

0.4

Abnormal Compromised Normal

0.2

0

0

20

40

0.6 Abnormal Compromised Normal

0.4

0.2

60

0

80

0

20

Sample Round

40

Self Reputation of Aggregated Result

peer data reputation

0.8

0.6

0

Abnormal Compromised Normal

0

20

40

60

80

1 0.9 0.8 0.7 0.6 0.5

10

20

Sample Round

30

40

50

60

70

80

Sample Round

(c) Aggregated Temperature of Aggregator

80

(b)

1

0.2

60

Sample Round

(a)

0.4

139

(d)

120

100

80

60

40

20 0

20

40

60

80

Sample Round

(e)

Fig. 3.18 Variation of trust reputations and aggregated results along with the changes of sensor state and environment. (a) Self-data reputation of node. (b) Peer node reputation of aggregator. (c) Peer data reputation of aggregator. (d) Self-data reputation of aggregator. (e) Aggregated results of aggregator

140

3 Data Fusion Based Transmission in Multimedia Sensor Networks

impact of the false and fake data, and can reflect the real change of the environmental temperature. From the above series of experimental and simulation studies, we conclude that the proposed trust-based framework can reduce the impact of erroneous data. The trust-based algorithm of data aggregation with fault tolerance can effectively identify failed nodes and filter their data out to keep the aggregated results consistent with the actual value all the time.

References Adamatzky, A.: Physarum Machines: Computers from Slime Mold. World Scientific (2010) Ahmed, A.A., Shi, H., Shang, Y.: A survey on network protocols for wireless sensor networks. In: Proceedings of the IEEE ITRE’03 (2003) Alexander, M., Cohoon, J., Ganley, J., Robins, G.: Placement and routing for performance-oriented FPGA layout. VLSI Des. 7(1), 97–110 (1998) Austin, T., Larson, E., Ernst, D.: Simplescalar: an infrastructure for computer system modeling. Computer 35(2), 59 (2002) Bao, F., RayChen, I., Chang, M., Cho, J.: Hierarchical trust management for wireless sensor networks and its applications to trust-based routing and intrusion detection. IEEE Trans. Netw. Serv. Manag. 9(2), 169–183 (2012) Berman, P., Ramaiyer, V.: Improved approximations for the Steiner tree problem. J. Algorithms 17(3), 381–408 (1994) Carman, D.W., Kruus, P.S., Matt, B.J.: Constraints and approaches for distributed sensor network security. NAI Labs Technical Report 00–010 1(1), 1–39 (2000) Chen, H., Wu, H., Cao, X., Gao, C.: Trust propagation and aggregation in wireless sensor networks. In: Proceedings of the Frontier of Computer Science and Technology (2007) Chen, X., Makki, K., Yen, K., Pissinou, N.: Sensor network security: a survey. IEEE Commun. Surv. Tutor. 11(2), 52–73 (2009) Cheng, X., Du, D.: Steiner Trees in Industries. Springer (2001) Chiaraviglio, L., Mellia, M., Neri, F.: Minimizing ISP network energy cost: formulation and solutions. IEEE/ACM Trans. 20(2), 463–476 (2012) Chipman, L.J., Orr, T.M., Graham, L.N.: Wavelets and image fusion. In: Proceedings of the International Conference on Image Processing (1995) Cong, J., Kahng, A., Leung, K.: Efficient algorithms for the minimum shortest path Steiner arborescence problem with applications to VLSI physical design. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 17(1), 24–39 (1998) Cortes, P., Garcia, J., Munuzuri, J., Onieva, L.: Viral systems: a new bio-inspired optimisation approach. Comput. Oper. Res. 35(9), 2840–2860 (2008) Cristescu, R., Beferull-Lozano, B., Vetterli, M.: On network correlated data gathering. In: Proceedings of IEEE INFOCOM (2004) Ding, M., Liu, F., Thaeler, A., Chen, D., Cheng, X.: Fault-tolerant target localization in sensor networks. EURASIP J. Wirel. Commun. Netw. 2007(1), 1–9 (2007) Dorigo, M., Maniezzo, V., Colorni, A.: Ant system: optimization by a colony of cooperating agents. IEEE Trans. Syst. Man Cybern.-Part B 26(1), 29–41 (1999) Farmer, J., Packard, N., Perelson, A.: The immune system, adaptation and machine learning. Phys. D: Nonlinear Phenom. 22(1–3), 187–204 (1986) Gendreau, M., Larochelle, J., Sanso, B.: A tabu search heuristic for the Steiner tree problem. Netw.: Int. J. 32(2), 162–172 (1999)

References

141

Gibson, J., Berglund, V., Sauter, L.: Kalman backward adaptive predictor coefficient identification in ADPCM with PCQ. IEEE Trans. Commun. 28(3), 361–371 (1980) Gilbert, E., Pollak, H.: Steiner minimal trees. SIAM J. Appl. Math. 16(1), 1–29 (1968) Goel, A., Estrin, D.: Simultaneous optimization for concave costs: single sink aggregation or single source buy-at-bulk. In: Proceedings of ACM-SIAM Symposium on Discrete Algorithms (2003) Han, G., Choi, D., Lim, W.: A novel sensor node selection method based on trust for wireless sensor networks. In: Proceedings of the International Conference on Wireless Communications, Networking and Mobile Computing (2007) Heinzelman, W., Kulik, J., Balakrishnan, H.: Adaptive protocol for information dissemination in wireless sensor networks. In: Proceedings of ACM Mobicom (1999) Heinzelman, W.R., Chandrakasan, A., Balakrishnan, H.: Energy-efficient communication protocol for wireless microsensor networks. In: Proceedings of the 33rd Annual Hawaii International Conference on System Sciences (2000) Ho, J.W., Wright, M., Das, S.K.: Fast detection of mobile replica node attacks in wireless sensor networks using sequential hypothesis testing. IEEE Trans. Mob. Comput. 10(6), 767–782 (2011) Ho, J.W., Wright, M., Das, S.K.: Zonetrust: Fast zone-based node compromise detection and revocation in wireless sensor networks using sequential hypothesis testing. IEEE Trans. Depend. Secur. Comput. 9(4), 494–511 (2012) Holland, J.: Genetic algorithms and the optimal allocation of trials. SIAM J. Comput. 2(2), 88–105 (1973) Hougardy, S., Prömel, H.: A 1.598 approximation algorithm for the Steiner tree problem in graphs. In: Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms (1999) Intanagonwiwat, C., Estrin, D., Govindan, R., Heidemann, J.: Impact of network density on data aggregation in wireless sensor networks. In: Proceedings of ICDCS’02 (2002) Intanagonwiwat, C., Govindan, R., Estrin, D., Heidemann, J., Silva, F.: Directed diffusion for wireless sensor networking. IEEE/ACM Trans. Netw. 11(1), 2–16 (2003) Josang, A.: Trust-based decision making for electronic transactions. In: Proceedings of Nordic workshop on secure IT systems (NORDSEC’99) (1999) Josang, A.: A logic for uncertain probabilities. Int. J. Uncertainty, Fuzziness Knowl.-Based Syst. 9(3), 279–311 (2001) Josang, A., Ismail, R., Boyd, C.: A survey of trust and reputation systems for online service provision. Decis. Support Syst. 43(2), 618–644 (2007) Karaboga, D., Basturk, B.: A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. J. Glob. Optim. 39(3), 459–471 (2007) Karp, R.M.: Reducibility among Combinatorial Problems. In: Miller, R.E., Thatcher, J.W., Bohlinger, J.D. (eds.) Complexity of Computer Computations. The IBM Research Symposia Series. Springer, Boston, MA (1972) Karpinski, M., Zelikovsky, A.: New approximation algorithms for the Steiner tree problems. J. Comb. Optim. 1(1), 47–65 (1997) Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks (1995) Kershaw, D.S.: The incomplete Cholesky-conjugate gradient method for the iterative solution of systems of linear equations. J. Comput. Phys. 26(1), 43–65 (1978) Krishnamachari, B., Estrin, D., Wicker, S.: Impact of data aggregation in wireless sensor networks. In: Proceedings of the International Conference on Distributed Computing Systems (2002) Krishnamachari, B., Estrin, D., Wicker, S.: Modelling data-centric routing in wireless sensor networks. In: Proceedings of IEEE INFOCOM (2002) Li, X., Shao, Z., Qian, J.: Anoptimizing method base on autonomous animates: fish-swarm algorithm. Syst. Eng. Theory Pract. 22(11), 32–38 (2002) Lindsey, S., Raghavendra, C.S.: Pegasis: power-efficient gathering in sensor information systems. In: Proceedings of IEEE Aerospace Conference (2002)

142

3 Data Fusion Based Transmission in Multimedia Sensor Networks

Liu, L., Song, Y., Ma, H., Zhang, X.: Physarum optimization: a biology-inspired algorithm for minimal exposure path problem in wireless sensor networks. In: Proceedings of IEEE INFOCOM (2012) Liu, L., Song, Y., Zhang, H., Ma, H., Vasilakos, A.: Physarum optimization: a biology-inspired algorithm for the Steiner tree problem in networks. IEEE Trans. Comput. 64(3), 818–831 (2013) Luo, H., Liu, Y., Das, S.K.: Routing correlated data with fusion cost in wireless sensor networks. IEEE Trans. Mob. Comput. 5(11), 1620–1632 (2006) Luo, H., Luo, J., Liu, Y., Das, S.K.: Adaptive data fusion for energy efficient routing in wireless sensor networks. IEEE Trans. Comput. 55(10), 1286–1300 (2006) Luo, H., Wang, J., Sun, Y., Ma, H., Li, X.: Adaptive sampling and diversity reception in multihop wireless audio sensor networks. In: Proceedings of IEEE International Conference on Distributed Computing Systems (2010) Luo, H., Tao, H., Ma, H., Das, S.K.: Data fusion with desired reliability in wireless sensor networks. IEEE Trans. Parallel Distrib. Syst. 22(3), 501–513 (2011) Mao, X., Miao, X., He, Y., Zhu, T., Wang, J., Dong, W., Li, X., Liu, Y.: CitySee: urban CO2 monitoring with sensors. In: Proceedings of IEEE INFOCOM (2012) Merhi, Z., Elgamel, M., Bayoumi, M.: A lightweight collaborative fault tolerant target localization system for wireless sensor networks. EURASIP J. Wirel. Commun. Netw. 8(12), 1690–1704 (2009) Meyerson, A., Munagala, K., Plotkin, S.: Cost-distance: two metric network design. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science (2000) Michaelides, M.P., Panayiotou, C.G.: Snap: fault tolerant event location estimation in sensor networks using binary data. IEEE Trans. Comput. 58(9), 1185–1197 (2009) Misra, S., Hong, S., Xue, G., Tang, J.: Constrained relay node placement in wireless sensor networks: formulation and approximations. IEEE/ACM Trans. Netw. 18(2), 434–447 (2009) Nakagaki, T., Yamada, H., T’oth, A.: Maze-solving by an amoeboid organism. Nature 407(6803), 470 (2000) Nakagaki, T., Yamada, H., T’oth, A.: Path finding by tube morphogenesis in an amoeboid organism. Biophys. Chem. 92(1–2), 47–52 (2001) Nakagaki, T., Iima, M., Ueda, T., Nishiura, Y., Saigusa, T., Tero, A., Kobayashi, R., Showalter, K.: Minimum-risk path finding by an adaptive amoebal network. Phys. Rev. Lett. 99(6), 068104 (2007) Nakagaki, T., Tero, A., Kobayashi, R., Onishi, I., Miyaji, T.: Computational ability of cells based on cell dynamics and adaptability. New Gener. Comput. 27(1), 57–81 (2009) Oliveira, C., Pardalos, P.: A survey of combinatorial optimization problems in multicast routing. Comput. Oper. Res. 32(8), 1953–1981 (2005) Ozdemir, S.: Functional reputation based data aggregation for wireless sensor networks. In: Proceedings of IEEE International Conference on Wireless and Mobile Computing, Networking and Communications (2008) Papadimitriou, C., Steiglitz, K.: Combinatorial Optimization: Algorithms and Complexity. Courier Corporation (1998) Pattem, S., Krishnamachari, B., Govindan, R.: The impact of spatial correlation on routing with compression in wireless sensor networks. In: Proceedings of IPSN’04 (2004) Pennebaker, W.B., Mitchell, J.L.: JPEG: Still Image Data Compression Standard. Springer Science & Business Media (1993) P˘aun, G., Pérez-Jiménez, M.J.: Membrane computing: brief introduction, recent results and applications. Biosystems 85(1), 11–22 (2006) Rickenbach, P.V., Wattenhofer, R.: Gathering correlated data in sensor networks. In: Proceedings of ACM DIALM-POMC’04 (2004) Robins, G., Zelikovsky, A.: Improved Steiner tree approximation in graphs. In: Proceedings of SODA (2000) Scaglione, A., Servetto, S.D.: On the interdependence of routing and data compression in multihop sensor networks. In: Proceedings of ACM MobiCom (2002)

References

143

Shaikh, R.A., Jameel, H., dAuriol, B.J., Lee, H., Lee, S., Song, Y.: Group-based trust management scheme for clustered wireless sensor networks. IEEE Trans. Parallel Distrib. Syst. 20(11), 1698–1712 (2008) Shapiro, J.M.: Embedded image coding using zerotrees of wavelet coefficients. IEEE Trans. Signal Process. 41(12), 3445–3462 (1993) Special issue on Physarum Computing. Int. J. Unconv. Comput. (2008) Sun, Y., Zhao, G., Ma, H., Luo, H.: Smsn: a smart multimedia sensor network for surveillance. In: Proceedings of International Conference on Mobile Computing and Networking, Poster (2009) Sun, Y., Luo, H., Das, S.K.: A trust-based framework for fault-tolerant data aggregation in wireless multimedia sensor networks. IEEE Trans. Depend. Secur. Comput. 9(6), 785–797 (2012) Tero, A., Kobayashi, R., Nakagaki, T.: A mathematical model for adaptive transport network in path finding by true slime mold. J. Theor. Biol. 244(4), 553–564 (2007) Tero, A., Yumiki, K., Kobayashi, R., Saigusa, T., Nakagaki, T.: Flow-network adaptation in Physarum amoebae. Theory Biosci. 127(2), 89–94 (2008) Tero, A., Takagi, S., Saigusa, T., Ito, K., Bebber, D., Fricker, M., Yumiki, K., Kobayashi, R., Nakagaki, T.: Rules for biologically inspired adaptive network design. Science 327(5964), 439– 442 (2010) Voss, S., Gutenschwager, K.: A chunking based genetic algorithm for the Steiner tree problem in graphs. Netw. Des.: Connectivity Facil. Locat. 40, 335–55 (1999) Wang, A., Heinzelman, W.B., Sinha, A., Chandrakasan, A.P.: Energy-scalable protocols for battery-operated microsensor networks. J. VLSI Signal Process. 29(3), 223–237 (2001) Wang, W., Li, X., Wang, Y.: Truthful multicast in selfish wireless networks. In: Proceedings of ACM MobiCom (2004) Yu, Y., Krishnamachari, B., Prasanna, V.: Energy-latency tradeoff for data gathering in wireless sensor networks. In: Proceedings of IEEE INFOCOM (2004) Yu, B., Li, G., Sollins, K., Tung, A.: Effective keyword-based selection of relational databases. In: Proceedings of ACM SIGMOD International Conference on Management of Data (2007) Zelikovsky, A.: An 11/6-approximation algorithm for the network Steiner problem. Algorithmica 9(5), 463–470 (1993) Zelikovsky, A.: Better approximation bounds for the network and Euclidean Steiner tree problems. University of Virginia, Charlottesville (1996) Zhang, W., Cao, G.: Dctc: dynamic convoy tree-based collaboration for target tracking in sensor networks. IEEE Trans. Wirel. Commun. 3(5), 1685–1701 (2004a) Zhang, W., Cao, G.: Optimizing tree reconfiguration for mobile target tracking in sensor networks. In: Proceedings of IEEE INFOCOM (2004b) Zhang, W., Das, S.K., Liu, Y.: A trust based framework for secure data aggregation in wireless sensor networks. In: Proceedings of IEEE SECON (2006)

Chapter 4

In-Network Processing for Multimedia Sensor Networks

4.1 Introduction Doubtlessly a plethora of data hungry applications will benefit from the availability of rich information. Because of the ability to ubiquitously capture multimedia content from the environment, multimedia sensor networks have great potential for strengthening the traditional wireless sensor networks’ applications, as well as creating a series of new multimedia applications such as multi-camera surveillance and location-based multimedia services. Also, multimedia sensor networks provide an important supportive platform for sensing, transmission, and processing of multimedia data in the Internet of Things (IoT) (Alvia et al. 2015). However, developing practical techniques of multimedia sensor networks faces an inherent contradiction: on one hand, the resource constraints of sensor nodes such as energy constraints, low bandwidth, and limited processing capabilities still exist; on the other hand, visual and acoustic information, the dominating part of multimedia data, require more sophisticated processing techniques and much higher bandwidth to deliver. Exploring the relationship between transmission and processing in multimedia sensor networks is the key to overcome this contradiction. In fact, transmission and processing of multimedia data are not independent, i.e., multimedia sensor networks should perform in-network processing of the collected multimedia data. This processing mode has a major impact on energy aware multimedia processing algorithms and energy efficient transmission in order to maximize network lifetime while meeting application requirements. But there is no easy answer to the questions of how, when, and where the multimedia data should be transmitted or processed.

© Springer Nature Singapore Pte Ltd. 2021 H. Ma et al., Multimedia Sensor Networks, Advances in Computer Science and Technology, https://doi.org/10.1007/978-981-16-0107-1_4

145

146

4 In-Network Processing for Multimedia Sensor Networks

In this chapter, we first present three collaborative in-network processing schemes for visual information collection, target tracking, and target recognition, respectively. • Visual information collection. Based on video senor correlations, we proposed a method of cooperative image processing that can effectively reduce the load of both sensor nodes and the whole network for visual information collection (Ma and Liu 2005). By pairing up highly correlated nodes and divide the visual information collection tasks among them, each sensor node only needs to cover a fraction of the targeted area. Compared to the non-cooperative approach, video sensor’s image processing and transmission workload is significantly reduced. We design efficient algorithms to calculate the correlations, partition sensing areas, and fuse partial images. • Target tracking. We proposed a dynamic node collaboration scheme for mobile target tracking in camera sensor networks (Liu et al. 2009). Based on our localization-oriented sensing model, we apply the sequential Monte Carlo (SMC) technique to estimate the belief state of the target location. In order to implement the SMC based tracking mechanism efficiently, we propose a dynamic node collaboration scheme, which can balance the tradeoff between the quality of tracking and the network cost. The proposed scheme employs an election method of the cluster heads during the tracking process and an optimal subset selection method of camera sensors (as the cluster members) for estimating the target location cooperatively. • Target recognition. Most surveillance applications require that the surveillance system can recognize the targets of interested. Thus, multi-class classification of targets becomes a critical task of multimedia sensor network. After analyzing the procedure of target classification utilizing the acoustic and visual information sensed by multimedia sensor nodes, we proposed a binary classification tree based framework for distributed target classification which includes generation of binary classification tree, division of binary classification tree, and selection of multimedia sensor nodes (Liu et al. 2012). As described above, cooperative in-network processing is a common mode of computation for lots of emerging applications of multimedia senor networks. Then, we further proposed a computing mode, decomposition-fusion (DF), by means of a combination of communication process and in-network processing of raw data streams that describe the phenomenon of interest from multiple views, with multiple modal media, and on multiple nodes. Based on this mode, we build up an highperformance cooperative framework to support that the complex task of MSNs can be efficiently divided into a set of subtasks and some suitable multimedia sensor nodes are selected to execute the subtasks in the cooperative fashion.

4.2 Correlation Based Image Processing in Multimedia Sensor Networks

147

4.2 Correlation Based Image Processing in Multimedia Sensor Networks 4.2.1 Motivation While visual surveillance systems have long been mature commercialized products and under numerous research efforts for enhancement (Akyildiz et al. 2002; Collins et al. 2001; Matsuyama and Ukita 2002; Buxton and Gong 1995), the underlying infrastructure is in spirit different from sensor networks: the front end is composed of powerful nodes with high resolution cameras and computation power which are connected to central servers via readily available high bandwidth backhaul networks (Chandramohan and Christensen 2002; Fotesti and Snidaro 2002; Morita et al. 2003). The research therefore mainly focuses on post-analysis of the captured video data from single or multiple cameras for the purpose of object tracking or identification. Clearly such an approach dramatically deviates from the essence of distributed sensor networks to harness the power of vast amount of deeply embedded nodes. These differences are calling for novel approaches for sensor networking and in particular in network data processing for reduction of data amount. In this section, we propose a systematic image processing method based on sensor correlation that can effectively reduce sensor and network load. Toward this objective, we first define correlation among video sensors based on their overlapping sensing areas and propose an algorithm to effectively obtain the correlation matrix. Based on this result, highly correlated sensors covering the same object of interests are paired together to cooperatively perform the sensing task by capturing part of the image respectively. Along independent routes, these partial images will be delivered to the sink, where composite image can be reconstructed. Though simple, this approach can significantly reduce the processing workload (including image encoding and transmission) on individual sensors. Furthermore, it is simple and efficient in particular for implementation on resource limited video sensors.

4.2.2 Sensing Correlation We consider a 2D model where the sensing area of a video sensor s is a sector − → denoted by 4-tuple (P , R, V , α). Here P is the location of the sensor node, R is the − → sensing radius, V is the center line of sight of the camera’s field of view which will be termed sensing direction, and α is the offset angle of the field of view on both − → sides of V . Sensors s1 and s2 are correlated if their sensing areas overlap with each other. − → − → Furthermore, if they are correlated and the angle between V1 and V2 is within [− π2 , π2 ], s1 and s2 are called strongly correlated, otherwise they are weakly correlated. In the following sections, correlation denotes strong correlation if not specified otherwise as weak correlation needs 3D models.

148

4 In-Network Processing for Multimedia Sensor Networks

Below we define the correlation degree between two video sensors based on the overlapping area of their sensing range. Definition 4.1 Given the sensing areas of video sensors s1 and s2 as F (s1 ) and F (s2 ) respectively, the correlation degree between s1 and s2 , denoted by c(s1 , s2 ) is defined as c(s1 , s2 ) =

F (s1 ) ∩ F (s2 ) . αR 2

(4.1)

For a group of sensors S = si , their correlations can be represented as matrix C = cij where cij denotes the correlation degree between video sensor si and sj . The correlation degree is determined by the overlapping sensing area of correlated nodes. Naturally the sensing region of a sensor is continuous and so is the overlapping area. However, due to numerous possible intersections between two sectors, calculating the correlation directly by integrating over the overlapping area of two sectors demands numerating various cases. To simplify the analysis, instead of treating the overlapping region as a continuous area, we adopt the view of the sensing region as a set of discrete points. For video sensors, intuitively we can map the discrete points to image pixels. Assume that sensor s is located at P0 (0, 0) with sensing direction the same as Y-axis (see Fig. 4.1). Our objective is to calculate the correlation degree of s with a set of sensor nodes S = si , i = 1, 2, ..., K. By assuming the sensing area of sensor s is composed of discrete points, we can examine every point therein and determine if it also falls in others’ sensing areas. If a point is also in the sensing area of sensor si , the number of overlapping points between the sensing area of sensor s and si will be increased; otherwise it remains intact. To efficiently ransack the sensing area of s, we divide the sensing sector of s into two parts: the triangle area and the segment area, which are treated with different calculation loops. The algorithm Fig. 4.1 Correlation calculation

4.2 Correlation Based Image Processing in Multimedia Sensor Networks

149

for calculating the correlation is given in Algorithm 1. Notice that the correlation degree c(s, si ) is normalized by the total number of points in one sensor’s sensing area (equals to αR 2 ).

Algorithm 9: Calculating for correlation degree 1 CorrComp(s0 , S, C) 2 /*s0 is the given sensor. S is the set of candidate correlated sensors, C is the set of the correlation degrees between s0 and its correlated.*/ 3 /*Initialize the correlation degrees */ 4 for h = 1 to K do 5 c[h]:=0; 6 /* Calculate over the triangle area */ 7 for j = 0 to Rcosα do 8 for i = −j sinα to j sinα do 9 for h = 1 to K do 10 if (i, j ) in the sensing region of s[h] then 11 c[h] := c{h} = 1; 12 /* Calculate over the segment area*/ 13 for j = Rcosαto  R do  14 for i = − R 2 − j 2 to R 2 − j 2 do 15 for h = l to K do 16 if (i, j ) in the sensing region of s[h] then 17 c[h] := c{h} = 1; 18 /*Normalize*/ 19 for h = l to K do 20 c[h] := c{h}/(αR 2 );

Since a simple coordinate transformation can make any sensor’s sensing direction consistent with Y-axis and move sensor location to (0, 0), the above algorithm can be actually utilized to calculate the correlation degree for any sensor node. − → Assume sensor s has sensing direction V = (Vx , Vy ) and located at P = (x0 , y0 ). The transformation for the above purpose is given as ⎛ ⎞ ⎛ x Vy Vx ⎝y ⎠ = ⎝−Vx Vy 1 −x0 −y0

⎞⎛ ⎞ x 0 0⎠ ⎝y ⎠ , 1 1

where (x , y ) is the transformed coordinate for any point (x, y).

(4.2)

150

4 In-Network Processing for Multimedia Sensor Networks

4.2.3 Image Processing Based on Correlation If an object of interest is covered by multiple sensors, to reduce information redundancy and hence network load, they can cooperatively capture the scene and independently deliver partial information to the sink where composite scene can then be constructed. Intuitively we can divide the sensing area into multiple parts, each sensor will be responsible for capturing one part resulting in video streams with only a fraction of the original size. In this way, each sensor’s workload can be significantly reduced and network life can be extended. Our approach of grouping sensors into cooperative teams is based on the correlation matrix obtained from the previous section. If sensors are highly correlated, redundance will be hi a h among their captured images and hence partitioning the sensing task across multiple sensors can most effectively reduce individual sensor’s workload. Toward this objective, given their correlation matrix, sensors covering the same object with high correlation degree can be teamed together. If the correlation is not strong enough, it can be augmented by certain deployment policy, e.g. pan, tilt, and zoom of cameras. However, this manipulation is beyond the scope of this section. Using a two-cooperative-sensor example we will detail the video processing based on correlation in this section. The case for multiple sensors will be a trivial extension which is omitted in this section. We first identify how the monitoring task can be effectively divided among these two sensors. Then details on how to perform video capturing, video transmission, and video fusion are presented.

4.2.3.1

Allocating the Sensing Task

Given a set of sensors  {si } covering the targeted object and corresponding correlation matrix C = ci,j , we can pair up nodes that have maximum correlation. Without loss of generality, assume sensors s1 and s2 are paired up. Let P1 and − → − → P2 denote their respective locations and V1 = (V1,x , V1,y ) and V2 = (V2,x , V2,y ) denote their respective sensing directions. Their relative sensing areas are depicted in Fig. 4.2a. In order to efficiently divide the monitoring task between s1 and s2 , we first need to determine respective sensing areas for them to cover which can be fused together at the sink to reconstruct the full image. Our objective is to obtain a fused image as if it was captured by a virtual camera located at the middle point Pm of P1 P2 . Assume − → − → −−−→ V1 and V2 intersects at point Pv , then Pm Pv is the sensing direction of the virtual −−−→ − → camera. Let the unit length vector of Pm Pv , be denoted by Vm = (Vm,x , Vm,y ). As illustrated in Fig. 4.2b, assume that segment A1 A2 is the scan line in the image plane and Am is the middle point of A1 A2 . If the images delivered by s1 and s2 can be utilized to restore A1 Am and Am A2 respectively, they can then be fused together to construct the virtual image. For this purpose, sensor s1 needs to cover the sector defined by point P1 , e1 and e2 that corresponds to L1 Lm in its image

4.2 Correlation Based Image Processing in Multimedia Sensor Networks

151

Fig. 4.2 Video processing

plane. Similarly, we can conclude the required coverage Rm R2 for sensor s2 (not shown in Fig. 4.2b for clearer illustration). The task now is to determine the exact partial image [L1 , Lm ] and [Rm , R2 ] that should be captured and delivered by st and respectively. Assume that the locations and sensing directions of two sensors are in the same plane. The distance between the location and the image plane for all sensors are the same and denoted by f . The scan line denoted by l that spans A1 Am can be represented as ! l :=

x = xm + f Vm,x − tVm,y , y = ym + f Vm,y + tVm,x .

(4.3)

Notice that distance between left end point A1 and middle point Am (xm , ym ) is f tanα at Therefore the coordinate of A1 is !

xA1 = xm + f Vm,x − f tanαVm,y , yA1 = ym + f Vm,y + f tanαVm,x .

(4.4)

− → For sensor s1 located at P1 = (x1 , y1 ) and with sensing direction V1 = (V1,x , V1,y ). The middle point of its scan line l1 is (x1 + f V1,x , y1 + f V1,y ) and hence l1 can be represented as ! l1 :=

x = x1 + f V1,x − tV1,y , y = y1 + f V1,y + tV1,x .

(4.5)

152

4 In-Network Processing for Multimedia Sensor Networks

As illustrated in Fig. 4.2, in the virtual camera located at Pm and with sensing − → direction Vm , assume that M number of pixels shall cover segment [A1 , Am ]. According to Equations (4.3) and (4.5), points A1 and Am in the image plane will be mapped to (M − tρ1 ) and (M − tρm ) respectively in s1 ’s image plane, where ρ = f tanα and t1 and tm can be obtained by replacing (xi , yi ) in Equation (4.6) M with the coordinates of A1 and Am . t=

(x1 + f V1,x − xi )y − (y1 + f V1,y − yi )x

, V1,x x + V1,y y

(4.6)

where y = (ym − yi ) and x = (xm − xi ). Scan line l2 of sensor s2 can be represented similarly. Following the same approach we can also determine the mapping relation between l and l2 and the mapping from Rm R2 to Am A2 .

4.2.3.2

Image Capturing

Once the relation between the two sensors’ coverage is determined, we can employ this information to empower the network with cooperative visual information capturing. Assume the resolution of image captured by a sensor is (2M) × N, where 2M and N are the horizontal resolution and the vertical resolution of an image frame. According to the results given in the previous section, as long as the partial images are individually delivered, the sink can successfully fuse the images to reconstruct the scene of interests. In order to achieve this goal, sensor s1 needs just to capture and deliver the partial image from the pixel (M − tρ1 ) to (M − tρm ) of every scan line; sensor s2 only needs to capture and deliver the partial image from the pixel (M − tρm ) to (M − tρ2 ) of every scan line. 4.2.3.3

Image Delivering

Once the images are captured and tailored accordingly, the partial images will be sent to the sink independently by s1 and s2 via different routes to balance the network load. To find the route from the sensor to the sink, our architecture adopts existing sensor-initiated routing algorithms such as SPIN (Heinzelman et al. 1999). Notice that in SPIN, a two step routing strategy is employed. In the first step, the sensor broadcasts a probe packet that describes the sensor data which will locate a QoS guaranteed route by negotiation and resource adaptation. Once the route is determined, in the second step, the whole video stream is transmitted.

4.2 Correlation Based Image Processing in Multimedia Sensor Networks

153

Fig. 4.3 Video fusion

4.2.3.4

Image Fusion

Once the images are delivered to the sink, they will be fused together to construct the composite image from the two sensors. The fusion process is depicted in Fig. 4.3. Images from sensor s1 and s2 shown in Fig. 4.3a, b each will be utilized to generated half of the fused image G shown in Fig. 4.3c. For each pixel of the resulting image G, one corresponding pixel needs to be found in images from either s1 or s2 , depending on which half it residing in. Since we consider only a 2-D model, the mapping relations for every scan line is the same. According to the result in Section 1.4.1, pixels [1...M] of one horizonal scan line of G correspond to pixels [(M − tρ1 ), (M − tρm )] on the scan line of the image captured by s1 ; pixels [M...2M] of one horizonal scan line of G correspond to pixels [(M− tρm ), (M− tρ2 )] on the scan line of the image captured by s2 . The detailed fusion algorithm is summarized in Algorithm 10.

4.2.4 Experimetal Results We performed a set of experiments as proof of concept of our approach. In particular, the quality of fused image using the proposed method is studied. In the experiments, all sensors have the same parameter setting. The size of image captured by a sensor is 320 × 240 pixels. The sensing offset angle is π/8 and the angle between sensing directions of the two sensors is varied from 0 to π . The experiment parameters are summarized in Table 4.1.

154

4 In-Network Processing for Multimedia Sensor Networks

Algorithm 10: Image Fusion 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Fusion(G, G1, G2) /* G is the fused image. G1 and G2 are the images from sensor s1 and s2 */ /* Initialize */ (x, y) := A1 ; ρ = f tanα/M; /* mapping pixels in G1 */ for i=1 to M do /* find the position parameter t in G1 */ mapping(x, y, G1 , t); /* find the pixel index corresponded to t */ i1 := M − ρt ; for j = 1 to N do /* transform G1 (i1 , j ) into G(i, j)*/ putpixel(G1 , i1 , G, i, j ); x := x + ρVm,y ; y := y − ρVm,x ;

16 /* mapping pixels in G2 */ 17 for i=M to 2M do 18 /* find the position parameter t in G2 */ 19 mapping(x, y, G2 , t); 20 /* find the pixel index corresponded to t */ 21 i2 := M − ρt ; 22 for j = 1 to N do 23 /* transform G2 (i2 , j ) into G(i, j)*/ 24 putpixel(G2 , i2 , G, i, j ); 25 26

x := x + ρVm,y ; y := y − ρVm,x ;

27 /* display the confused image */ 28 output(G); 29 end Table 4.1 Parameter setting

Parameter Offset angle(α) f Image size Angle between V1 and V2

Value π/8 377cm 320 × 240 0-π

Figure 4.4 shows a typical set of experimental results. Two images taken by two different sensors s1 and s2 for the same scene from different view points (sensor locations) are shown in Fig. 4.4a, b. The angle between their sensing directions − → − → ( V1 and V2 ) is set to 5π/18. Figure 4.4d, e represent the partial images actually transmitted by the two sensors. They are used to obtain the fused image depicted in Fig. 4.4f. To measure the quality of the fused image, an actual sensor s3 is deployed at the middle point between s1 and s2 with sensing direction pointing at the intersection

4.3 Dynamic Node Collaboration for Mobile Target Tracking in Multimedia. . .

155

Fig. 4.4 Eeperimental results (α = π/8, angle between V1 and V2 is 5π/18)

− → − → of V1 and V2 . In other words, s3 is the targeted virtual sensor for the fused image. By comparing the fused image Fig. 4.4f with image Fig. 4.4c that is actually taken by s3 we can see the quality decreases little which proves our concept. Indeed there is some noticeable fusion mark which is due to the large angle between the sensing directions. In our experiments we notice that the quality of fused image becomes worse when the angle between the two sensing directions becomes larger. This is essentially resulted from the 2-D model we employed. By employing a 3-D sensing model, this problem shall be corrected.

4.3 Dynamic Node Collaboration for Mobile Target Tracking in Multimedia Sensor Networks 4.3.1 Motivation Target tracking in wireless sensor networks has received much attention in the literature (Gupta et al. 2006), and it is also one of the most important applications for camera sensor networks (Denman et al. 2006; Bramberger et al. 2006). Generally speaking, the target tracking application has the following characteristics: • The tracking system needs to report the location of the moving target to the users periodically.

156

4 In-Network Processing for Multimedia Sensor Networks

• For most types of sensors, one sensor node cannot locate a target accurately by itself. Thus, sensor nodes need to collaborate on estimating the target location during the tracking process. Compared with the traditional wireless sensor networks, target tracking by using camera sensor networks has two main advantages which the other types of sensor networks cannot achieve: • Camera sensors – perhaps the most informative sensing device – can provide more information of the moving target and related background than the other types of sensors. • High-level analyses using the image processing techniques and the object recognition techniques can intelligently track targets, and even determine their activities. To facilitate collaborative target tracking, the dynamic cluster architecture is usually used. In the dynamic cluster architecture, formation of a cluster is triggered by detection of a target. Once a sensor node detects the target, it becomes a leader node, called the cluster head (CH). The CH selects some other sensor nodes as the cluster members from its neighboring nodes, and updates the estimate of target location by incorporating its measurement with the cluster members’ measurements. After that, CH relays the control of the tracking to one of its neighboring nodes, a new CH, which is elected based on a certain rule. The new CH takes the role of the previous CH. Above procedure repeats until the target disappears in the surveillance region. From above descriptions, we can observe that there are two key problems for collaborative target tracking as follows: • How to design a cooperative localization algorithm to estimate the target location accurately while satisfying the low complexity requirement of wireless sensor networks? • How to form a dynamic cluster during the target tracking process to balance the tradeoff between the quality of target tracking and the network energy consumption? This leads to the following two sub-problems: (1) how to elect the cluster head, and (2) how to select the cluster members. To implement the target tracking using wireless camera sensor networks, we first need to develop a localization-oriented sensing model for camera sensors. This model is based on the perspective projection and needs to take the observation noisy into account, and thus it is nonlinear. Based on this sensing model, we then use the sequential Monte Carlo (SMC) technique to estimate the target location during the tracking process. In order to implement the SMC based tracking procedure efficiently, we propose a dynamic node collaboration scheme for wireless camera sensor networks, which can balance the tradeoff between the quality of tracking and the network cost. Our scheme deploys the dynamic cluster architecture which mainly includes the following two components. First, we designed a scheme to elect the cluster heads during the tracking process. Second, we developed an

4.3 Dynamic Node Collaboration for Mobile Target Tracking in Multimedia. . .

157

optimization-based algorithm to select an optimal subset of camera sensors as the cluster members for estimating the target location cooperatively.

4.3.2 Related Works The existing works about target tracking in wireless sensor networks can be roughly divided into two categories. The first category mainly focuses on the signal processing aspect of target localization and tracking. Li et al. (2002) proposed a framework for collaborative signal processing in distributed sensor networks, and applied the minimum square estimation to locate the target during the tracking process. Liu et al. (2003) estimated the target location by incorporating the current measurement at a sensor with the past history at other sensors. Brooks et al. (2003) and Moore et al. (2003) presented some centric CSP (Collaborative Signal Processing) approaches for target tracking in wireless sensor networks. In their approaches, a selected region instead of an individual sensor node was activated. However, since the above works mainly focus only on signal processing, it is unclear on how their methods can be efficiently implemented in wireless sensor networks. The other category mainly involves algorithms and networks protocols that enable collaborative information processing among multiple sensor nodes. Zhang and Cao (2004a) and Zhang and Cao (2004b) designed a series of clusterbased approaches for target tracking. They proposed to dynamically construct a tree-structured cluster for the target, and then presented the optimized tree reconfiguration for target tracking networks. Song and Hatzinakos (2007) proposed a low energy self-organizing protocol (LESOP) for target tracking in dense wireless sensor networks from the perspective of cross-layer design. For high protocol efficiency, direct interactions between the application layer and the MAC layer were exploited in their protocol. Gui and Mohapatra (2004) proposed a power conservation protocol and a sensor deployment scheme for target tracking. There are also other works addressing the target tracking problem by taking both signal processing and network protocols into account. Zhao et al. (2002) and Chu et al. (2002) proposed a tracking scheme called IDSQ (Information Driven Sensor Querying), where a leader sensor node intelligently select the best neighbor node to perform sensing and serve as the next leader. A cost function was employed by jointly considering the energy expenditure and information gain. Based on a similar idea, Guo and Wang (2004) applied the Baysian sequential Monte Carlo method to solve the problems of optimal sensor selection and information fusion for target tracking. However, all above works and the related techniques are based on the omnidirectional sensing model, and thus cannot be applied for camera sensor networks directly.

158

4 In-Network Processing for Multimedia Sensor Networks

4.3.3 System Models and Description State-space model is one of the most commonly used models for target tracking (Arulampalam et al. 2002). Let N be the set of natural numbers. The evolution of the state sequence {xt , t ∈ N} of a target is given by the following state equation: xt = Ft (xt−1 , vt ),

(4.7)

where Ft (·, ·) is a function of target state, and {vt , t ∈ N} is an i.i.d. random process of noise sequence. The objective of tracking is to recursively estimate xt from the measurement zt . Consider the measurement model given by zt = Ht (xt , wt ),

(4.8)

where Ht (·, ·) is a nonlinear function, and {wt , t ∈ N} is an i.i.d. measurement of noise sequence. In particular, we seek the filtered estimates of xt based on the set of all available measurements z1:t  {zi , i = 1, · · · , t} up to time t. In this section, we derive the motion model of the target and the sensing model of camera sensors by applying Eqs. (4.7) and (4.8), respectively. Then, we implement the target tracking by using the sequential Monte Carlo method. Finally, we propose a dynamic node collaboration scheme.

4.3.3.1

Motion Model of the Target

The target motion model describes the evolution of the target state with respect to time. In most motion models, the target is usually treated as a point object without a shape. In this section, we choose the commonly used nearly constant velocity model (Guo and Wang 2004; Hue et al. 2002) to be the target motion model. The y state vector xt represents the coordinates, (xt , yt ), and the velocities, (vtx , vt ), in y x the x-y plane: xt  {xt , yt , vt , vt }. Let t be the interval between time t − 1 and t. Then, the discretized state equation of Eq. (4.7) can be extended to the following form:  2   t I2×2 tI2×2 I xt = xt−1 + 2 2×2 vt , (4.9) 0 I2×2 tI2×2 where I2×2 is the identity matrix in dimension  vt is a Gaussian zero-mean  2, and vector with the covariance matrix v = diag σx2 , σy2 .

4.3 Dynamic Node Collaboration for Mobile Target Tracking in Multimedia. . .

159

Fig. 4.5 (a) Schematic of target localization in a camera sensor network. There are several camera sensors deployed in a surveillance region, and a target is in the center of this region. (b) The image captured by the camera sensor which is indicated in the dotted ellipse. The distance, Z, from the vertical centerline of the target blob to the centerline of image is the observation measurement by this camera sensor for target localization

4.3.3.2

Sensing Model of Camera Sensors

As shown in Fig. 4.5a, given a number of randomly deployed camera sensors, we first analyze how to use camera sensors to locate a target, and then build a localization-oriented sensing model for camera sensors. For simplicity, we assume that all camera sensors are modeled by the perspective projection, and have the same FOV (field-of-view) region. Identifying the moving objects from a set of pictures or a video sequence is a fundamental task in most applications of wireless camera sensor networks. When a camera sensor captures a frame, it can employ background subtraction (Piccardi 2004; Kim et al. 2005) to remove the static background. Background subtraction is a commonly used technique for segmenting out objects of interest in a scene for applications such as video surveillance. As shown in Fig. 4.5b, the area of a image frame where there is a significant difference between the observed and estimated images indicates the location of a moving object in this image plane. The area containing the change in the frame is further processed to find the horizontal shift, denoted by Z, of the target’s image from the center of the image plane. During the tracking process, Z is the measurement of the camera sensor, and only Z is communicated to the fusion center.

160

4 In-Network Processing for Multimedia Sensor Networks

Fig. 4.6 Perspective projection model

Table 4.2 Parameters of perspective projection

Parameter T(xt , yt ) Li (xi , yi ) Zi (T) θi F

Description Location of target in ground plane at time t Location of Camera sensor ci in ground plane Ideal horizontal shift of the target in image plane Orientation angle of ci Focal length of camera sensor

Let T(xt , yt ) be the location of a target at time t. For a given camera sensor, ci , we can get the theoretic horizontal shift, denoted by Zi (T), of the target’s image by using the perspective projection model. As shown in Fig. 4.6, the relationship between Zi (T) and T is  yt − yi Zi (T) = F · tan θi − arctan xt − xi = F·

xt tan θi − xi tan θi − yt + yi . xt − xi + yt tan θi − yi tan θi

(4.10)

The descriptions of parameters in Eq. (4.10) are summarized in Table 4.2. When the distance between T and Li becomes far enough, the background subtraction cannot segment out the objects of interest. This implies that the camera sensor ci cannot detect the target at T. Let r be the maximum detecting distance. Because r  F, we employ a sector model to describe the sensing region of a camera sensor. Here, we use Di to denote the sensing region of ci . If a point belongs to Di , then the point can be detected by ci . As shown in Chapter 3, the sector model − → − → can be denoted by a 4-tuple Li , r, Vi , α , where Vi is the unit vector, which evenly splits the sensing sector into two halves, determining sensing direction, and α is the − → offset angle in the field of view on both sides of Vi .

4.3 Dynamic Node Collaboration for Mobile Target Tracking in Multimedia. . .

161

However, the perspective projection model used in Eq. (4.10) is an ideal one. The measurement, Zti , of ci at time t is corrupted by some additive noises in practice. These noises mainly come from two aspects: the sensing model of camera sensors and the processing of background subtraction. Referring to Ercan et al. (2006), we assume that the measurement error variance, denoted by σi2 , for ci is of the following form σi2 = ζ di2 + σp2 + σs2 .

(4.11)

In Eq. (4.11), di is the distance from ci to the target. Making camera noise variance dependent on distance can efficiently model the weak perspective projection while allowing the usage of projective model in Eq. (4.10). Our noise model takes the errors in the calibration of camera sensors into account. Errors in the location of ci are contained in σp2 , and errors in the orientation are measured in ζ . Moreover, the accuracy of background subtraction method and posture/motion of the target also cause errors, and these errors are reflected in σs2 . Therefore, we adopt the Gaussian error model to represent the relationship between the measurement of ci , Zti , and the location of target, T. Then, the conditional probability density function of Zti given T is p(Zti

| T) = √

1 2π σi

e



T))2

(Zti −Zi ( 2σi2

(4.12)

.

If T can be detected by k camera sensors simultaneously, then k measurements are available. Let zt denote the set of measurements at time t, i.e., zt  {Zti }ki=1 . According to Eq. (4.10) and the noise model of camera sensors, we have ∀i ∈ {1, . . . , k}, Zti = Zi (T) + ei , where ei is the additive noise of Zti , and ei ∼ N(0, σi ). Then, p(zt |xt ) =

k  i=1

4.3.3.3

p(Zti

| T) =

k  i=1

1

e √ 2π σi



T))2

(Zti −Zi ( 2σi2

.

(4.13)

Target Tracking by Sequential Monte Carlo Method

The target tracking problem consists (1) calculating the conditional density of the state xt at time t, given all the measurements accumulated up to t, i.e., p(xt |z1:t ) and (2) estimating the location of target by the expectation E[(xt , yt )|z1:t ] as well. Thus, it is required to construct the pdf p(xt |z1:t ). In principle, the pdf p(xt |z1:t ) can be obtained, recursively, in two stages (Arulampalam et al. 2002): prediction and update.

162

4 In-Network Processing for Multimedia Sensor Networks

Suppose that the required pdf p(xt−1 |z1:t−1 ) at time t − 1 is available. The prediction is done according to the following equation:  p(xt |z1:t−1 ) =

p(xt |xt−1 )p(xt−1 |z1:t−1 )dxt−1 .

(4.14)

At time t, the measurement zt is available. Then, we can use zt to update the prediction via Bayes’ rule: p(xt |z1:t ) = 

p(zt |xt )p(xt |z1:t−1 ) . p(zt |xt )p(xt |z1:t−1 )dxt

(4.15)

This recursive propagation of p(xt |z1:t ) is only a conceptual solution. If Eqs. (4.7) and (4.8) are under the assumptions of Gaussian noises vt and wt and linear functions Ft (·, ·) and Ht (·, ·), Eqs. (4.14) and (4.15) lead to the Kalman filter’s equations. However, from the sensing model of camera sensors, we have Ht (·, ·) is nonlinear. Thus, p(xt |z1:t ) cannot be determined analytically in wireless camera sensor networks. The sequential importance sampling (SIS) based sequential Monte Carlo (SMC) is a technique that can implement a recursive Bayesian filter by Monte Carlo simulations. The key idea is to represent the required posterior density function by a set of random samples with associated weights and to compute estimates based (i) (i) s (i) on these samples and weights. Let {xt , wt }N i=1 denote a particle set, where xt is (i) a sample of x with associated weight wt . The weights are normalized such that Ns (i) t i=1 wt = 1. Then, the posterior density p(xt |z1:t ) can be approximated as p(xt |z1:t ) ≈

Ns 

(i)

(i)

wt δ(xt − xt ),

(4.16)

i=1

where δ(·) is the Dirac delta measure. Let ∝ denote the proportionality relation. The weight update equation is wt(i)



(i) (i) (i) (i) p(zt |xt )p(xt |xt−1 ) , wt−1 (i) (i) q(xt |xt−1 , zt ))

(4.17)

(i) where x(i) t is generated from a proposal importance density q(xt |xt−1 , zk ), i.e., (i)

(i)

xt ∼ q(xt |xt−1 , zt ), i = 1, · · · , Ns . To evaluate the degeneracy of the particle set, a resampling step is performed in eff , is less than an adaptive way. That is when the effective number of particles, N a given threshold, Nthr , the Ns samples are resampled from the current particle set

4.3 Dynamic Node Collaboration for Mobile Target Tracking in Multimedia. . .

163

with probabilities proportional to their weights. The resampling step replaces the current particle set with this new one and set wt(i) = 1/Ns for i = 1, · · · , Ns . The procedure of SMC for target tracking is summarized in Algorithm 11, and the more details can be found in the work of Arulampalam et al. (2002). Algorithm 11: SMC based target tracking algorithm (i)

(i)

1 Initialization: x0 ∼ p(x0 ), w0 = 1/Ns for i = 1, · · · , Ns . 2 for t = 1 to T do 3 for i = 1 to Ns do (i) (i) 4 Draw xt ∼ q(xt |xt−1 , zt ) (i) wt(i) = wt−1

5

(i)

(i)

(i)

p(zt |xt )p(xt |xt−1 ) (i) (i) q(xt |xt−1 ,zt )

// wt(i) is un-normalized

10

 s (i) Calculate total weight Wt = N i=1 wt for i = 1 to Ns do wt(i) = wt(i) /Wt // Normalize the weight   (i) (i) Ns (i) (i) Ns ( xt ,  yt ) = i=1 wt xt , i=1 wt yt eff = N 1 (i) Calculate N

11 12 13 14 15

eff < Nthr ) then if (N // Resampling for i = 1 to Ns do Ns (i) (i) Draw x(i) t ∼ i=1 wt δ(xt − xt ) (i) wt = 1/Ns

6 7 8 9

s i=1 (wt

)2

16 return ( xt ,  yt )

4.3.3.4

The Dynamic Node Collaboration Scheme

As shown in Fig. 4.7, there is a set of N geographically distributed camera sensors C = {c1 , c2 , ..., cN } in a surveillance region S. A target (the car in Fig. 4.7) traverses through the surveillance region. Next, we describe the procedure of dynamic clustering at time t to show our dynamic node collaboration scheme. Assume that CHt−1 is the cluster head at time t − 1 and p(xt−1 |z1:t−1 ) is available. By using the motion model of the target and p(xt−1 |z1:t−1 ), CHt−1 can obtain the prior pdf of the target state at time t, p(xt |z1:t−1 ). According to p(xt |z1:t−1 ) and the sensing region of a given camera sensor, it is easy to estimate the probability that this camera sensor detects the target at time t. When CHt−1 becomes a cluster head, it collects the locations and orientation angles of camera sensors in its communication range (the detail description of this collection procedure can be found in the next paragraph). This implies that CHt−1 knows the sensing region of each camera sensor in its communication range. Then, for a given camera sensor cm in the communication range of CHt−1 , CHt−1 can estimate the probability that the target can be detected by cm at time t. The camera sensor with

164

4 In-Network Processing for Multimedia Sensor Networks

Fig. 4.7 Illustration of target tracking in wireless camera sensor networks

the highest probability is elected as the cluster head at time t, denoted by CHt . CHt−1 relays the needed tracking information to CHt . After receiving the tracking information from CHt−1 , CHt broadcasts a message, QU ERY , to the camera sensors in its communication range. After receiving QU ERY , camera sensors send their locations and orientation angles back to CHt . In general, the communication radius of sensor nodes, denoted by rc , is assumed to be two times larger than the sensing radius r. We also follow this assumption in wireless camera sensor networks.1 This implies that if a camera sensor cj can detect the target, then the other camera sensors which can detect the target must be in the communication range of cj . After receiving the locations and orientation angles, for a given camera sensor cm in the communication range of CHt , CHt can estimate the probability that the target can be detected by cm at time t. If the probability exceeds to a predefined threshold, i.e., cm can detect the target with a high probability at time t, then cm becomes a candidate for the tracking process. From all the candidates, CHt properly selects a set of camera sensors to be as the cluster members which provide the measurements at time t. Then, CHt takes the role of fusion center, and estimates p(xt |z1:t ) by incorporating p(xt |z1:t−1 ) with the

1 If

the distance between the target and the camera sensor is too far, the background substraction technology cannot segment out the target though the target can be captured by the camera sensor. In our experiment, the maximum distance is about 30 m. Therefore, the assumption, rc = 2r, is reasonable.

4.3 Dynamic Node Collaboration for Mobile Target Tracking in Multimedia. . .

165

measurements from the cluster members. The above procedure is repeated until the target disappears in the surveillance region.

4.3.4 Election of the Cluster Heads First, we need to obtain the pdf p(xt |z1:t−1 ) by using the following lemma. (i)

(i)

s Lemma 4.1 Let {xt , wt−1 }N i=1 be a particle set. The weighted approximation to the pdf p(xt |z1:t−1 ) is given by

Ns 

p(xt |z1:t−1 ) ≈

(i)

(i)

wt−1 δ(xt − xt ),

(4.18)

i=1 (i) where x(i) t ∼ p(xt |xt−1 ).

Proof From the importance sampling principle, p(x0:t |z1:t−1 ) can be approximated as p(x0:t |z1:t−1 ) ≈

Ns 

(i) wt|t−1 δ(x0:t − x(i) 0:t ),

(4.19)

i=1 (i)

where x0:t ∼ q(x0:t |z1:t−1 ) and (i)

(i)

wt|t−1 ∝

p(x0:t |z1:t−1 ) (i)

q(x0:t |z1:t−1 )

(4.20)

.

Because of q(x0:t |z1:t−1 ) = q(xt |x0:t−1 )q(x0:t−1 |z1:t−1 ),

(4.21)

(i)

we can obtain the sample x0:t ∼ q(x0:t |z1:t−1 ) by augmenting the existing sample (i) (i) (i) x0:t−1 ∼ q(x0:t−1 |z1:t−1 ) with the new sample xt ∼ q(xt |xt−1 ). Furthermore, p(x0:t |z1:t−1 ) can be also factorized as the following form: p(x0:t |z1:t−1 ) = p(xt |x0:t−1 )p(x0:t−1 |z1:t−1 ).

(4.22)

Substituting Eq. (4.21) and Eq. (4.22) into Eq. (4.20), the weight update equation is then (i)

(i) wt|t−1 ∝

(i)

(i)

p(xt |x0:t−1 )p(x0:t−1 |z1:t−1 ) (i)

(i)

(i)

q(xt |x0:t−1 )q(x0:t−1 |z1:t−1 )

(a)

(i) ∝ wt−1

(i)

(i)

p(xt |x0:t−1 ) (i)

(i)

q(xt |x0:t−1 )

(b)

(i) = wt−1

(i)

(i)

(i)

(i)

p(xt |xt−1 ) q(xt |xt−1 )

. (4.23)

166

4 In-Network Processing for Multimedia Sensor Networks

where (a) is obtained directly from the following equation: (i)

(i)

wt−1 ∝

p(x0:t−1 |z1:t−1 ) (i)

q(x0:t−1 |z1:t−1 )

,

and (b) is obtained by considering that {xt , t ∈ N} is the first order Markov process (see Eq. (4.9)), i.e., q(xt |x0:t−1 ) and p(xt |x0:t−1 ) equal to q(xt |xt−1 ) and p(xt |xt−1 ), respectively. This implies that the importance density becomes only dependent on xt−1 . (i) (i) Let q(xt |xt−1 ) = p(xt |xt−1 ). From Eq. (4.23), we can obtain that the normalized wt|t−1 is equal to wt−1 . Therefore, p(xt |z1:t−1 ) can be approximated as Eq. (4.18). For a given camera sensor, cm , in the communication range of CHt−1 , the probability, denoted by P (cm ), that cm can detect the target is   P (cm ) =

p(xt |zt−1 )dxt dyt ≈ Dm

Ns 

(i)

(i)

wt−1 ϕ(xt , Dm ),

(4.24)

i=1

where Dm is the sensing region of cm and

(i) ϕ(xt , Dm )

=

(i)

(i)

1, if (xt , yt ) ∈ Dm ; 0, otherwise.

(4.25)

Let Cht be the set of camera sensors which be in communication range of CHt , i.e., Cht  {cm | dis(cm , CHt ) < rc , 0 < m ≤ N}, where dis(cm , CHt ) is the distance between cm and CHt . Then, the cluster head election at time t − 1 can be expressed as: CHt = arg max P (cm ). cm ∈Cht−1

(4.26)

The cluster head election algorithm is described in Algorithm 12.

4.3.5 Selection of the Cluster Members After election of the cluster head, the new CH needs to select some camera sensors as the cluster members to participate in the target tracking process. CH first needs to get the set, denoted by Cc , of candidate camera sensors which can detect the target

4.3 Dynamic Node Collaboration for Mobile Target Tracking in Multimedia. . .

167

Algorithm 12: Cluster head election algorithm 1 Initialization: Ma = 0 // Ma keeps the maximum P (cm ). 2 CHt = null 3 for i = 1 to Ns do 4 // Generate Ns particles. (i) (i) 5 Draw xt ∼ p(xt |xt−1 ) - t6 for m = 1 to -Ch - do 7 for i = 1 to Ns do (i) (i) 8 if ((xt , yt ) ∈ Dm ) then (i) (i) (i) 9 P (cm ) = P (cm ) + wt−1 // wt−1 is the weight of xt . 10 11 12

if (P (cm ) > Ma ) then // Find the maximum P (cm ). Ma = P (cm ) and CHt = cm

13 return CHt

with high probability. Let CHt be the cluster head at time t, the corresponding Cc of CHt is then Cc = {cm | P (cm ) > ξ, dis(CHt , cm ) < rc , 0 < m ≤ N},

(4.27)

where ξ is a predefined threshold and P (cm ) can be obtained directly from Eqs. (4.24) and (4.25). Please note that, P (cm ) > ξ only implies that cm can detect the target with a high probability. It is possible that a few candidate camera sensors in Cc cannot detect the target, and a few camera sensors which can detect the target are not in Cc . Ideally, we can obtain the maximum information gain when CH merges the measurements from all camera sensors in Cc , but this would be too costly. Our goal is to select the optimal set of camera sensors from Cc to obtain a precise estimate of the target location while minimizing the energy cost. In general, there exist two different criterions to define the optimal selection problem: 1. Maximum utility: maximizes the accuracy of localization under the specified cost; 2. Minimum cost: minimizes the cost so as to attain a specified accuracy of localization. Due to the constrained-resource of wireless camera sensor networks, energy saving is one of the most important problems to be considered. Motivated by this, we study the selection of locating camera sensors by using the second criteria — minimum cost. Let 2Cc denote the set of all subsets of Cc . In order to model this tradeoff between utility and cost, we need to define the following functions: 1. A utility function U: 2Cc → R+ , which quantifies the localization utility of measurements obtained by each Cc ⊆ Cc .

168

4 In-Network Processing for Multimedia Sensor Networks

2. A cost function C: 2Cc → R+ , which quantifies the energy cost of taking measurements from each Cc ⊆ Cc . Then, we can formulate the optimal selection problem as follows. CMS (Cluster Member Selection). Choose a subset Cc ⊆ Cc which minimizes C(Cc ) subject to U(Cc ) ≥ ϑ, where ϑ is the predefined threshold for localization accuracy. Thus, CMS can be also expressed as: CMS(Cc , δ) = arg

4.3.5.1

min

Cc ⊆Cc ,U(Cc )≥ϑ

C(Cc ).

(4.28)

Utility Function

Let z t be a set of measurements from Cc ⊆ Cc at time t, i.e., z t  {Zti | ∀ci ∈ Cc }. The utility of Cc can be defined as the uncertainty of the target state reduced by the new measurements z t . Continuous entropy and mutual information are applied for quantifying the information gain (or uncertainty reduction) of target localization in Wang et al. (2004). We also use mutual information to define the utility function. Then, according to the formula of continuous entropy, we have U(Cc )  h[p(xt |z1:t−1 )] − h[p(xt |z1:t−1 , z t )]  = − p(xt |z1:t−1 ) log p(xt |z1:t−1 )dxt  +

p(xt |z1:t−1 , z t ) log p(xt |z1:t−1 , z t )dxt ,

(4.29)

where h[·] is the continuous entropy function. However, Eq. (4.29) is difficult to compute in practice since we need to have the measurements before deciding how useful they are. A more practical alternative is using the expected posterior distribution instead of the true posterior distribution to compute the entropy. That is, we predict the new state (a posterior distribution) if the expected measurements of Cc from the current state are incorporated. From Eq. (4.15), we can obtain that p(xt |z1:t−1 , z t ) ∝ p(z t |xt )p(xt |z1:t−1 ). Then, the expected posterior pdf can be defined as follows: p (xt |z1:t−1 , z t )  p(z t |xt )p(xt |z1:t−1 ),

(4.30)



Z (xt )p(xt |z1:t−1 )dxt is the expected measurement of Cc and where z t 

Z (xt )  {Zi (xt )|∀ci ∈ Cc }.

4.3 Dynamic Node Collaboration for Mobile Target Tracking in Multimedia. . .

169

Then, h[p(xt |z1:t−1 , z t )] in Eq. (4.29) using p (xt |z1:t−1 , z t ) can be computed as an approximation to the entropy of p(xt |z1:t−1 , z t ). Based on the importance sampling principle, p (xt |z1:t−1 , z t ) can be represented by the set of particles  Ns (i) (i) xt , w t , i.e., i=1

p (xt |z1:t−1 , z t )



Ns 

w t(i) δ(xt − x(i) t ).

(4.31)

i=1

According to Eq. (4.30), the weight update equation is then (a)

w t ∝ p(z t |xt )wt|t−1 = p(z t |xt )wt−1 (i)

(i)

(i)

(i)

(i)

(i)

(4.32)

(i)

where (a) can be directly obtained from wt|t−1 = wt−1 . From the definition of z t , the importance sampling principle, and Eq. (4.18), we can also get the an approximation of z t , i.e., z t ≈

Ns 

Z (xt )wt−1 . (j )

(j )

(4.33)

j =1

Therefore, we can obtain the approximation of utility function as follows: U(Cc ) ≈

Ns  i=1

4.3.5.2

(i)

(i)

w t log w t −

Ns 

(i)

(i)

wt−1 log wt−1 .

(4.34)

i=1

Cost Function

The energy cost for tracking procedure of a cluster member can be mainly partitioned into two parts: (1) the energy cost for capturing and processing images, denoted by ep ; (2) the energy cost for transmitting the measurement Z, denoted by et . In this section, we assume all the camera sensors have the same energy cost for image capturing and processing, and measurement transmitting. For a wireless sensor network, the failure of several sensor nodes can affect the whole network topology. So, energy saving requires not only minimizing the total cost of the sensor network, but also homogenizing the cost of the sensor nodes. Let em be the remaining energy of a given camera sensor cm . We define the cost of cm as the ratio between the energy consumption of localization and the remaining energy, i.e., ⎧ e + et ⎪ ⎨ p , if cm is not CH; C(cm ) = epem ⎪ ⎩ , if cm is CH. em

(4.35)

170

4 In-Network Processing for Multimedia Sensor Networks

For a set of camera sensors Cc , the cost function of this set is the maximum cost in this set, i.e., C(Cc ) = max C(cm ). cm ∈Cc

4.3.5.3

(4.36)

The Cluster Members Selection Algorithm

Algorithm 13: Cluster members selection algorithm 1 Initialize Qc array to null, id =0 2 if (|Cc | > 1) then 3 // There exit more than one candidate camera sensors. 4 Ascending sort Cc by cost value and store into Qc 5 id =1 // Move the index to the second element of Qc . 6 while (Qc [id ] is not null) do 7 if (|Qc [id ]| > 1) then 8 if (U(Qc [id ]) > ϑ) then 9 return Qc [id ] // Qc [id ] is the optimal set of camera sensors. 10 11 12 13

else

14

i d = id + 1

// Qc [id ] is the set consisted by one camera sensor. for i = 0 to id −31 do Insert Qc [i] Qc [id ] into Qc

If there exit more than one candidate camera sensors, then we sort these candidate camera sensors by their cost values, and generate an ascending queue Qc . Every element in Qc is a subset of Cc . Let id be an index pointing at the elements in Qc , and Qc [id ] be the id -th element of Qc . Set the initial value of id is 0, i.e., id points at the head of Qc . The head of Qc only consists the camera sensor with the minimum cost value. It is impossible to get the target’s location by using only one measurement, i.e, U(Qc [0]) cannot satisfy the requirement. Then, id points at the next element of Qc . Because Qc [1] is also the set consisted by only one 3 camera sensor, U(Qc [1]) cannot satisfy the requirement. From 3 Eq. (4.35), C(Qc [0] Qc [1]) = C(Qc [1]) ≤ C(Q [2]). Then, we insert Q [0] Qc [1] into c 3c Qc after Qc [1], i.e., Qc [2] = Qc [0] Qc [1], and move the index id to the next element Qc [2]. If U(Qc [2]) is larger than the predefined threshold, denoted by ϑ, then Qc [2] is the optimal set we wanted. Otherwise, id moves to the next element of Qc . For the element3Qc [id ], if Qc [id3 ] is the set which has3only one camera sensor, then insert Qc [id ] Qc [0], Qc [id ] Qc [1], · · · , Qc [id ] Qc [id −1] into Qc after Qc [id ], and move id to the next element. On the other hand, if Qc [id ] is the set which have at least two camera sensors and U(Qc [id ]) is larger than ϑ, then Qc [id ]

4.3 Dynamic Node Collaboration for Mobile Target Tracking in Multimedia. . .

171

Fig. 4.8 (a) Deployment of 350 camera sensor nodes in 500×500 surveillance area. (b) Actual and estimated target trajectories (“” denotes the actual target location and “+” denotes the estimated target location)

is the optimal set we look for. Otherwise, id moves to the next element of Qc . The pseudocode of CMS algorithm is shown as Algorithm 13.

4.3.6 Simulation Results In order to perform empirical evaluations of the proposed node collaboration scheme for target tracking, we have built up a simulation platform by VC++. The fixed parameters of the simulation platform are as follows: S = 50000 × 50000, r = 30, α = π/3, F = 9.5e − 3, ζ = 5e − 4, σp = 0.001, and σs = 0.001.2 As shown in Fig. 4.8a, 350 camera sensors are deployed in S. Set x0 = 0, y0 = y 500, v0x = 1, v0 = −1, t = 20, σx = 0.005, and σy = 0.005. Then, according to Eq. (4.9), we can generate 20 points, see the “” in Fig. 4.8b, on the target trajectory from t = 1 to t = 20. For each t, we can use the proposed schemes to elect a CH and the corresponding cluster members, and then estimate the target location. The “+” in Fig. 4.8b denotes the estimated target trajectory, and the detailed information for each t can be found in Table 4.3. From Fig. 4.8b and Table 4.3, we can observe that the difference between the actual and the estimated target trajectories are small. The number of camera sensors which can detect the target with high probability at

2 The

values of parameters are based on a commonly used digital camera (Sony DSC-717F) and the related calibration process.

172

4 In-Network Processing for Multimedia Sensor Networks

Table 4.3 Target trajectories t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

xt 20 35 49 63 76 90 96 105 120 129 134 142 148 155 163 169 176 184 191

yt 480 462 445 425 403 380 358 338 315 293 275 258 240 224 206 188 172 157 144

 xt 19.6685 39.9896 50.9202 61.78 72.6 87.18 101.06 115.44 129.7 136.38 144.22 150.5 156.32 165.08 172.2 179.2 185.28 192.96 199.56

 yt 479.866 460.909 440.54 422.46 405.08 386.76 365.6 344.24 327.6 305.06 287.26 269.9 254.32 237.18 219.66 203.98 187.22 177.18 161.74

|Cc | 2 1 2 3 4 2 2 1 2 3 1 1 2 2 2 2 2 1 1

CH (xi , yi , θi ) 158(44, 467, 208◦ ) 158(44, 467, 208◦ ) 247(49, 423, 86◦ ) 194(83, 432, 230◦ ) 334(72, 393, 10◦ ) 334(72, 393, 10◦ ) 148(106, 345, 75◦ ) 181(123, 341, 216◦ ) 293(121, 301, 44◦ ) 322(128, 281, 36◦ ) 309(134, 272, 49◦ ) 192(153, 244, 98◦ ) 316(129, 252, 310◦ ) 273(164, 216, 193◦ ) 266(144, 225, 290◦ ) 118(169, 166, 48◦ ) 118(169, 166, 48◦ ) 107(184, 157, 208◦ ) 343(199, 158, 256◦ )

time t is about 2 or 3. This implies that the sizes of clusters are not big. For each t, the corresponding CH can detect the target, and its location is close to the target. Moreover, we generate the same target trajectory in a 200 camera sensors deployed area and a 500 camera sensors deployed area, respectively. As shown in Fig. 4.9, we can also get the corresponding estimated target trajectories by using the proposed schemes. From Figs. 4.9a, 4.8b, and 4.9b, we have that the difference between the actual and the estimated target trajectories decreases as the number of deployed camera sensors increases. The main reason is as follows: when the density of camera sensors is low, the size of Cc is zero or small, the lack of measurements causes the big estimated error; when the density of camera sensors is high, there exit enough measurements to get an accurately estimate. On the other hand, if the density of camera sensors is high, the size of Cc is big in general. Thus, selecting some informative camera sensors from Cc as cluster member is necessary for saving network energy. Assume that the initial energy of each camera sensor is 100. For a cluster member, the energy consumptions of detecting target and sending the corresponding measurement to CH is 1. We can also use the same motion model in Eq. (4.9) to generate Nt different target trajectories. Then, we apply three different schemes, described as follows, to select cluster members in a wireless camera sensor network of 500 nodes.

4.3 Dynamic Node Collaboration for Mobile Target Tracking in Multimedia. . .

173

Fig. 4.9 Actual and estimated target trajectories. (a) Deployment of 200 camera sensor nodes. (b) Deployment of 500 camera sensors

1. U scheme. In this scheme, all the camera sensors which can detect the target with high probability estimate the target location collaboratively, i.e., they are cluster members. 2. C scheme. In this scheme, cluster members are the camera sensors which can provide informative measurements to meet the requirement on the target state uncertainty. 3. B scheme. It is the scheme proposed in this section. Figure 4.10a–c are the statistical results for remaining energy of camera sensors. X-axis denotes the remaining energy, and Y-axis denotes the corresponding number of camera sensors. It is obviously that the energy cost of U scheme is much more than C scheme and B scheme, and the difference increases as Nt increases. The main reason is that the size of cluster in U scheme is bigger than C scheme and B scheme. There exist some redundant measurements during the tracking process in U scheme. When Nt is at the low side, for most camera sensors, the remaining energy of B scheme is as the same as C scheme (Fig. 4.10a). Using C scheme, there exist a few camera sensors which consume much more energy than the other camera sensors. For example, when Nt = 30, shown in Fig. 4.10b, the energy of 2 camera sensors are in [60,65], and the energy of 4 camera sensors are in [65,70]. Because there are a few camera sensors which can provide highly informative measurement are always selected as the cluster members. As Nt increases, for most camera sensors, the difference of remaining energy between C scheme and B scheme increases. From Fig. 4.10c, when Nt = 45, for C scheme, there are 8 camera sensors whose energy is below 60; for U scheme, only 2 camera sensors’ energy are below 60.

174

4 In-Network Processing for Multimedia Sensor Networks

Fig. 4.10 Statistical results for remaining energy of 500 camera sensors. (a) Nt = 15. (b) Nt = 30. (c) Nt = 45

4.4 Distributed Target Classification in Multimedia Sensor Networks 4.4.1 Motivation Previous works about target classification in traditional surveillance system focus on a centralized paradigm that signals are obtained at a single or few sensors and processed at a signal fusion center. As a new type of surveillance system, wireless sensor network can provide an extra advantage: redundant sensor nodes which classify a target in a collaborative way. Therefore, target classification in wireless sensor networks is a relatively recent topic of study (Brooks et al. 2003; Moore et al. 2003; Zhang and Cao 2004a,b; Song and Hatzinakos 2007; Gui and Mohapatra 2004; Zhao et al. 2002; Chu et al. 2002; Guo and Wang 2004; Arulampalam et al. 2002; Hue et al. 2002; Piccardi 2004; Kim et al. 2005; Ercan et al. 2006; Wang et al. 2004, 2007, 2009; Duda et al. 2000; Arora et al. 2004; Duarte and Hu 2004, 2003; Malhotra et al. 2008; Wang and Wang 2007; D’Costa et al. 2004; Kotecha et al. 2005).

4.4 Distributed Target Classification in Multimedia Sensor Networks

175

Compared with the traditional wireless sensor networks, multimedia sensor networks are potential to provide more powerful abilities for the target classification, especially for multiclass classification. This is because that multimedia sensor node with acoustic and visual information collection module – perhaps the most informative sensing device – can provide more information of the target and the related background than the other types of sensor nodes. However, target classification utilizing multimedia sensor networks faces two new challenges as follows: • Existing works about target classification in wireless sensor networks are often based on homogeneous sensor nodes, and use only one category of features. But in multimedia sensor networks, multimedia sensor nodes can extract multiple categories of features from sensed acoustic and visual information. It is necessary to develop a uniform framework to efficiently utilize the various features for target classification. • In many real surveillance applications, there exist multiple targets of various types. Consider the traffic scenario of a street, the targets of interest maybe cars, buses, walking people, bike-riding people, and so on. However, for the multimedia sensor node which has limited computing ability and non-rechargeable battery, it is difficulty to implement the whole procedure of multiclass classification. Therefore, the conventional paradigm of target classification in wireless sensor networks cannot be directly applicable for multiclass classification in multimedia sensor networks. It is necessary to design a distributed scheme to resolve the contradiction between the resource limitation of multimedia sensor node and the high computational complexity of multiclass classification. To overcome the above challenges, we analyzed the main stages of target classification – target detection, feature extraction, and classification– in a typical multimedia sensor network developed by our group. Then, we proposed a binary classification tree based framework for multiclass classification. By using this framework, the complex multiclass classification task can be efficiently divided into a set of subtasks, i.e., each selected multimedia sensor node only performs a sub-classifier. Specifically speaking, the proposed framework includes three components: 1. Generation of binary classification tree. In order to take the advantages of both the efficient computation of the classification tree and the high classification accuracy of Support Vector Machine (SVM), we generate a SVM based binary classification tree according to the rule of maximizing the impurity reduction. This binary classification tree give the insights of the relationship between the multiclass targets and the virous features. 2. Division of binary classification tree. According the complexity analysis of the binary classification tree, we present a scheme to divide the classification tree into a set of subtrees which are suitable for multimedia sensor node’s ability, while minimize the number of subtrees.

176

4 In-Network Processing for Multimedia Sensor Networks

3. Selection of multimedia sensor nodes. By defining the cost function and utility function, we map the nodes selection problem into an optimization problem, and then develop an optimal selection scheme to properly select a subset of multimedia sensor nodes for performing the subtrees split from binary classification tree, respectively.

4.4.2 Related Works The classification problem is studied extensively in literature (Duda et al. 2000), and it also has attracted considerable attention in the area of wireless sensor networks. At the beginning, target classification is studied as a subproblem of the target tracking in wireless sensor networks. Brooks et al. (2003) discussed a distributed classification algorithm which exploits signals from multiple nodes in several modalities and relies on prior statistical information about target classes. Arora et al. (2004) present a comprehensive study on detection, classification, and tracking of moving persons, soldiers, and vehicles in wireless sensor networks. In their works, multiple sensors measured the influence field generated by an intruder, and the classifier then fused these measurements to label the intruder class. However, these works mainly focused on detecting or tracking objects, while not giving detailed design of classifiers. Recently, more and more researchers have realized that the target classification plays an important role in surveillance applications of wireless sensor networks. Thus, some works began to focus on the distributed target classification as well as the target tracking. In these works, acoustic and seismic sensors were most commonly used. Based on an extensive real world experiments, Duarte and Hu (2004) detailed the collection procedure of acoustic and seismic data, the feature extraction, and baseline classifier development for vehicles classification. They further proposed a distance-based decision fusion scheme for exploiting the relationships among sensor to target distance, signal to noise ratio, and classification rate (Duarte and Hu 2003). Malhotra et al. (2008) also addressed the problem of classification of moving ground vehicles. They present a distributed framework to classify vehicles based on FFT (fast Fourier transform) and PSD (power spectral density) features, and proposed three distributed algorithms which are based on the k-nearest neighbor (k-NN) classification method. In the literatures, visual information based target classification is one of the most important research content of pattern classification. There are a few works about target classification in the environment of networked camera sensors. Wang et al. (2007), and Wang and Wang (2007) present a distributed multi-view tracking system using collaborative signal processing (CSP) techniques. Target classification result is achieved by a static classifier in a centralized manner and the classifier learning is based on a supervised learning method. They further construct a full autonomous target surveillance system, in which localization and classifier learning are both

4.4 Distributed Target Classification in Multimedia Sensor Networks

177

executed by the progressive data-fusion paradigm, with specifically designed sensor-node-selection strategy (Wang et al. 2009). There also exist some works which study the target classification problem of wireless sensor network from a perspective of signal processing. D’Costa et al. (2004) modeled the spatio-temporal signal field generated by an object as a bandlimited stationary ergodic Gaussian field, and analyzed the classifier performance for both soft and hard decision fusion across coherence regions assuming noisefree as well as noisy communication links between nodes. Kotecha et al. (2005) studied distributed strategies for classification of multiple targets in a wireless sensor network. They modeled the target signals as zero-mean Gaussian processes with distinct temporal power spectral densities. Their proposed classifier used a simple distributed architecture: local hard decisions from each node are communicated over noisy links to a manager node which optimally fuses them to make the final decision.

4.4.3 Procedure of Target Classification in Multimedia Sensor Networks We consider n geographically distributed multimedia sensor nodes, M = {m1 , m2 , ..., mn }, are deployed in a surveillance region S. Each multimedia sensor node is equipped with acoustic and visual information collection modules. In the surveillance applications, multimedia sensor nodes first capture the acoustic and visual signals, and preprocess them to determine whether there exists any possible target. When a target is detected, its features are extracted from the sensed signals. Finally, the classification is emitted by passing these features to one or more classifiers. Then, we conclude that the target classification procedure of multimedia sensor network includes three major stages: target detection, feature extraction, and classification.

4.4.3.1

Target Detection

Because the multimedia sensor node can capture both acoustic and visual information, we need to use different methods to detect and segment the target from the background, respectively. (1) Target detection utilizing acoustic information In our system, we utilize a constant volume threshold based method for acoustic target detection. Let ns be the number of simple  s points in a audio frame. Then, the volume of the audio frame can be defined as ni=0 |κi | where κi is the amplitude of the i-th simple point.

178

4 In-Network Processing for Multimedia Sensor Networks

100 50 0 –50 –100 0.05

0.1

0.15

0.2

0.15

0.2

0.25

0.3

0.35

0.4

1200 1000 800 600 400 200 0.05

0.1

0.25

0.3

0.35

0.4

Fig. 4.11 Acoustic target detection. (a) Calculation of the volume threshold. (b) A period of the acoustic signals captured by multimedia sensor node. (c) The curve of frame volume. From 0.19 s to 0.31 s, the corresponding frame volumes are exceed to the threshold

As shown in Fig. 4.11a, we first choose a period of acoustic signals of the target, and calculate the volume of each frame. Then, we can decide a threshold of frame volume, denoted by Eth , according to the following equation: Eth =

Emax − Emin + Emin , α

where Emax and Emin are the maximum and minimum of frame volume in the period of acoustic signals shown in Fig. 4.11a, respectively, and α is an proportionality constant for adjusting the threshold value. After sensing a period of acoustic signals, the multimedia sensor node calculates volumes of all frames. Figure 4.11b, c illustrate a period of acoustic signals sensed by a multimedia sensor node and the corresponding frame volumes, respectively. If there exist some frames whose volume exceed to the threshold, then the target is detected. The acoustic signals of the target can be further segmented for feature extraction.

4.4 Distributed Target Classification in Multimedia Sensor Networks

179

Fig. 4.12 Background subtraction. (a) A Background scene. (b) One image captured by multimedia sensor node. (c) Binary image of background subtraction result where the black pixels are the background and the white pixels are the foreground. (d) The minimum boundary rectangle containing the target

(2) Target detection utilizing visual information In the literatures of computer vision, background subtraction (Piccardi 2004) is a commonly used technique for segmenting out objects of interest in a scene. Then, when a multimedia sensor node captures a image frame, it can employ background subtraction to remove the static background. Because of the limited computing ability of multimedia sensor nodes, a simple background subtraction method (Haritaoglu et al. 2000) is used for target detection from the images. For a background scene shown in Fig. 4.12a, we first build a statistical background model. Each pixel in a sensed image (see Fig. 4.12b) is classified as either a background or a foreground pixel using the background model. The results are illustrated in Fig. 4.12c where the black pixels are the background and the white pixels are the foreground. We further compute the connected white areas. If the area of a connected white area is very small, it is just a noise. Otherwise, the foreground is the target. Then, we label the minimum boundary rectangle of this foreground on the original image (see the red rectangle in Fig. 4.12d). Therefore, the image in the minimum boundary rectangle is segmented for features extraction. Next, we define a detection-oriented sensing model for the multimedia sensor node as follows: Definition 4.2 Detection-oriented sensing model of multimedia sensor node. As shown in Fig. 4.13, for a given multimedia sensor , the corresponding  node mi− →  sensing region Di is characterized by the 5-tuple Li , ra , rv , V i , α , where Li = (xi , yi ) is the location of mi , ra and rv are the sensing radiuses of acoustic and visual − → information,3 respectively, Vi is the unit vector determining the sensing direction of

3 Because

this sensing model is for target detection, the sensing radiuses of acoustic information denotes the maximal distance that our acoustic detection method works. This implies that if the distance between the target and mi exceeds to ra , mi cannot detect the target using the acoustic detection method. Similarly, rv denotes the maximal distance that the background subtract works. In ordinary, rv is larger than ra .

180

4 In-Network Processing for Multimedia Sensor Networks

Fig. 4.13 The detectionoriented sensing model of multimedia sensor node

visual information collection module, and α is the offset angle in the field of view − → on both sides of Vi . From Fig. 4.13, we have that Di is composed of a disk Di,1 and a sector Di,2 . When a target appeared in Di,1 (or Di,2 ), the corresponding ( acoustic (or visual) information is available; when a target is in Di,3 = Di,1 Di,2 , the acoustic and visual information are available simultaneously.

4.4.3.2

Feature Extraction

After segmenting the acoustic and visual information of the target from the backgrounds, we further extract the features of the target. The goal of the feature extraction is to characterize an target to be recognized by measurements whose values are very similar for targets in the same category, and very different for targets in different categories. Generally speaking, multiple types of features can be extracted from the acoustic and visual information using different techniques. One the other hand, according to the same technique, features with different number of dimensions can be also extracted. Because the feature dimensions effect the computation complexity and the classification results, we regards the features, extracted by the same technique but with different dimensions, as being different types. Let F = {F1 , F2 , · · · , Fw } be the set of feature types used in the multimedia sensor network, and |Fi | denote the number of dimensions of the feature type Fi ∈ F. Next, we respectively review some commonly used feature types for acoustic and visual signal processing. For acoustic signals, the extracted features are often based on the frequency spectrum of the signals. FFT (Fast Fourier transform) (Brenner and Rader 1976) based feature is generated using an FFT of data points that provides a linear vector representing frequencies. PSD (Power Spectral Density) (Davenport and Root 1987) based feature is generated by taking the power spectral density estimates of data points using the Thomson multitaper method.

4.4 Distributed Target Classification in Multimedia Sensor Networks

181

In the area of computer vision, there are many different features subject to different applications. PCA (Principle Component Analysis) is a well-known feature extraction tool. In the work of Wang and Wang (2007), PCA was used for networked camera sensors. However, the computation of PCA contains a matrix operation, which is computation expensive for embedded systems. HOG (Histogram of Oriented Gradient) descriptors (Dalal and Triggs 2005) are feature descriptors used for the human detection. This technique counts occurrences of gradient orientation in localized portions of an image. In our system, the HOG feature and the FFT features with 4-dimension and 8dimension are used.

4.4.3.3

Classification

We consider that there are k categories of the targets to be classified, denoted by T = {T1 , T2 , · · · , Tk } where Ti ∈ T is the i-th category. The task of the classification stage is to use the features provided by the stage of feature extraction to assign the target to a category. Classification is typically done in two steps: first, a classifier is constructed by learning from a set of training data; second, the classifier is used to determine the class of the target according to its features. A wide variety of classifiers have been proposed in literatures (Duda et al. 2000), each having its own advantages and disadvantages. The recent results have shown that SVM classifiers (Vapnik 1998) often have superior recognition rates in comparison to other classification methods. In our system, the SVM is used to construct the classifier for target classification, which is essentially a linear classifier operating in a higher-dimensional space. Next, we consider the classification problem: how to determine whether the target belongs to a category Tc or not using the features with the type of Fj ? Given l learning samples: (x1 , y1 ) , (x2 , y2 ) , · · · , (xl , yl ) , xi ∈ Rd ; yi ∈ {−1, +1}, where xi is the extracted feature vector with the type of Fj , d = |Fj | denote the number of dimensions of xi , and yi is the label indicating whether the point xi belongs to Tc or not. The SVM aims to separate the vectors with a hyperplane wx + b = 0, where w and b are the parameters of the hyperplane. The hyperplane with the largest margin is desired, where the optimization problem can be formulated as: 

  wT w +C min ξi , yi (w · xi − b) ≥ 1 − ξi , w,b 2 i

where ξi ≥ 0 is the slack variable, C is the parameter for adjusting the cost of constraint violation. Then, for the target t with feature vector xt , if wxt + b > 0, then t belongs to the category Tc . Otherwise, t belongs to the other categories.

182

4 In-Network Processing for Multimedia Sensor Networks

However, the SVM based classification is for binary decision problems, and its extension to multiclass problems is not straightforward. The popular methods for applying SVMs to multiclass classification problems, such as one-againstall (Vapnik 1998), one-against-one (Friedman 1997), and binary tree of SVM (Fei and Liu 2006), usually decompose the multiclass problem into several two-class problems that can be addressed directly using a set of SVMs. But these methods are not suitable for the multiclass classification in multimedia sensor networks. The main reason is that in these methods the SVMs used for one multiclass classification task are all on the basis of the same feature type, i.e., the classifier composed of multiple SVMs performs in one feature vector space. But the feature types are various in the multimedia sensor network. This implies that the classifier is potential to perform in some different feature vector spaces distributedly. Therefore, we need to design a new method to organize the multiple SVMs with various feature types in multimedia sensor networks.

4.4.4 Binary Classification Tree Based Framework In the literatures of pattern classification, the classification tree (Breiman 1984) is a powerful tool to predict a discrete category using the pattern described by lists of attributes. Because the multiclass classification studied in this section is based on feature vectors, the classification tree cannot be used for our problem straightforwardly. On the other hand, the output of every SVM is a binary decision which can be regards as an attribute. Therefore, the multiple SVMs can be organized as the a binary classification tree. Moreover, because of the universal expressive power of binary trees and the comparative simplicity in training, we use the binary classification tree to provide a uniform framework for SVMs based multiple classification in multimedia sensor networks. The proposed binary classification tree based framework takes advantage of both the efficient computation of the classification tree architecture and the high classification accuracy of SVMs. Because of the limitations of computing ability and energy supply, it is difficult to perform the classification task, which corresponds to the whole binary classification tree, on a multimedia sensor node. An intuitive method is deploying the m SVM classifiers (assume that the multiclass classifier consists m SVM classifiers) to m multimedia sensor nodes, respectively. However, too much multimedia sensor nodes taking part in collaborative classification will cause the extra cost of communication among nodes. Therefore, our framework divides the binary classification tree into a set of subtrees, i.e., the multiclass classifier is divided into a set of sub-classifiers. The goal is to make sure the computational complexities of any subtree is suitable for the ability of multimedia sensor node while, minimize the number of subclassifiers. In multimedia sensors networks, especially the densely deployed ones, a target are often detected by multiple multimedia sensor nodes synchronously. Thus, after dividing the binary classification tree, the framework should select the proper

4.4 Distributed Target Classification in Multimedia Sensor Networks

183

Fig. 4.14 The overview of binary classification tree based framework. The small grey square and small circle denote a SVM and a category, respectively

multimedia sensor nodes to perform the sub-classifiers. The goal is to satisfy the accuracy requirement of classification while minimizing the energy cost of multimedia sensor network. Therefore, as shown in Fig. 4.14, the binary classification tree based framework includes three components: generation of binary classification tree, division of binary classification tree, and selection of multimedia sensor nodes.

4.4.4.1

Generation of the Binary Classification Tree

Assume that we have a set D of labeled training targets: D = {t1,1 , t1,2 , · · · , t1,n , t2,1 , · · · , t2,n , · · · , tk,1 , · · · , tk,n }, where ti,j , 0 < i ≤ k, 0 < j ≤ n denotes the i-th training target of the category Tj . Then, there are n training targets for each category and 4n training targets in all. For a given training target, the multimedia sensor node can extract w types of feature vectors. Clearly, the binary classification tree will progressively split the set of training targets into smaller and smaller subsets. Each decision outcome at a node4 is called a split, because it corresponds to splitting a subset of the training targets. The root node splits the full training set D; each successive decision splits a proper subset of the training targets. The split at each node is based on SVM classifier, thus the number of splits at each node is 2. Assume that there exist τ categories training targets represented at a given node N. Let the class of categories be TN = {T1 , T2 , · · · , Tτ }. The goal is to find the split that best splits groups of the τ categories – that is, a candidate “supercategory” TN,1 ⊂ TN , TN,1 = ∅ consisting of all training targets in some subset of the

4 Please

node”.

note that in this section the word “node” means “the node of tree” not “multimedia sensor

184

4 In-Network Processing for Multimedia Sensor Networks

categories, and candidate “supercategory” TN,2 = TN − TN,1 as all remaining training targets. Definition 4.3 Atomic classifier. Let Fi ∈ F be an arbitrary feature type. If a Fi based SVM classifier at node N is used to determine whether a target belongs to TN,1 ⊂ TN , TN,1 = ∅, we then  call the SVM classifier as a atomic classifier described by the 2-tube Fi , TN,1 . Much of work in generating the binary classification tree focuses on deciding which atomic classifier should perform at each node. The fundamental principle underlying tree generation is that of simplicity: we prefer binary decisions that lead 5 to simple, compact  few atomic classifiers. To this end, we seek a property  tree with atomic classifier Fi , TN,1 at each node N that makes the training targets reaching the immediate descendent nodes as “pure” as possible. Next, we define a measure of impurity using the information entropy. Definition 4.4 Entropy impurity. Let I(N ) denote the impurity of a node N . Define I(N )  −

τ 

P (Tj ) log2 P (Tj ),

(4.37)

j =1

where P (Tj ) is the fraction of training targets at node N that are in category Tj . By the well-known properties of entropy, if all the training targets are of the same category, I(N ) = 0; otherwise I(N ) is positive, with the greatest value occurring when the different categories are equally likely.   Given a partial tree down to node N, the problem is what value of Fi , TN,1 should we choose? An heuristic is to choose the atomic classifier that decreases the impurity as much as possible. The drop in impurity is defined by I(N ) = I(N ) − PL I(NL ) − (1 − PL )I(NR ),

(4.38)

where NL and NR are the left and right descendent nodes, I(NL ) and I(NR ) are their impurities, and PL is the  fractionof training targets at node N that will go to NL when the atomic classifier Fi , TN,1 is used. The impurity reduction corresponds to an information gain provided by the atomic classifier. If the maximum of I(N ) is below to a predefined threshold ζ , then slipping is stopped and N becomes a leaf (and state what category to assign to it). Otherwise, the atomic classifier that maximizes I(N ) is chosen for the slipping at N .

5 According

to the principle of Occam’s razor, the simplest model that explains data is the one to be preferred (Duda et al. 2000).

4.4 Distributed Target Classification in Multimedia Sensor Networks

185

Because TN,1 ⊂ TN , the set of all possible values of TN,1 is 2TN − {∅, TN } where 2TN denotes the set of all subsets of TN . Then, |TN,1 | = 2τ − 2. Moreover, |F| = w. Therefore, we need to calculate the impurity reduction value (2τ − 2) w times for finding the maximum.

4.4.4.2

Division of the Binary Classification Tree

In our framework, we use the number of clock cycles to measure the computational complexity of classification. A target classification process corresponds to the path from the root node to a leaf node in the binary classification tree. Definition 4.5 Classification path. As shown in Fig. 4.15, Nr is the root node of a binary classification tree and Nl,i is an arbitrary leaf node with the class label of Ti . For a given classification process with the result Ti , the classification path, denoted by pi , is defined as the path from Nr to the parents node of Nl,i , i.e., pi  (Nr , Ni,2 , · · · , Ni,j ). Let Cc (Nl,i ) denote the computational complexity of the classification process with the result Ti . Assume that k types of features {Fi,1 , Fi,2 , · · · , Fi,k } are used in this classification. Then, Cc (Nl,i ) is given by Cc (Nl,i ) = Cd +

k  l=1

Cf (Fi,l ) +

j 

Cs (Ni,q ),

(4.39)

q=1

where Cd , Cf (Fi,l ), and Cs (Ni,q )6 denote the computational complexities of the target detection, extraction of the feature Fi,l , and classification using the SVM at the node Ni,q , respectively. Fig. 4.15 Classification path. The bold line denotes the classification path (Nr , Ni,2 , · · · , Ni,j )

6N

1,q

denotes the root node Nr .

186

4 In-Network Processing for Multimedia Sensor Networks

Definition 4.6 Computational complexity of the binary classification tree. For a binary classification tree Tr , its computational complexity, denoted by Cc (Tr ), is defined as: Cc (Tr )  max Cc (Nl,i ). Ti ∈T

(4.40)

Let δ be the maximal number of allowable clock cycles for performing one time classification on a multimedia sensor node. Then, if Cc (Tr ) < δ, then the classifier corresponding to the binary classification tree Tr can run on one multimedia sensor node. Otherwise, we need to divide Tr into a set of subtrees. Let Tr,L and Tr,R be the left and right subtrees of Tr , respectively. According to Eq. (2.21), we can obtain Cc (Tr,L ) and Cc (Tr,R ). If the bigger one is smaller than δ, then the corresponding subtree, denoted by Ts,1 , is removed from Tr , and a new tree Tr − Ts,1 is generated. Otherwise, we calculate the computational complexities of the left and right subtrees of Ts,1 . Above process is repeat till Tr is divided into a set of subtrees. Each subtree corresponds to a sub-classifier running on a multimedia sensor node.

4.4.4.3

Selection of Multimedia Sensor Nodes

Assume that the binary classification tree Tr is divided into a set of subtrees Ts = {Ts,1 , Ts,2 , · · · , Ts,u }, i.e., there are u types of sub-classifiers running on u multimedia sensor nodes. An arbitrary node mi ∈ M performs the sub-classifier corresponding to Ts,g ∈ Ts . From Definition 1, we have that: if Ts,g is based on only acoustic (or visual) information, the target should be in Di,1 (or Di,2 ); if Ts,g is based on both acoustic and visual information, the target should be in Di,3 . Let χ be the variable of the target location, and (χ ) be the pdf (probability density function) of χ . How to compute the (χ ) using the measurements is an important research topic for target localization but we will not discuss in this section, and we refer the readers to the work of Liu et al. (2009) for a discussion. Then, the probability, denoted by pi , that mi performs Ts,g is:  pi =

(χ )dχ , D

where ⎧ ⎨ Di,1 , if Ts,g uses acoustic information; D = Di,2 , if Ts,g uses visual information; ⎩ Di,3 , if Ts,g uses acoustic and visual information.

(4.41)

4.4 Distributed Target Classification in Multimedia Sensor Networks

187

If pi exceeds a predefined threshold ε, then mi becomes a candidate for executing Ts,g . Thus, for Ts,g the set of candidates, denoted by γg , is: γc = {mi | pi > ε, 0 ≤ i < n}. Let ρ  (mc,1 , mc,2 , · · · , mc,u ) be a group of u candidates for the u subtrees where mc,1 ∈ γ1 , mc,2 ∈ γ2 , · · · , mc,u ∈ γu . Our goal is to select the optimal group of u candidate multimedia sensor nodes to obtain a precise result of the target classification while minimizing the energy cost. Due to the constrained-resource of multimedia sensor networks, energy saving is one of the most important problems to be considered. Motivated by this, we study the selection of multimedia sensor nodes by using the criteria of minimum cost. That is minimizing the cost so as to attain a specified classification accuracy. Let G denote the set of all possible groups, i.e., G  {∀(mc,1 , · · · , mc,u ) : mc,1 ∈ γ1 , · · · , mc,u ∈ γu }. In order to model the tradeoff between utility and cost, we need to define the following functions: (1) A utility function U: G → R+ , which quantifies the classification accuracy of each ρ ∈ G. The utility of ρ can be defined as: U(ρ)  uk=1 A(mc,k , Ts,k ),

(4.42)

where A(mc,k , Ts,k ) denotes the expected classification accuracy of the subtree Ts,k running on the multimedia sensor node mc,k . Next, we deduce the expression of A(mc,k , Ts,k ). Let Nl,i be a leaf node in Ts,k with the class label of Ti . For a given node Ni,q in the classification path pi , let A(mc,k , Ni,q ) be the expected classification accuracy of the corresponding atomic classifier running on mc,k . Under the same application environment, the distance between the target and the multimedia sensor node is the main factor which affects the classification accuracy. Then, 

  (χ )fi,q |Lc,k χ | dχ ,

A(mc,k , Ni,q ) =

(4.43)

D

where |χ Lc,k | denotes the Euclidean distance between the target location χ and the multimedia sensor node location Lc,k , and fi,q (·) is an accuracy function of the distance. The expected classification accuracy of classification path pi , denoted by A(mc,k , pi ), is given by: A(mc,k , pi ) =

u  q=1

A(mc,k , Ni,q ),

(4.44)

188

4 In-Network Processing for Multimedia Sensor Networks

Let P (pi ) denote the probability that the classification reaches the leaf node Nl,i , we have:  P (pi )A(mc,k , pi ). (4.45) A(mc,k , Ts,k ) = i

(2) A cost function C: G → R+ , which quantifies the energy cost of classification for each ρ ∈ G. In our framework, only decisions are transmitted among multimedia sensor nodes, thus the energy consumption of communication is small. We also use the number of clock cycles to measure the energy consumption. Let E(Ts,k ) be the expected energy consumption of the subtree Ts,k running one time. Then, we have:  P (pi )E(pi ), (4.46) E(Ts,k ) = i

where E(pi ), denoted the energy consumption when the classification path is pi , can be obtained by Eq. (4.40). On the other hand, for a multimedia sensor network, the failure of several multimedia sensor nodes can affect the whole network topology. So, energy saving requires not only minimizing the total cost of the network, but also homogenizing the cost of the multimedia sensor nodes. Let ec,k be the remaining energy of the multimedia sensor node mc,k . We define the cost function of mc,k as the ratio between the total energy consumption for classification and the remaining energy, i.e., E(mc,k ) =

E(Ts,k ) ec,k

(4.47)

For ρ ∈ G, the cost value is defined as the maximum cost value of multimedia sensor node in this group, i.e., the cost function is E(ρ) = max E(mc,k ). 1≤k≤u

(4.48)

Then, we can formulate the optimal selection problem as follows: Choose a subset ρ ∈ G which minimizes E(ρ) subject to U(ρ) ≥ ϑ, where ϑ is the predefined threshold for classification accuracy. Thus, this problem can be also expressed as: ρ∗ = arg

min E(ρ). U(ρ)≥ϑ

ρ∈G,

where ρ∗ is the optimal group of multimedia sensor nodes for distributed classification. Let |G| denote the size of the value space of ρ. In the real applications, the number of multimedia .sensors, which detect a target simultaneously, is limited. This implies that |G| = ui=1 |γi | is small. Therefore, it is easy to get ρ∗ using the method of exhaustion.

4.4 Distributed Target Classification in Multimedia Sensor Networks

189

4.4.5 Case Study and Simulations We first choose a typical surveillance scenario for our experiment. There exist 5 target categories: car, bus, bike-riding people, speaking people, and non-speaking people. For each target, we extract 4 types of features: the aspect ratio of the target, the HOG feature, the FFT feature with 4-dimension, and the FFT feature with 8dimension. We then generate a binary classification tree, shown in Fig. 4.16, using the samples captured by our developed multimedia sensor nodes (see Fig. 4.19). Our binary classification tree includes four atomic classifiers: SVM 1, SVM 2, SVM 3, and SVM 4. We run these atomic classifiers on a multi media sensor node, respectively. The running time of each atomic classifier is the experimental value of computational complexity. We have Cc (SVM 1) : Cc (SVM 2) : Cc (SVM 3) : Cc (SVM 4) ≈ 1 : 9 : 3 : 2. Then, we can divide the binary classification tree into three subtrees including {SVM 4}, {SVM 2}, {SVM 1, SVM 3}, respectively. Moreover, from the prototype of multimedia sensor node, we also obtain: ra = 30m, rv = 50m, and α = π/4. However, our developed prototype of multimedia sensor network is still a smallsize one with several multimedia nodes. Therefore, to show the effect of our proposed framework on the energy saving for large scale networks, we build up a simulation platform by using above parameters. As shown in Fig. 4.17a, 300 multimedia sensor nodes are randomly deployed in the region S = 500m × 500m. The coverage situation is shown in Fig. 4.17b. For an arbitrary point in S, let Na and Nv be the number of multimedia sensor node whose acoustic sensing region and visual sensing region covers this point, respectively. We also define the Na -Nv -coverage probability as the probability that an arbitrary point is in Na

Fig. 4.16 The case of binary classification tree in our experiment

190

4 In-Network Processing for Multimedia Sensor Networks

Fig. 4.17 Simulation of multimedia sensor network. (a) A randomly deployed multimedia sensor network with 300 nodes. (b) Coverage situation of the multimedia sensor network

Fig. 4.18 Relationships among Na , Nv , and the Na -Nv -coverage probability

acoustic sensing regions and Nv visual sensing regions. Figure 4.18 illustrates the relationships among Na , Nv , and the Na -Nv -coverage probability. From Fig. 4.18, we observe that for most points in S, their Na ’s and Nv ’s are in the range of [2,6]. This result verifies that |G| is small and the method of exhaustion for ρ∗ is reasonable.

4.4 Distributed Target Classification in Multimedia Sensor Networks

191

Assume that the initial energy of each multimedia sensor node is 100. Let Cc (SVM 4) = 1, then we have E({SVM 1, SVM 3}) = 2, E({SVM 4}) = 1, and E({SVM 2}) = 4.5. Next, we apply three different schemes, described as follows, to select multimedia sensor nodes for classifying a target. 1. A scheme. In this scheme, all the multimedia sensor nodes which can detect the target perform the classification collaboratively. This scheme follows the traditional decision fusion paradigm. 2. B scheme. In this scheme, the distance between the target and the candidate multimedia sensor node is the criteria for selecting multimedia sensor nodes. This scheme can provide the most informative measurements to meet the requirement on classification accuracy. 3. C scheme. It is the scheme proposed in this section. We randomly generate one target in S at one time. Let Nt be the times of generating target. Table 4.4 and Fig. 4.19 are the statistical results for remaining energy of 300 multimedia sensors with different Nt ’s. In Table I, the left column is the values of remaining energy, and the other columns are the number of multimedia

Table 4.4 The Statistical results for remaining energy of multimedia sensor nodes using A, B, and C schemes Energy 100−95 95−90 90−85 85−70 80−75 75−70 70−65 65−60 60−55 55−50 50−45 45−40 40−35 35−30 30−25 25−20 20−15 15−10 10−5 5−0 0−

Nt = 300 A B C 45 160 138 57 66 113 79 49 35 56 19 12 28 3 2 15 2 0 13 1 0 5 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Nt = 600 A B 7 74 9 60 30 70 43 46 51 27 52 15 39 4 31 4 13 0 12 0 7 0 3 0 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

C 26 109 84 57 18 2 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0

Nt = 900 A B 4 33 0 40 7 58 12 65 20 42 25 23 43 16 54 10 34 8 36 3 30 2 12 0 11 0 3 0 5 0 3 0 1 0 0 0 0 0 0 0 0 0

C 2 29 85 88 60 20 8 5 2 1 0 0 0 0 0 0 0 0 0 0 0

Nt = 1200 A B C 2 15 0 2 28 8 2 39 42 2 56 56 10 40 67 10 44 74 20 30 36 14 17 7 32 9 3 38 11 3 44 2 3 31 5 1 23 3 0 23 1 0 17 0 0 9 0 0 9 0 0 5 0 0 3 0 0 3 0 0 1 0 0

Nt = 1500 A B C 1 9 0 3 25 5 1 19 11 1 37 37 5 42 43 2 39 69 10 34 59 9 26 45 11 26 18 12 13 3 24 13 5 26 5 3 39 2 2 28 4 0 33 5 0 27 1 0 18 0 0 21 0 0 10 0 0 6 0 0 13 0 0

192

4 In-Network Processing for Multimedia Sensor Networks

Fig. 4.19 The variances of the remaining energies using A, B, and C schemes

sensor nodes corresponding to different remaining energies and schemes. It is obviously that the energy cost of A scheme is much more than B scheme and C scheme, and the difference increases as Nt increases. The main reason is : in A scheme all multimedia sensor nodes which detect the target are taken part in classification; in B scheme and C scheme, only three multimedia sensor nodes are chosen to perform 3 sub-classifiers, respectively. When Nt is at the low side, for most multimedia sensor nodes, the remaining energy of B scheme is as the same as C scheme. However, Using B scheme, there exist a few multimedia sensor nodes which consume much more energy than the other ones. For example, when Nt = 600, using B scheme, the remaining energies of 23 multimedia sensor nodes are below 75. But there are only 6 multimedia sensor nodes whose remaining energies are below 75 using B scheme. As Nt increases, for most multimedia sensor nodes, the difference of remaining energies between B scheme and C scheme increases. When Nt = 1500, using B scheme, there are 30 multimedia sensor nodes whose remaining energies are below 50; using C scheme, only 10 multimedia sensor nodes’ energies are below 50. We also observe that if A scheme is used, 13 multimedia sensor nodes will be out of power when Nt = 1500. In order to verify that our scheme can homogenize the cost of the multimedia sensor nodes, we compute the variances of the remaining energies using A, B, and C schemes, respectively. The results are illustrated in Fig. 4.19. From Fig. 4.19, we clearly see that the variance of C scheme is smaller than the variances of A and B schemes. As Nt increases, the variance of C scheme slowly increases, and the differences among variances also increase.

4.5 Decomposition-Fusion: A Cooperative Computing Mode for Multimedia. . .

193

4.5 Decomposition-Fusion: A Cooperative Computing Mode for Multimedia Sensor Networks 4.5.1 Motivation In this section, we first analyze the typical transmission-processing paradigms for MSNs. Then, we propose a computing mode, Decomposition-Fusion (DF), by means of a combination of communication process and in-network processing of raw data streams that describe the phenomenon of interest from multiple views, with multiple modal media, and on multiple nodes. By using this cooperative computing mode, the complex task of MSNs can be efficiently divided into a set of subtasks, and some suitable multimedia sensor nodes are selected to execute the subtasks in a uniform cooperative computing framework. Moreover, from the perspective of computing mode, the DF mode embodies the idea of edge computing which is a new paradigm pushing intelligence and processing capabilities down closer to where the sensory data originates (Ahmeda and Rehmanib 2017). The proposed DF mode offers a natural vantage point for aggregating and analyzing multimedia data from devices, and provides an ideal high-performance infrastructure to support lots of emerging intelligent applications.

4.5.2 Typical Paradigms of Transmission-Processing for MSNs MSNs can be understood as the convergence between the concepts of WSNs and distributed smart cameras. There exist two complementary transmission-processing paradigms in WSNs and smart cameras, respectively. As shown in Fig. 4.20a, traditional WSNs commonly exploit a centralized paradigm “transmit first, then process” that the sensor node captures scalar data and then sends it to the sink node for centralized processing. Figure 4.20b illustrates the paradigm of “process first, then transmit” which is commonly used for distributed smart cameras. All smart cameras around the target yield measurements and perform the processing tasks independently. The final result is made on the sink node by fusing the results from smart cameras. Obviously, streaming all the multimedia data to the sink node is impractical due to the energy and bandwidth constraints of MSNs. Also, most multimedia processing tasks are complicated and computation extensive, thus they are hardly executed on a single sensor node. Moreover, running the whole task on a single node cannot take advantages of retrieved information from multiple views, multiple modal media, and multiple nodes. The insight problem of these paradigms is that processing of multimedia content has mostly been approached as a task isolated from the network-design. However, transmission and processing of multimedia data in MSNs are not independent. The desired paradigm will be provided by means of a combination of cross-layer optimization of the communication process and in-network processing.

194

4 In-Network Processing for Multimedia Sensor Networks

Fig. 4.20 Transmission-processing paradigms. (a) Transmit first, then process. (b) Process first, then transmit. (c) Filter-fusion. (d) Decomposition-fusion

In traditional WSNs, in-network processing is exploited to avoid the transmission of redundant sensory data since processing costs are significantly lower than communication costs. However, as multimedia data is much more complex than scalar data, the models for WSNs cannot be directly applied to MSNs. But in many research efforts for MSNs, collaborative multimedia in-network processing is also suggested as an effective way to avoid the transmission of redundant information. According to the requirements of specific applications, each sensor node can filter out partial sensory data and coordinate with each other to aggregate correlated data. In our prior work (Ma and Liu 2005), we proposed a method for cooperative video processing in video sensor networks based on sensor correlations. Each sensor node only needs to cover a fraction of the targeted area so that the image processing and transmission workload is significantly reduced. In the work of Wagner et al. (2003), each sensor node transmits a low-resolution version of a common area, and the sink combines multiple low-resolution versions into a highresolution image. In the work of Wuand and Chen (2006), based on spatial and temporal correlation information, images from correlated sensors are transmitted collaboratively. A FPGA architecture is proposed in the work of Aziz and Pham (2013) to extract updated objects from the background image. Only the updated objects are transmitted, providing energy-efficient image transmission in error-

4.5 Decomposition-Fusion: A Cooperative Computing Mode for Multimedia. . .

195

prone environments. As shown in Fig. 4.20c, this is a “filter-fusion” paradigm. However, in this paradigm in-network processing is just for the transmission task. But in many applications of MSNs, the tasks are analyzing and processing the multimedia content not transmitting the data to the sink node. For example, in the surveillance application, the multimedia sensor nodes should tracking and recognizing the target of interest. Moreover, the cooperation degree among sensor nodes is low in this paradigm. In this section, we propose a decomposition-fusion mode that the complex task of MSNs can be efficiently divided into a set of sub-tasks, and suitable multimedia sensor nodes are selected to execute the subtasks in the cooperative fashion (see Fig. 4.20d). The interim results are exchanged among cooperative nodes, and only the final result is transmitted to the sink node.

4.5.3 Decomposition-Fusion Cooperative Computing Framework Although multimedia processing techniques are application and media type dependent, it is necessary to design a uniform framework for providing a high level specification of decomposition-fusion mode. As shown in Fig. 4.21, the framework consists primarily of 5 components.

4.5.3.1

Task Decomposition

As mentioned above, because of the limitations of computing ability and energy supply, it is difficult to perform the complex task on a single sensor node. An intuitive method is to decompose the whole task into a set of sub-tasks and deploy each sub-task on one multimedia sensor node. Although the procedure of task decomposition is application dependent, we also can give two criterions as follows: • Each sub-task should keep certain independence so that the sub-tasks can be easily connected through simply interacting interim results. • Too much multimedia sensor nodes taking part in collaborative classification will cause the extra cost of communication among nodes. Therefore, the second criterion is to make sure the computational complexities of any sub-task is suitable for the ability of multimedia sensor node while, minimize the number of sub-tasks.

4.5.3.2

Target Detection

In many applications, multimedia sensor nodes first capture the raw data, and preprocess them to determine whether there exists any possible target. Because

196

4 In-Network Processing for Multimedia Sensor Networks

Fig. 4.21 Decomposition-fusion Framework for MSNs

the multimedia sensor node can capture both acoustic and visual information, different methods are utilized to detect and segment the target from the background, respectively. For example, we can utilize a constant volume threshold based method and background subtraction for acoustic and visual target detection, respectively.

4.5 Decomposition-Fusion: A Cooperative Computing Mode for Multimedia. . .

4.5.3.3

197

Selection of Candidates

Assume that the task T is divided into a set of sub-tasks Ts = {Ts,1 , Ts,2 , · · · , Ts,u }, i.e., there are u types of sub-tasks running on u multimedia sensor nodes. An arbitrary node mi ∈ M performs the sub-task corresponding to Ts,g ∈ Ts . From Fig. 4.13, we have that: if Ts,g is based on only acoustic (or visual) information, the target should be in Di,1 (or Di,2 ); if Ts,g is based on both acoustic and visual information, the target should be in Di,3 . Let χ be the variable of the target location, and (χ ) be the pdf (probability density function) of χ . How to compute the (χ ) using the measurements is an important research topic for target localization but we will not discuss in this section, and we refer the readers to the work of Liu et al. (2009) for a discussion. Then, the probability, denoted by pi , that mi performs Ts,g is:  pi =

(χ )dχ ,

(4.49)

D

where ⎧ ⎨ Di,1 , if Ts,g uses acoustic information; D = Di,2 , if Ts,g uses visual information; ⎩ Di,3 , if Ts,g uses acoustic and visual information. If pi exceeds a predefined threshold ε, then mi becomes a candidate for executing Ts,g . Thus, for Ts,g the set of candidates, denoted by γg , is: γc = {mi | pi > ε, 0 ≤ i < n}.

4.5.3.4

Selection of Cooperators

Let ρ  (mc,1 , mc,2 , · · · , mc,u ) be a group of u candidates for the u subtasks where mc,1 ∈ γ1 , mc,2 ∈ γ2 , · · · , mc,u ∈ γu . Our goal is to select the optimal group of u candidate multimedia sensor nodes to obtain desired performance of the target task while minimizing the energy cost. Due to the constrained-resource of multimedia sensor networks, energy saving is one of the most important problems to be considered. Motivated by this, we study the selection of multimedia sensor nodes by using the criteria of minimum cost. That is minimizing the cost so as to attain a specified task utility. Let G denote the set of all possible groups, i.e., G  {∀(mc,1 , · · · , mc,u ) : mc,1 ∈ γ1 , · · · , mc,u ∈ γu }.

198

4 In-Network Processing for Multimedia Sensor Networks

In order to model the tradeoff between utility and cost, we need to define the following functions: • A utility function U: G → R+ , which quantifies the classification accuracy of each ρ ∈ G. • A cost function C: G → R+ , which quantifies the energy cost of task execution for each ρ ∈ G. In our framework, only interim results are transmitted among multimedia sensor nodes, thus the energy consumption of communication is small. We also use the number of clock cycles to measure the energy consumption. On the other hand, for a multimedia sensor network, the failure of several multimedia sensor nodes can affect the whole network topology. So, energy saving requires not only minimizing the total cost of the network, but also homogenizing the cost of the multimedia sensor nodes. Let C(Ts,k ) be the expected energy consumption of the sub-task Ts,k running one time. Let ec,k be the remaining energy of the multimedia sensor node mc,k . We define the cost function of mc,k as the ratio between the total energy consumption for classification and the remaining energy, i.e., C(mc,k ) = C(Ts,k )/ec,k . For ρ ∈ G, the cost value is defined as the maximum cost value of multimedia sensor node in this group, i.e., the cost function is C(ρ)  max C(mc,k ). 1≤k≤u

(4.50)

Then, we can formulate the optimal selection problem as follows: Choose a subset ρ ∈ G which minimizes C(ρ) subject to U(ρ) ≥ ϑ, where ϑ is the predefined threshold for classification accuracy. Thus, this problem can be also expressed as: ρ∗ = arg

min C(ρ). U(ρ)≥ϑ

ρ∈G,

where ρ∗ is the optimal group of multimedia sensor nodes for distributed classification. Let |G| denote the size of the value space of ρ. In the real applications, the number of multimedia .sensors, which detect a target simultaneously, is limited. This implies that |G| = ui=1 |γi | is small. Therefore, it is easy to get ρ∗ using the method of exhaustion.

4.5.3.5

Interim Results Fusion

After determining the cooperators, each sub-task is assigned to the corresponding cooperator. Compared with traditional results fusion which is commonly based on vote, the interim results fusion is more complex and application dependent. The execution results of some sub-tasks may be the inputs of some other sub-tasks. Thus, the cooperators transmit the interim results via the cooperation communication module. After executing all sub-tasks, the obtained final result is transmitted to the

References

199

sink node. In ordinary, these cooperators around the target form a cluster (Bernabe 2015), and the formation of the cluster is triggered by detection of the target. Our proposed DF mode could prolong network lifetime while meeting application requirements. It is also provide a high level specification of edge computing for multi-modal sensory data. In the future, we plan to investigate softwaredefined transmission-processing in MSNs to further introduce the reconfigurability to the DF mode. Another particularly interesting topic would be to implement the emerging AI algorithms, taking deep learning for example, by the DF mode to support more and more intelligent applications of IoT.

References Ahmeda, E., Rehmanib, M.: Mobile edge computing: Opportunities, solutions, and challenges, Future Generation Computer Systems, online (2017). https://doi.org/10.1016/j.future.2016.09. 015 Akyildiz, I., Sankarasubramaniam, W.Y., Cayirci, E.: A survey on sensor networks. IEEE Commun. Mag. 40(8), 102–114 (2002) Alvia, S.A., Afzala, B., Shaha, G.A., Atzorib, L., Mahmooda, W.: Internet of multimedia things: Vision and challenges. Ad Hoc Netw. 33, 87–111 (2015) Arora, A., et al.: A line in the sand: A wireless sensor network for target detection, classification, and tracking. Comput. Netw. 46(5), 605–634 (2004) Arulampalam, M., Maskell, S., Gordeon, N., Clapp, T.: A tutorial on particle filters for online nonlinear/non-Guassian Bayesian Tracking. IEEE Trans. Signal Process. 50(2), 174–188 (2002) Aziz, S.M., Pham, D.M.: Energy efficient image transmission in wireless multimedia sensor networks. IEEE Commun. Lett. 17(6), 1084–1087 (2013) Bernabe, A., Dios, J., Ollero, A.: Efficient cluster-based tracking mechanisms for camera-based wireless sensor networks. IEEE Trans. Mob. Comput. 14(9), 1820–1832 (2015) Bramberger, M., Doblander, A., Maier, A., Rinner, B., Schwabach, H.: Distributed embedded smart cameras for surveillance applications. Computer 39(2), 68–75 (2006) Breiman, L.: Classification and Regression Trees. Chapman & Hall (1984) Brenner, N., Rader, C.: A new principle for fast fourier transformation. IEEE Trans. Acoust. Speech Signal Process. 24(3), 264–266 (1976) Brooks, R., Ramanathan, P., Sayeed, A.: Distributed target classification and tracking in sensor networks. Proc. IEEE 91(8), 1163–1171 (2003) Buxton, H., Gong, S.: Visual surveillance in a dynamic and uncertain world. Proc. IEEE 78(1–2), 431–459 (1995) Chandramohan, V., Christensen, K.: A first look at wied sensor networks for video surveillance systems. In: Proceedings of the High Speed Local Networks Workshop at the IEEE Conference on Local Computer Networks (2002) Chu, M., Haussecker, H., Zhao, F.: Scalable information-driven sensor querying and routing for ad hoc heterogenous sensor networks. Int. J. High Perform. Comput. Appl. 16(3), 293–313 (2002) Collins, R., Lipton, A., Fujiyoshi, H., Kanade, T.: Algorithms for cooperative multisensor surveillance. Proc. IEEE 89(10), 1456–1477 (2001) Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) (2005) Davenport, W.B., Root, W.L.: An Introduction to the Theory of Random Signals and Noise. IEEE Press (1987)

200

4 In-Network Processing for Multimedia Sensor Networks

D’Costa, A., Ramachandran, V., Sayeed, A.: Distributed classification of Gaussian space-time sources in wireless sensor networks. IEEE J. Sel. Areas Commun. 22, 6, 1026–1036 (2004) Denman, S., Fookes, C., Cook, J., Davoren, C., et al.: Multi-view intelligent vehicle surveillance system. In: Proceedings of IEEE International Conference on Video and Signal Based Surveillance (2006) Duarte, M., Hu, Y-H.: Distance based decision fusion in a distributed wireless sensor network. In: Proceedings of International Workshop on Information Processing in Sensor Networks (2003) Duarte, M., Hu, Y.-H.: Vehicle classification in distributed sensor networks. J. Parallel Distrib. Comput. 64(7), 826–838 (2004) Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience (2000) Ercan, A.O., Yang, D.B.-R., El Gamal, A., Guibas, L.J.: Optimal placement and selection of camera network nodes for target localization. In: Proceedings of International Conference on Distributed Computing in Sensor Systems (2006) Fei, B., Liu, J.: Binary tree of SVM: A new fast multiclass training and classification algorithm. IEEE Trans. Neural Netw. 17(3), 696–704 (2006) Fotesti, G., Snidaro, L.: A distributed sensor network for video surveillance of outdoor environments. In: Proceedings of IEEE National Conference on Image Processing (2002) Friedman, J.H.: Another approach to polychotomous classification. Technical Report, Statistics Department, Stanford University (1997) Gui, C., Mohapatra, P.: Power conservation and quality of surveillance in target tracking sensor networks. In: Proceedings of International Conference on Mobile Computing and Networking (MobiCom04) (2004) Guo, D., Wang, X.: Dynamic sensor collaboration via sequential Monte Carlo. IEEE J. Sel. Areas Commun. 22(6), 1037–1047 (2004) Gupta, A., Gui, C., Mohapatra, P.: Mobile target tracking using sensor networks. Mobile Wirel. Sensor Netw. 173–196 (2006) Haritaoglu, I., Harwood, D., Davis, L.S.: W/sup 4: real-time surveillance of people and their activities. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 809–830 (2000) Heinzelman, W., Kulik, J., Balakrishnan, H.: Adaptive protocol for information dissemination in wireless sensor networks. In: ACM Mobicom99. pp.174–185, Seattle, Washington (1999) Hue, C., Cadre, J., Perez, P.: Sequential Monte Carlo methods for multiple target tracking and data fusion. IEEE Trans. Signal Process. 50(2), 309–325 (2002) Kim, K., Chalidabhongse, T.H., Harwood, D., Davis, L.S.: Real-time foreground-background segmentation using codebook model. Real-Time Imaging 11(3), 172–185 (2005) Kotecha, J.H., Ramachandran, V., Sayeed, A.: Distributed multitarget classification in wireless sensor networks. IEEE J. Sel. Areas Commun. 23(4), 703–713 (2005) Li, D., Wong, K., Hu, Y., Sayeed, A.: Detection, classification, tracking of targets in micro-sensor networks. IEEE Signal Process. Mag. 19, 17–29 (2002) Liu, J., Liu, J., Reich, J., Cheung, P., Zhao, F.: Distributed group management for track initiation and maintenance in target localization applications. In: Proceedings of Information Processing in Sensor Networks (2003) Liu, L., Zhang, X., Ma, H.: Dynamic node collaboration for mobile target tracking in wireless camera sensor networks. In: Proceedings of IEEE INFOCOM (2009) Liu, L., Zhang, X., Ma, H.: Optimal nodes selection for target localization in wireless camera sensor networks. IEEE Trans. Veh. Technol. 59(7), 3562–3576 (2009) Liu, L., Ming, A., Ma, H., Zhang, X.: A binary-classification-tree based framework for distributed target classification in multimedia sensor networks. In: Proceedings of IEEE INFOCOM (2012) Ma, H., Liu, Y.: Correlation based bideo processing in video sensor networks. In: Proceedings of International Conference on Wireless Networks, Communications and Mobile Computing (2005) Malhotra, B., Nikolaidis, I., Harms, J.: Distributed classification of acoustic targets in wireless audio-sensor networks. Comput. Netw. 52(13), 2582–2593 (2008)

References

201

Matsuyama, T., Ukita, N.: Real-time multitarget tracking by a cooperative distributed vision system. Proc. IEEE 90(7), 1136–1150 (2002) Moore, J., Keiser, T., Brooks, R., Phoha, S., Friedlander, D., Koch, J., Reggio, A., Jacobson, N.: Tracking targets with self-organizing distributed ground sensors. In: Proceedings of IEEE Aerospace Conference (2003) Morita, S., Yamamwa, K., Yokoya, N.: Networked video surveillance using multiple omnidirectional cameras. In: Proceedings of IEEE International Svmposium Computational Intelligence in Robotics and Automation (2003) Piccardi, M.: Background subtraction techniques: a review. In: Proceedings of IEEE International Conference on Systems (2004) Song, L., Hatzinakos, D.: A cross-layer architecture of wireless sensor networks for target tracking. IEEE/ACM Trans. Networking 15(1), 145–158 (2007) Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998) Wagner, R., Nowak, R., Baraniuk, R.: Distributed image compression for sensor networks using correspondence analysis and super-resolution. In: Proceedings of International Conference on Image Processing (2003) Wang, X., Wang, S.: Collaborative signal processing for target tracking in distributed wireless sensor networks. J. Parallel Distrib. Comput. 67(5), 501–515 (2007) Wang, H., Yao, K., Pottie, G., Estrin, D.: (2004) Entropy-based sensor selection heuristic for localization. In: Proceedings of International Symposium on Information Processing in Sensor Networks Wang, X., Wang, S., Bi, D., Ma, J.J.: Distributed peer-to-peer target tracking in wireless sensor networks. Sensors 7(6), 1001–1027 (2007) Wang, X., Wang, S., Bi, D.: Distributed visual-target-surveillance system in wireless sensor networks. IEEE Trans. Syst. Man Cybern. Part B Cybern. 39(5), 1134–1146 (2009) Wuand, M., Chen, C.W.: Collaborative image coding and transmission over wireless sensor networks. EURASIP J. Adv. Signal Proces. 2007, 1–9 (2006) Zhang, W., Cao, G.: DCTC: Dynamic convoy tree-based collaboration for target tracking in sensor networks. IEEE Trans. Wirel. Commun. 3(5), 1689–1701 (2004a) Zhang, W., Cao, G.: Optimizing tree reconfiguration for mobile target tracking in sensor networks. In: Proceedings of IEEE INFOCOM (2004b) Zhao, F., Shin, J., Reich, J.: Information-driven dynamic sensor collaboration. IEEE Signal Process. Mag. 19(2), 61–72 (2002)

Chapter 5

Multimedia Sensor Network Supported IoT Service

5.1 Introduction Based on the traditional information carriers such as the Internet and telecommunication networks, Internet of Things (IoT) is a network that interconnects ordinary physical objects with the identifiable addresses to provide intelligent services (Ma 2011). As the low price of sensors and wireless sensing networks, the development of communication techniques, and the emergence of various smart objects, the numbers of physical objects being connected will be increased to 50 billion by 2020 (Zhou et al. 2016). The proliferation of IoT extends the Internet into the physical world, such that objects can be managed remotely and act as physical access points to Internet services. With ubiquitous sensors, IoT can sense the activities we perform everyday by real-time tracking physical objects. Correspondingly, it opens up tremendous opportunities for economic and social development, which also accompanies immense technical challenges. At present, there are some significant research topics in the IoT, such as sensing network, big sensing data computing, security and privacy protection, and innovative IoT services (Ma et al. 2016). Among them, the innovative IoT services attract increasing attentions in the industry and academia recently. Different from Internet services, the main characters of IoT services are the active participation of physical objects in service processes. As shown in Fig. 5.1, the Internet services are performed by the interactions between users and cyberspace. Differently, the IoT services are executed by the interactions of users, cyberspace, and the physical world. Such services contain three different service patterns: • Information publish service utilizes ubiquitous sensors to sense the states of objects in the physical world. Then the sensed data will be actively transmitted

© Springer Nature Singapore Pte Ltd. 2021 H. Ma et al., Multimedia Sensor Networks, Advances in Computer Science and Technology, https://doi.org/10.1007/978-981-16-0107-1_5

203

204

5 Multimedia Sensor Network Supported IoT Service

Fig. 5.1 The illustration of IoT service patterns

into cyberspace and published to the users. Thus, the information publish service is driven by the data in the physical world. • Sensing-controlling service, which is also called Cyber-Physical System, is described as smart systems including computational and physical components. These components are seamlessly integrated and closely interacting to sense the changing states of the real-world. In sensing-controlling service, the embedded computers and networks will monitor and control the physical processes, usually with feedback loops where physical processes affect computations and vice versa. Therefore, the sensing-controlling service is driven by events in the physical world. • IoT search service is a physical object search process. When a user wants to search the location and state of a physical object, he/she sends a query into the cyberspace. Through the information exchange between the cyberspace and physical world, the system will return the locations and states of searched physical object in the real-world. Hence, the IoT search service is driven by the users directly. Doubtlessly a plethora of data hungry applications will benefit from the availability of rich information. Multimedia sensor networks provide an important supportive platform for sensing, transmission, and processing of multimedia data in the IoT. Our group developed three versions of MSN prototype, shown in Fig. 5.2, which can provide IEEE 802.11b- and 802.15.4-based networking connectivity. Our multimedia sensor nodes adopt the modular structure, which consists of the main

5.1 Introduction

(a)

205

(b)

(c)

Fig. 5.2 Prototypes of multimedia sensor network deployed outside our laboratory. (a) Version I. (b) Version II. (c) Version III

Fig. 5.3 Version III nodes for monitoring the environment and conditions of Great Wall in Yulin

processing module, wireless communication module, image capturing module,1 and power supply module. Moreover, we also deploy a small size experimental network of 7 multimedia sensor nodes (Version III) in the filed. Therefore, the main node modules are encapsulated into a waterproof and dustproof box. We also exploit the solution of solar power supply, and provide 802.11b/g- and 4G- based networking connectivity. This network is deployed on a part of Ming Great Wall in Yulin, Shaanxi province of China (see Fig. 5.3), and has been running for nearly two years. The camera sensor nodes collect and send back the images of Great Wall for monitoring intruders and detecting wall cracks. Because of the ability to ubiquitously capture multimedia content from the environment, multimedia sensor networks have great potential for strengthening the traditional wireless sensor networks’ applications, as well as creating a series of new applications such as IoT searching. Take urban vehicle search for example, given a vehicle image as probe, to search in a database for images that contain the same vehicles captured by multiple cameras. Vehicle search can quickly discover, locate, and track the target vehicles in large scale surveillance videos, which has many potential applications in urban video surveillance, intelligent transportation system,

1 In order to save the economic cost, the audio capturing module is integrated in the main processing

module.

206

5 Multimedia Sensor Network Supported IoT Service

and urban computing. However, IoT searching faces two main challenges: (1) how to search optimal results as the large intra-class differences of the same object in different nodes, and subtle inter-instance differences between different objects in the same views; (2) how to realize the real-time object search in the large-scale networks. In this section, we firstly analyze the service patterns in the IoT, and summarize the main characters and challenges of IoT search. To solve the challenges, we proposed a progressive search paradigm (Ma and Liu 2017), which contains three important search strategies: (1) coarse-to-fine search in feature space; (2) near-todistant search in spatial-temporal space; and (3) low-to-high permission search in the security space. The strategies all utilize the simple features and computation to instantly reduce the search space, in which complex matching process can be efficiently exploited to finely find the matched objects. Based on the progressive search paradigm, we further proposed PROVID, a PROgressive Vehicle re-IDentification framework based on deep neural networks (Liu et al. 2018). In particular, our framework not only utilizes the multi-modality data in large-scale video surveillance, such as visual features, license plates, camera locations, and contextual information, but also considers vehicle re-identification in two progressive procedures: coarse-to-fine search in the feature domain, and nearto-distant search in the physical space. Furthermore, to evaluate our progressive search framework and facilitate related research, we construct the VeRi dataset, which is the most comprehensive dataset from real-world surveillance videos. It not only provides large numbers of vehicles with varied labels and sufficient cross-camera recurrences but also contains license number plates and contextual information.

5.2 Searching in IoT 5.2.1 Motivation As an innovative service pattern in IoT, the IoT search attracts massive research attentions in recent years (Zhang et al. 2011). Just as the information search in the traditional Internet has become one of the most popular services, we believe that IoT search will be a killer IoT service that allows to search for physical objects with a certain state. The vast number of sensors connected to the IoT, the rapidly changing data in sensors, and the requirement to search for the realtime states of physical objects would all make huge demands on a real-world search engine (Romer et al. 2010). IoT search has many potential applications in a wide range of human-centric services, whose primary tasks are to facilitate interactions between human users and intelligent systems, such as smart cities, home automation, manufacturing, smart grids, and network management (Zhou et al. 2016). However, because billions of diverse sensors – deployed or embedded into infrastructures and objects – are keeping acquiring information, IoT gives a much

5.2 Searching in IoT

207

larger and higher dynamic searching space than traditional information search on the Internet. The multiple sensing modes and multi-source cooperation in IoT search undoubtedly bring massive multi-modality data to be sensed, processed, transmitted, and analyzed. Moreover, because the states of objects in the physical world change very frequently in a short time, the search process must be finished in real-time. Therefore, IoT search is an extra challenging problem which requires pretty high search accuracy and speed. On the other hand, different with the Internet search, it still lacks of a comprehensive platform for IoT search. In this section, we explore to implement the IoT search on the multimedia sensors based urban sensing network. This network contains interconnected multimedia sensing nodes that are able to ubiquitously capture multimedia data from the physical world. The conventional sensing networks are always specialized for a particular purpose, which is constrained by resource limitations on a single sensor node. Differently, the multimedia sensors based urban sensing network is able to retrieve, store, and process multimedia data in the city. Moreover, it can also correlate and fuse multimedia data originated from heterogeneous sources. However, there is no doubt that the big data sensed by the multimedia sensors make a much larger and higher dynamic searching space than traditional information search on the Internet. In addition, the states of entities observed by sensors also continuously change. Besides, the rich contents contained in audio and visual streams demand much more sophisticated processes for sensing, processing, transmitting, securing, and analyzing in the urban sensing network. In conclusion, as the huge number of objects in the IoT, and their highly dynamic states, IoT search in the multimedia sensors based urban sensing network is a very challenging task.

5.2.2 Concept of IoT Search The IoT search can be defined as the process of finding physical objects with the given states captured by the diverse sensors in the IoT. Therefore, the IoT search includes four elements: (1) physical objects, (2) diverse sensors which sense the states of physical objects, (3) queries to search for physical objects with a certain state, and (4) a search engine that accepts queries and returns references to objects matching the query. In practice, it is often sufficient to return a subset of (the most relevant) objects matching the query. With the above problem description, we can define IoT search as R  (O, S, Q, E), where O is the set of physical objects, S is the set of sensors, Q is the set of queries, and E is the IoT search engine. Furthermore, the query set Q can be defined as Q  {qr |qr ∈ O × A × V },

(5.1)

208

5 Multimedia Sensor Network Supported IoT Service

where A is the set of object attributes, V denotes the set of attribute values sensed by S. Finally, we can define the search engine as E : Q × S (→ O.

(5.2)

It means that given a query qr ∈ Q and sensor set Sr ⊆ S, the searched results are Or = E(qr , Sr ), where Or ⊆ O.

5.2.3 Characters of Searching in IoT The key character of service patterns in IoT is the active participation of physical objects in the service process. Therefore, as shown in Table 5.1, the main differences between Internet search and IoT search are as follows: (1) Searching Targets. In traditional Internet search, the search targets are the information entities in the cyberspace, such as text terms, images, and videos. Differently, the targets of IoT search are more wide, which contain not only information entities, but also physical object’s location and states in the realworld environment. (2) Interactive Modes. As shown in Fig. 5.1, besides the traditional humancomputer interaction in cyberspace, the searching in IoT needs the humancomputer-object interaction in cyberspace and physical world. It means that the users not only need to obtain the information of the searched results, but also want to operate and control the searched objects to realize their searching target. (3) Data Sources. Besides the traditional cyberspace, the data source of IoT also includes the physical world. The billions of sensors will generate immense data originated from heterogeneous sources continuously, which give a much larger, diverse, and higher dynamic searching space than traditional cyberspace in Internet. (4) Results Presentation. Different from information ranking in traditional Internet search, searching in IoT usually aims at locating physical objects in the spatio-temporal space. For example, people and vehicle searches require the system to detect, locate, and track the target objects for further tasks. Therefore,

Table 5.1 Comparison of internet search and IoT search Characters Target Interactive modes Data source Results presentation Timeliness

Internet search Information entities Human + Computer Cyberspace Information ranking Non-real-time

IoT search Physical objects Human + Computer + Object Cyberspace + Physical World States + Spatio-temporal Real-time

5.2 Searching in IoT

209

the results presentation in IoT search includes both the states and spatialtemporal information of physical objects. (5) Information Timeliness. The physical objects in the real-world are real-time changing with the variation of time and environments. Although the cyberspace in Internet search is dynamic, the timeliness of object state information captured by sensors in IoT are more sensitive. Consequently, IoT search must be realtime to guarantee the information timeliness.

5.2.4 Challenges of Searching in IoT Although traditional search on the Internet has already attracted extensive research, the searching in IoT is still in its early stage. Rather than simply porting the traditional internet search into a new scenario, searching in IoT meets the following unique challenges: • The comprehensive sensing of physical environments – Different from Internet search which employs the web crawler to crawl webpage information on the server, the IoT search utilizes the diverse sensors to capture the physical objects information. However, as the huge search space and massive data in the physical world, how to comprehensively sense the physical world is the primary challenge. First of all, multiple sensing modes are exploited to collect the realworld information. For example, we not only need to utilize the proactive deploy sensors to collect the states of objects in special regions, but also explore to take advantage of the sensors with mobile devices to expand the sensed region and decrease the cost. Moreover, the emerging social network also gives us a more effective way to collect the data. Besides combing multiple sensing modes, multisource cooperation is also another important topic in IoT search. Multiple kinds of sensors give us multi-modality data. How to holistically collect and purify the beneficial information from multimedia data, low quality data, and looselycoupled data is an another great challenge for IoT search. • The correlation discovery of massive multiple-modalities data – Different from Internet search, multiple sensing modes and multi-source cooperation in IoT search undoubtedly bring massive multi-modality data. How to effectively and efficiently preprocess these data and discover their correlation is another great challenge. The relevant discovery of massive multiple-modalities data contains two important steps: correlation mining and value reconstruction. The physical objects in real-world contains complex spatial, temporal, and logic correlations. For example, in the vehicle search, the vehicles’ spatial-temporal trajectory will give significant clues to find the same vehicles. Moreover, the effects of traffic rules, traffic lights, and road network also need to be considered. Besides, different from the structural data in Internet search, although the quantity of multiple-modalities data in IoT search is huge, their values are sparse

210

5 Multimedia Sensor Network Supported IoT Service

and fragmentary. Therefore, how to mine the valuable information from these data need to be further studied. • The guarantee of computing timeliness – The main difference between the Internet search and IoT search is that, besides the cyberspace, the search space of IoT also contains the huge physical world. It not only means the IoT search needs larger scale sensing network to capture the information in the physical world, but also produces much huger search space to be processed and indexed. The huger search space brings two higher search requirements. Firstly, higher search accuracy is needed. With the growing of the search data, very small search errors will cause massive noises and unrelated search results, which may greatly decrease the search effectiveness. Moreover, the huger search space undoubtedly increases the search time. However, the real-time search experience is the basic demand of IoT search. One side, the states of physical objects are instantly changed; on the other side, the users hope to obtain the search results as soon as possible. Therefore, besides obtaining the high search accuracy, how to guarantee the search timeliness is one major challenge for IoT search. • The information security and privacy protection – the IoT interconnects trillions of smart things around us, which can collect, store, process and communicate information about the things and their physical environment (Ziegeldorf et al. 2014). Especially with IoT search, the increasingly invisible, dense and pervasive collection, processing and dissemination of data in people private lives bring serious security and privacy concerns. Ignorance of those issues can have undesired consequences, e.g., damage to reputation and costly law suits. Therefore, how to protect the information security and personal privacy in the IoT search is a big problem. The search system must set particular access permissions for different users, who will obtain different search results matching their access permissions. For example, when users to search specific vehicles in the intelligent urban traffic surveillance system, the low-level permission users can only obtain the vehicles’ location and time information. Differently, the users with high-level permission may further acquire the personal information of the vehicle owners for further searches.

5.2.5 The Progressive Search Paradigm The essence of IoT search is exploiting the machine to imitate and supersede the human beings to sense the physical world and find the needed objects with diverse sensors. According to our observations, when human search the objects in the physical world, his/her cognitive way is usually from the appearance to the intrinsic property, and from the easy to the difficult. In other words, the search is a progressive process. Therefore, by imitating the human search process, we propose a progressive search paradigm with the captured multi-view information in the IoT. The basic idea of progressive search in IoT is that firstly utilizing the easily computed appearance information to filter out the most dissimilar objects.

5.2 Searching in IoT

211

Then in the decreased search space, more complex intrinsic properties can be extracted for further search. To achieve this, three important search strategies are exploited in the progressive framework: (1) coarse-to-fine search in feature space; (2) near-to-distant search in spatial-temporal space; and (3) low-to-high permission search in security space. Although in three different spaces, the strategies all try to simultaneously improve the search accuracy and speed in one unified framework – utilizing the simple computation to instantly reduce the search space, and then exploiting complex but more discriminative features to significantly improve the search accuracy. The proposed progressive search framework can comprehensively decrease the computation complexity, guarantee the timeliness, and improve the security. Next, we will introduce each strategy in detail.

5.2.5.1

Coarse-to-Fine Search Strategy

As the multiple sensors and sensing modes in IoT search bring massive multimodality data, how to holistically exploit the complementary features extracted from these multi-modality data to improve the search accuracy and speed is the first key point in the progressive search framework. First of all, according to the computation complexity and discriminative power, we can classify these data into three categories: (1) appearance features—this kind of features can be directly and easily extracted, which usually capture the objects’ external or global information, such as the appearance features of vehicle and people; (2) semantic features— this kind of features are abstracted from the external features, which contains the semantic description of the physical objects, such as the categories, characters, and functions; and (3) unique features—most of these features can uniquely identify an object, meanwhile they are difficult to be extracted and easily disturbed by the noisy information. In the coarse search process, the light-weight appearance features and middleweight semantic features are firstly utilized to instantly filter out the most dissimilar objects. Although these features are not discriminative enough to uniquely identify the final results, they can significantly reduce the search space with little time cost. Then in the next fine search, we can employ high-weight unique features to improve the search accuracy. As an example, for vehicle search, we can extract and match the license plate information to further verify whether the targets are same with the query. Although the extractions of these features are time consuming, as the search space is greatly decreased by the first coarse search step, the search speed is acceptable. More important, these features are discriminative enough to uniquely find the target instances.

5.2.5.2

Near-to-Distant Search Strategy

One of the important differences between physical object search in IoT and conventional Internet search is that the objects contain spatio-temporal information

212

5 Multimedia Sensor Network Supported IoT Service

captured by the sensors in the real-world space. In practice, it is reasonable to perform the object search with a from-near-to-distance fashion in the spatiotemporal space. From the observation in Liu et al. (2016a), we make a general assumption that: two entities have a higher possibility to be the same if they have small spatial or temporal distance, and lower possibility to be the same if they have large spatial or temporal distance. Therefore, we can holistically exploit the complex spatio-temporal correlation to further re-rank the objects in the coarse search step to reduce the search space or re-rank the final entities search results in the final step to improve the search accuracy. Therefore, the spatio-temporal similarity is a significant clue to improve the search speed and accuracy for the physical object search in real-world.

5.2.5.3

Low-to-High Permission Search Strategy

It is a big issue to protect the information security and personal privacy in the IoT search. It means that users with different access permissions will obtain different search results that match their permissions. So in the progressive search framework, the users will firstly obtain the search results in the low-level permission scope. If he has higher permission, he can further implement the fine search in the previous results to get more useful search results with more high-level permission information. The low-to-high permission search strategy can simultaneously ensure the search timeliness and privacy security. For example, in the vehicle search, the users with low-level permission can only retrieve the vehicles with appearance features and license plate information. Differently, the users with high-level permission can further utilize the drivers’ personal information, e.g., personal ID, the GPS position of vehicles and mobile phones, credit card consumption information, etc., to obtain more useful information of the vehicles.

5.2.6 Progressive IoT Search in the Multimedia Sensors Based Urban Sensing Network As shown in Fig. 5.4, the rapid proliferation of urbanization has modernized many people lives, and also engendered critical issues, such as traffic congestion, energy consumption, and environmental pollution (Zheng et al. 2014). Nowadays, multiple multimedia sensing technologies and large-scale computing infrastructures have produced a vast variety of big data in urban spaces, which provide rich knowledge about a city to help tackle these challenges. Consequently, the intelligent urban computing, which holistically exploits the big data in cities to improve the urban environment, human life quality, and city operation systems, attracts massive attention in academia and industry. Many efforts have been dedicated to connecting unobtrusive and ubiquitous sensing technologies, advanced data management and

5.2 Searching in IoT

213

Fig. 5.4 The overview of multimedia sensors based urban sensing system. The progressive search paradigm gives a possible way to solve the large-scale object search in the urban sensing network. Moreover, the multimedia sensors in the urban network also give massive multiple-modalities data to support progressive IoT search

analytic models, and novel visualization methods to construct intelligent urban sensing networks for smart cities. IoT search is one of the most important service patterns in the urban sensing network, which is emerging and becoming pervasive in the field of urban planning, transportation systems, environmental conservation, energy consumption, economy, and public security. We will discuss applications of progressive IoT search in the multimedia sensors based urban sensing network. First of all, the multimedia sensors based urban sensing network contains interconnected nodes which are able to ubiquitously capture multimedia data from the environment. Different from conventional sensing networks which are specialized for a particular purpose constrained by severe resource limitations on a single sensor node, multimedia sensor networks can produce immense data. In addition to the ability to retrieve multimedia data, multimedia sensors based urban sensing network will also be able to store, process in real-time, correlate and fuse multimedia data originated from heterogeneous sources. However, given that millions of multimedia sensors are keeping acquiring information, multimedia sensing network provides a much larger and higher dynamic searching space than traditional Internet searches. Moreover, the states of objects observed by sensors change very frequently. The rich contents contained

214

5 Multimedia Sensor Network Supported IoT Service

in acoustic and visual streams demand much more sophisticated processes for sensing, processing, transmitting, securing, and analyzing in the multimedia sensing network. Therefore, the multimedia sensors in the urban network give massive multiple-modalities data to support progressive IoT search. More important, the progressive search paradigm also supplies a possible way to solve the large-scale IoT search in the multimedia sensors based urban sensing network.

5.3 PROVID: Progressive and Multi-modal Vehicle Re-identification for Large-Scale Urban Surveillance 5.3.1 Motivation Vehicles, such as cars, buses, and trucks, have been an indispensable part of human life as well as an important class of objects in urban surveillance systems. Many researchers in the multimedia and computer vision fields have focused on vehicle-related research, such as detection (Feris et al. 2012), fine-grained categorization (Yang et al. 2015), 3-D pose estimation (Matei et al. 2011), and driver behavior modeling (Li et al. 2013). Nevertheless, vehicle re-identification (Re-Id) is a significant but frontier area that is always overlooked and far from solved by the community. Taking a query vehicle as the input, vehicle Re-Id aims to search in the surveillance data and find the same vehicle recorded by different cameras, as shown in Fig. 5.5. Vehicle Re-Id can be pervasively applied in intelligent surveillance systems (Valera and Velastin 2005), smart transportation (Zhang et al. 2011), and urban computing (Zheng et al. 2014). Through the ubiquitous surveillance networks, it can quickly tell users where and when the vehicle was in the city. Vehicle Re-Id can be considered as an instance-level object search task, which is different from traditional vehicle detection, tracking, and categorization problems. Similar to near-duplicate image retrieval (Song et al. 2013; Xie et al. 2015), content-based video search (Hu et al. 2011), and object instance search (Meng et al. 2016), vehicle Re-Id is to find the vehicle with the same identity from urban surveillance videos. In real-world practice, humans can treat this task in a progressive manner (Ma and Liu 2017). For instance, if the security officers need to find a suspect car in a city with large-scale video surveillance networks, appearance attributes such as models, types, and colors can be initially used to find similar vehicles and reduce the search field. Then, they can identify the targets precisely from the filtered vehicles by matching the license plates, which can reduce the enormous workload. Meanwhile, they will search videos recorded by cameras from near to far positions and from close to distant time range. Therefore, the contextual information such as spatiotemporal cues thus can decidedly assist in the search process. Inspired by real-world practice, we can construct a progressive vehicle search framework in a two-step procedure with multi-level attributes and multimodal data: (1) searching from coarse to fine in the feature domain, which first

5.3 PROVID: Progressive and Multi-modal Vehicle Re-identification for. . .

215

Fig. 5.5 An example of vehicle Re-Id: searching for the white BMW SUV with multi-modality data in urban surveillance

employs the appearance features for a coarse but fast filter and then exploit the license plate as the unique identifier to find the same vehicles; and (2) searching from close to far in the physical world, which considers the time and locations as the key cues for vehicle search. Nevertheless, the construction of the progressive vehicle re-identification framework with multi-modal data from practical urban video surveillance faces three significant challenges: first the appearance-based methods usually cannot give satisfactory results because of the trivial inter-class differences between different vehicles from similar viewpoints and the dramatic within-class differences of the same vehicle from various viewpoints, as shown in Fig. 5.6a. Moreover, conventional license plate recognition systems can hardly recognize the license plate in an unconstrained surveillance environment because of the various lightning conditions and viewpoints, noise, and low resolution, as shown in Fig. 5.6b. In addition, the plate recognition system usually contains multiple procedures such as plate localization, calibration, character segmentation, and recognition, as in Wen et al. (2011) and Du et al. (2013). If one of the steps fails or any of the characters on the plate is mis-recognized, the vehicle Re-Id results might be incorrect. How to utilize the license plate effectively and efficiently in unconstrained urban surveillance is a crucial challenge. Furthermore, the contextual information, such as the spatiotemporal pattern of vehicles, camera locations, and topology of the

216

5 Multimedia Sensor Network Supported IoT Service

(a)

(b)

(c)

Fig. 5.6 (a) The same vehicles have great within-class differences in different viewpoints (left). Different but similar vehicles have trivial inter-class differences (right). (b) The license number plates as the unique ID for vehicle search. (Plate is masked to protect privacy.) (c) The contextual information can assist in vehicle search in the city

city roads is difficult to discover and model. The environmental factors and the driver’s behavior can introduce great uncertainty (Li et al. 2013). How to utilize the contextual information is another great challenge. Existing vehicle Re-Id approaches are predominantly focused on appearance features of vehicles, such as colors, types, shapes, and detailed attributes (Feris et al. 2012; Liu et al. 2016c,d; Zapletal and Herout 2016). Therefore, they can hardly differentiate among vehicles with similar models and colors and identify the same vehicle in a varied environment. Moreover, they usually overlook unique identifiers, such as number plate when matching a vehicle. In contrast, we comprehensively utilize the appearance attributes and the license plate information in a coarse-tofine manner for vehicle search. The appearance features can be employed to find the similar vehicles, and then the license number plates are used to match the same vehicle precisely. In addition, existing approaches neglect the spatiotemporal context. Contextual information has been exploited in several research fields such as intelligent surveillance (Kettnaker and Zabih 1999), cross-camera person tracking (Javed et al. 2008), person Re-Id (Sunderrajan and Manjunath 2016), and object retrieval (Xu et al. 2013). With contextual cues recorded by the surveillance system, we treat the search procedure by a from-close-to-far manner in the physical space. This section proposes a PROgressive Vehicle re-IDentification framework based on deep neural networks, named PROVID, which features four important properties: (1) a progressive vehicle Re-Id paradigm is designed to exploit multi-modality data in urban surveillance such as multi-level visual features, license plates, camera locations, and contextual information; (2) the appearance of the target vehicle is used as a coarse filter by integrating hand-crafted features and high-level attributes learned by convolutional neural network; (3) a Siamese neural network is adopted to verify license number plates for precise vehicle search; and (4) a spatiotemporal model is exploited to further improve the search procedure. Particularly, we consider the plates as the fingerprints of vehicles, and we just need to verify two plate images instead of precisely recognizing the characters. Furthermore, a spatiotemporal relation (STR) model is designed as the context to re-rank the results.

5.3 PROVID: Progressive and Multi-modal Vehicle Re-identification for. . .

217

To evaluate the proposed framework and facilitate related research, “VeRi”, a comprehensive vehicle Re-Id dataset, is constructed from a practical urban video surveillance system. It includes not only large numbers of vehicles with various annotations and sufficient cross-camera recurrences but also plenty of license plates and spatiotemporal information. Extensive experiments on the VeRi dataset demonstrate that our PROVID framework achieves excellent accuracy and speed. Finally, we discuss several extension of the progressive search, which can be utilized in various applications. Compared with our previous works (Liu et al. 2016c,b), we propose a Null space based Fusion of Color and Attribute feaTure model (NuFACT), which can significantly improve the accuracy for appearance-based vehicle search, e.g., 29.73% in mean Average Precision (mAP) and 24.55% in HIT@1. In Liu et al. (2016c,b), the texture, color, and high-level attributes are fused by direct earlyfusion or late-fusion strategy, while the NuFACT adopts a Null Foley-Sammon Transform (NFST)-based metric learning approach for fusion of multi-level features. It can not only learn discriminative representation of vehicle appearance from different viewpoints but also reduce the feature redundancy (from approximately 7, 000-D to 1, 000-D) to guarantee efficiency. To evaluate the adaptation ability of PROVID under different conditions, we conduct extensive experiments on two large-scale vehicle Re-Id datasets, i.e., VeRi (Liu et al. 2016b) and VehicleID (Liu et al. 2016d). Comprehensive experiments demonstrate that PROVID not only dramatically improves the accuracy but also reduces the computational cost for vehicle Re-Id.

5.3.2 Related Work Vehicle re-identification/search. Vehicle search, or Re-Id, is a frontier area with limited related research in recent years. Feris et al. (2012) designed a vehicle detection and retrieval framework. They first classified vehicles by type, size, and color, and then organized and retrieved vehicles with a relational database. Yang et al. (2015) proposed the adoption of the deep convolutional neural network for fine-grained vehicle categorization, model verification, and attribute prediction, and collected a vehicle image dataset, CompCars, to validate the proposed method. Recently, Liu et al. (2016c) explored some appearance features, such as the texture, color, and semantic attributes learned by convolutional neural networks. They also built an appearance-based model by integrating low-level and high-level semantic features for vehicle search. Liu et al. (2016d) proposed a Deep Relative Distance Learning (DRDL) framework, which could jointly learn the feature representation and metric mapping. Nevertheless, appearance-based methods can hardly distinguish among similar vehicles from the same viewpoints and identify the same

218

5 Multimedia Sensor Network Supported IoT Service

vehicle under different conditions, such as various illuminations and viewpoints. Additionally, the license plate, as the distinct property of vehicles, should be utilized to precisely identify the same vehicle. Furthermore, existing datasets, such as CompCars (Yang et al. 2015) and VehicleID (Liu et al. 2016d), only provide the appearance labels such as types and models, neglecting the license plate and contextual information, which are important for vehicle Re-Id in large-scale urban surveillance. License plate for vehicle search. In real-world practice, parks and highways have adopted license plate recognition systems to identify vehicles (Wen et al. 2011; Du et al. 2013). However, existing systems require high-quality license plate images. Therefore, the cameras are usually installed in constrained situations such as entrances of parks or toll gates of highways, calibrated with proper viewpoints, and require auxiliary infrastructure such as flashlights and sensors. While in unconstrained traffic environments, the license plate recognition system can not work well because of uncertain factors such as various lightning conditions and occlusions (Feris et al. 2012; Liu et al. 2016c). Thus, we propose to verify the license plates instead of recognizing all characters of the plates. Recently, deep learning models, such as convolutional neural networks (CNNs), have obtained state-of-the-art results in many multimedia and vision tasks such as image categorization (Krizhevsky et al. 2012), object detection (Girshick et al. 2015), image analysis (Frome et al. 2013), video summarization (Liu et al. 2015), and multimedia retrieval (Mei et al. 2014). In particular, Bromley et al. (1993) proposed a Siamese Neural Network (SNN) for hand-written signature verification. SNN is built with two CNNs with shared parameters to extract discriminative features, and trained by the contrastive loss to learn a latent space for the similarity metric. Chopra et al. (2005) employed the SNN to verify faces and achieved state-of-the-art results. Zhang et al. (2016) propose to identify persons with gait features learned by SNN and obtain significant improvement. Inspired by these methods, we adopt SNN to verify license plate in our vehicle Re-Id framework. Contextual models. Contextual information, e.g., the spatiotemporal records, object locations, and topology of cameras, has been widely exploited in multicamera systems (Kettnaker and Zabih 1999; Javed et al. 2008; Xu et al. 2013). For examples, Kettnaker et al. (1999) adopted a Bayesian estimation model to assemble likely paths of objects over different cameras. Javed et al. (2008) proposed to estimate the inter-camera correspondence with spatiotemporal information for cross-camera person tracking. Recently, Xu et al. (2013) designed a graph-based object retrieval framework to find persons and cyclists on the campus. However, existing approaches usually consider objects that move at low speed, such as persons and cyclists. In addition, they mainly focus on constrained environments, e.g., parks, campuses, and buildings. In an urban area, the traffic scenes, such as roads and crossroads, are mostly unconstrained environment with significant uncertainty due

5.3 PROVID: Progressive and Multi-modal Vehicle Re-identification for. . .

219

to the complex environments and varied road topology. We can still gain some insights from the above works to exploit the contextual cues for vehicle Re-Id.

5.3.3 Overview of the PROVID Framework In Fig. 5.7, we show the architecture of the PROVID framework. In our framework, the input query is a vehicle image and contextual information from the surveillance system, e.g., the camera ID and spatiotemporal cues. With the query, the PROVID framework can search for the same vehicle by three procedures: (1) coarse filtering by vehicle appearance: the framework utilizes the appearance model to find the vehicles that have similar texture, shape, color, and type in surveillance videos; (2) precise search by license plate verification: with the Siamese neural network, the license plate distances between the query vehicle and gallery vehicles are estimated for the filtered vehicles to match the same vehicles; (3) the spatiotemporal relation model (STR) is proposed to re-rank the previous results and identify the optimal vehicles.

Fig. 5.7 Framework of PROVID

220

5 Multimedia Sensor Network Supported IoT Service

5.3.4 Vehicle Filtering by Appearance 5.3.4.1

Multi-level Vehicle Representation

In practical vehicle search, it is effective to filter vehicles by appearance features, e.g., texture, shape, type, and color. Besides, these features can be extracted and matched efficiently in large-scale data. In our previous work (Liu et al. 2016c), we propose to use multi-level appearance feature as the coarse filter to search for the vehicles that have similar appearance. For the texture feature, we adopt the traditional Scale-Invariant Feature Transform (SIFT) (Lowe 2004) as the local descriptor. Then, the bag-of-words (BOW) model is used to quantized the SIFT descriptor because of the efficiency and effectiveness in multimedia retrieval (Sivic and Zisserman 2003). For the color feature, the Color Name (CN) descriptor (Van De Weijer 2009) is extracted and then encoded by the BOW for high-accuracy person re-identification (Zheng et al. 2015). For the highlevel semantic features, we exploit a deep convolutional neural network (CNN), i.e., the GoogLeNet Szegedy et al. (2015), as the feature extractor. The CNN is pretrained on the ImageNet dataset (Russakovsky et al. 2015) and fine-tuned on the CompCars dataset (Yang et al. 2015) which has been labeled with many detailed attributes, e.g., the light shape, the number of seats, the number of doors, and the vehicle model. Therefore, by fine-tuning on CompCars, the model can learn many rich high-level semantic features that are very effective for vehicle search.

5.3.4.2

The Null-Space-Based FACT Model

The FACT model in the work of Liu et al. (2016c) adopted a post-fusion scheme to directly sum the Euclidean distances of three types of features extracted from vehicle images. However, it cannot effectively integrate the complementary multilevel features. The Null Foley-Sammon Transform (NFST) was first proposed to address the small sample size problem in face recognition (Guo et al. 2006). Zhang et al. (2016) proposed a Kernelized NFST for person Re-Id by mapping the multiple features into a discriminative null space; this method significantly outperforms the state-of-the-art methods. In this section, we propose a Null-space-based FACT (NuFACT) to extract effective and robust representations for vehicle appearance. The NFST is one type of metric learning methods; other examples of metric learning methods include Linear Discriminant Analysis (LDA) and Foley-Sammon Transform (FST) (Foley and Sammon 1975). The basic idea of the FST is to learn a projection matrix W ∈ Rd×m and maximize the Fisher discriminant criterion: J (w) =

w) Sb w , w) Sw w

(5.3)

where w denotes a column of W, and Sb and Sw are the between-object scatter matrix and within-object scatter matrix, respectively. With W, the original visual

5.3 PROVID: Progressive and Multi-modal Vehicle Re-identification for. . .

221

Fig. 5.8 The appearance features of the same vehicle are mapped to a single point by NFST

features can be mapped into a latent metric space in which the distances of features from the same object are much smaller than those of features from different objects. However, NFST aims to learn a null space by adopting an extreme restrictive constraint: w) Sw w = 0,

(5.4)

w) Sb w > 0.

(5.5)

In the null space, the features of each object are collapsed to a single point, which means the intra-object distance is zero and inter-class distance is positive, as shown in Fig. 5.8. Furthermore, to learn a discriminative null space for person Re-Id (Zhang et al. 2016), Zhang et al. introduce a kernel function (x) to NFST that can map the original feature x into an implicit high-dimensional space. During learning of the discriminative null space on the training data, the multiple features are fused effectively and can generate a discriminative representation for person Re-Id. In this section, we adopt the discriminative NFST method to integrate the multilevel features of vehicles, i.e., texture, color, and high-level attribute features. First, the three types of features, Xt , Xc , and Xa , are extracted from all training vehicle images and concatenated to obtain the original appearance feature as X = (Xt , Xc , Xa ). Then, the training features X are kernelized by (x) to obtain (X). Finally, the projection matrix W of the discriminative null space is learned by NFST on (X) as in the work of Zhang et al. (2016). In the test phase, the original features Xq and Xg of the query and gallery vehicles are also kernelized with (x) and mapped by W. Finally, the similarity of the query and gallery vehicles can be measured by the Euclidean distance in the discriminative

222

5 Multimedia Sensor Network Supported IoT Service

null space. By NFST-based multi-level feature fusion, the vehicles that have the similar appearance to the query are obtained effectively and efficiently. After this procedure, a small number of vehicles are extracted from the whole database of vehicles. Nevertheless, it can hardly uniquely match images of the same vehicle based only on appearance features, which cannot distinguish similar vehicles with trivial inter-class differences due to environmental factors. In these situations, the distinct identifier, i.e., the license plate, must be considered for precise vehicle search.

5.3.5 License Plate Verification Based on Siamese Neural Network As shown in Fig. 5.6b, the characters on a license plate can hardly be recognized correctly in unconstrained environments because the varied viewpoints and lightning conditions cause the plate images to be blurry. In addition, license plate recognition systems are usually composed of several components such as plate detection, calibration, character segmentation, and recognition. Thus, the license plate recognition techniques are unsuitable for the vehicle Re-Id task. Therefore, we propose to verify the license plate instead of recognizing the plate number for precise vehicle search. The Siamese neural network (SNN) proposed by Bromley et al. (1993) was originally designed to verify hand-written signatures. SNN is built with convolutional layers to discover the feature representation and fully-connected layers to learn a mapping function from the large number of training images. With SNN, the discriminative features can be extracted directly from image pairs, and then the features are mapped into a metric space in which the distance between different objects is large while the distance between the same objects is small. Therefore, SNN is very suitable for tasks in which there are large numbers of objects but the samples of all the classes are insufficient. Decidedly, SNN can be adopted for license plate verification which has this property. In our framework, we designed the SNN for plate verification as illustrated in Fig. 5.9. Two parallel CNNs have the same structure and share the same weights in forward and backward computations. Each CNN is built with two convolutional layers and max-pooling layers for feature representation, and three fully connected layers to learn the metric space. The detailed parameters are selected as shown in Fig. 5.9. In the training phase, a pair of license plate images is assigned a value 1 if they have the same number and 0 otherwise. After that, the contrastive loss layer takes the output features of the last layer and the labels as the input to calculate the cost of the model. With the Stochastic Gradient Descent algorithm, the SNN is optimized with the contrastive loss. In particular, we denote by W the weights of the neural network, and by x1 and x2 a pair of input plates. The features obtained by the forward propagation can be

5.3 PROVID: Progressive and Multi-modal Vehicle Re-identification for. . .

223

Fig. 5.9 The architecture of license plate verification based on the Siamese neural network

denoted by SW (x1 ) and SW (x2 ). The difference between x1 and x2 is denoted as EW (x1 , x2 ) = ||SW (x1 ) − SW (x2 )||.

(5.6)

With EW (x1 , x2 ), the contrastive loss is defined as L(W, (x1 , x2 , y)) = (1 − y) · max(m − EW (x1 , x2 ), 0) +y · EW (x1 , x2 ),

(5.7)

where (x1 , x2 , y) is a three-tuple of two training plates and the corresponding label, and m is a positive hyperparameter to adjust the margin (m = 1 in our method). In our framework, the Caffe deep learning tool (Jia et al. 2014) is adopted to implement the SNN and train the model. In the testing phase, the output of the second fully connected layer (FC2) in the learned SNN is extracted as the 1,000-D feature representation for the plate images. Finally, the similarity of two input plates is computed by the Euclidean distance.

5.3.6 Spatiotemporal Relation-Based Vehicle Re-ranking In practical vehicle search, humans usually execute the search process in a closeto-far manner in the physical world. Therefore, the spatiotemporal information is explored in our progressive vehicle Re-Id framework. Nevertheless, how to model the behavioral features of vehicles and discover the spatiotemporal property of the same vehicle remains a significant challenge, especially in unconstrained environments and with only video surveillance networks. To explore the effect of spatiotemporal information for vehicle Re-Id in unconstrained scenes, we select 20,000 pairs of the same vehicles and 20,000 pairs of vehicles that are picked randomly. Then, the spatiotemporal difference of each pair is calculated for analysis. The histograms in Fig. 5.10 show the statistics (the spacial distances and temporal distances of all samples are normalized to [0, 1] for

224

5 Multimedia Sensor Network Supported IoT Service

Fig. 5.10 Statistics of spatiotemporal information. (a) Histograms of space distances. (b) Histograms of time distances

better representation). It is obvious that the pairs of the same vehicles have smaller spatiotemporal differences than the pairs of randomly selected vehicles. Hence, an assumption is made based on this observation: two images are more likely to be the same vehicle when their spatiotemporal difference is small, whereas they are more likely to be the different vehicles when their spatiotemporal difference is large. Based on this assumption, given a pair of images i and j , ST (i, j ) is the spatiotemporal similarity formulated as: ST (i, j ) =

|Ti − Tj | δ(Ci , Cj ) × Tmax Dmax

(5.8)

5.3 PROVID: Progressive and Multi-modal Vehicle Re-identification for. . .

225

Fig. 5.11 The physical distance matrix of surveillance cameras

where Ti and Tj are the timestamps at which the images are captured by the cameras and Tmax is a global maximum value obtained from all vehicle images captured over a long time period. δ(Ci , Cj ) is the physical distance between camera Ci and Cj , and Dmax is a global maximum distance between all cameras. The physical distance between each pair of cameras is obtained from a public online map services, i.e., Google Maps, and organized as a distance matrix as illustrated in Fig. 5.11. In our framework, we assume the distance matrix is symmetric which means the distances from camera Ci to Cj and from camera Cj and Ci are equal. Finally, the spatiotemporal similarity can be integrated with the appearance and plate features using the late fusion or the top-K re-ranking scheme for efficiency.

5.3.7 Applications In this section, we show how the progressive vehicle Re-Id framework can be utilized in various practical applications.

5.3.7.1

Application I: Suspect Vehicle Search

As its core functionality, our PROVID framework can support suspect vehicle search for vehicle and traffic management department. Consequently, with a query vehicle image captured by a surveillance camera, users can instantly obtain information on where and when the vehicle has ever appeared in the whole city. Our framework can incorporate cameras deployed in constrained scenes in which the license plate recognition system can be applied. Then, the vehicles can be searched more

226

5 Multimedia Sensor Network Supported IoT Service

accurately with cameras in both constrained and unconstrained environments. With more precise license plate information, detailed information about the vehicle can be found by users. For example, our PROVID system can be integrated with cameras at the park entrances or toll gates, and then connected with the vehicle registration information system. When security officers have an image of a suspect car, they can first use our system to find the locations and time at which the car appeared. Then, they can use the license plate recognition system to obtain its license number via the toll gate camera. With the license number, detailed information such as the owner of the vehicle, registration time, and criminal records can be searched from the vehicle database in the registration system. Using this information, the staff can manage the vehicles or investigate criminal events more effectively and efficiently. In summary, our progressive vehicle Re-Id system becomes a vehicle search engine for urban surveillance networks.

5.3.7.2

Application II: Cross-Camera Vehicle Tracking

The proposed vehicle search framework can also be applied to track the target vehicle across multiple cameras. For example, if the police officers want to track a suspect car in the city, they can first specify a target vehicle in one camera from the backend browser. Then, our progressive vehicle search system can take the vehicle image, location, and time as input to find the same vehicle in the neighboring cameras. Consequently, the system can track the target vehicle from one camera to another and obtain the route of the target. It can provide significant assistance for criminal investigation and urban security. Another example is live broadcasts of car races. The car races such as Dakar Rally or Formula One are usually broadcasted by multiple cameras. In particular, viewers are willing to watch a specific car in videos from different cameras at different time while all cars look very similar. With the vehicle search system, the users or directors can specify the car that needs to be tracked at a specific time. Then, the system can instantly track the target car by the appearance and unique identifiers, such as the numbers or names on the car. In conclusion, our system can help users localize and track vehicles across multiple cameras automatically, which is very useful for suspect car tracking in urban surveillance and live broadcasts of car races.

5.3.8 Experiments 5.3.8.1

Dataset

(1) VeRi dataset To facilitate related research and evaluate the proposed progressive vehicle search framework, we build a comprehensive vehicle Re-Id dataset, named VeRi. A total of 20 surveillance cameras installed along several roads in a 1.0 km2 area are selected to guarantee data quality and real-world traffic scenarios. Various

5.3 PROVID: Progressive and Multi-modal Vehicle Re-identification for. . .

227

scenes are captured by the cameras, such as crossroads, two-lane roads, and fourlane roads. The cameras record videos at a resolution of 1920 × 1080 and 25 frames per second. The cameras are installed in arbitrary positions and directions (the orientation and tilt-angle information is not available). In addition, overlaps exist between part of the cameras. The construction process of the VeRi dataset is introduced in our previous papers (Liu et al. 2016c,b). Figure 5.12 shows some sample images and main statistics of the dataset.2 The VeRi dataset has four featured properties that make it a valuable and challenging dataset: • Large-scale data from real-world surveillance. We select continuous one-day raw videos from 20 surveillance cameras. Then, the videos from 16:00 to 17:00 are segmented from the original videos with basic compression and transcoding. To balance quality and efficiency, one in every five frames is extracted from the 25-fps videos to obtain over 360,000 frames for vehicle annotation. After the annotation in Liu et al. (2016c), we obtain approximately 50,000 images and 9000 tracks of 776 vehicles, which guarantee the scalability for vehicle search. Each vehicle is captured by at least two cameras from various viewpoints, lightning conditions, and backgrounds which guarantees a practical urban traffic environment, as shown in Fig. 5.12a, and sufficient cross-camera recurrence for vehicle search, as shown in Fig. 5.12b. The dataset is split into a training set containing 37,781 images of 576 vehicles and a testing set with 11,579 images of 200 vehicles. From the testing set, we select one image from each camera and of each vehicle as the query and obtain a query set containing 1,678 images. • Rich attribute labels. Each vehicle image in the VeRi dataset is labeled with various attributes. First, we annotate the bounding boxes (BBoxes) as well as the locations of the vehicle images in video frames which can also be used for vehicle detection tasks. Moreover, we annotate 10 types of colors, i.e. black, gray, white, red, green, orange, yellow, golden, brown, and blue to label the color of vehicles. Furthermore, each vehicle is labeled with one of nine classes, i.e., sedan, SUV, hatchback, MPV, van, pickup, bus, truck, and estate car. In addition, part of the vehicles are labeled with about 30 common brands, such as BMW, Audi, Ford, and Toyota. The statistics of colors and types are shown in Fig. 5.12c. • License plate annotation. As one of the most noteworthy contribution of the VeRi dataset, we annotate the license number plate if it can be detected in the vehicle image by the annotators. For each image in the training, testing, and query sets, we annotate the location of the license plate and the characters if they can be recognized. At least three annotators are asked to label each image to guarantee high quality. Finally, 999, 4,825, and 7,647 plates are obtained from the query, testing and training sets respectively. • Contextual information annotation. As important contextual information, the spatiotemporal information of vehicles, camera topology, and distances between 2 The

latest version of the VeRi dataset can be obtained from https://github.com/VehicleReId/ VeRidataset.

228

5 Multimedia Sensor Network Supported IoT Service

Fig. 5.12 The main properties of the VeRi dataset. (a) Sample images in VeRi dataset. (b) Distribution of numbers of vehicle tracks. (c) Statistics of types and colors

5.3 PROVID: Progressive and Multi-modal Vehicle Re-identification for. . .

229

cameras are annotated. Firstly, we annotate the camera ID which records the vehicle track and the time at which it is captured. Then, the distance between each pair of cameras in the surveillance system is obtained from Google Maps, as shown in Fig. 5.11. With the above contextual information, the multi-modal data can be exploited for progressive vehicle Re-Id. (2) VehicleID dataset Recently, Liu et al. (2016d) built a large-scale dataset for vehicle re-identification named VehicleID. It contains images captured in the daytime by different cameras in the traffic surveillance system of a small city. Similar to our VeRi dataset, each vehicle appears more than one time in different cameras. It contains a total of 26,267 vehicles with 221,763 images, and 10,319 vehicles are labeled with models such as Ford Focus, Toyota Corolla, and Honda Accord. To facilitate the research, the VehicleID dataset is split into a training set with 110,178 images of 13,134 vehicles and a testing set with 111,585 images of 13,133 vehicles. In addition, from the original testing data, three subsets, which contain 800, 1600, and 2400 vehicles, are extracted for vehicle search in different scales. There are two main differences between our VeRi dataset and the VehicleID dataset. First, although the scale of VehicleID is larger than VeRi, the vehicles of VehicleID are captured only from the front or the back, whereas our dataset contains vehicle images captured by 20 cameras with various viewpoints, resolutions, and occlusions, which can reflect practical situations. This makes VeRi closer to a real-world unconstrained environment and more challenging for vehicle Re-Id. Furthermore, VehicleID can only be used for appearance-based vehicle Re-Id or related research. In addition to vehicle images, our dataset contains license plate annotations and spatiotemporal information. This means that VeRi can not only facilitate vehicle Re-Id in a surveillance network but also provide potential value for license plate recognition, traffic data mining, and urban computing.

5.3.8.2

Experimental Settings

In this section, we first compare different appearance-based method on both of the VehicleID and VeRi dataset. Then, we evaluate the license-plate-based vehicle search and the complete progressive PROVID framework on the VeRi dataset. For VehicleID, image-to-image search is conducted because each vehicle is captured in one image by one camera. For each test dataset (size = 800, 1600, and 2400), one image of each vehicle is randomly selected into the gallery set. All other images are probe queries. To measure the accuracy of the approaches, we adopt HIT@1, HIT@5, and Cumulative Matching Characteristic (CMC) curve, as in Liu et al. (2016d). For VeRi, cross-camera matching is performed, which means that one image of a vehicle from one camera is used as the query to search images from other cameras for the same vehicle. In addition to the image-to-image search as for VehicleID dataset, we also adopt an image-to-track approach, in which the image is used

230

5 Multimedia Sensor Network Supported IoT Service

as the query, while the gallery consists of tracks of the same vehicle captured by other cameras. A track is a trajectory of a vehicle recorded by one camera at a time, which means the images in a track are organized together. The similarity between an image and a track is computed by max-pooling over images in the test track because, in the practical search procedure of humans, it is reasonable to find the most possible image in the track from one camera to capture the target vehicle. Therefore, we use 1,678 query images and 2,021 testing tracks for the image-to-track search. The CMC curve, HIT@1 (precision at rank 1), and HIT@5 (precision at rank 5) are also adopted to evaluate the accuracy of the methods. In addition, the query has more than one ground truth, so precision and recall should be considered in our experiments. Hence, we also use mean average precision to evaluate the comprehensive performance. The average precision (AP) is computed for each query as n AP =

k=1 P (k) × gt (k)

Ngt

(5.9)

where n and Ngt are the numbers of tests and ground truths respectively, P (k) is the precision at the k-th position of the results, and gt (k) is an indicator function that equals to 1 if the kth result is correctly matched and 0 otherwise. Over all queries, the mean Average Precision (mAP) is formulated as Q mAP =

q=1 AP (q)

Q

(5.10)

in which Q is the number of queries.

5.3.8.3

Evaluation of Appearance-Based Vehicle Re-Id

In this experiment, we compare eight vehicle Re-Id approaches which are evaluated on both VehicleID and VeRi. The details of the approaches are introduced as follows. (1) Texture based feature (BOW-SIFT). For both VeRi and VehicleID datasets, the image is resized to 64 × 128 firstly. Then, we extract the SIFT local descriptors (Lowe 2004) from the images. After that, the descriptors are encoded by the BOW model with the pre-trained codebook (size k = 10, 000). Finally, we obtain a 10,000-D feature to represent the texture of the vehicle. (2) Local Maximal Occurrence Representation (LOMO). LOMO is proposed as a local feature for person Re-Id that is robust to the varied lightning conditions in practical surveillance scenes (Liao et al. 2015). We consider LOMO as the state-of-the-art texture feature. For both the VehicleID and VeRi, we extract the LOMO feature with the parameters given in Liao et al. (2015) and obtain a 26,960-D feature vector for each vehicle image.

5.3 PROVID: Progressive and Multi-modal Vehicle Re-identification for. . .

231

(3) Color based feature (BOW-CN). This model is the benchmark for person ReId on the Market-1501 dataset (Zheng et al. 2015) due to its robustness in outdoor scenes. It first adopts the Color Name (CN) (Van De Weijer 2009) as a local color descriptor. Similar to BOW-SIFT, the image is resized to 64 × 128. Then, we divide the image into 4 × 4 patches to extract the CN descriptors densely. Before testing, a pre-trained codebook is built on VeRi and VehicleID separately using k-means (size k = 350). After that, the avgIDF and geometrical priors are applied as in Zheng et al. (2015). Finally, a 5,600-D color feature is obtained for each image. (4) Semantic feature learned by CNN (GoogLeNet). For VeRi, we adopt the GoogLeNet model (Szegedy et al. 2015) pre-trained on ImageNet (Russakovsky et al. 2015). As in Yang et al. (2015), the model is fine-tuned on the CompCars dataset, which contains images of whole and parts of cars with rich attributes such as the number of doors, the light shape, and the car model. The finetuned CNN model is employed as a feature extractor for high-level attributes. Finally, we obtain a 1,024-D feature from the last pooling layer of the neural network to represent the semantic feature of vehicles. (5) Fusion of Attributes and Color feaTures (FACT). As in Liu et al. (2016c), by combining the low-level color feature and high-level semantic attribute, the FACT model achieves excellent performance on the VeRi dataset. We implement the FACT model on both VeRi and VehicleID. The fusion weights are obtained on a small subset of the training data for validation. (6) Deep Relative Distance Learning with VGG (DRDL-VGG). The DRDL framework is proposed to jointly learn a discriminative feature representation and a metric mapping with an end-to-end CNN and achieves the state-of-the-art results on the VehicleID dataset (Liu et al. 2016d). It adopts a mixed network structure based on the VGG_M model (Chatfield et al. 2014) with a coupled cluster loss to learn the relative distances of different vehicles. Because the VeRi dataset does not contain model information as VehicleID, we only evaluate DRDL-VGG on VehicleID. (7) Semantic feature learned by VGG (VGG). To evaluate different deeplearning-based models, we directly use the VGG_M model in DRDL-VGG (Liu et al. 2016d) as a feature extractor for testing on the VeRi dataset. The 1024-D feature is extracted from the fc_7 layer of the VGG_M model. (8) Null space base Fusion of Attribute and Color feaTures (NuFACT). As introduced in Sect. 5.3.4.2, we concatenate the color feature and the semantic attributes to obtain the original features of vehicles for VeRi and VehicleID separately. Then, the projection matrix to the null space is learned on the corresponding training sets. Finally, we evaluate the NuFACT model on both VeRi and VehicleID.

232 Table 5.2 The image-to-track search results on VeRi

5 Multimedia Sensor Network Supported IoT Service Methods BOW-SIFT LOMO (Liao et al. 2015) BOW-CN (Zheng et al. 2015) VGG (Liu et al. 2016d) GoogLeNet (Yang et al. 2015) FACT (Liu et al. 2016c) NuFACT

mAP 1.51 9.64 12.20 12.76 17.89 18.75 48.47

HIT@1 1.91 25.33 33.91 44.10 52.32 52.21 76.76

HIT@5 4.53 46.48 53.69 62.63 72.17 72.88 91.42

Fig. 5.13 The CMC curves of different methods on VeRi

Table 5.2 illustrates mAP, HIT@1, and HIT@5 on VeRi, Fig. 5.13 shows the CMC curves. The results on VehicleID are shown in Table 5.3 and Fig. 5.14. From the results, we obtain the following findings: (1) For both VehicleID and VeRi datasets, the hand-crafted features, i.e., BOWSIFT, LOMO, and BOW-CN achieves relatively lower accuracy than the deep learning-based models, i.e., GoogLeNet and VGG. This demonstrates that the features learned by deep neural networks are more discriminative and robust than conventional features for vehicle Re-Id. Moreover, the fusion model of multi-level features, i.e., FACT, and the mixed neural network structure, i.e., DRDL, obtain higher accuracy than the above single-model approaches. This shows that the high-level attributes and low-level hand-crafted features have complementary effects for vehicle Re-Id. Finally, our proposed NuFACT model achieves the optimal results on both VehicleID and VeRi. This means that, in the null space learned by NFST, the multi-level features can be fused effectively for vehicle Re-Id. (2) By comparison of the results on the two datasets, we find that different methods have different characteristics. First, the texture feature, i.e., LOMO, has better

5.3 PROVID: Progressive and Multi-modal Vehicle Re-identification for. . .

233

Table 5.3 Comparison of different methods on VehicleID Methods BOW-SIFT LOMO (Liao et al. 2015) BOW-CN (Zheng et al. 2015) GoogLeNet (Yang et al. 2015) FACT (Liu et al. 2016c) DRDL (Liu et al. 2016d) NuFACT

Test size = 800 Test size = 1600 Test size = 2400 Average HIT@1 HIT@5 HIT@1 HIT@5 HIT@1 HIT@5 HIT@1 HIT@5 2.81 4.23 3.11 5.22 2.11 3.76 2.68 3.76 19.74 32.14 18.95 29.46 15.26 25.63 17.98 3.76 13.14

22.69

12.94

21.09

10.20

17.89

12.09

20.56

47.90

67.43

43.45

63.53

38.24

59.51

43.20

60.04

49.53

67.96

44.63

64.19

39.91

60.49

44.69

64.21

48.91

66.71

46.36

64.38

40.97

60.02

45.41

63.70

48.90

69.51

43.64

65.34

38.63

60.72

43.72

65.19

Fig. 5.14 The CMC curves of different methods on VehicleID

234

5 Multimedia Sensor Network Supported IoT Service

accuracy than the color feature on VehicleID, while we obtain opposite results on VeRi. By the examining the two datasets, we find that the vehicle images in VehicleID are relatively larger and sharper than the images in VeRi. More detailed texture can be extracted from the images in VehicleID than in VeRi for the LOMO. Besides, some of the images in VehicleID are captured at night and are almost black in hue, while VeRi contains only images captured in the daytime. Therefore, we can obtain more effective color features from the images in VeRi than from those in VehicleID. Second, NuFACT achieves much better improvement on VeRi than on VehicleID. One reason is that the color feature is more effective on VeRi than on VehicleID, so the fusion of color feature with semantic attributes can work better on VeRi. The other reason is that each vehicle in VeRi has many more images (64 images/vehicle) than the vehicles in VehicleID (8.4 images/vehicle). During training of the null space, more information such as different viewpoints, occlusions, and resolutions can be learned on VeRi. Thus, the NuFACT achieves greater improvement than FACT on the VeRi dataset.

5.3.8.4

Evaluation of Plate Verification

In this section, we compare the plate verification based on SNN with that based on the traditional texture features, i.e., SIFT (Lowe 2004). The plate features obtained by the above two models are fused with the appearance features of the NuFACT model by the late fusion to evaluate the performance. The details of the two methods are as follows: (1) NuFACT + Plate-SIFT. This approach uses the hand-crafted SIFT as the basic representation. Then, the SIFT feature is quantized by the BOW model on the whole plate image. In the training phase, a codebook (size k = 1000) is learned on the training data of the VeRi dataset. During testing, the license plate image is extracted by the trained model as a 1000-D feature. Finally the plate feature and the appearance-based feature are integrated by late fusion. (2) NuFACT + Plate-SNN. This method adopts the SNN as the feature extractor for license plate images. During training, we first select over 100,000 plate pairs from the original 7,647 plates in the training set. Half of the pairs are from the same vehicles and are labeled with 1 as the positive samples; the other half are from different vehicles and are labeled with 0 as the negative samples. All samples are shuffled before training. The Caffe deep learning tool (Liu et al. 2016c) is adopted to implement the SNN with the structure and parameters in Sect. 5.3.5. The model is optimized by Stochastic Gradient Descent algorithm and converges after 60,000 iterations. Then, the output of the FC2 layer (1000D) is extracted as the feature of the license plate images. Similar to appearance-based search, we estimate the similarity with Euclidean distance and perform the image-to-track search. The weights for late fusion are set to 0.86 and 0.14 for the NuFACT and the Plate-SNN models respectively.

5.3 PROVID: Progressive and Multi-modal Vehicle Re-identification for. . . Table 5.4 Comparison of different models for plate verification

Methods NuFACT + Plate-SIFT NuFACT + Plate-SNN

mAP 42.48 50.87

235 HIT@1 75.27 81.11

HIT@5 90.41 92.79

Table 5.4 shows mAP, HIT@1, and HIT@5 on the VeRi dataset. The results show that the plate representation model learned by the deep neural network significantly outperformed the hand-crafted feature. Therefore, the features learned by SNN are more robust to uncertain environmental factors such as varied lightning conditions and low resolution. This also demonstrates that the deep neural network has higher discriminative power especially with a large amount of training data. The effectiveness of the learned SNN is guaranteed by the use of sufficiently many license plate images.

5.3.8.5

Evaluation of Progressive Vehicle Re-Id

To evaluate the performance of the progressive search paradigm, we compare four methods on the VeRi dataset: (1) NuFACT. We utilize the NuFACT model to calculate the appearance similarities with the same settings as in Sect. 5.3.8.3. (2) NuFACT + Plate-SNN. In this method, the NuFACT is first used to filter out the dissimilar vehicles by appearance. The late fusion scheme is then adopted to integrate the scores of the NuFACT model and Plate-SNN model for precise Re-Id. The weights for the NuFACT and the Plate-SNN models are set to 0.86 and 0.14 respectively as in Sect. 5.3.8.4. (3) NuFACT + Plate-REC. This approach uses a commercial plate recognition tool (Plate-REC) to recognize the plate characters from the plate images for the accurate vehicle search. The weights for NuFACT and Plate-REC are set to 0.9 and 0.1 respectively as in Sect. 5.3.8.4. (4) PROVID. This is the proposed progressive vehicle search framework, which fuses the scores of the NuFACT, Plate-SNN, and STR models. The Euclidean distance is adopted to compute the similarity between a query image and a test track. The NuFACT+Plate-SNN is obtained as introduced in Sect. 5.3.8.4. Before late fusion, the similarity vectors are normalized to (0, 1). Finally, the two vectors are added linearly to obtain the final scores. The weights are set to 0.85 and 0.15 for NuFACT+Plate-SNN and STR, respectively. Towards this end, the progressive vehicle Re-Id is achieved by comprehensively integrating the appearance features, license plate information, and spatiotemporal cues. Figure 5.15 shows the CMC curves of the progressive search. The mAP, HIT@1, and HIT@5 values are listed in Table 5.5. From the results, we can find that: The results indicate that the proposed framework is effective for vehicle search with coarse filtering by appearance and precise matching by plate verification. The coarse filtering scheme can find most vehicles of similar shape, color, and type

236

5 Multimedia Sensor Network Supported IoT Service

Fig. 5.15 The CMC curves of different methods Table 5.5 Comparison of different methods on VeRi dataset

Methods NuFACT NuFACT + Plate-REC NuFACT + Plate-SNN PROVID

mAP 48.47 48.55 50.87 53.42

HIT@1 76.76 76.88 81.11 81.56

HIT@5 91.42 91.48 92.79 95.11

to the query vehicle, especially those with similar plate images. Moreover, after the filtering the vehicles, the framework can match the vehicles by license plate verification to eliminate the incorrect matches. The Plate-REC approach shows only negligible improvement because the recognition technique cannot achieve correct results under the unconstrained conditions. Furthermore, the PROVID framework outperforms all other tested approaches. In particular, the proposed framework can search the vehicles in the spatiotemporal space progressively in a close-tofar manner. The results validate the effectiveness of the PROVID framework as well as the significance of multi-modal data for vehicle search in large-scale urban surveillance. In Fig. 5.16, we give some examples to compare efficacy of the proposed framework and our previous methods (Liu et al. 2016b) on the VeRi dataset. For each query, the left three rows are the results of FACT, FACT+Plate-SNN, and FACT+Plate-SNN+STR in Liu et al. (2016b), the right three rows are the results of NuFACT, NuFACT+Plate-SNN, and the PROVID proposed in this section. The three queries are hard cases in Liu et al. (2016b). For example (a), the methods in Liu et al. (2016b) cannot return optimal results, even through the progressive search procedure, while the proposed PROVID can achieve excellent results in the top-five lists using only the appearance-based NuFACT model. This demonstrates

5.3 PROVID: Progressive and Multi-modal Vehicle Re-identification for. . .

237

Fig. 5.16 The top-5 search results on the VeRi dataset. For each query, the left three rows are the results of the FACT, FACT+Plate-SNN, and FACT+Plate-SNN+STR in Liu et al. (2016b), and the right three rows are the results of NuFACT, NuFACT+Plate-SNN, and PROVID proposed in this section. The green box denotes a true positive, the red denotes a false positive. (Best seen in color)

the effectiveness and robustness of our NuFACT model in representing vehicle appearance. Example (b) shows the importance of the license plate verification in vehicle Re-Id. The vehicles with similar types and colors are found by the appearance features, but the correct results are not in the top results among the vehicles. Through the license plate verification, the target vehicles are matched precisely. From example (c), we can find that due to the low resolution and significant blur, the license plate verification may fail. Nevertheless, the target vehicles are found by the contextual information, i.e., the spatiotemporal similarity. These examples show the superior performance of the proposed PROVID framework compared to previous methods. However, the examples also reflect some limitations and difficulties of the system, which mainly come from three aspects: The first difficulty is caused by environmental factors. For example, varied illumination makes the same vehicle have very different colors especially in dark conditions. Moreover, the vehicle body under the sunlight can be very bright due to specular reflection. The second difficulty is caused by arbitrary camera settings. For example, the cameras in an urban surveillance system are not only installed in arbitrary

238

5 Multimedia Sensor Network Supported IoT Service

locations, heights, and orientations but also with varied parameter settings, such as resolution, focal distance, and shutter speed. Therefore, the vehicle images captured by these cameras could contain significant blur, noise, and occlusion. The last difficulty is the ambiguity in the appearances of vehicles that are made by the same manufacturer and are of similar model and color. In this case, the license plate is the only information that can identify a vehicle. If the license plate is fake, occluded or removed, the proposed method might become invalid. However, even in these extreme conditions, PROVID can also provide valuable assistance in finding the target vehicle with the multi-modal information from urban surveillance.

5.3.8.6

Time Cost of the PROVID Framework

In our PROVID framework, we can select the top-K percent of outputs from the appearance-based filter and the license-plate-based search as the inputs of their subsequent procedures. To evaluate the mAP for different top-K percentages, we implement the PROVID framework on VeRi and reduce the percentage from 100% to 10%. To measure the time cost under the each percentage, we add 99,029 junk tracks to the original 2,021 test tracks to build a 50-time gallery. As shown in Fig. 5.17, we find that from the top-100% to top-30%, the mAP decreases marginally while the time cost decreases from 92.4 ms/query to 32.5 ms/query. PROVID can guarantee optimal accuracy and reduce the time cost by 64.8% by using the top 30% of outputs in each process as the inputs of the next step. This demonstrates that PROVID can significantly improve the precision and reduce the time cost of the instance-level vehicle search in large-scale urban surveillance.

Fig. 5.17 The time cost and mAP under different top-K percent

References

239

References Bromley, J., Bentz, J.W., Bottou, L., Guyon, I., LeCun, Y., Moore, C., Säckinger, E., Shah, R.: Signature verification using a siamese time delay neural network. Int. J. Pattern Recognit. Artif. Intell. 7(04), 669–688 (1993) Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint arXiv:1405.3531 (2014) Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2005) Du, S., Ibrahim, M., Shehata, M., Badawy, W.: Automatic license plate recognition (alpr): A stateof-the-art review. IEEE Trans. Circuits Syst. Video Technol. 23(2), 311–325 (2013) Feris, R.S., Siddiquie, B., Petterson, J., Zhai, Y., Datta, A., Brown, L.M., Pankanti, S.: Large-scale vehicle detection, indexing, and search in urban surveillance videos. IEEE Trans. Multimedia 14(1), 28–42 (2012) Foley, D.H., Sammon, J.W.: An optimal set of discriminant vectors. IEEE Trans. Comput. 100(3), 281–289 (1975) Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Mikolov, T., et al.: Devise: A deep visualsemantic embedding model. In: Proceedings of in Advances in Neural Information Processing Systems (2013) Girshick, R.: Fast r-cnn. In: Proceedings of IEEE International Conference on Computer Vision (2015) Guo, Y.F., Wu, L., Lu, H., Feng, Z., Xue, X.: Null foley-sammon transform. Pattern Recogn. 39(11), 2248–2251 (2006) Hu, W., Xie, N., Li, L., Zeng, X., Maybank, S.: A survey on visual content-based video indexing and retrieval. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. (Applications and Reviews) 41(6), 797–819 (2011) Javed, O., Shafique, K., Rasheed, Z., Shah, M.: Modeling inter-camera space-time and appearance relationships for tracking across non-overlapping views. Comput. Vis. Image Underst. 109(2), 146–162 (2008) Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of ACM International Conference on Multimedia (2014) Kettnaker, V., Zabih, R.: Bayesian multi-camera surveillance. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (1999) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems (2012) Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occurrence representation and metric learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Proceedings (2015) N. Li, Jain, J.J., Busso, C.: Modeling of driver behavior in real world scenarios using multiple noninvasive sensors. IEEE Trans. Multimedia 15(5), 1213–1225 (2013) Liu, W., Mei, T., Zhang, Y., Che, C., Luo, J.: Multi-task deep visual-semantic embedding for video thumbnail selection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2015) Liu, X., Liu, W., Mei, T., Ma, H.: A deep learning-based approach to progressive vehicle reidentification for urban surveillance. In: Proceedings of European Conference on Computer Vision (2016a) Liu, X., Liu, W., Mei, T., Ma, H.: A deep learning-based approach to progressive vehicle reidentification for urban surveillance. In: Proceedings of European Conference on Computer Vision (2016b) Liu, X., Liu, W., Ma, H., Fu, H.: Large-scale vehicle re-identification in urban surveillance videos. In: Proceedings of IEEE International Conference on Multimedia and Expo (2016c)

240

5 Multimedia Sensor Network Supported IoT Service

Liu, H., Tian, Y., Yang, Y., Pang, L., Huang, T.: Deep relative distance learning: Tell the difference between similar vehicles. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2016d) Liu, X., Liu, W., Mei, T., Ma, H.: PROVID: Progressive and multimodal vehicle reidentification for large-scale urban surveillance. IEEE Trans. Multimedia 20(3), 645–658 (2018) Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004) Ma, H.: Internet of things: Objectives and scientific challenges. J. Comput. Sci. Technol. 26(6), 919–924 (2011) Ma, H., Liu, W.: Progressive search paradigm for internet of things. In: Proceedings of IEEE MultiMedia (2017) Ma, H., Liu, L., Zhou, A., Zhao, D.: On networking of internet of things: Explorations and challenges. IEEE Internet Things J. 3(4), 441–452 (2016) Matei, B.C., Sawhney, H.S., Samarasekera, S.: Vehicle tracking across nonoverlapping cameras using joint kinematic and appearance features. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2011) Mei, T., Rui, Y., Li, S., Tian, Q.: Multimedia search reranking: A literature survey. ACM Comput. Surv. (CSUR) 46(3), 1–38 (2014) Meng, J., Yuan, J., Yang, J., Wang, G., Tan, Y.P.: Object instance search in videos via spatiotemporal trajectory discovery. IEEE Trans. Multimedia 18(1), 116–127 (2016) Romer, K., Ostermaier, B., Mattern, F., Fahrmair, M., Kellerer, W.: (2010), Real-time search for real-world entities: A survey. In: Proceedings of the IEEE Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015) Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: Proceedings of IEEE International Conference on Computer Vision (2003) Song, J., Yang, Y., Huang, Z., Shen, H.T., Luo, J.: Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Trans. Multimedia 15(8), 1997–2008 (2013) Sunderrajan, S., Manjunath, B.: Context-aware hypergraph modeling for re-identification and summarization. IEEE Trans. Multimedia 18(1), 51–63 (2016) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2015) Valera, M., Velastin, S.A.: Intelligent distributed surveillance systems: A review. IEEE Proc. Vis. Image Signal Proces. 152(2), 192–204 (2005) Van De Weijer, J., Schmid, C., Verbeek, J., Larlus, D.: Learning color names for real-world applications. IEEE Trans. Image Process. 18(7), 1512–1523 (2009) Wen, Y., Lu, Y., Yan, J., Zhou, Z., Von Deneen, K.M., Shi, P.: An algorithm for license plate recognition applied to intelligent transportation system. IEEE Trans. Intell. Transp. Syst. 12(3), 830–845 (2011) Xie, L., Wang, J., Zhang, B., Tian, Q.: Fine-grained image search. IEEE Trans. Multimedia 17(5), 636–647 (2015) Xu, J., Jagadeesh, V., Ni, Z., Sunderrajan, S., Manjunath, B.: Graph-based topic-focused retrieval in distributed camera network. IEEE Trans. Multimedia 15(8), 2046–2057 (2013) Yang, L., Luo, P., Loy, C.C., Tang, X.: A large-scale car dataset for fine-grained categorization and verification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2015) Zapletal, D., Herout, A.: Vehicle re-identification for automatic video traffic surveillance. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops (2016) Zhang, D., Yang, L.T., Huang, H.: Searching in internet of things: Vision and challenges. In: Proceedings of IEEE International Symposium on Parallel and Distributed Processing with Applications (2011)

References

241

Zhang, J., Wang, F.Y., Wang, K., Lin, W.H., Xu, X., Chen, C.: Data-driven intelligent transportation systems: A survey. IEEE Trans. Intell. Transp. Syst. 12(4), 1624–1639 (2011) Zhang, C., Liu, W., Ma, H., Fu, H.: Siamese neural network based gait recognition for human identification. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (2016) Zhang, L., Xiang, T., Gong, S.: Learning a discriminative null space for person reidentification. In: Proceedings of IEEE International Conference on Computer Vision (2016) Zheng, Y., Capra, L., Wolfson, O., Yang, H.: Urban computing: Concepts, methodologies, and applications. ACM Trans. Intell. Syst. Technol. 5(3), 222–235 (2014) Zheng, Y., Capra, L., Wolfson, O., Yang, H.: Urban computing: concepts, methodologies, and applications. ACM Trans. Intell. Syst. Technol. 5(38), 1–55 (2014) Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: A benchmark. In: Proceedings of IEEE International Conference on Computer Vision (2015) Zhou, Y., De, S., Wang, W., Moessner, K.: Search techniques for the web of things: A taxonomy and survey. Sensors 16(5), 600 (2016) Ziegeldorf, J.H., Morchon, O.G., Wehrle, K.: Privacy in the internet of things: threats and challenges. Security Commun. Netw. 7(12), 2728–2742 (2014)

Chapter 6

Prospect of Future Research

The Internet of Things (IoT) has a distinctive feature that application requirements drive the progress of technologies. After nearly two decades of development, especially with the increasingly in-depth of IoT applications in terms of smart cities, intelligent manufacturing, Internet of Vehicle (IoV), and ocean observations, etc., the traditional IoT development goals of “thorough perception, extensive interconnection, and intelligent services” are getting higher, i.e., “more extensive thorough perception, denser extensive interconnection, and faster intelligent services”. Especially from the need for thorough perception, the sensing technology of IoT has been developed over two generations: the first-generation sensing networks mainly focus on scalar information such as temperature and humidity, and the second-generation sensing networks emphasize the collection of multimedia information such as video and audio. MSIG, the world’s largest sensor industry alliance, predicts that the number of global sensors will surge from the current tens of billions to the trillions (Trillion-Sensor, TSensor) in 2025. The leap in magnitude will fundamentally change sensing networks in optimal deployment, data transmission, and information processing. On the other hand, there are still a lot of physical phenomena in the world that are difficult or impossible to be directly sensed. For example, in the field of industrial manufacturing, many important parameter variables and monitoring target postures in the production process are difficult to obtain through sensors, resulting in the production status cannot be obtained in realtime. In a word, the sensing ability of IoT has not reached the ability of human beings to perceive the physical world in many aspects. Therefore, it is difficult to continue the idea of only relying on a large number of traditional sensors to meet the needs of thorough perception, which has become a very challenging problem that puzzles the research of IoT and restricts large-scale applications. Exploring of novel sensing paradigms and technologies has always been a frontier hotspot in the field of IoT research. In recent years, technologies such as crowdsensing, wireless sensing, and passive sensing have also emerged, which have effectively improved the sensing ability of the IoT. However, these technologies still © Springer Nature Singapore Pte Ltd. 2021 H. Ma et al., Multimedia Sensor Networks, Advances in Computer Science and Technology, https://doi.org/10.1007/978-981-16-0107-1_6

243

244

6 Prospect of Future Research

have their limitations, and lack theoretical supporting, such as the modeling and characterization of sensing mechanism and the derivation of the bounds of sensing ability. Through the analysis of emerging IoT applications, we found that more thorough perception requires further breakthroughs in the sensing ability of IoT: the emerging sensing network elements such as smart terminals, vehicles, robots, and unmanned aerial vehicle (UAV) will break through the simple collection of scalar data and audio/video data, and realize the comprehensive collection and intelligent usage of sensory information such as sight, hearing, force, and touch, so as to realize human-like perception ability and further extend human perception ability in terms of precision, breadth, and speed. The ultimate goal is to achieve full information sensing of the physical environment. Human-like perception requires the highdensity deployment of the sensing network of IoT (like the human nervous system), which needs to support dense networking. For example, the smart workshop needs the instantaneous connection capability for thousands of users and ten thousands of terminals. In addition, the massive data obtained by human-like perception requires more complicated calculations, and the interaction of human-machine-object in IoT applications is more delay-sensitive. For example, the interaction delay requirement in AR is 20 ms, and the delay requirement of many scenarios in the unmanned system is less than 10 ms, which requires the new design of computing models to ensure timeliness. These requirements not only put forward higher requirements for IoT sensing technology, but also its networking and processing technologies, to provide better experience quality of IoT services and construct a good IoT ecosystem environment. At the same time, we also notice the rapid development of Artificial Intelligence (AI) technology in recent years. Intellectualization is an important way to solve many challenges faced by IoT, and it is also the inevitable direction to drive the evolution of IoT. In July 2017, China released the “New Generation AI Development Plan”, which proposed five new directions for the development of AI – big data intelligence, swarm intelligence, cross-media intelligence, humancomputer interaction enhanced intelligence, and autonomous automation system. At present, most AI studies are characterized by data-driven. In the era of big data, various forms of sensory data bring great challenges to intelligent processing. Moreover, human intelligence and machine intelligence have their advantages. The best way is to integrate machines and humans to merge the two intelligences. At the same time, the information space created by human being is evolving from the binary space of human-machine to the ternary space of human-machine-object. A large amount of information directly from the physical world is sensed through sensors, and the generated information begins to expand rapidly so that humans have no time to process such information. The future IoT system, especially the sensing technology, must be deeply combined with big data intelligence, crossmedia intelligence, swarm intelligence, hybrid enhanced intelligence, autonomous collaborative control and optimized decision, advanced machine learning, brain-like intelligent computing, quantum intelligent computing, and other new-generation AI theories to improve the intelligence of IoT.

6.1 Human-Like Perception

245

However, the meaning of intellectualization is constantly evolving and advancing with the times. The new generation of IoT, one of the goals of improving intellectualization, is not just simply adding existing IoT technology and AI technology, but hoping to have endogenous intelligence of IoT, which urgently needs connotative development and disruptive technological revolution. Therefore, future research of IoT sensing plans to focus on the three major challenges of human-like perception, intelligent networking and transmission, and intelligent services. The following is a detailed explanation in three aspects.

6.1 Human-Like Perception Human-like perception is a way to realize intelligent sensing. At present, the main difficulty in realizing human-like perception is that the ability of the sensor is limited and static – the type of sensory data is determined by the sensing mechanism, the spatiotemporal range is determined by the deployment method, and the sensing accuracy is determined by the device and signal processing levels. Simply superimposing multiple sensors or sensing methods, i.e., “simple-combining, besteffort” sensing mode, cannot guarantee the comprehensive collection of sensory information, and it is difficult to fully satisfy the needs of IoT users. Moreover, this mode also lacks intelligence and flexibility, which restricts the comprehensive and effective usage of sensory information. The future research direction is to deeply integrate the sensing technology with neural science, material science, biological engineering, wireless communication, AI, and other fields, for exploring new sensing mechanisms. The goal is to realize ability enhancement in data types, sensing mode, spatiotemporal range, sensing resolution, and other aspects, and build the basic theories and key technologies of human-like perception. Specifically, it includes: (1) Theories of human-like perception. It is foreseeable that the new generation of IoT sensing network will consist of a huge number (TB level or even PB level) of sensing units that can independently interact with the outside world, forming a huge network through interconnection. The sensing unit mainly includes various sensors, intelligent wireless sensing units (millimeterwave base stations, WiFi hotspots, etc.), human-centric crowdsensing terminals (smartphones, vehicles, etc.), and autonomous unmanned system terminals (unmanned vehicles/machines, robots, etc.). Facing the new form of sensing networks, it is necessary to establish the ability model of human-like perception unit and the measurement system of human-like perception quality, and derive the quantitative relationship between human-like perception quality and the characteristics, quantity, distribution, and movement regularity of units. (2) New sensing mechanisms. According to introduce the latest achievements in materials, biology, wireless communication, and other fields, the photoelectric effect, biochemical effect, Doppler effect and others will be integrated into the

246

6 Prospect of Future Research

sensing process. For example, by analyzing the changes in signal parameters (signal strength, reflection angle, etc.) caused by the 5G deployed ubiquitous millimeter-wave wireless signals passing through the sensing target, or being reflected by the sensing target, the body shape information of the sensing target can be obtained, so as to realize the non-contact sensing and interaction of the scene target without deploying special sensors. (3) Intelligent sensing technologies. The new generation of AI technology, big data technology, and sensing technology are deeply integrated. Moreover, the interconnection of sensor nodes is extended to the extensive interconnection of node-data-model-knowledge. When a certain type of data required by the application cannot be directly sensed, the intelligent sensing technology can automatically infer the related data based on domain knowledge, and then selects the corresponding learning method to establish an association model between the required data and related data, i.e., the required data is obtained through association mapping. Furthermore, the sensing ability can also realize adaptive optimization and dynamic evolution according to the sensing requirements and changes of applications.

6.2 Intelligent Networking and Transmission At present, the networking technology of IoT is difficult to meet the requirement of thorough perception. First, the network elements of local application scenarios are heterogeneous and dense. For example, in a dense and small space, the unmanned cluster system gathers multiple communication connections, multiple vehicles will communicate with others in IoV, and thousands of devices communicate with each other in a smart workshop. The existing 5G system theoretically only supports 1 communication connection per square meter, which cannot satisfy the requirements of the future IoT connection. Second, the overall coverage and capabilities are limited. In the areas with weak infrastructure such as mountains, wilds, and deserts, which cover 60% of the land area, and the ocean, which accounts for 71% of the earth’s surface area, there is no wired and wireless network coverage. Moreover, the space network coverage is extremely limited and difficult to support IoT applications to expand to the space and the ocean. In order to meet the wide coverage, huge connection, and low latency performance requirements of IoT, more heterogeneous nodes need to be deployed based on 5G dense networking. The complexity of the network increases exponentially, and the massive parameters have increased sharply and need to be configured and optimized. Moreover, the future IoT needs to construct a flexible, reconfigurable, single, and minimalist wireless network on a common hardware platform through the perception of available spectrum, wireless environment, user relationship attributes, service characteristics, transmission status, and network performance, so as to efficiently support massive services and highly differentiated user needs in the dynamic application environment. How to design a concise but powerful wireless

6.2 Intelligent Networking and Transmission

247

networking mechanism that can be self-adapted and reconfigured is extremely challenging. AI is an important way to solve the above-mentioned performance requirements. The main research work includes: (1) Wireless concise-intelligent networking theory. The concise-intelligent networking mechanism is needed for wide-area coverage, huge connection, and low latency. The theoretical fundamentals include analysis model of networking performance, quantitative impact of machine learning methods on networking performance, and analytical expressions for networking performance. According to condense the channel characteristics, topology structure, data characteristics, energy consumption and other factors that affect the transmission quality, a knowledge graph which models the dependencies between the factors is build up. (2) Intelligent networking methods. Combining advanced solutions such as Information-Centric Network (ICN) and Software-Defined Network (SDN), intelligent wireless access network architecture and dynamic networking solutions are studied. Based on application needs and multi-dimensional sensory information, the knowledge-driven wireless networking technology is developed. It is necessary to build Ratio Access Network (RAN) slicing and orchestrating functional entities and corresponding protocol procedures, explore the three-dimensional spatial hierarchical networking of the world for wide-area coverage, the local ultra-dense networking for huge connection, and the intelligent offloading networking of edge computing for low latency and other intelligent networking methods. Collaborative scheduling optimization method of multi-dimensional resources, self-adapting to the goals of wide coverage, huge connection, and low latency, is also an important direction, self-adapt to wide coverage, huge connection, and low latency optimization goals. (3) Intelligent transmission technologies. Based on the movement trajectory data mining, the research work includes analyzing the distribution law of human movement, and establishing a human movement model that can accurately describe human movement statistics and urban scene information. Based on the study of historical data such as network status, node movement trajectory, sensory data, and urban environment, the law of network evolution and the efficient forwarding methods for sensory data collection are investigated. To support data transmission needs of real-time interactive IoT sensing, intelligent transmission methods and intelligent audio/video flow control mechanisms are necessary. The research focus on comprehensively improve the transmission quality and efficiency through the network’s adaptation to task requirements, perceiving the dynamic changes of the environment, and network conditions.

248

6 Prospect of Future Research

6.3 Intelligent Services The current service generation and provision are difficult to satisfy the needs of intelligent services of IoT. First, the flexibility of the service provision architecture is insufficient. The centralized and passive service provision architecture represented by cloud services focuses on the centralized sharing of resources in terms of lowlatency guarantees to passively meet the discrete non-real-time service provision, which is difficult to meet the continuous real-time service provision in the future IoT’s distributed scenarios. Second, the dynamic scalability of the service is insufficient. Current solutions are mainly based on a small amount of semantic adaptation, simple rule combination, and fixed process scenarios. It is difficult to adapt to the low-latency services demand brought by the changeable and crosscutting business scenarios (such as AR-assisted driverless cars), complex and random adaptation combinations (such as mixed reality immersive interaction) in the IoT environment. In order to solve the above problems, future research will focus on the active recognition of service capabilities in terms of service generation, interaction, and provision for diverse goals, changeable scenarios, and personalized users, so as to achieve the adaptive restructured of the service provision in terms of service generation adaptation, service composition process, service experience quality. Therefore, the main research work of IoT intelligent services are as follows: (1) The theories of IoT intelligent services. The research work includes: the sensing, comprehending and expressing methods of target behaviors, scene semantics, and user characteristics based on human-like perception information and machine learning; polymorphic interfaces, cross-domain collaborative behaviors, and internal interaction processes that can support cognitive services, and a unified description method for IoT cognitive services; a cognitive service provision model based on the coupling of business logic and cognitive functions based on human cognitive mechanisms; from multiple perspectives such as syntax and semantics, business logic, interface components, combined processes, target objects, and behavioral contexts, combining the knowledge graph to explore the mechanism of cognitive service construction with selflearning ability; the mechanism of cognitive service fission and reflection that supports multiple scenarios, and realize the dynamic collaboration and adaptation of cross-domain cognitive services. (2) The support technologies of IoT intelligent service. The research work includes: lightweight learning agents for cognitive services, and distributed service discovery technology; technologies of automatic reasoning of application logic and active adaptation of interface components; quality guarantee strategies for agility enhanced service; the multi-scale and multi-brain area cooperative cognition for brain-inspired calculation model in the urban sensing network; the mechanism of concept formation, interactive learning, and environmental adaptation, and methods of the brain-like autonomous learning for continuous emerging big sensory data.

6.3 Intelligent Services

249

(3) Development of the common platform of IoT intelligent services. Exploring the realization technologies of brain-like computing system with big sensory data, the technologies of automatic generation and management of cognitive services, dynamic collaboration and on-demand supply, and user Quality of Experience (QoE) guarantee; developing a common platform for intelligent services, provide cross-domain, cross-industry, and cross-scenario support for emerging IoT services such as unmanned systems, AR, smart workshops, IoV, and ocean observations.