290 35 17MB
English Pages XVIII, 404 [415] Year 2020
Gang Xiao · Durga Prasad Bavirisetti Gang Liu · Xingchen Zhang
Image Fusion
Image Fusion
Gang Xiao • Durga Prasad Bavirisetti • Gang Liu • Xingchen Zhang
Image Fusion
Gang Xiao School of Aeronautics and Astronautics Shanghai Jiao Tong University Shanghai, China
Durga Prasad Bavirisetti School of Aeronautics and Astronautics Shanghai Jiao Tong University Shanghai, China
Gang Liu School of Automation Engineering Shanghai University of Electrical Power Shanghai, China
Xingchen Zhang School of Aeronautics and Astronautics Shanghai Jiao Tong University Shanghai, China
ISBN 978-981-15-4866-6 ISBN 978-981-15-4867-3 https://doi.org/10.1007/978-981-15-4867-3
(eBook)
Jointly published with Shanghai Jiao Tong University Press, Shanghai, China The print edition is not for sale in China Mainland. Customers from China Mainland please order the print book from: Shanghai Jiao Tong University Press. © Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
This book is dedicated to the memory of my father. He was away from us during my PhD study
Preface
Image fusion has been a hot topic for many years, and it has been widely applied to fields varying from robotics, medical engineering, and surveillance to military and many others. However, almost 10 years have passed since the publication of the last book on image fusion. Within these years, we have seen a rapid progress and more applications of image fusion, which have not been reviewed systematically. For instance, with the emergence and fast growing of artificial intelligence, some researchers have begun to investigate how machine learning and deep learning would benefit image fusion. Therefore, we think the time is ripe to write a new book on image fusion to systematically and thoroughly summarize the recent development of image fusion. The purpose of this book is to provide an extensive introduction to the field of image fusion. Based on the discussion of pixel-level, feature-level, and decisionlevel of fusion, this book systematically introduces the basic concepts, basic theories, and latest research outcomes as well as practical applications of image fusion. There are 10 chapters in total, which are arranged as two parts. Part I is about image fusion theory, including basic concepts and principles of image fusion presented in Chap. 1 and multi-source image fusion at pixel, feature, and decision level discussed in Chaps. 2, 3, and 4, respectively. Chapter 5 introduces multi-source dynamic image fusion, and Chap. 6 talks about image fusion evaluation metrics. The recent trend of image fusion, namely fusion based on machine learning and deep learning, is given in Chap. 7. Part II is about the practical applications of image fusion, including medical image fusion discussed in Chap. 8 and night vision image fusion introduced in Chap. 9. Finally, Chap. 10 describes the image fusion simulation platform developed in Shanghai Jiao Tong University in detail. This book is written based on the research work of the authors on image fusion in recent years, reflecting research outcomes from the authors and members of the Advanced Avionics and Intelligent Information Laboratory (AAII Lab), Shanghai Jiao Tong University, and exhibiting the latest development of image fusion field in China and international community to some extent.
vii
viii
Preface
This book can be used as a textbook or reference book for senior undergraduate and postgraduate students. It should also be useful to researchers working on image fusion and to practicing engineers who wish to use the concept of image fusion in practical applications. Hope you enjoy reading this book and the field of image fusion. Shanghai, China Shanghai, China January 2020
Gang Xiao Gang Liu
Acknowledgments
This book would not have been possible without the funding from National Natural Science Foundation of China under Grant 61973212 and Grant 61673270. This book is sponsored by National Science and Technology Academic Publications Fund (2019). I would sincerely thank Prof. Zhongliang Jing, who is a distinguished professor of Shanghai Jiao Tong University, for his pioneering work in the field of information fusion. It is my great honor to complete this book with the help of members of the Advanced Avionics and Intelligent Information Laboratory (AAII Lab), Shanghai Jiao Tong University. They are Dr. Durga Prasad Bavirisetti, Dr. Gang Liu, Dr. Xingchen Zhang, and others. We would like to thank all people who have contributed to this book, without whom this book would not have been possible. This includes all current and former members of the AAII Lab at Shanghai Jiao Tong University and students in Prof Gang Liu’s research group. We also appreciate the continuous help very much from Shanghai Jiao Tong University Press. I also thank Ms. Shu Wang, my wife, and my daughter, Suyang Xiao. They always understand and support me in my scientific research.
ix
Contents
Part I
Image Fusion Theories
1
Introduction to Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 History and Development . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Image Fusion Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Necessity to Combine Information of Images . . . . . . . 1.2.2 Definition of Image Fusion . . . . . . . . . . . . . . . . . . . . 1.2.3 Image Fusion Objective . . . . . . . . . . . . . . . . . . . . . . 1.3 Categorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Fundamental Steps of an Image Fusion System . . . . . . . . . . . . 1.5 Types of Image Fusion Systems . . . . . . . . . . . . . . . . . . . . . . . 1.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Summary and Outline of the Book . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
3 4 5 5 6 6 10 10 11 13 14 15 18 18
2
Pixel-Level Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Single-Scale Image Fusion . . . . . . . . . . . . . . . . . . . . 2.1.2 Multi-Scale Image Fusion . . . . . . . . . . . . . . . . . . . . . 2.2 Pyramid Image Fusion Method Based on Integrated Edge and Texture Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Fusion Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Pyramid Image Fusion of Edge and Texture Information-Specific Steps . . . . . . . . . . . . . . . . . . . . 2.2.4 Beneficial Effects . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
21 21 22 23
. . .
29 29 30
. .
30 34
xi
xii
Contents
2.3
Image Fusion Method Based on the Expected Maximum and Discrete Wavelet Frames . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Discrete Wavelet Frame Multi-Resolution Transform . . . 2.3.3 Basic Structure of the New Fusion Scheme . . . . . . . . . 2.3.4 Fusion of the Low-Frequency Band Using the EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.5 The Selection of the High-Frequency Band Using the Informative Importance Measure . . . . . . . . . . . . . . . . . 2.3.6 Computer Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Image Fusion Method Based on Optimal Wavelet Filter Banks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 The Generic Multi-Resolution Image Fusion Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Design Criteria of Filter Banks . . . . . . . . . . . . . . . . . . 2.4.4 Optimization Design of Filter Bank for Image Fusion . . . 2.4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Anisotropic Diffusion-Based Fusion of Infrared and Visible Sensor Images (ADF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Anisotropic Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Anisotropic Diffusion-Based Fusion Method (ADF) . . . 2.5.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.4 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Two-Scale Image Fusion of Infrared and Visible Images Using Saliency Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Two-Scale Image Fusion (TIF) . . . . . . . . . . . . . . . . . . 2.6.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.3 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Multi-Focus Image Fusion Using Maximum Symmetric Surround Saliency Detection (MSSSF) . . . . . . . . . . . . . . . . . . . 2.7.1 Maximum Symmetric Surround Saliency Detection (MSSS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.2 MSSS Detection-Based Image Fusion (MSSSF) . . . . . . 2.7.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.4 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34 34 35 37 38 43 44 48 51 51 52 53 54 58 62 62 63 65 68 70 74 74 80 82 84 85 87 91 92 99
Contents
3
4
Feature-Level Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Fusion Based on Grads Texture Characteristic . . . . . . . . . . . . . 3.2.1 Multi-Scale Transformation Method Based on Gradient Features . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Multi-Scale Transformation Based on Gradient Feature Fusion Strategy . . . . . . . . . . . . . . . . . . . . . . . 3.3 Fusion Based on United Texture and Gradient Characteristics . . . 3.3.1 Joint Texture and Gradient Features of Multi-Scale Transformation Method . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Multi-Scale Image Fusion Method Based on Gradient Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Fusion Algorithm Based on Fuzzy Regional Characteristics . . . . 3.4.1 Area-Based Image Fusion Algorithm . . . . . . . . . . . . . . 3.4.2 Image Fusion Method Based on Fuzzy Region Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Image Fusion Method Based on Fuzzy Region Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Multi-Source Face Feature Fusion Recognition Algorithm Based on Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Feature-Level Fusion Algorithm . . . . . . . . . . . . . . . . . 3.5.2 Fusion Recognition Based on Genetic Algorithm . . . . . 3.5.3 Experimental Results and Evaluation . . . . . . . . . . . . . . 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Decision-Level Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Fusion Algorithm Based on Voting Method . . . . . . . . . . . . . . . 4.3 Fusion Algorithm Based on D-S Evidence Theory . . . . . . . . . . . 4.4 Fusion Algorithm Based on Bayes Inference . . . . . . . . . . . . . . . 4.5 Fusion Algorithm Based on Summation Rule . . . . . . . . . . . . . . 4.6 Fusion Algorithm Based on Min-Max Rule . . . . . . . . . . . . . . . . 4.6.1 Maximum Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.2 Minimum Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Fusion Algorithm Based on Fuzzy Integral . . . . . . . . . . . . . . . . 4.7.1 ICA Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 4.7.2 SVM Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.3 Decision Fusion with Fuzzy Integral . . . . . . . . . . . . . . 4.8 Label Fusion for Segmentation Via Patch Based on Local Weighted Voting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.3 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . 4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xiii
103 103 106 106 108 110 110 116 124 125 127 131 137 138 140 143 145 146 149 149 150 150 152 153 154 154 155 156 157 158 158 159 159 159 163 169 169
xiv
5
Contents
Multi-sensor Dynamic Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Multi-sensor Dynamic Image Fusion System . . . . . . . . . . . . . . 5.3 Improved Dynamic Image Fusion Scheme for Infrared and Visible Sequence Based on Image Fusion System . . . . . . . . . . . 5.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Generic Pixel-Based Image Fusion Scheme . . . . . . . . . 5.3.3 Improved Dynamic Image Fusion Scheme Based on Region-Based Target . . . . . . . . . . . . . . . . . . . . . . . 5.3.4 The Platform of the Visible-Infrared Dynamic Image Fusion System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.5 Experimental Results and Analysis . . . . . . . . . . . . . . . 5.3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Infrared and Visible Dynamic Image Sequence Fusion Based on Region Target Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Generic Pixel-Based Image Fusion Scheme . . . . . . . . . 5.4.3 The Region-Based Target Detection Dynamic Image Fusion Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.4 Experimental Results and Analysis . . . . . . . . . . . . . . . 5.4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Multi-sensor Moving Image Registration Algorithm . . . . . . . . . 5.5.1 Dynamic Image Registration . . . . . . . . . . . . . . . . . . . . 5.5.2 Dynamic Image Registration Method . . . . . . . . . . . . . . 5.6 Criteria-Based Wavelet Moving Image Fusion Displacement Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 Multi-scale Image Fusion Scheme . . . . . . . . . . . . . . . . 5.6.2 Wavelet Transform-Related Theory . . . . . . . . . . . . . . . 5.6.3 Convergence Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.4 Mobility Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.5 Performance Evaluation of Fusion Algorithms . . . . . . . 5.6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Multi-sensor Dynamic Image Fusion Algorithm . . . . . . . . . . . . 5.7.1 Image Fusion Algorithm Based on Low-Redundancy Discrete Wavelet Frame . . . . . . . . . . . . . . . . . . . . . . . 5.7.2 Image Fusion Algorithm Based on Plum-Shaped Discrete Wavelet Framework . . . . . . . . . . . . . . . . . . . 5.7.3 Image Fusion Algorithm Based on Nonlinear Wavelet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Experimental Results and Evaluation . . . . . . . . . . . . . . . . . . . . 5.8.1 LRDWF Image Fusion Algorithm Experiment and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.2 QWWF Image Fusion Algorithm Experiment and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
171 171 172 174 174 175 176 180 181 184 184 185 185 186 189 191 191 191 194 198 199 200 211 214 218 224 224 224 240 257 272 272 281
Contents
xv
5.8.3
Two Kinds of Nonlinear Wavelet Image Fusion Algorithm Experiment and Evaluation . . . . . . . . . . . . . 289 5.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 6
7
Objective Fusion Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Essentiality of Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Research on Evaluation Methods . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Evaluation Index Based on the Regional Fusion . . . . . 6.3.2 Noise Performance Evaluation Index of Fusion System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Traditional Fusion Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Evaluation Metrics Based on the Amount of Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Evaluation Metrics Based on Statistical Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.3 Evaluation Metrics Based on the Visual System . . . . . 6.5 Performance Measure for Image Fusion Considering Region Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.2 Performance Measure for Image Fusion . . . . . . . . . . . 6.5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Region Mutual Information-Based Objective Evaluation Measure for Image Fusion Considering Robustness . . . . . . . . . 6.6.1 Performance Measure for Image Fusion . . . . . . . . . . . 6.6.2 The Image Segment Algorithm . . . . . . . . . . . . . . . . . 6.6.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
297 297 298 299 299
. 303 . 308 . 308 . 309 . 311 . . . .
312 312 313 314
. . . . . .
316 316 317 319 322 323
Image Fusion Based on Machine Learning and Deep Learning . . . . 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Machine Learning Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . 7.2.3 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . 7.2.4 Important Machine Learning Algorithms . . . . . . . . . . . 7.3 Image Fusion Based on Machine Learning . . . . . . . . . . . . . . . . 7.4 Deep Learning Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Image Fusion Based on Deep Learning . . . . . . . . . . . . . . . . . . . 7.6 Future Scope on AI-Based Image Fusion . . . . . . . . . . . . . . . . . 7.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
325 325 327 328 329 329 329 331 337 343 347 348 348
xvi
Contents
Part II 8
9
10
Experimental Examples
Example 1: Medical Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Traditional Medical Image Fusion Methods . . . . . . . . . . . . . . 8.2.1 Local Weighted Voting . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 MV Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.3 Global Weighted Voting . . . . . . . . . . . . . . . . . . . . . . 8.2.4 Semi-Local Weighted Fusion . . . . . . . . . . . . . . . . . . 8.3 Recent Medical Image Fusion Methods . . . . . . . . . . . . . . . . . 8.3.1 Patch-Based Local Weighted Voting Segmentation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.2 Patch-Based Global Weighted Fusion Segmentation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example 2: Night Vision Image Fusion . . . . . . . . . . . . . . . . . . . . 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 A Novel Night Vision Image Color Fusion Method Based on Scene Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 An Evaluation Metric for Color Fusion of Night Vision Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
355 355 359 359 361 362 365 367
. 367 . 369 . 371 . 372
. . 375 . . 375 . . 376 . . 381 . . 385 . . 385
Simulation Platform of Image Fusion . . . . . . . . . . . . . . . . . . . . . . 10.1 Image Fusion Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.1 Introduction to the Image Fusion Simulator . . . . . . . . 10.1.2 Functions of Each Module . . . . . . . . . . . . . . . . . . . . 10.1.3 Demonstration of Simulation . . . . . . . . . . . . . . . . . . . 10.2 Fusion Tracking Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.1 Introduction to the Fusion Tracking Simulator . . . . . . 10.2.2 Functions of Each Module . . . . . . . . . . . . . . . . . . . . 10.2.3 Demonstration of Simulation . . . . . . . . . . . . . . . . . . . 10.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
387 387 387 388 392 395 395 397 399 402 403
About the Authors
Gang Xiao received his bachelor’s degree, master’s degree, and PhD in 1998, 2001, and 2004, respectively. He is currently a full professor at the School of Aeronautics and Astronautics and the director of the Advanced Avionics and Intelligent Information (AAII) Laboratory, Shanghai Jiao Tong University. He was a visiting scholar at the University of California, San Diego (UCSD) in 2010 and Southern Illinois University Edwardsville (SIUE) in 2014. His current research interests include image fusion, target tracking, and avionics integration and simulation. Durga Prasad Bavirisetti received both his MTech and PhD degrees from the VIT University, India, in 2012 and 2016, respectively. Currently, he is pursuing his postdoctoral studies at the School of Aeronautics and Astronautics, Shanghai Jiao Tong University, China. He is the member of Advanced Avionics and Intelligent Information Laboratory. He has published his research on image fusion in several reputed journals and conferences. Dr. Durga Prasad serves as an active reviewer for Information Fusion, IEEE Transactions on Multimedia, IEEE Transactions on Instrumentation and Measurement, IEEE Sensors, Infrared Physics and Technology, Neurocomputing, International Journal of Imaging Systems and Technology, etc. His research interests are in image fusion, target detection, and tracking. Gang Liu is a full professor at the School of Automation Engineering, Shanghai University of Electrical Power. He received his PhD degree from Shanghai Jiao Tong University in 2005. His research interests are image fusion, pattern recognition, and machine learning.
xvii
xviii
About the Authors
Xingchen Zhang received his BSc degree from the Huazhong University of Science and Technology, in 2012, and PhD degree from the Queen Mary University of London, in 2017. He is currently a postdoctoral research fellow at the School of Aeronautics and Astronautics, Shanghai Jiao Tong University, China. He is also the director of the Artificial Intelligence and Image Processing Group and Advanced Avionics and Intelligent Information (AAII) Laboratory. His current research interests include object fusion tracking, image fusion, deep learning, and computer vision.
Part I
Image Fusion Theories
Chapter 1
Introduction to Image Fusion
Abstract Human beings possess wonderful sense to appreciate visuals. Eye plays a key role in supporting various human activities. An image capture of a visual scene always conveys much more information than any other description adhered to it. Human beings have five sensing capabilities or systems as shown in Fig. 1.1. They are eyes, ears, nose, tongue, and skin. These sensors are able to acquire independent information. Eyes can visualize a scene. Ears can sense the data by listening to sounds. Nose can smell the odor of an object. Tongue can sense the taste of an object. Skin can sense the texture and size of the object. As shown in Fig. 1.2, all these five sensing systems act as sensors. Human brain collects data from these individual sensors and fuses or combines information for compact representation or better description about a scenario. This compact data is useful for decision-making and task execution. Data fusion is a process of combining information from several sources for optimal or compact representation of a huge data supporting better description and decision-making. Human brain is the best example of a data fusion system. Even when we take one sensor, for example, eye, it can derive many useful details of a scene by looking at the scenario more than once. The brain can integrate visuals and give details hidden in a single view. Multiple views will always improve the decisions. Whenever we take a snapshot of a scene with our digital camera, we will not be satisfied with a single image. We try to take a few more images of the same scene to have more clarity and information. It is not rare to find that none of these images contain all the required qualities. It is common to feel that all positive aspects of these images need to be combined to get the desired image. It motivates us to fuse images for the desired output. We can use different cameras and fuse images. Likewise, many options are there. Image fusion has a long history and development phase. Research on image fusion has been carried out by adopting various mathematical tools and techniques. It is very important to introduce concepts of image fusion in a systematic way. To this end, in this chapter, fundamental concepts, categories, types, and applications of image fusion are covered. © Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2020 G. Xiao et al., Image Fusion, https://doi.org/10.1007/978-981-15-4867-3_1
3
4
1
(a)
Introduction to Image Fusion
(b)
(d)
(c)
(e)
Fig. 1.1 Human sensors (a) eyes, (b) ears, (c) nose, (d) tongue, (e) skin Ears
Eyes
Tongue
Brain
Nose
Skin
Decision and execution
Fig. 1.2 Human sensor fusion system for decision-making
1.1
History and Development
Image fusion has a history of more than 35 years and has contributed to the image processing community by supporting various application fields. Image fusion research filed has been developing progressively by adopting various mathematical
1.1 History and Development
5
tools and techniques. Here, a brief overview of history and development is presented for an overall understanding of image fusion.
1.1.1
History
In the late nineteenth century, the United States Navy installed the first image fusion prototype on the Memphis submarine (SSN-691), enabling the operator to directly observe all captured images of sensors in the best position. In the Gulf war, the Low Altitude Navigation and Targeting Infrared for Night (LANTIRN) pod with better combat performance was an image fusion system. In 1995, American TI company launched an image fusion system, which was able to combine the forward-looking infrared and low light of the advanced helicopter (AHP) sensor systems. In May 2000, the Boeing avionics flight laboratory successfully demonstrated the multisource information fusion technology and also a function of the integrated avionics system of Joint Strike Fighter (JSF). In October 1999, the satellite jointly launched by China and Brazil “CBERS-01” used CCD camera and infrared multispectral scanner to fuse remote sensing images. At the same time, image fusion also placed a crucial role in other fields such as medical, surveillance, and robotics. In cranial radiotherapy and brain surgery of medical field, fusion was used to combine advantages of multi-modal images. In the security inspection system, multi-source image fusion technology was a good solution for hidden weapon inspection problem. In computer vision, multi-source image fusion technology was used for scene perception of the surrounding environment and for supporting the navigation of robots.
1.1.2
Development
During 1990s, major attention in image processing was drawn by pyramid decomposition based on techniques. In 1980, Burt and Julesz [1] were the first authors who proposed an image fusion algorithm based on pyramids for binocular images. Then, they proposed another fusion algorithm for multi-modal images [2]. Later, Burt and Kolczynski [3] developed an improved approach to fuse images based on pyramid decomposition. Some other well-known image fusion work from Alexander Toet for fusing visible and infrared images depends on various pyramids and wavelet transforms [4–7] for surveillance applications. Later, neural networks were successfully adopted for visible and infrared image fusion by Ajjimarangsee and Huntsberger [8]. An image fusion device was developed for visible and thermal images by Lillquist in 1988 [9]. An integrated analysis of infrared and visible images was implemented by Nandhakumar and Aggarwal [10] for scene interpretation. Multisensor target detection, classification, and segmentation were addressed by Ruck et al. [11] and Rogers et al. [12] using fusion approach. At the same time, Li et al.
6
1
Introduction to Image Fusion
[13] and Chipman et al. [14] developed fusion algorithms using discrete wavelet transform. Koren et al. [15] introduced a new method to fuse multi-sensor images with the help of steerable dyadic wavelet transform. For night vision, Waxmen et al. [16, 17] proposed a visible and thermal imagery fusion algorithm based on biological color vision. Prasad [18] developed a data fusion method for visual and range data for robotics and machine intelligence. Dasarathy [19] implemented various fusion strategies for enhancing decision reliability in a multisensory environment. Numerous transform-based methods such as stationary wavelet [20], complex wavelet [21], curvelet [22], contourlet [23], and non-sub-sampled contourlet [24] were utilized for fusion. Optimization-based methods [25, 26] were also adopted for image fusion. Recently, many researchers implemented image fusion methods based on filtering-based decompositions. Cross bilateral filter [27], guided image filter [28], rolling guided image filter [29], anisotropic diffusion [30], and weighted least square filter [31] based techniques are a few notables among them.
1.2
Image Fusion Fundamentals
An image is a two-dimensional quantity. It can be viewed as the combination of illumination and reflectance. Illumination stands for the amount of light from the source falling on the object, and reflectance corresponds to the amount of light that is reflected from the same object. A sensor is a device which converts incoming energy into an electrical signal as shown in Fig. 1.3a. In case of imaging sensors, reflected energy will be converted into a corresponding electrical signal. As displayed in Fig. 1.3b, the sensor array gives a large number of signals. The sampling in the spatial domain is performed by the sensor array. These signals are quantized to obtain a digital image representation. This entire process is termed as digitization which is shown in Fig. 1.3c. Thus, the visual information present in a scene can be captured as a digital image f(x, y) using a sensor array as shown in Fig. 1.3b. All elements in the sensor array will be of the same modality. Hence image capture using a sensor array is simply referred to as single sensor image capture. We may be interested in the details of a scene using multiple sensor arrays, each operates in a different wavelength range. This is simply termed as multi-sensor image capture. In the following discussion, it can be noted that the term sensor is used simply in place of a sensor array. Next, fundamental concepts such as problem specification (necessity to combine information), definition, and objective of image fusion are described.
1.2.1
Necessity to Combine Information of Images
Single-sensor image capture may not always provide complete information about a target scene. Sometimes we need two or more images of the same scene for better
1.2 Image Fusion Fundamentals
7
Light Sensing material
Power in
Electrical signal (a)
y
x
(b) Electrical signal
Digitization process (c)
f ( x, y) Intensity value at a particular location
f (x, y)
Fig. 1.3 Single sensor imaging. (a) Single sensor, (b) Single sensor array and its corresponding image represented as a matrix, (c) Digitization process
visual understanding. These images may be captured by using a single sensor or by using multiple sensors of different modalities, depending on the application [32]. These image captures provide complementary or visually different information. A human observer cannot reliably combine and observe a composite image from
8
1
Introduction to Image Fusion Sensor Plane
Object
P2
P1
P3
Lens Focal Point
p2 p1 p3 Circle of confusion
Fig. 1.4 Image formation model
these multiple image captures. Useful or complementary information of these images should be integrated into a single image to provide a more accurate description of the scene, than any one of the individual source images. Two different examples where we need to capture multiple images and combine the required information are discussed below. The first one is single-sensor imaging in which multiple images of the same scene are captured using a sensor to extract more details of a targeted scene. Another one is multi-sensor imaging which requires multiple images using different sensors for the same propose. 1. Example 1: Single-sensor imaging In digital photography, objects of a scene at different distances cannot be focused at the same time. If the lens of a camera focuses on an object at a certain distance, then other objects appear to be blurred. Image formation model of a sensor or a camera system is displayed in Fig. 1.4. If a point P1 on an object is focused, then a dot p1 corresponding to that particular point will be generated on the sensor plane. Therefore, all the points at the same distance of P1 from lens will appear sharp. The region of acceptable sharpness of the object is referred to as the depth of field (DoF). Let us consider another point P2 on the object behind the point P1. Since this point P2 is out of DoF, it will generate a dot p2 somewhere before the sensor plane. As lens distance from the P1 increases, the object will appear more blurred. As shown in Fig. 1.4, P3 is located in front of P1 on the object plane. Since P3 falls out of the DoF, it produces a dot p3 behind the sensor plane, resulting in an unsharp dot on the image plane (sensor plane). For better visual quality, images should have all objects in focus. One of the best approaches to do this is capture
1.2 Image Fusion Fundamentals
Gamma Rays
X-Rays
9
Ultraviolet
Near IR
Visible((VI)
Short wave IR
Infrared (IR)
Mid wave IR
Microwave
Long wave IR
Radio
Far IR
Fig. 1.5 The electromagnetic spectrum Table 1.1 Electromagnetic wavelength range
Electromagnetic waves Gamma rays X-rays Ultraviolet Visible (VI) Infrared (IR) Microwave Radio
Wavelength, λ (m) 10 15 to 10 11 10 11 to 10 9 10 9 to 4 10 7 4 10 7 to 7 10 7 10 7 to 10 3 10 3 to 0.1 0.1 to 105
7
images with different focusing conditions and combine them to generate an all-inone focus image. Now we discuss another example where we need to acquire multiple images using different modalities. 2. Example 2: Multi-sensor imaging Visual information present in a scene can be captured as an image using a charge coupled device (CCD). The wavelength of the visible (VI) light that can be captured by CCD sensor ranges from 4 10 7 to 7 10 7 m. However, in most of the image processing and computer vision applications, CCD image alone is not sufficient to provide all the details of the scene. To extract more details, complementary images of the same scene should be captured by using multiple sensors of different modalities. This can be done by capturing images in wavelengths other than the VI band of the electromagnetic spectrum. The electromagnetic spectrum is illustrated in Fig. 1.5, and the corresponding wavelengths are presented in Table 1.1. As discussed before, VI light wavelength ranges from 4 10 7 to 7 10 7 m. The infrared (IR) spectrum wavelength ranges from 7 10 7 to 10 3 m. IR spectrum is further divided into five sub-bands as near, short, mid wave, long wave, and far IR bands. Usually objects with more than 0 K emits radiations throughout the IR spectrum. The energy emitted by these objects can be sensed by IR sensors and displayed as images for the end users. However, these images alone are not sufficient to provide an accurate description of the targeted scene. Hence, information from VI spectrum also needs to be integrated for better scene understanding using fusion algorithms.
10
1
1.2.2
Introduction to Image Fusion
Definition of Image Fusion
Image fusion has numerous definitions. Among them, some well-known definitions are presented here below. Produce a single image from a set of input images. The fused image should have complete information which is more useful for human or machine perception [33].
or Generation of a result which describes the scene better than any images captured in a single shot [34, 35].
or Image fusion is a process of combining images, obtained by sensors of different wavelengths simultaneously viewing of the same scene, to form a composite image. The composite image is formed to improve image content and to make it easier for the user to detect, recognize, and identify targets and increase his situational awareness [36].
or Image fusion is the process of merging or combining or integrating useful or complementary information of several source images such that the resultant image provides a more accurate description about the scene than any one of the individual source images [37].
1.2.3
Image Fusion Objective
Image fusion process can also be explained using set theory representation [38, 39] as transferring of information of two sets. The Venn diagram representation of this process is shown in Fig. 1.6. Each set stands for the information contribution corresponding to that particular image. Sets A and B represent the information contribution by the two source images A and B, respectively. Set F corresponds to the information contribution by the fused image F. Ideally, this fused image should contain all the information from source images. However, it is not possible. In practice, not all the source image information is transferred into the fused image. Only required and necessary information will be transferred. Information loss of source images may occur during the fusion process. Simultaneously, fusion process itself may introduce extra information or false information called “fusion artifacts” into the fused image. In Fig. 1.6, blue portion represents the information transferred from source images in the fused image which is simply referred to as “fusion gain or fusion score.” Green portion indicates the information lost during the fusion process which is termed as “fusion loss.” This information in source images is not present in the fused image, and red portion corresponds to unnecessary information (fusion artifacts) introduced in the fused image. It has no relevance to the source images. Hence, fusion algorithm should consider all these factors for better performance.
1.3 Categorization
11
Fig. 1.6 Graphical illustration of image fusion process
The main objective of an image fusion algorithm is to generate a visually good fused image with less computational time, by maximizing the fusion gain and minimizing the fusion loss and fusion artifacts.
1.3
Categorization
Image fusion algorithms are broadly divided into three categories: pixel, feature, and decision levels. Pixel-level fusion is performed on each input image pixel by pixel. Pixel-level fusion methods can be implemented in the spatial domain [40, 41] or in a transform domain [6, 13, 14]. In the spatial domain, these methods can be implemented pixel by pixel. However, transform domain methods work by a coefficient. For a small change in the frequency coefficient, the whole resultant image will be effected. To obtain a better fused image without artifacts, best transform technique with a suitable fusion rule should be chosen. Substantial work has been contributed at pixel level because of their effectiveness and ease of implementation compared to other level fusion schemes. At feature level, fusion is executed on the extracted features of source images. Feature-level fusion schemes usually consider segmented regions based on different properties such as entropy-, variance-, and activity-level measurements [21, 42, 43]. These algorithms give a robust performance in the presence of noise [44]. At decision level, fusion is performed on probabilistic decision information of local decision makers. These decision makers are in turn derived from the extracted features. These fusion techniques integrate information from source images based on decision maps derived from the features. Relational graph matching is used for image fusion by Williams et al. [45], and organization of relational model is also used for decision level fusion by Shapiro [46].
12
1
Introduction to Image Fusion
Number of articles published in Ei Compendex Web
Key words: image fusion 16000
14858
14000 12000 10000
8784
8000 6000 3900 4000 1805
2000
98
236
664
1980-1984
1985-1989
1990-1994
0 1995-1999
2000-2004
2005-2009
2010-2015
(a) Key words: video fusion Number of articles published in Ei Compendex Web
3000 2510 2500 2000
1702
1500 1000 624 500
210 9
12
54
1980-1984
1985-1989
1990-1994
0 1995-1999
2000-2004
2005-2009
2010-2015
(b) Fig. 1.7 Number of articles published in Ei Compendex Web for a duration of 35 years from 1980 to 2015. Keywords used for searching are (a) image fusion and (b) video fusion
Even though image and video fusion research has started 35 years back, today lot of contributions are still happening in this area because of its diverse applications. The number of articles published in Ei Compendex engineering literature database from 1980 to 2015 for a duration of 35 years are displayed in Fig. 1.7a, b. The keywords used for searching are “image fusion” and “video fusion.” From the statistics, it is obvious that image and video fusion is an active research area and is following an increasing trend.
1.4 Fundamental Steps of an Image Fusion System
1.4
13
Fundamental Steps of an Image Fusion System
An image fusion system mainly consists of eight fundamental steps as shown in Fig. 1.8. They are: (1) image acquisition, (2) pre-processing, (3) image registration, (4) image fusion, (5) post-processing, (6) fusion performance evaluation, (7) storage, and (8) display. 1. During image acquisition stage, visually different or complementary images will be captured using a single sensor or multiple sensors of different modalities. 2. In the pre-processing step, noise or artifacts introduced in the source images during image acquisition process are removed or reduced. 3. Image registration is the process of aligning or arranging more than one images of the same scene according to a coordinate system. In this process, one of the source images will be taken as a reference image. It is also termed as the fixed image. Then geometric transformation will be applied on remaining source images to align them with the reference image. 4. Fusion process can be performed at three levels [32]: pixel, feature, and decision level. Pixel-level fusion schemes are preferable for fusion compared to another level of approaches because of their effectiveness and ease of implementation. 5. During the fusion process, some required information of source images may be lost, and visually unnecessary information or artifacts may be introduced into the fused image. Hence, fusion algorithms need to be assessed and evaluated for better performance. This performance analysis can be carried out by evaluating them qualitatively by visual inspection and quantitatively using fusion metrics. 6. In post-processing, fused images are further processed depending on the application. This processing may involve segmentation, classification, and feature extraction. 7. Source images, fused images, post-processing results, and their corresponding data will be stored with help of storage devices such as hard disks and flash memories. 8. Finally, fused images and post-processing results such as segmented images, features, and classification results can be displayed using devices like LCD and LED monitors.
Fusion performance
Image Acquisition
Preprocessing
Image registration
Image fusion
PostProcessing
Display Storage
Fig. 1.8 Fundamental steps in image fusion system
14
1
1.5
Introduction to Image Fusion
Types of Image Fusion Systems
Image fusion systems are broadly classified into single-sensor image fusion system (SSIF) (Fig. 1.9a) and multi-sensor image fusion system (MSIF) (Fig. 1.9b). In SSIF, using a single sensor, the sequence of images of the same scene are captured, and useful information of these several images is integrated into a single image by the process of fusion. In noisy environment and improper illumination conditions, human observers may not be able to detect objects of interest which can be easily found from fused images of that targeted scene. Digital photography applications such as multi-focus imaging and multi-exposure imaging [47] come under SSIF. However, these fusion systems have their drawbacks. They depend on conditions like illumination and dynamic range of the sensors. For example, VI sensor like the
(a) Sensor
Scene Fusion
Fused image
Image captures
(b)
Sensor 1
Sensor 2 Scene Sensor n
Fusion
Fused image
Fig. 1.9 Types of image fusion systems: (a) SSIF system and (b) MSIF system
1.6 Applications
15
digital camera can capture visually good images in high-illumination conditions. However, they fail to capture under improper illumination conditions such as night, fog, and rain. To overcome the shortcomings of SSIF, MSIF systems are introduced to capture images in adverse environment conditions. In MSIF, multiple images of the same scene are captured using various sensors of different modalities to acquire complementary information. For example, VI sensors are good in high-lighting conditions. However, IR sensors are able to capture images in low-lighting conditions. Required and necessary information of these images is combined into a single image by the fusion process. Applications such as medical imaging, military, navigation, and concealed weapon detection fall under MSIF category. Various advantages of MSIF systems are as follows: 1. Reliable and accurate information. These MSIF systems provide a reliable and accurate description of the scene compared to source images. 2. Robust performance. Even if one sensor of the MSIF fails, this system generates a composite image by considering the redundant information of other working sensors. So it is robust. 3. Compact representation. The fused image of the MSIF is compact and provides all the necessary information of source images in a single image. 4. Extended operating range. The range of operation is extended by capturing images at different operating conditions of the sensors. 5. Reduced uncertainty. Combined information of various sensors reduces the uncertainty present in individual captures of the scene.
1.6
Applications
Image fusion finds applications in various fields such as digital photography [33, 47], medical imaging [48], remote sensing [34], concealed weapon detection [27], military [49], night vision [16], autonomous vehicles [50], visual inspection in industrial plants [51], ambient intelligence [52], and person re-identification [53]. As shown in Fig. 1.10, we consider four scenarios like digital photography, medical imaging, concealed weapon detection, and military to explain how image fusion is useful in these applications. In digital photography [33, 47], a scene cannot be focused at the same time due to inherent system limitations. If we focus on one object, we may lose information about other objects and vice versa. Figure 1.10a shows foreground and background focused images of a bottle dataset. Foreground focused image provides information about the bottle in the foreground, whereas background focused image gives information of the bottle in the background of the same scene. These individual images do not provide complete information about the targeted scene. For better visual understanding, focused regions of these two images have to be combined to result in an allin-one focused image.
16
1
Introduction to Image Fusion
(a)
(b)
(c)
(d)
Fig. 1.10 Fusion results. (a) Multi-focus imaging, (b) medical imaging, (c) concealed weapon detection, (d) battle field monitoring in military
In medical imaging, different modalities like positron emission tomography (PET), single-photon emission tomography (SPECT), computer tomography (CT), and magnetic resonance imaging (MRI) are used to capture complementary information. These individual image captures do not provide all required details. Therefore, information from different captures has to be incorporated into a single image. Figure 1.10b shows CT and MR images of a human brain. As shown in Fig. 1.10b,
1.6 Applications
17
CT can capture bone structure or hard tissue information, whereas MRI can capture soft tissue information present in the brain. For a radiologist, a fused image obtained from these two images will be helpful in computer-assisted surgery and radio surgery for better diagnosis and treatment. In concealed weapon detection, visible light (VI) and millimeter wave (MMW) sensors are used to capture complementary images. In Fig. 1.10c, the left one is a VI image and the middle one is an MMW image. VI image conveys information of three persons. However, it is not providing any sign of the existence of a weapon. MMW image conveys the weapon information alone. From these individual images, it is difficult to identify, which person concealed the weapon. To accurately locate and detect the weapon, useful information from these complementary images has to be combined in a single image. In military and navigation, VI and IR imaging sensors are used to acquire complementary information of the targeted scene. Due to bad weather circumstances, such as rain and foggy winter, the images captured using VI sensors alone are not sufficient to provide the essential information about a situation. VI image is able to provide background details such as vegetation, texture, area, and soil. In contrast, IR sensors provide information about the foreground of weapons and enemy and vehicle movements. For the detection and localization of a target as well as improvement of situational awareness, information from both IR and VI images needs to be merged in a single image. In Fig. 1.10d, the first image is VI output and the second one is IR image of a battle field. VI image provides the information of a battle field. However, it is incapable of identifying the person near fencing. IR image identifies the person existence but cannot provide sufficient visual information of the battle field. If we integrate useful information from these images in a single image, then we can easily identify and localize enemy or target. Hence, for a better understanding of the scene, we need to combine essential visual information of source images to obtain a meaningful image. As we discussed, image fusion [32] is a phenomenon of integrating useful information of source images into the fused image. In Fig. 1.10, the third images from left are the fused images of various applications. An all-in-one focused image in Fig. 1.10a, obtained from two out-of-focus images provides visually more information. The fused image in Fig. 1.10b would assist a radiologist in better diagnosis and treatment than individual CT and MR images. The combined image in Fig. 1.10c is giving information about the person as well as the concealed weapon. From the fused image in Fig. 1.10c, one can say that the third person from the left concealed the weapon inside his shirt. From the fused image in Fig. 1.10d, one can identify an enemy moment on the battle field near the fencing. In the following chapters, these applications are explained further for in-depth understanding.
18
1.7
1
Introduction to Image Fusion
Summary and Outline of the Book
In this chapter, an overview of image fusion is presented. In particular, history, development, problem specification, definitions, objectives, and categories of image fusion are explained in detail. This chapter also provides an overview of image fusion system, and their components as well as types. In addition, it also gives a brief summary of image fusion applications. The remaining contents of the book are organized as follows. Chapters 2–4 contribute an in-depth discussion on pixel-, feature-, and decision-level fusion, respectively. Chapter 5 provides a detailed discussion of multi-sensor dynamic image fusion. Chapter 6 summarizes the existing fusion quantitative metrics and also introduces new metrics developed by the authors. Nowadays, machine learning especially deep learning is making significant changes in the entire image processing community including image fusion. In Chap. 7, an attempt is made to give an overview of image fusion based on these concepts. As mentioned before, another important aspect of this book is experimental examples. Chapters 8 and 9 present experimental examples of medical imaging and night vision. Finally, an image fusion platform and a fusion tracking platform are introduced in Chap. 10.
References 1. P. Burt, B. Julesz, A disparity gradient limit for binocular fusion. Science 208, 615–617 (1980) 2. P.J. Burt, E.H. Adelson, The Laplacian pyramid as a compact image code. IEEE Trans. Commun. 31(4), 532–540 (1983) 3. P.J. Burt, R.J. Kolczynski, Enhanced image capture through fusion. ICCV, 173–182 (1993) 4. A. Toet, Image fusion by a ration of low-pass pyramid. Pattern Recogn. Lett. 9(4), 245–253 (1989) 5. A. Toet, V. Ruyven, Merging thermal and visual images by a contrast pyramid. Opt. Eng. 28(7), 789–792 (1989) 6. A. Toet, Hierarchical image fusion. Mach. Vis. Appl. 3(1), 1–11 (1990) 7. A. Toet, Adaptive multi-scale contrast enhancement through non-linear pyramid recombination. Pattern Recogn. Lett. 11(11), 735–742 (1990) 8. P. Ajjimarangsee, T.L. Huntsberger, Neural network model for fusion of visible and infrared sensor outputs, in Sensor Fusion: Spatial Reasoning and Scene Interpretation, vol. 1003, (1989), pp. 153–161 9. R.D. Lillquist, Composite visible/thermal-infrared imaging apparatus, Google Patents, 14 Jun 1988 10. N. Nandhakumar, J.K. Aggarwal, Integrated analysis of thermal and visual images for scene interpretation. IEEE Trans. Pattern Anal. Mach. Intell. 10(4), 469–481 (1988) 11. D.W. Ruck, S.K. Rogers, J.P. Mills, M. Kabrisky, Multisensor target detection and classification. Sens. Fusion 931, 14–22 (1988) 12. S.K. Rogers, C.W. Tong, M. Kabrisky, J.P. Mills, Multisensor fusion of ladar and passive infrared imagery for target segmentation. Opt. Eng. 28(8), 288881 (1989)
References
19
13. H. Li, B.S. Manjunath, S.K. Mitra, Multisensor image fusion using the wavelet transform. Graph. Models Image Process. 57, 235–245 (1995) 14. L.J. Chipman, T.M. Orr, L.N. Graham, Wavelets and image fusion, in Proceedings of International Conference on Image Processing, 1995, vol. 3, (1995), pp. 248–251 15. I. Koren, A. Laine, F. Taylor, Image fusion using steerable dyadic wavelet transform, in Proceedings of International Conference on Image Processing, 1995, vol. 3, (1995), pp. 232–235 16. A.M. Waxman et al., Color night vision: Fusion of intensified visible and thermal IR imagery, in Synthetic Vision for Vehicle Guidance and Control, vol. SPIE-2463, (1995), pp. 58–68 17. A.M. Waxman et al., Color night vision: Opponent processing in the fusion of visible and IR imagery. Neural Netw. 10(1), 1–6 (1997) 18. K.V. Prasad, Data fusion in robotics and machine intelligence. Control. Eng. Pract. 1(4), 753–754 (1993) 19. B.V. Dasarathy, Fusion strategies for enhancing decision reliability in multisensor environments. Opt. Eng. 35(3), 603–616 (1996) 20. O. Rockinger, Image sequence fusion using a shift-invariant wavelet transform. Proc. Int. Conf. Image Process. 3, 288–291 (1997) 21. P. Hill, N. Canagarajah, D. Bull, Image fusion using complex wavelets, in 13th Br. Mach. Vis. Conf., (2002), pp. 487–496 22. M. Choi, R.Y. Kim, M.R. Nam, H.O. Kim, Fusion of multispectral and panchromatic satellite images using the curvelet transform. IEEE Geosci. Remote Sens. Lett. 2(2), 136–140 (2005) 23. M. Qiguang, W. Baoshu, A novel image fusion method using contourlet transform, in International Conference on Communications, Circuits and Systems Proceedings, 2006, vol. 1, (2006), pp. 548–552 24. B.Y. Bin Yang, S.L.S. Li, F.S.F. Sun, Image fusion using nonsubsampled contourlet transform, in Fourth Int. Conf. Image Graph. (ICIG 2007), (2007), pp. 719–724 25. R. Shen, I. Cheng, J. Shi, A. Basu, Generalized random walks for fusion of multi-exposure images. IEEE Trans. Image Process. 20(12), 3634–3646 (2011) 26. M. Xu, H. Chen, P.K. Varshney, An image fusion approach based on Markov random fields. IEEE Trans. Geosci. Remote Sens. 49(12), 5116–5127 (2011) 27. B.K. Shreyamsha Kumar, Image fusion based on pixel significance using cross bilateral filter. Signal Image Video Process., 1193–1204 (2013) 28. S. Li, X. Kang, J. Hu, Image fusion with guided filtering. IEEE Trans. Image Process. 22(7), 2864–2875 (2013) 29. A. Toet, M.A. Hogervorst, Multiscale image fusion through guided filtering, in SPIE Security + Defence, (2016), pp. 99970J–99970J 30. D.P. Bavirisetti, R. Dhuli, Fusion of infrared and visible sensor images based on anisotropic diffusion and Karhunen-Loeve transform. IEEE Sensors J. 16(1), 203–209 (2016) 31. Y. Jiang, M. Wang, Image fusion using multiscale edge-preserving decomposition based on weighted least squares filter. IET Image Process. 8(3), 183–190 (2014) 32. A. Ardeshir Goshtasby, S. Nikolov, Image fusion: Advances in the state of the art. Inf. Fusion 8 (2 SPEC Issue), 114–118 (2007) 33. Z. Zhang, R.S. Blum, A categorization of multiscale-decomposition-based image fusion schemes with a performance study for a digital camera application. Proc. IEEE 87(8), 1315–1326 (1999) 34. C. Pohl, J.L. Van Genderen, Multisensor image fusion in remote sensing: Concepts, methods and applications. Int. J. Remote Sens., 37–41 (2010) 35. M.B.A. Haghighat, A. Aghagolzadeh, H. Seyedarabi, Multi-focus image fusion for visual sensor networks in DCT domain. Comput. Electr. Eng. 37(5), 789–797 (2011) 36. Q. Miao, J. Lou, P. Xu, Image fusion based on NSCT and Bandelet transform, in Proceedings of the 2012 8th International Conference on Computational Intelligence and Security, CIS 2012, (2012), pp. 314–317
20
1
Introduction to Image Fusion
37. D.P. Bavirisetti, R. Dhuli, Fusion of infrared and visible sensor images based on anisotropic diffusion and Karhunen-Loeve transform. IEEE Sens. J. 16(1) (2016) 38. C.S. Xydeas, Objective image fusion performance measure. Electron. Lett. 36(4), 308–309 (2000) 39. V. Petrovic, C. Xydeas, Objective image fusion performance characterisation, in Tenth IEEE International Conference on Computer Vision, 2005. ICCV 2005, (2005), pp. 1866–1871 40. A.A. Goshtasby, 2-D and 3-D Image Registration: For Medical, Remote Sensing, and Industrial Applications (John Wiley & Sons, Hoboken, NJ, 2005) 41. S. Li, J.T. Kwok, Y. Wang, Using the discrete wavelet frame transform to merge Landsat TM and SPOT panchromatic images. Inf. Fusion 3(1), 17–23 (2002) 42. Z. Zhang, R. Blum, Region-based image fusion scheme for concealed weapon detection. Annu. Conf. Inf. Sci. Syst., 168–173 (1997) 43. G. Piella, A general framework for multiresolution image fusion: From pixels to regions. Inf. Fusion 4(4), 259–280 (2003) 44. G. Piella, A region-based multiresolution image fusion algorithm, in Proc. Fifth Int. Conf. Inf. Fusyion. FUSION 2002. (IEEE Cat.No.02EX5997), vol. 2, (2002), pp. 1557–1564 45. M.L. Williams, R.C. Wilson, E.R. Hancock, Deterministic search for relational graph matching. Pattern Recogn. 32(7), 1255–1271 (1999) 46. L.G. Shapiro, Organization of relational models, in Proceedings-International Conference on Pattern Recognition, (1982) 47. S. Li, X. Kang, Fast multi-exposure image fusion with median filter and recursive filter. IEEE Trans. Consum. Electron. 58(2), 626–632 (2012) 48. Q. Guihong, Z. Dali, Y. Pingfan, Medical image fusion by wavelet transform modulus maxima. Opt. Express 9(4), 184–190 (2001) 49. W. Gan et al., Infrared and visible image fusion with the use of multi-scale edge-preserving decomposition and guided image filter. Infrared Phys. Technol. 72, 37–51 (2015) 50. Q. Li, L. Chen, M. Li, S.L. Shaw, A. Nüchter, A sensor-fusion drivable-region and lanedetection system for autonomous vehicle navigation in challenging road scenarios. IEEE Trans. Veh. Technol. 63(2), 540–555 (2014) 51. B. Majidi, B. Moshiri, Industrial assessment of horticultural products’ quality using image data fusion, in Proceedings of the 6th International Conference on Information Fusion, FUSION 2003, vol. 2, (2003), pp. 868–873 52. H. Irshad, M. Kamran, A.B. Siddiqui, A. Hussain, Image fusion using computational intelligence: A survey, in 2009 Second Int. Conf. Environ. Comput. Sci., (2009), pp. 128–132 53. L. Zheng, S. Wang, L. Tian, F. He, Z. Liu, Q. Tian, Query-adaptive late fusion for image search and person re-identification, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (2015), pp. 1741–1750
Chapter 2
Pixel-Level Image Fusion
Abstract Image fusion can be performed at various levels of information representation. A generic classification is given as pixel, feature, and decision levels. In this chapter, our focus in on pixel-level image fusion. The pixel-level fusion mainly finds applications in image processing where human perception is given due priority than machine vision. Possible applications are digital photography, medical imaging, remote sensing, surveillance, pilot navigation, and so on. In this chapter, first a brief introduction, different categories, and traditional pixel-level image fusion approaches are presented. Next, new image fusion methods based on pyramids, wavelet filter banks, and edge-preserving decomposition methods are introduced. Both qualitative and quantitative analyses are considered for in-depth discussion on these methods.
2.1
Introduction
If source images are combined by performing pixel-wise operations, then it is referred to as pixel-level image fusion. The main objective of any pixel-level image fusion algorithm is to generate a visually good fused image with less computational time along with the following properties: 1. It has to transfer complementary or useful information of source images into the composite image. 2. It should not lose source image information during the fusion process. 3. It should not introduce artifacts into the fused image. In the view of image fusion objective, over the past few decades, various fusion algorithms have been proposed. As shown in Fig. 2.1, image fusion is broadly classified as single-scale and multi-scale fusion methods.
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2020 G. Xiao et al., Image Fusion, https://doi.org/10.1007/978-981-15-4867-3_2
21
22
2 Pixel-Level Image Fusion
Fig. 2.1 A generic classification of image fusion methods
Image fusion methods
Single-scale
2.1.1
Multi-scale
Single-Scale Image Fusion
Here, the fusion is performed on present scale source images without further decomposition. These are also referred to spatial domain techniques. A brief literature on spatial domain methods is tabulated with their advantages and drawbacks in Table 2.1. Simple operators (Ardeshir and Nikolov [1]) such as average, weighted average, minimum, maximum, and morphological operators are used for fusion. In the simple average method, the resultant fused image F(i, j) is obtained by calculating the pixel-wise average operation on input images A(i, j) and B(i, j), as in Eq. (2.1). F ði, jÞ ¼ ðAði, jÞ þ Bði, jÞÞ=2
ð2:1Þ
In the weighted average method, the resultant fused image F(i, j) is obtained by computing the pixel-wise weighted average operation on input images, as in Eq. (2.2). F ði, jÞ ¼
m X n X i¼0
wAði, jÞ þ ð1 wÞBði, jÞ
ð2:2Þ
j¼0
where w is the weight factor. In the selective maximum method, the resultant fused image F(i, j) is obtained by applying the pixel-wise maximum operation on input images. F ði, jÞ ¼ max ðAði, jÞ, Bði, jÞÞ
ð2:3Þ
In the selective minimum method, the resultant fused image is obtained by calculating the pixel-wise minimum operation on input images. F ði, jÞ ¼ min ðAði, jÞ, Bði, jÞÞ
ð2:4Þ
These methods are easy to implement. But, they may produce brightness or color distortions into the fused image. Principal component analysis (PCA) [2], independent component analysis (ICA) [3], and intensity-hue-saturation (IHS) [4] are some of the well-known methods in
2.1 Introduction
23
Table 2.1 Single-scale fusion methods, their advantages, and drawbacks Spatial domain methods Average, minimum, maximum, and morphological (Ardeshir and Nikolov [1]) operators Principal component analysis (PCA) (Yonghong [2]), independent component analysis (ICA) (Mitianoudis and Stathaki [3]), intensity-hue-saturation (IHS) (Tu et al. [4]) Focus measure (Huang and Jing [5]), bilateral sharpness criteria (Tian et al. [6]) Optimization methods (Shen et al. [7]; Xu and Varshney [8])
Advantages Easy to implement
Drawbacks Reduce the contrast or produce brightness or color distortions
Computationally efficient
May suffer from spectral distortion. May give desirable results for only few fusion datasets
May produce desirable results
Applicable to a few datasets. Computationally expensive
May produce desirable results
Take multiple iterations. Computationally expensive. Over smoothen fused image
the spatial domain category. These methods may suffer from the spectral distortion and give desirable results only for a few fusion datasets. Focus measure-based approaches [5] are famous in this class. Here, source images are divided into blocks, and various focus measures are employed to select the best among image blocks. Variance, energy of image gradient, Tenenbaum algorithm (Tenengrad), energy of Laplacian (EOL), sum-modified Laplacian (SML), and spatial frequency (SF) are various focus measures successfully used for the fusion. In Huang and Jing [5], it is observed that SML gives superior performance compared to other focus measures. However, it is computationally expensive. To address this problem, the bilateral gradient-based sharpness criterion (BGS) (Tian et al. [6]) is used for fusion. But, it failed to produce a better focused image. In addition, it is computationally demanding. To overcome these problems, optimization-based fusion Schemes [7, 8] are proposed. These methods take multiple iterations to find an optimal solution (fused image). These optimization methods may over smooth the fused image because of multiple iterations.
2.1.2
Multi-Scale Image Fusion
Multi-scale fusion methods are developed to overcome drawbacks of the single-scale fusion. Multi-scale decomposition (MSD) extracts the salient information (visually significant information) of source images for the fusion purpose. MSD methods perform better than single-scale fusion methods due to following facts: 1. Human visual system (HVS) is sensitive to changes in the saliency information such as edges and lines. These features can be well extracted and fused with help of the MSD.
24
2 Pixel-Level Image Fusion
Fig. 2.2 The multi-scale image decomposition (MSD) (ψ)
2. It offers better spatial and frequency resolution. In the MSD, source images are decomposed into approximation and detail coefficients/layers at several scales. For example, as displayed in Fig. 2.2, a given source image I is decomposed into approximation coefficient C 1A and detail coefficient C1D at level 1 (L1). C1A at L1 is further decomposed into C 2A and C 2D at level 2 (L2). In general, approximation and detail coefficients at nth decomposition level can be represented as C nA and C nD , respectively. This MSD process is called as analysis, and it is represented as ψ. Its inverse process is termed as the synthesis which is indicated as ψ 1. In the multi-scale image fusion, source images I1 and I2 are decomposed into approximation and detail coefficients as shown in Fig. 2.3. Approximation and detail coefficients of the source image I1 at ith level are 1 1 and CiD , respectively. Similarly for I2, they can be represented as CiA i 2 i 2 represented as C A and C D . This decomposition process is called as analysis. Fusion is performed on these decomposed coefficients to obtain final approximation F F coefficient CiA and detail coefficient CiD by employing various fusion rules.
2.1 Introduction
25
Fig. 2.3 A general block diagram of the multi-scale image fusion
All these fused coefficients at different levels will be combined to obtain the fused image F. This process is termed as synthesis. Multi-scale image fusion methods are further classified as: 1. Pyramid-based fusion. 2. Wavelet transform-based fusion. 3. Filtering-based fusion.
2.1.2.1
Pyramid-Based Fusion
During 1990s, pyramid decomposition-based fusion methods are introduced by Akerman [9]. The basic idea of these methods is explained as follows: First, decompose source images into successive sub-images using some operations such as blurring and down sampling. Next, apply fusion rules on these decomposed sub-images. Finally, reconstruct the fused image from these fused sub-images. The general block diagram of the pyramid based fusion is depicted in Fig. 2.4. As shown in the figure, source images I1 and I2 are blurred using linear filter and down sampled by 2 along rows and columns. This process is given as
C iþ1 P
1
¼ ¼
h h
CiP ðx, yÞ CiP ðx, yÞ
1 2
wðx, yÞ wðx, yÞ
i i
#2 #2
, ,
Ciþ1 P
2
i ¼ 0, 1, 2, . . . N:
ð2:5Þ
1 where Ciþ1 represents sub-images obtained from pyramid decomposition of the P source image I1 at (i + 1) ‐ th level which depends on its previous level sub-image i 1 1 represents the input image I1. The convolution operation is CP ðx, yÞ . C0P represented by . w is a linear filter, and Nrepresents the number of levels. The same is true for the source image I2 as well. Various fusion rules can be employed on these F at various levels Li. decomposed sub-images to obtain fused sub-images C iP Then pyramid is reconstructed back from these fused sub-images to get the fused image F.
26
2 Pixel-Level Image Fusion
Fig. 2.4 A general block diagram of pyramid-based image fusion
Gradient (GRAD) [10], Laplacian [11], morphological difference [12], ratio (RATIO) [13], contrast [14], and filter subtract decimate (FSD) pyramid [10, 11] based methods are well known methods in this class. These methods may produce halo effects near edges.
2.1 Introduction
2.1.2.2
27
Wavelet Transform-Based Fusion
Succeeding fusion schemes in the multi-resolution category is discrete wavelet transform (DWT) decompositions [15]. DWT is preferred over pyramid due to various advantages. It provides compact representation and directional information of a given image. These qualities make DWT suitable for the purpose of fusion. Wavelet methods produce less blocking effects when compared to pyramid methods. In DWT, each source image is decomposed into wavelet coefficients at various levels. By using different fusion rules, these wavelet coefficients are fused. Finally, inverse wavelet transform is applied on these final fused wavelet coefficients to obtain the desired fused image. A general block diagram of wavelet-based fusion is shown in Fig. 2.5. As depicted in the figure, source image I1 is decomposed into a set of four sub-images in various directions. {CLL}1 is an approximation coefficient of I1 which represents the low frequency content. {CLH}1, {CHL}1, and {CHH}1 indicate detail coefficients in horizontal, vertical, and diagonal directions, respectively. Approximation coefficient {CLL}1 at level 1 will be further decomposed into approximation coefficient and detail coefficients in horizontal, vertical and diagonal directions at level 2, and so on. The same thing is true for source image I2 as well. By employing various fusion rules, wavelet coefficients of source images at various levels will be combined. From these fused wavelet coefficients, final fused image F is generated by employing inverse wavelet transform on them. DWT is shift variant because of its multi-rate operations. This shift variant property may introduce some artifacts in the fused image. To overcome problems of the DWT, shift invariant discrete wavelet transform (SIDWT) has been introduced. SIDWT image fusion method can be found in Rogers [16]. Image fusion is
Fig. 2.5 A general block diagram of wavelet-based image fusion
28
2 Pixel-Level Image Fusion
also carried out using recent transforms like curvelet transform [17], non-subsampled contourlet [18], multi-resolution singular value decomposition (MSVD) [19], high-order singular value decomposition [20], empirical mode decomposition [21], discrete cosine harmonic wavelet transform (DCHWT) [22], shearlets [23], and so on.
2.1.2.3
Filtering-Based Fusion
Next category in multi-scale fusion is filtering-based techniques. These techniques use image filtering techniques such as edge-preserving decomposition filters (EPD) [24] and non-EPD (e.g., average filter) to perform decomposition process (this filtering concepts will be discussed thoroughly in later sections). In this category of fusion, each source image is decomposed into approximation/base layers containing large-scale variations in intensity and detail layers containing smallscale variations in intensity. Separate fusion rules are employed to combine these decomposed base and detail layers. Finally, these final base and detail layers are combined to generate the fused image. A generic block diagram of filtering based fusion methods is shown in Fig. 2.6. Here, ψ and ψ 1 represent the filtering-based image decomposition and reconstruction process, respectively. First, two source images fI n g2n¼1 are decomposed into base layers containing large-scale variations in intensity and detail layers containing small scale variations in intensity as
Bnkþ1
2 n¼1
2 ¼ Bkn n¼1 w,
where
k ¼ 0, 1, . . . K
Fig. 2.6 A general block diagram of filtering-based image fusion
ð2:6Þ
2.2 Pyramid Image Fusion Method Based on Integrated Edge and Texture Information
29
where Bnkþ1 is the base layer of n-th source image at k + 1 level which depends on its previous level base layer Bkn . B0n represents the n-th input image In. The convolution operation is represented by . w is an image filter. It can be an EPD or non-EPD filter. K is the number of levels. The detail layers Dnkþ1 at present level k + 1 are obtained by subtracting base layers Bkn at previous level kfrom base layers Bnkþ1 at present level k + 1. Dnkþ1 ¼ Bnkþ1 Bkn :
ð2:7Þ
EPD filters decompose source images into base and detail layers while preserving the edge information. Thus, these filters may provide desirable decompositions for the purpose of fusion. Weighted least square filter [25], L1 fidelity using L0 gradient [26], L0 gradient minimization [27], combination of weighted least square and guided image filters [24], guided image filter [28], and cross bilateral filter (CBF) [29] are recently proposed EPD-based fusion methods. In this chapter, new multi-scale image fusion methods are presented to address problems of existing methods. They are: 1. Pyramid image fusion method based on integrated edge and texture information. 2. Image fusion method based on the expected maximum and discrete wavelet frames. 3. Image fusion method based on optimal wavelet filter banks. 4. Anisotropic Diffusion-based Fusion for infrared and visible sensor images (ADF). 5. Two-scale Image Fusion of visible and infrared images using visual saliency detection (TIF). 6. Maximum Symmetric Surround Saliency detection-based multi-focus image Fusion (MSSSF). Out of these six methods, first one is based on pyramid decomposition. Second and third are developed in the wavelet domain. Fourth method is an EPD-based fusion. Fifth and sixth are non-EPD-based image fusion methods.
2.2 2.2.1
Pyramid Image Fusion Method Based on Integrated Edge and Texture Information Background
The simplest method of image fusion is to weighted average the original images. The advantage of this method is that it is simple and has good real-time performance, but at the same time, it has the negative effect of reducing the contrast of the image. The
30
2 Pixel-Level Image Fusion
image pyramid decomposition technique closely resembles the observation of things in the human visual system. The image pyramid decomposition techniques include image wavelet transformation, multi-speed filter representation, and pyramid transformation. Among them, the fusion method based on the change of the pyramid may become the most promising fusion method. Although wavelet exhibits many advantages in image representation, such as orthogonality, direction sensitivity, and noise reduction performance, it outperforms the pyramid decomposition method. However, due to the existence of asymmetric wavelet, its performance in invariance of transformation is worse, which affects the effect of image fusion. Therefore, a new pyramid image fusion method based on pyramid decomposition is developed to avoid the disadvantage of poor shift invariance due to wavelet and to make up for the shortcomings of pyramid image fusion of traditional pyramid structure in extracting texture and edge features.
2.2.2
Fusion Framework
Through the pyramid image fusion method that combines edge and texture information, the fused image quality can be improved and the ideal practical effect can be achieved. The pyramid decomposition of the original image is obtained by the Gaussian filter, considering the linear relationship between the binarized Gaussian filter, the texture extraction filter, and the edge extraction filter. The corresponding coefficients of the texture and the edge image are obtained by using the singular value decomposition method. Using the feature information of each scale image, each layer of the decomposed image is represented. The fusion process is as follows: First, calculate the similarity measures and saliency measures of each pair of texture and edge images of two images, adopt suitable fusion strategies (select maximum or weighted average) according to the size of saliency measures and then get a set of texture and edge pyramid representation of the image under such a fusion strategy, and finally the final fusion of the image is reconstructed.
2.2.3
Pyramid Image Fusion of Edge and Texture Information-Specific Steps
The pyramid image fusion method using the integrated edge and texture information is shown in the flowchart of Fig. 2.7. The decomposition and reconstruction structure used is shown in Fig. 2.8. The specific steps of each part are as follows: 1. Create a structure of pyramid decomposition and reconstruction based on edge and texture information. Decomposing the image into a pyramid representation of texture and edge information needs to satisfy the reconstruction conditions
2.2 Pyramid Image Fusion Method Based on Integrated Edge and Texture Information
31
Fig. 2.7 Diagram of image fusion scheme using integrated edge and texture information
Fig. 2.8 Block diagram of a pyramid decomposition and reconstruction based on texture and gradient feature
32
2 Pixel-Level Image Fusion
ð1 w_ new Þ ¼
25 X
ti T i T i þ
i¼1
4 X
ci Di Di
ð2:8Þ
i¼1
Here, ti and ci are undetermined coefficients, which can be obtained by the singular value decomposition method; Ti is Laws texture extraction filter, and Di is the edge extraction filter. Establishing a pyramid decomposition and reconstruction based on edge and texture information, the main task is to find these undetermined coefficients. The five kernel vectors extracted by Laws texture are shown below: l5 ¼ ½1 4 6 4 1 e5 ¼ ½ 1 2 0 2
1
s5 ¼ ½ 1 0 2 0 1 u5 ¼ ½ 1 2 0 2 1 r5 ¼ ½ 1
4 6
4
ð2:9Þ
1
Ti represents twenty-five 9 9 filters obtained by cross-convolution and selfconvolution of these kernel vectors and then by convolutional expansion. The four edge extraction filters are as follows: 2
0
0
0
3
2
0
7 6 6 d1 ¼ 4 1 2 1 5; d2 ¼ 4 0 0 0 0 0:5 3 2 0:5 0 0 7 6 d4 ¼ 4 0 1 0 5 0 0 0:5
0 1 0
0:5
3
7 0 5; 0
2
0
6 d3 ¼ 4 0 0
1 2 1
0
3
7 0 5; 0
Similar to the texture extraction filter processing, they get four edge extraction filters Di by convolution expansion. Determine the coefficient by the method of singular value decomposition of Eq. (2.8). We get the pyramid decomposition and reconstruction of the structure. 2. Pyramid decomposition process of an image Edge extraction filter and texture extraction filter together constitute 29 feature extraction filter Fl (l ¼ 1, 2, . . ., 29). Pyramid decomposition process can use Eq. (2.10) Lkl ¼ f l ðF l F l Þ ½Gk þ w_ new Gk
ð2:10Þ
Here, Fl and fl are the feature extraction filters and their corresponding ! coefficients, respectively; L kl is the decomposed feature image.
2.2 Pyramid Image Fusion Method Based on Integrated Edge and Texture Information
33
3. Layers of image fusion processing After the image is decomposed into texture-based and edge-based pyramid forms, the fusion method adopts a strategy based on the similarity measure and the saliency measure. Marking the kth layer of the l direction of the image is Lkl. First, the activity measure of pyramid decomposition coefficient of two images is calculated. Suppose, saliency measures of the two decomposition coefficients !
are A p
!
and B p , respectively. Using the window-based measure, the
window size is 3 3, the window template coefficient is 2
1 6 α ¼ 41 1
3 1 1 7 1 8 15 16 1 1
The significance measure is X ! ! S p ¼ αðs, t ÞL kl ðm þ s, n þ t, k, lÞ2 s2S, t2T
The similarity measure is !A !B P αðs, t ÞL kl ðm þ s, n þ t, k, lÞL kl ðm þ s, n þ t, k, lÞ 2 s2S, t2T ! M AB p ¼ ! ! S2A p þ S2B p
If the similarity measure MAB α, then ωA ¼ 12 12
1M AB 1β
, and ωB ¼ 1 ωA.
If the similarity measure MAB < α, then
ωA ¼ 1
if SA > SB
ωA ¼ 0
otherwise
and
ωB ¼ 1 ωA
Finally, the fusion strategy is !F ! L kl p
!A !B ! ! ! ! ¼ ωA p L kl p þ ωB p L kl p
ð2:11Þ
4. Pyramid reconstruction of fused images based on texture and edge information !
Using L kl obtained by Eq. (2.11) to find pyramid inverse transform based on texture and edge which can fuse image. The top image Gn represents the low-pass information of the image, and the part of the image is interpolated to obtain a
34
2 Pixel-Level Image Fusion
2 M 2 M image. This is equivalent to the dimensionality of the underlying texture and edge information image of the low-pass image. Considering the coefficients ti and ci obtained in step 1 are obtained by satisfying the reconstruction conditions, the texture and edge images must be multiplied by this coefficient and then add the interpolation result of the top image (low-pass image) to obtain a low-pass image Gn 1.
2.2.4
Beneficial Effects
After the pyramid expansion of the image based on the edge and texture information, the image can fully reflect the edge and texture features on each scale. Image fusion at each level based on such pyramid decomposition can make the fused image fully reflect the features of the original image, which are necessary for the subsequent image recognition. The pyramid image fusion method based on texture and edge information greatly improves the image quality after fusion, which is of great significance and practical value for the subsequent processing of application system and image display.
2.3 2.3.1
Image Fusion Method Based on the Expected Maximum and Discrete Wavelet Frames Introduction
The simplest fusion approach is taking an average of the input images, pixel by pixel, to create a new composite image. However, this approach is not appropriate, since it creates a blurred image where the details are reduced. Image fusion approaches based on multi-resolution representations are now widely used. The basic idea of these approaches is to perform a multi-resolution transform (MST) on the source images and to construct a composite multi-resolution representation by using an appropriate fusion rule. The fused image is then obtained by taking the inverse multiresolution transform (IMST). The MST approaches include the Laplacian pyramid, the ratio of low-pass pyramid, and the discrete wavelet transform (DWT). Because of an underlying down sampling process, the fusion results of these approaches are shift-dependent. When there is a slight camera or object movement or when there is misregistration of the source images, the performance of those MST approaches [15, 30] will quickly go bad. For the discrete wavelet frame (DWF) transform, each frequency band will have the same size, because the DWF utilizes a dilate analysis filter instead of down sampling of the input signal. The DWF transform [31], which is more suitable for image fusion, has the properties of freedom from aliasing and shift invariance. Theoretically, the DWF decomposition process can be continued until the
2.3 Image Fusion Method Based on the Expected Maximum and Discrete Wavelet Frames
35
low-pass approximation is just the mean information contained within a (1 1) pixel signal. This is not desirable in practice, because at low resolution feature selection becomes less accurate and prone to bias toward one of the input multi-resolution representations. This can cause large-scale ringing artifacts and significantly degrade the fused image’s quality. Thus, decomposition is halted before the theoretical (1 1) pixel minimum is reached. In this case, the low-frequency band image contains only the very large-scale features that form the background of input images and are important for their natural appearance, while the high-frequency band shows the detailed information. For the fusion rule in the low-frequency band, there are two appropriate choices: weighted average and estimation methods [32, 33]. The weighted-average fusion method averages the input low-frequency bands to compose a single low-frequency band. The estimation fusion methods formulate the fusion result in terms of estimated parameters of an imaging model. The fusion results for the low-frequency band are obtained by maximizing their probability likelihood function. For the fusion rule in the high-frequency band, the basic fusion approach is absolute-value maximum selection, i.e., the largest absolute values in the sub-bands are retained for reconstruction. It is a fact that the largest absolute values correspond to features in the image such as edges, lines, and region boundaries. Recently, estimation theory has been used to improve the efficiency of image fusion algorithms. However, these approaches are all based on the assumption that the disturbance satisfies a Gaussian distribution. Since natural images actually follow a Gaussian scale mixture distribution in multi-resolution space, the Gaussian assumption might mistreat the useful signal as disturbance and hence degrade the quality of the fused image. The registered images are decomposed using the DWF transform. The DWF decomposes the source images into multi-resolution representations with both low-frequency coarse information and high-frequency detail information. We assume that there exists an optimal of the source images. Thus, the low-frequency fusion problem is formulated as a parameter estimation problem. The EM algorithm [33] is used to estimate these parameters. A new measure is used to characterize important image information, and value maximum selection is implemented to improve the robustness of the fusion algorithm. The informative importance measure is applied to the high frequencies. The final fused image is obtained by taking the inverse transform of the fused low-frequency and high-frequency multi-resolution representations.
2.3.2
Discrete Wavelet Frame Multi-Resolution Transform
The DWF MST is aliasing-free and translation-invariant. For this reason, when the MST method is used in an image fusion system, better fusion results may be expected. Figure 2.9 illustrates the ith stage of the two-dimensional (2-D) DWF
36
2 Pixel-Level Image Fusion
introduced in [31], where a particular pair of analysis filters h(x) and g(x) corresponding to a particular type of wavelet are used. Here, S0 is the original image. The processing is applied recursively for each decomposition level. Figure 2.9 shows that after one stage of processing, an image is decomposed into four frequency bands: low-low (LL), low-high (LH), high-low (HL), and high-high (HH). They are the coarse information and the vertical, horizontal, and diagonal highfrequency information, respectively. A DWF transform with N decomposition levels will have M ¼ 3N + 1 such bands. For the ith decomposition level, SLL is processed iþ1 iþ1 iþ1 to produce Siþ1 LL , DLH , DHL , DHL . Because the DWF needs a dilate analysis filter instead of down sampling of the input signal, each frequency band will have the
Fig. 2.9 One stage of 2-D DWF decomposition (a) and reconstruction (b)
2.3 Image Fusion Method Based on the Expected Maximum and Discrete Wavelet Frames
37
same size as the source image. On the contrary, for the DWT the sub-band images do not have the same size as the source images. Figure 2.9 shows the DWF transform of the “Lena” image. The transformed image and the source image have the same size. Diþ1 ðnÞ ¼ ½g"2i Si ðk Þ
ð2:12Þ
Siþ1 ðnÞ ¼ ½h"2i Si ðkÞ
ð2:13Þ
where the analysis filters ½h"2i and ½g"2i at level i are obtained by inserting the appropriate number of zeros between the taps of the prototype filters. The reconstruction process is similarly computed via 1-D synthesis filters e hðxÞ and e gð x Þ SðnÞ ¼ e hN S N ð k Þ þ
N X
e gi Di ðk Þ
ð2:14Þ
i¼1
h i where e h¼ e h
"2N
and e gi ¼ ½e g"2i , and where e hðxÞ and e gðxÞ are the synthesis filters
corresponding to the analysis filters h(x) and g(x), respectively.
2.3.3
Basic Structure of the New Fusion Scheme
The basic structure of the new fusion scheme is shown in Fig. 2.10. The multiresolution process can be summarized as follows: 1. Decompose the source images into a multi-resolution representation with both low-frequency coarse information and high-frequency detail information, including the vertical, horizontal, and diagonal high-frequency information SSðAÞ ¼ fD1 ðAÞ, . . . , Di ðAÞ, . . . , DN ðAÞ, SN ðAÞg,
Fig. 2.10 The proposed multi-resolution image fusion scheme
38
2 Pixel-Level Image Fusion
SSðBÞ ¼ fD1 ðBÞ, . . . , Di ðBÞ, . . . , DN ðBÞ, SN ðBÞg, whereDi(A) and Di(B) are the high-frequency detail information of input images: Di ¼ DiLH , DiHL DiHH ; and SN(A) and SN(B) are the low-frequency coarse information: SN ¼ SNLL. The high-frequency detail information yields the detailed feature information of the source images, such as edges, texture, and so on. The low-frequency coarse information shows large-scale features that form the background of the source images. The importance of the high frequencies and the low frequencies mainly depends on the effective information content, which varies with the source images. 2. Estimate optimally the low-frequency coarse information SN(F) for the fused image from SN(A) and SN(B). 3. Apply the feature selection rule to the high-frequency detail information, considering the important image information as Di ðF Þ ¼ RuleðDi ðAÞ, Di ðBÞÞ: 4. Construct the multi-resolution representation of the fused image as SSðF Þ ¼ fD1 ðF Þ, . . . , Di ðF Þ, . . . , DN ðF Þ, SN ðF Þg: 5. Perform the inverse discrete wavelet frame transform (IDWF) to obtain the final fusion image F.
2.3.4
Fusion of the Low-Frequency Band Using the EM Algorithm
Before the proposed EM method is applied to the low-frequency band, the image model should be defined as SN ðX, jÞ ¼ αðX, jÞSN ðF, jÞ þ βðX, jÞ þ εðX, jÞ
ð2:15Þ
where X ¼ A or B represents the source image index; j denotes the pixel location in the low-frequency band; SN(X, j) denotes the low-frequency band of the image X at the jth pixel; SN(F, j) represents the optimally fused low-frequency band at the jth pixel, which is a parameter to be estimated; α(X, j) ¼ 1 or 0 is the sensor selectivity factor; β(X, j) is the bias of the image, which reflects the mean of the low-frequency image; and ε(X, j) is the random noise, which is modeled by a K-term mixture of Gaussian probability density functions (pdfs), that is,
2.3 Image Fusion Method Based on the Expected Maximum and Discrete Wavelet Frames
f εðX,jÞ ðεðX, jÞÞ ¼
k X k¼1
"
λk,x ð jÞ h
1 2πσ 2k,x ð
jÞ
i1=2
εðX, jÞ2 exp 2 2σ k,x ð jÞ
39
# ð2:16Þ
where λk, X( j) are the weights of the K-term Gaussian distribution and σ 2k,X ð jÞ is the variance of the distribution. The image formation model is generally different for each location j. However, to a first-order approximation, the selectivity factor and the parameters of the pdf in Eqs. (2.15) and (2.16) can be considered constant over a small region of neighboring j. Estimation is usually performed in a neighborhood with j as the center. In this study, a neighborhood size L ¼ 5 5 is selected, and j scans all pixels in the low-frequency image. When the boundary is reached, symmetric mirror extension is used to extend the boundary by 2 pixels. Inside the neighborhood, we assume that the model parameters β(X, j), λk,l ð jÞ, σ 2k,l ð jÞ , and α(X, j) are constants. For simplicity, we2 drop the indices j of these parameters in the sequel, writing β(X), λk,l ð jÞ, σ k,l ð jÞ , and α(X). The EM algorithm is a general method for finding the maximum-likelihood estimate of the parameters of an underlying distribution from a given dataset in which the data are incomplete or have missing values. The first step in deriving the EM algorithm is the specification of sets of complete data and incomplete data. For the image formation model Eqs. (2.15) and (2.16), the incomplete dataset Y consists of the following observed data as Y ¼ fSN ðX, lÞ : X ¼ A or B,
l ¼ 1, . . . , Lg
ð2:17Þ
where l indexes the coefficient location in the small region of neighboring j, X denotes the source image A or B, and S(X, l ) denotes the coefficient at location l in region j of the low-frequency band of the source image in A or B. The complete dataset Yc is defined as Y c ¼ fSN ðX, lÞ, kðX, lÞ : X ¼ A or B
l ¼ 1, . . . , Lg
ð2:18Þ
where k(X, l) identifies which term in the Gaussian mixture pdf Eq. (2.16) produces the additive distortion sample in the observation SN(X, l). The common parameter set is F ¼ SN ðX, lÞ, βðX Þ, λk,X , σ 2k,X , αðX Þ; X ¼ A or B; l ¼ 1, . . . , L; k ¼ 1, . . . , K . The number of terms, K, in the Gaussian mixture pdf model Eq. (2.16) is assumed to be fixed; we choose K ¼ 2. A standard technique of the SAGE version of the EM algorithm is used to derive the iterative estimation equations. The elements of the incomplete data Y in Eq. (2.17) are independent identical distribution with marginal pdf.
40
2 Pixel-Level Image Fusion
SN ðX, lÞjF hðSN ðX, lÞjFÞ ¼
k X k¼1
λk,X 2πσ 2k,X
) ~( ½SN ðX, lÞ βðX Þ αðX ÞSN ðF, lÞ2 exp 1=2 2σ 2k,X ð2:19Þ
where the symbol ~ denotes the two sides are independently and identically distributed, and h(SN(X, l )|F) is the marginal probability density function under the condition of common parameter set F for incomplete data Y. The elements of the complete data Yc in Eq. (2.17) are independent with marginal probability density function ðSN ðX, lÞ, k ðX, lÞÞjF hc ðSN ðX, lÞ, k ðX, lÞjFÞ ¼
λk,X 2πσ 2k,X
1=2
) ~( ½SN ðX, lÞ βðX Þ αðX ÞSN ðF, lÞ2 exp 2σ 2k,X ð2:20Þ
where hc(SN(X, j), k(X, l )|F) is the marginal probability density function under the condition of common parameter set F for complete data Yc. The conditional distribution k(X, l)|SN(X, l ), F is hc ðSN ðX, lÞ, kðX, lÞjFÞ
gk,xl ½SN ðX, lÞ ¼
λ
hðSN ðX, lÞjFÞ ¼
exp
½SN ðX, lÞβðX ÞαðX ÞSN ðF, lÞ2
ð2:21Þ
ð2πσ2k,X Þ P λk,X ½SN ðX, lÞβðX ÞαðX ÞSN ðF, lÞ2 exp 1=2 2σ 2 2 k,X k¼1 ð2πσ k,X Þ 1=2
2σ 2 k,X
K
The joint probability density functions for the incomplete datasets and complete datasets are hðYjF Þ ¼
L Y Y
hðSN ðX, lÞjF Þ
and
hc ðY c jF Þ
X¼A, B l¼1
¼
L Y Y
hc ðSN ðX, lÞ, k ðX, lÞjF Þ
X¼A, B l¼1
Each iteration of the EM algorithm involves two steps: the expectation step (E step) and the maximization step (M step). The E step of the EM algorithm performs an average over complete data, conditioned on the incomplete data to produce the cost function
2.3 Image Fusion Method Based on the Expected Maximum and Discrete Wavelet Frames
41
QðF0 jFÞ ¼ E f ln hc ðY c jF0 ÞjY, Fg ¼
L X X
E f ln hc,Xl ðSN ðX, lÞ, kðX, lÞjF0 ÞjSN ðX, lÞ, FÞg
X¼A, B l¼1
¼Bþ
L X K X X X¼A, B l¼1 k¼1
L X K X
(
l¼1 k¼1
ln σ 02 k,X
1 X ln λ0k,X gk,Xl ½SN ðX, lÞ 2 X¼A, B
2 ) SN ðX, lÞ β0 ðX Þ α0ðX ÞSN ðF,lÞ þ gk,Xl ½SN ðX, lÞ σ 02 k,X
ð2:22Þ
where B is a term independent of F0 . The EM algorithm would update the parameter estimates to new values F0 that maximize Q(F0 |F) in Eq. (2.22). This is the M step of the EM algorithm. In order to maximize Q(F0 |F) analytically, we update each parameter one at a time. Because a(X) is discrete, a0 (X) is updated to have the value from the set {0,1, +1} that maximizes Eq. (2.22) with all the other parameters set at their old values: 2 S0N ðF, lÞ ¼ SN ðF, lÞ, λ0k,X ¼ λk,X , and σ 02 k,X ¼ σ k,X . The optimal fused low-frequency 0 coarse information SN ðF, lÞ is obtained from maximizing Eq. (2.22) analytically by solving ∂Q/∂SN(X, l) ¼ 0 using the updated α0(X) and the old for the other parameters values. The update estimate for λ0k,X and σ 02 k,X are obtained from solving ∂Q/ ∂λk, X ¼ 0 and ∂Q/∂σ k, X ¼ 0, respectively, for k ¼ 1,...,K. Initial values for the parameters are required to start the EM algorithm. A simple estimate for SN(F, l) comes from the weighted average for the low frequency of the source images SN ðF, lÞ ¼
X
wX SN ðX, lÞ
ð2:23Þ
X¼A, B
where
P
wX ¼ 1 . The simplest case is using an equal weight for each source
X¼A, B
image, that is, wX ¼ 1/q. A simple initialization for αX ¼ 1 for X ¼ A or B. To model the distortion in a robust way, the distortion is initialized as impulsive. We initialized the distortion parameters with λ1, X ¼ 0.8 and λ2, X¼. . . ¼ λk, X¼0.2/(K 1). Then, we set σ 2k,X ¼ γσ 2k1,X , k ¼ 2, . . . , K, where the choice of σ 21,X is based on an estimate K P of the total variance σ 2X ¼ λk,X σ 2k,X given by k¼1
σ 2X ¼
L X
½SN ðX, lÞ SN ðF, lÞ2 =L
ð2:24Þ
l¼1
where L ¼ h h. We choose γ ¼ 10 so that the initial distortion model is fairly impulsive. The sensor bias is
42
2 Pixel-Level Image Fusion
PL
l¼1 SN ðX, lÞ
βX ¼
ð2:25Þ
L
and is equal to the mean value in the small region. The initialization scheme worked very well for the cases we have studied. We observed that the algorithm in our experiments generally converged in less than five iterations to obtain the fusion result SN(F) in each local analysis window. According to the preceding derivation, we can summarize the standard technique of the SAGE version of the EM algorithm in the following iterative procedure: 1. Compute the condition probability density n o 2 exp ½SN ðX, lÞαðX2σÞS2N ðF, lÞβðX Þ k,X ð Þ gk,x,l ½SN ðX, lÞ ¼ PK λp,i ½SN ðX, lÞαðX ÞSN ðF, lÞβðX Þ2 1=2 exp p¼1 2σ 2p,X ð2πσ2p,X Þ λ 1=2 2πσ 2k,i
ð2:26Þ
2. Update the parameter α(X) by giving it the value from the set {1, 0, 1} that maximizes Q L K 1 X XX 2 X¼A, B l¼1 k¼1 ( ) ½SN ðX, lÞ αðX ÞSN ðF, lÞ βðX Þ2 2 ln σ k,X þ gk,X,l ½SN ðX, lÞ 2σ 2k,X
Q¼
3. Recalculate the condition probability density gk, gk, X, l[SN(X, l)] and the bias β(X) P 0
S ðF, lÞ ¼
PK X¼A,B
k¼1 ½SN ðX, lÞ
P
β 0 ðX Þ ¼
K P
l¼1 k¼1
βðX Þα0 ðX Þ
PK
X¼A,B L P
X, l[SN(X, l )];
gk,X,l ðSN ðX, lÞÞ σ 2k,X
gk,X,l ðSN ðX, lÞÞ 02 k¼1 α ðX Þ σ2
then update
ð2:28Þ
k,X
g ðS ðX, lÞÞ SN ðX, lÞ α0 ðX ÞS0N ðF, lÞ k,X,l σN2 k,i
L P
ð2:27Þ
K P
l¼1 k¼1
gk,X,l ðSN ðX, lÞÞ σ 2k,X
ð2:29Þ
4. Recalculate gk, X, l and β(X) and update the model parameters λk, X, σ 2k,X, and β(X)
2.3 Image Fusion Method Based on the Expected Maximum and Discrete Wavelet Frames
λ0k,X ¼
L 1X g ðS ðX, lÞÞ, L l¼1 k,X,l N L
P
l¼1 σ 02 k,X ¼
k ¼ 1, . . . , SN ðF, lÞ . . . K,
X ¼ A, B
43
ð2:30Þ
2 SN ðX, lÞ α0ðX ÞS0N ðF, lÞ β0ðX Þ gk,X,l ðSN ðX, lÞÞ L P
,
k
gk,X,l ðSN ðX, lÞÞ
l¼1
¼ 1, . . . , K, . . . X ¼ A, B
ð2:31Þ
5. Repeat steps 1–4 using the new parameters S0N ðF, lÞ, α0(X), λ0k,X , σ 0k,X , and β0(X). When all of the parameters have converged to a fixed range in each location of the low-frequency images, we can achieve the optimal fusion result SN(F).
2.3.5
The Selection of the High-Frequency Band Using the Informative Importance Measure
In the pattern-selective fusion scheme, we proposed a new measure to characterize important image information, which models the early retinal process [34]. A quantitative estimation of this important information can be provided by the measure of uncertainty in pixel-to-neighbors interaction. Two sources of such uncertainty must be considered: luminance uncertainty and topological uncertainty PIX ðm, nÞ ¼ C ðm, nÞI ðm, nÞ
ð2:32Þ
where PIX(m, n) indicates the important information of the wavelet frame coefficient, C(m,n) is the absolute value of wavelets frame coefficient (high-frequency band) and also reflects the luminance uncertainty, I(m,n) denotes the topological uncertainty Cðm, nÞ ¼ jDX ðm, nÞj
ð2:33Þ
where DX(m, n) is the high-frequency coefficient DiLH , DiHL , or DiHH , and we omit the superscript i, and the subscripts LH, HL, HH that denote the scale and the directions, respectively, as in Fig. 2.9 and Fig. 2.10. The subscript X is added to indicate the source image A or B, and (m,n) is the location of the high-frequency coefficient, m being the row number and n the column number. Considering the relationship of neighborhood coefficients, the sign of the highfrequency coefficient is decided first
44
2 Pixel-Level Image Fusion
signðm, nÞ ¼ signðDX ðm, nÞÞ
ð2:34Þ
where sign() is 1 if the high-frequency coefficient is equal to or greater than zero, and 0 otherwise. We have I ðm, nÞ ¼ PX ðm, nÞ½1 PX ðm, nÞ
ð2:35Þ
where pX(m, n) is the probability of finding surrounding coefficients in the same state (sign) as the central coefficient at position (m,n)
PX ðm, nÞ ¼
8 > > > > > > >
mþ1 P > > > > > > : 1 i¼m1
8 nP þ1
if signðm:nÞ ¼ 1
ð2:36Þ
signði, jÞ
j¼n1
8
if signðm:nÞ ¼ 0
Two extreme situations, when all neighbors and the central pixel are in the same state (a flat region, p ¼ 1) and when none of the neighbors is in the central pixel state (an outlier, p ¼ 0), have the same intuitively expected result I(m,n) ¼ 0 and consequently PIX(m,n) ¼ 0: there is no “edginess” at that location. Then, we can achieve fusion-selective pattern DF ðm, nÞ ¼
DA ðm, nÞ,
PIA PIB
DB ðm, nÞ
PIA PIB
ðm, nÞ 2 E
ð2:37Þ
where DA denotes the high-frequency wavelet frame coefficient of the source image A, DB denotes the high-frequency wavelet frame coefficient of the source image B, and DF is the wavelet frame coefficient or the high-frequency band of the fused image SS(F). Finally, when the optimal fused low-frequency band SN(F) and the fused high-frequency band Di are obtained, the final fused image F can be achieved by performing the inverse discrete wavelet frame transformation as in Fig. 2.9.
2.3.6
Computer Simulation
Performance measures are essential to determine the possible benefits of fusion as well as to compare results obtained with different algorithms. However, it is difficult to evaluate image fusion result objectively. Therefore, three evaluation criteria are used to quantitatively assess the performance of the fusion. The first evaluation measure is the objective performance metric proposed by Petrovic and Xydeas [35]. It models the accuracy with which visual information is
2.3 Image Fusion Method Based on the Expected Maximum and Discrete Wavelet Frames
45
transferred from the source images to the fused image. Important information is associated with edge information measured for each pixel. Correspondingly, by evaluating the relative amount of edge information that is transferred from the input images to the fused image, a measure of fusion performance is obtained. A larger objective performance metric means that more important information in the source images has been preserved. Mutual information has been proposed for fusion evaluation. Given two images xF and xR, we define their mutual information as QðxR ; xF Þ ¼
L X L X
hR,F ðu, vÞ log 2
u¼1 v¼1
hR,F ðu, vÞ hR ðuÞhF ðvÞ
ð2:38Þ
where xR is the ideal reference; xF is the obtained fused image; hR and hF are the normalized gray-level histograms of xR and xF, respectively; hR,F is the joint graylevel histogram of xR and xF; and L is the number of bins. We select L ¼ 100. Thus, the higher the mutual information between xR and xF, the more likely it is that xF resembles the ideal xR. The mutual information evaluation method may be modified into an objective measure according to [36]. This is the second evaluation measure. The third evaluation measure is the entropy EN ¼
H X
pi ln pi
ð2:39Þ
i
where pi is the probability that the pixel number of a gray level is i. Since multiple-source images are always corrupted by noise or have register errors, the study of the robustness of image fusion system becomes important. We proposes a measure to evaluate the robustness by using the measures described. We call it the relative difference E¼
jQ0 Qj Q
ð2:40Þ
where Q0 denotes the evaluation measure of image fusion when source images are corrupted by noise or have register errors, and Q denotes the measure when source images have no noise effect. Figure 2.11a, b shows a visual image and a millimeter wave (MMW) image employed in concealed weapon detection (CWD). The size of source images is 200 256. We can see that a weapon is concealed on the third person from Fig. 2.11b. The discrete wavelet method, Yang’s statistical fusion method, the traditional DWF method, and the method proposed in this section are applied separately in the fusion process. In all cases, we perform a three-level decomposition. The fusion results are demonstrated in Fig. 2.11c–f. It is clear that the proposed
46
2 Pixel-Level Image Fusion
Fig. 2.11 (a) Visual image and (b) MMW image, and the fusion results employing (c) the 9/7 wavelet, (d) Yang’s statistical fusion method, (e) the DWF-based fusion method, and (f) the proposed method (Sect. 2.5)
method outperforms the others. The values of the evaluation measures are exhibited in Table 2.2. The entropy, the pixel mutual information, and the edge mutual information are increased. The implementation time is also shown in Table 2.2. Figure 2.12b demonstrates a visual image of a scene in which the background is road, grassland, and fence as well as a house. But the person is hardly found to appear on the infrared image in Fig. 2.12a. The same figure displays the results of fusion by the discrete wavelet, Yang’s statistical fusion method, the traditional DWF
2.3 Image Fusion Method Based on the Expected Maximum and Discrete Wavelet Frames
47
Table 2.2 Fusion results of WMM image and visual image Method 9/7 wavelet Yang’s DWF Proposed
Entropy 4.0415 4.2556 4.0110 4.7009
Pixel mutual information 0.8812 1.6135 0.9622 1.6140
Edge mutual information 0.5465 0.6312 0.5670 0.6975
Implementation time (s) 0.3280 188.3900 1.4690 126.4220
Fig. 2.12 (a) IR image and (b) visual image, and the fusion results employing (c) the 9/7 wavelet, (d) Yang’s statistical fusion method, (e) the DWF-based fusion method, and (f) the proposed method
method, and the method proposed in this section. In all cases, we perform a threelevel decomposition. It is clear that the proposed method outperforms the others. The evaluation measures are exhibited in Table 2.3. The entropy and the pixel and edge mutual information go up.
48
2 Pixel-Level Image Fusion
Table 2.3 Fusion results of IR image and visual image Method 9/7 wavelet Yang’s DWF Proposed
Entropy 4.4520 4.9306 4.4711 4.9643
Pixel mutual information 0.9893 1.4593 1.0089 1.5308
Edge mutual information 0.3686 0.4715 0.4308 0.5042
Figure 2.13 shows CT and MRI image fusion. Their fusion results can be found in Table 2.4. Figure 2.14 shows SAR and IR image fusion, and their results are shown in Table 2.5. The filter coefficient of the DWF is as the 9/7 wavelet. The high-frequency and the low-frequency filter coefficients are ½0:0378; 0:0238; 0:1106; 0:3774; 0:8527; 0:3774; 0:1106; ; 0:02380:0378,
½0:0645; 0:0407; 0:4181; 0:7885; 0:4181; 0:0407; 0:0645:
Table 2.6 shows the fusion results of the infrared image and visual image of Fig. 2.12a, b corrupted by noise; the mean of the noise is 0, and the variance is 0.01. To evaluate the robustness of the fusion systems, the discrete wavelet-based fusion method, Yang’s statistical fusion method, the traditional DWF-based fusion method, and the proposed method are used. If the relative difference (RD) is less, the fusion method is more robust. When the source images have registration errors, there are some effects on image fusion system, which will lead to changes in the evaluation measure corresponding to the various fusion methods. Table 2.7 gives the values of the evaluation measure and relative difference when there exists 1-pixel registration error in the source images. Both Tables 2.6 and 2.7 indicate that the proposed fusion method is more robust than the existing methods.
2.3.7
Conclusions
In this section, an image fusion method has been proposed for merging multiple source images based on the expectation maximum (EM) algorithm and discrete wavelet frame. Experimental results indicate that the proposed method out-performs the methods based on the discrete wavelet transform and the existing wavelet frame transforms. We also proposed a relative difference measure to evaluate the robustness of an image fusion system and illustrate that the proposed method is more robust than the existing image fusion methods.
2.3 Image Fusion Method Based on the Expected Maximum and Discrete Wavelet Frames
49
Fig. 2.13 (a) CT image and (b) MRI image, and the fusion results employing (c) the 9/7 wavelet, (d) Yang’s statistical fusion method, (e) the DWF-based fusion method and (f) the proposed method Table 2.4 Fusion results of CT image and MRI image Method 9/7 wavelet Yang’s DWF Proposed
Entropy 3.7655 4.0594 3.7461 4.2133
Pixel mutual information 1.4793 2.3301 1.5958 2.4954
Edge mutual information 0.5458 0.7705 0.6663 0.7715
50
2 Pixel-Level Image Fusion
Fig. 2.14 (a) SAR image and (b) IR image, and the fusion results employing (c) the 9/7 wavelet, (d) Yang’s statistical fusion method, (e) the DWF-based fusion method, and (f) the proposed method
Table 2.5 Fusion results of SAR image and IR image Method 9/7 wavelet Yang’s DWF Proposed
Entropy 4.9012 5.0762 4.9139 5.1471
Pixel mutual information 0.7751 1.2709 0.8583 1.2246
Edge mutual information 0.3957 0.7705 0.5124 0.5623
2.4 Image Fusion Method Based on Optimal Wavelet Filter Banks
51
Table 2.6 Fusion results of noise-corrupted IR image and noise-corrupted visual image Method 9/7 wavelet Yang’s DWF Proposed
Entropy Measure 5.0249 4.9747 5.0176 5.2703
RD 0.1287 0.0089 0.1222 0.0616
Pixel mutual information Measure RD 0.8478 0.1430 1.1087 0.2403 0.9163 0.0917 1.5349 0.0027
Edge mutual information Measure RD 0.2608 0.2925 0.3405 0.2925 0.3142 0.2716 0.5080 0.0075
Table 2.7 Fusion results of IR image and visual image corresponding to a register error of 1 pixel Method 9/7 wavelet Yang’s DWF Proposed
2.4 2.4.1
Entropy Measure 4.4816 4.9258 4.4886 4.9663
RD 0.0067 0.01 0.0039 0.0004
Pixel mutual information Measure RD 1.0305 0.0417 1.4631 0.0026 1.0266 0.0175 1.4349 0.0028
Edge mutual information Measure RD 0.3789 0.0279 0.4828 0.0193 0.4324 0.0023 0.4074 0.0068
Image Fusion Method Based on Optimal Wavelet Filter Banks Introduction
Primary image fusion approaches are based on combining the multi-resolution decomposition coefficients of the source images [30]. The basic idea is to perform a multi-scale transform (MST) on all source images and construct a composite multiscale representation of them. The fused image is then obtained by taking the inverse multi-scale transform (IMST). In multi-scale transform, a filter bank is applied to split the input signal into the low-frequency approximate signal and the highfrequency detail signal. Therefore, the design of digital filter banks becomes key issue in image fusion. Existing filter-bank design methods focus on biorthogonal filter banks with perfect reconstruction [15, 37]. A nice property of perfect-reconstruction filter banks is that they do not introduce any errors by themselves. However, in image fusion, information of the fused image is an incomplete representation of the source images. Hence, small reconstruction errors introduced by filter banks do not necessarily lead to worse fusion quality. In this method, we relax the perfect reconstruction condition in designing filter banks and emphasize the overall fusion performance. We formulate the design problem as a nonlinear optimization problem whose design objectives include both the performance metrics of the overall image fusion, such as the RMSE to reference image, and those of each individual filter such as stopband and passband energies of a low-pass filter. The optimization problem is solved using simulating annealing nonlinear optimization method. At first, filters are designed for each
52
2 Pixel-Level Image Fusion
Fig. 2.15 The generic image fusion scheme
individual training image to maximize the fusion quality. Then, the filter bank with the best performance across training images is selected as the final result.
2.4.2
The Generic Multi-Resolution Image Fusion Algorithm
The basic structure of the new fusion scheme is shown in Fig. 2.15. The multiresolution process can be summarized as follows [16]: 1. The source images are decomposed into a multi-resolution representation with both low-frequency coarse information and high-frequency detail information. SSðAÞ ¼ fD1 ðAÞ, . . . , Di ðAÞ, . . . DN ðAÞ, SN ðAÞg,
ð2:41Þ
SSðBÞ ¼ fD1 ðBÞ, . . . , Di ðBÞ, . . . DN ðBÞ, SN ðBÞg,
ð2:42Þ
where Di(A) and Di(B) are the high-frequency detail information of input images; SN(A) and SN(B) are the low-frequency coarse information. 2. Average SN(A) and SN(B) to obtain the low-frequency coarse information SN(F) for the fused image. 3. Apply feature selection rule on the high-frequency detail information as Di ðF Þ ¼
Di ðAÞ, if Di ðAÞ Di ðBÞ Di ðBÞ, if Di ðAÞ < Di ðBÞ
,
ð2:43Þ
4. Construct the multi-resolution representation of the fused image as SSðF Þ ¼ fD1 ðF Þ, . . . , Di ðF Þ, . . . , DN ðF Þ, SN ðF Þg,
ð2:44Þ
5. Perform inverse discrete wavelet frame transform (IMST) to obtain the final fusion image F. Different combinations of MSD methods yield different
2.4 Image Fusion Method Based on Optimal Wavelet Filter Banks
53
Fig. 2.16 A two-channel filter bank
performance. A careful study of this issue has been lacking. In fact, there has been very little study comparing various MSD-based fusion algorithm, even at a basic level. Here, we attempt to provide such a study. We search for the filter bank in context of an image fusion scheme to maximize fusion quality.
2.4.3
Design Criteria of Filter Banks
A generic two-channel Finite Impulse Response (FIR) filter bank [38] is shown in Fig. 2.16, where H0(z) and H1(z) represent the low-pass and high-pass filters in the analysis bank, respectively, and G0(z) and G1(z) are the synthesis filters. The filter bank consists of an analysis stage and a synthesis stage. According to the Nyquist Theorem, each sub-band signal is then down-sampled by 2 to form the outputs of the analysis filter. These signals can then be analyzed or processed in various ways depending on the application. When no error occurs in the stage, the input signal to the synthesis filter is the same to the output of analysis stage. In the synthesis stage, each sub-band signal is up-sampled by 2 and processed by the synthesis filters G0(z) and G1(z). Finally, the output of the synthesis filter is summed to form the reconstructed signal. The Z-transform of the reconstructed signal, which is a function of x(n) and the filters, is as follows: bx ¼
1 2
½G0 ðzÞH 0 ðzÞ þ G1 ðZ ÞH 1 ðzÞX ðzÞ
þ½G0 ðzÞH 0 ðzÞ þ G1 ðzÞH 1 ðzÞX ðzÞ
¼ T ðzÞX ðzÞ þ SðzÞX ðzÞ
ð2:45Þ
There are three types of undesirable distortions in a filter bank. Aliasing distortions include aliasing caused by subsampling and images caused by up-sampling. The aliasing term S(z)X(z) in (2.45) represents the aliasing distortion. Amplitude distortions represent deviations of the magnitude of T(z) in Eq. (2.45) from unity. Phase distortions represent deviations of the phase of T(z) from the desired phase property such as linear phase. Extensive research has been conducted to remove undesirable distortions of filter banks. Aliasing distortions can be removed by selecting synthesis filters based on analysis filters. Infinite impulse response (IIR) filter removes phase distortions. Perfect reconstruction of the original signal by a filter bank requires S(z) ¼ 0 for
54
2 Pixel-Level Image Fusion
all z and T(z) ¼ czd, where c and d are constants. Therefore, to perfectly reconstruct a signal, the transfer function is a pure delay with no aliasing, no amplitude change, and linear phase. In biorthogonal filter banks, perfect reconstruction and linear phase of filters can be achieved by the proper selection of filter parameters. First, aliasing is removed by choosing the synthesis filters according to the analysis filter. For example, when the two synthesis filters are defined as G0 ðzÞ ¼ H 1 ðzÞ
and
G1 ðzÞ ¼ H 0 ðzÞ,
ð2:46Þ
Therefore, the aliasing term in Eq. (2.46) is 0 and the transfer function becomes T ðzÞ ¼ ðH 0 ðzÞH 1 ðzÞ H 1 ðzÞH 0 ðzÞÞ=2
ð2:47Þ
Due to the pure-delay constraint T(z) ¼ czd for a constant d, the two analysis filters H0(z) and H1(z) should satisfy the following conditions [38, 39]: 1. The sum of the lengths must be a multiple of 4. 2. Both of them are FIR filters and must be of even length or odd length at the same time. When they are even length, filter H0(z) should be symmetric and filter H1(z) antisymmetric. When they are of odd length, both of them should be symmetric. 3. To make the transfer function a pure delay, i.e., T(z) ¼ z d for a constant d, the coefficients of H0(z) and H1(z) need to satisfy a set of equations called the perfect reconstruction (PR) condition. When both the filters H0(z) and H1(z) are symmetric and of odd length, the PR condition is as follows: 2i X 1 N N1 θ i 0 ð1Þk1 h0 ð2i þ 1 kÞh1 ðkÞ, ¼ 2 4 k¼1
¼ 1, 2, . . . ,
for
i
N0 þ N1 4
ð2:48Þ
where h0(n), n ¼ 1, . . ., N0 are coefficients of H0(z) with length N0; h1(n), n ¼ 1, . . ., N1 are coefficients of H1(z) with length N1; and θ(x) ¼ 1 if x ¼ 0, and 0 otherwise.
2.4.4
Optimization Design of Filter Bank for Image Fusion
In this method, the goal is to find the filter bank that produces the best image fusion quality. We use a metrics to measure the performance of image fusion: RMSE (the root mean square error) to reference image. The RMSE is defined as
2.4 Image Fusion Method Based on Optimal Wavelet Filter Banks
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u M X N u 1 X RMSE ¼ t ðRði, jÞ F ði, jÞÞ2 M N i¼1 j¼1
55
ð2:49Þ
where R is the reference image, and F is the fused image; M,N is the size of the two images. When filter length is fixed, filter coefficients form the search space. We apply the design criteria of both the filter bank and individual filters to constrain the search space. First, synthesis filters are decided by analysis filter according to Eq. (2.46). By doing that, the design of filter banks becomes the design of a pair of analysis filters H0(z) and H1(z). Second, the two analysis filters are FIR filters and total length of them is a multiple of 4. In addition, they are of even length or odd length at the same time. When they are of even length, filter H0(z) should be symmetric and filter H1(z) antisymmetric. When they are of odd length, both of them should be symmetric. Note that these are the conditions of a biorthogonal filter bank, except that the PR condition because we focus on the situation of high image fusion property, in which the PR condition is not critical. In addition, individual filters include regularity and smoothness properties in wavelet theory. Regularity requires that the iterated low-pass filter converge to a continuous function. For a low-pass filter H0(z), the regularity of order m requires at least m zeros of its amplitude response H0(ω) at ω ¼ π. Similarly, for a high-pass filter H1(z), the regularity of order m requires at least m zeros of its amplitude response H1(ω) at ω ¼ 0 [40, 41]. It has been shown that regularity of order 2 is good enough. Regularity of order 2 can be achieved by letting H0(ω ¼ π) ¼ 0 for the low-pass filter and H1(ω ¼ 0) ¼ 0 for the high-pass filter, i.e., N0 X
ð1Þnþ1 h0 ðnÞ ¼ 0,
and
n¼1
N1 X
h1 ð nÞ ¼ 0
ð2:50Þ
n¼1
Furthermore, the following condition is necessary to make infinite iteration of a low-pass filter converges N0 X n¼1
h0 ðnÞ ¼ 1,
and
N1 X
ð1Þnþ1 h1 ðnÞ ¼ 1:
ð2:51Þ
n¼1
These conditions are linear equations and can be used to remove variables through substitution. For example, in the design of a pair of symmetric analysis filter of length 9 and 7, we have a total of 5 free variables because the symmetric property reduces the number of variables to 5 + 4 ¼ 9 and the wavelet conditions remove another 4 variables. After constraining the search space, we then search for the filter coefficients that maximizes RMSE to the reference image. Thus, the design problem becomes a
56
2 Pixel-Level Image Fusion
nonlinear optimization problem whose objective is maximizing RMSE. This design formulation is general and can be extended to other design objectives. Empirically, we find that it is difficult to get good filters based on maximizing RMSE. The reason is that the majority of the search space corresponds to filter coefficients that lead to bad fusion results. RMSE itself does not provide enough guidance for the search to escape from bad search regions. To overcome this difficulty, we introduce a second objective that consists of stopband and passband energies of individual filters. The stopband energy and passband energy of a filter can be used to measure the proximity to the ideal step filter. Assume an odd-length symmetric low-pass H0(z) with coefficients h0(n), n ¼ 1,. . .,N0, the Fourier transform of H0(z) is [42]. N 0 1 F 0 ejω ¼ H 0 ðωÞej 2 ω ,
ð2:52Þ
where N 0 1
H 0 ðωÞ ¼ h0 ððN 0 þ 1Þ=2Þ þ
2 X
n¼1
2h0 ðnÞ cos
N0 þ 1 n ω, 2
ð2:53Þ
The stopband energy ES(h0) with stopband cut-off frequency ωs is Zπ E S ðh0 Þ ¼
H 20 ðωÞdω ωs
¼h20 ððN 0 þ 1Þ=2Þðπ ωs Þ 4h0 ððN 0 þ 1Þ=2Þ N0 þ 1 n ωs 2 h0 ð nÞ N0 þ 1 n¼1 n 2 ðNX ðNX 0 1Þ=2 0 1Þ=2 sin ðN 0 þ 1 2nÞωs þ2 h0 ðnÞ2 ðπ ωs Þ 2 N 0 þ 1 2n n¼1 n¼1
ðNX 0 1Þ=2
ðN X 0 1Þ=2 m¼1, m6¼n
sin
h0 ðnÞh0 ðmÞ
sin ðn mÞωs sin ðN 0 þ 1 n mÞωs þ nm N0 þ 1 n m
The passband energy EP(h0) with passband cut-off frequency ωp is
ð2:54Þ
2.4 Image Fusion Method Based on Optimal Wavelet Filter Banks
57
Zωp E S ð h0 Þ ¼
ðH 0 ðωÞ 1 0
Þ2 dω ¼ h0 ðððN 0 þ 1Þ=2Þ 1Þ2 ωp þ 4ðh0 ððN 0 þ 1Þ=2Þ 1Þ N þ1 ðNX 0 1Þ=2 n ωp sin 0 2 h0 ð nÞ N0 þ 1 n¼1 n 2 ðNX ðNX 0 1Þ=2 0 1Þ=2 sin ðN 0 þ 1 2nÞωp þ2 þ2 h0 ðnÞ2 ωp N 0 þ 1 2n n¼1 n¼1
ðNX 0 1Þ=2
h0 ðnÞh0 ðmÞ
m¼1, m6¼n
sin ðn mÞωp sin ðN 0 þ 1 n mÞωp þ nm N0 þ 1 n m
ð2:55Þ
The second objective is formulated to minimize the total stopband and passband energies of both the analysis filters, i.e., min E ðh0 h1 Þ,
h0 h1 ,
ð2:56Þ
where E(h0h1) ¼ ES(h0) + ES(h1) + EP(h0) + EP(h1) is the total energy of both the filters. In this objective, filters proximating the ideal step filters are being sought. The overall objective is a combination of the main objective, maximizing RMSE and entropy, and the secondary objective. min ωðRMSEðA, R, F ÞÞ þ ð1 ωÞEðh0 , h1 Þ,
h0 , h1 ,
ð2:57Þ
where 0 w 1 is a constant weight; A is the image fusion algorithm; R is the reference image, F is fused image. Note that when w ¼ 1, the objective becomes just maximizing RMSE. The nonlinear optimization problem in Eq. (2.57) can be solved by various nonlinear optimization methods such as random search methods, simulated annealing, and evolutionary algorithms. Simulated annealing was used in our experiments. The filters designed based on Eq. (2.57) depend on the training images: R and F. Different training images lead to different filter design. Instead of obtaining the best filter bank for each training images, we want a filter bank that performs well on many images. To achieve this goal, symmetric improvement ratio is used to measure how well a filter bank performs across multiple images. Symmetric improvement ratio [43] is a normalization method that avoids anomalies in inconsistent orderings of two hypotheses due to the choice of the baseline hypothesis. For two hypotheses,
58
2 Pixel-Level Image Fusion
A and B, with performance values (A1, . . ., Am) and (B1, . . ., Bm) respectively on m test cases, the symmetric improvement ratio S is defined as follows: Si ¼
Ai =Bi 1
if
Ai Bi
1 Ai =Bi
if
Ai Bi
S¼
m 1 X S m i¼1 i
,
ð2:58Þ ð2:59Þ
In the regular improvement ratio Ai/Bi , degradations are between 0 and 1, whereas improvements are between 1 and infinity. Consequently, when improvement ratios are averaged, degradations carry less weight than improvements. This problem does not exist in symmetric improvement ratio because it puts equal weight on degradations and improvements. In our method, the symmetric improvement ratio is used to aggregate the performance values of a filter bank on different training images. The filter bank that has the best symmetric improvement ratio is selected as the final result.
2.4.5
Experiments
In our experiments, we have applied the optimization- and generalization-based method to design biorthogonal filter banks for image fusion and obtained promising results. Our goal is to find generalizable filter banks that perform better than the best solutions. Then the following five sets of images used in the experiments, shown in Fig. 2.17, consist of five pair of natural images, and their reference images were picked arbitrarily as the training images used in the optimization and generalization phases of our method. The other images were used to test the performance of the final result. In the experiments, the performance of the filter banks was evaluated using the generic image fusion method. In the design of 9/7 biorthogonal filter banks, the widely used Antonini 9/7 filter bank was used as the baseline solution. In the optimization phase, either the multi-starts of the ASA was applied. The performance of filter banks was evaluated. In calculating the passband and stopband energies, the passband cut-off frequency ωp was set to 1 and the stopband cut-off frequency ωs was set to π 1. In our experiments, the initial points of ASA were always the baseline solution. It usually found better solutions. In the first experiment of designing 9/7 biorthogonal filter banks, ASA was used as the optimization method and Eq. (2.57) with w ¼ 1, 0.8, 0.2, 0.1, and 0.02 were the objectives. The best result was obtained when w ¼ 0.02. The initial point was always the baseline solution. To speed up the convergence of ASA, we restricted the search ranges on the variables. In the design of 9/7 filter banks, the search ranges of the first three coefficients of H0(z) were limited to [0.01, 0.1], [0.1, 0.01], and
2.4 Image Fusion Method Based on Optimal Wavelet Filter Banks
59
Fig. 2.17 Image pairs and ideal fusion result used to optimize filter banks. The top four images are part of “clock” and the bottom images are “cameraman.” They are different focus images, and the right side is their reference images
60
2 Pixel-Level Image Fusion
Fig. 2.18 Amplitude responses of the pair of analysis filters in Antonini (9/7) filter bank [44] (left) and those in the new (9/7) filter bank obtained by the proposed method (right)
[0.12, 0.11], respectively. The search ranges of the first two coefficients of H1(z) were limited to [0.01, 0.1] and [0.1, 0.01], respectively. The execution of ASA was limited to 10,000 function evaluations in each run, which corresponds to 50 h for a pair of images. The amplitude responses of Antonini 9/7 filter bank and the new filter bank designed by our method are shown in Fig. 2.18. Since Antonini filters were designed to have high-order regularity, their amplitude responses are very smooth. On the other hand, our new filters have sharper transitions. An image fusion example is shown in Fig. 2.19. Input images in Fig. 2.19a, b are fused using the proposed filter bank with generic fusion scheme into the fused image in Fig. 2.19d. Image fusion result using Antonini 9/7 filter bank is shown in Fig. 2.19c. Tables 2.8, 2.9 and 2.10 compare the performance of our new 9/7 filter bank with that of Antonini (9/7) filter bank with that of Antonini 9/7 filter bank by using generic fusion scheme. Tables 2.8, 2.9 and 2.10 show that the new filter bank improves Antonini filter bank on most images. For some images such as “disk” and “cameraman,” the improvement is significant. Although the new filter bank was designed based on five training images, it performs well on other images and under different image fusion scheme. Furthermore, in the experiments, we found that the entropy of the image is increased. The evaluation measure entropy is defined as EN ¼
H X
pi ln pi
ð2:60Þ
i
where pi is the probability when the pixel number of gray leveli. The measure indicates how much information of an image is contained. So the larger the number is, the better the quality of the fused image. Table 2.8 demonstrate the comparison of our new 9/7 filter bank with Antonini (9/7) filter bank.
2.4 Image Fusion Method Based on Optimal Wavelet Filter Banks
61
Fig. 2.19 The fusion results of “disk” images, (a) and (b) are the input multi-focus images, (c) is the fused image using the optimal filter bank and (d) is the fused image using the Antonini 9/7 filter bank Table 2.8 The fusion results of “disk” images
Filter selection 9/7 wavelet Optimal 9/7 wavelet
RMSE 12.3364 12.2813
Entropy 4.9880 4.9965
Table 2.9 The fusion results of “clock” images
Filter selection 9/7 wavelet Optimal 9/7 wavelet
RMSE 10.6446 10.0970
Entropy 4.8247 4.8393
Table 2.10 The fusion results of “cameraman” images
Filter selection 9/7 wavelet Optimal 9/7 wavelet
RMSE 14.2595 13.9386
Entropy 4.8842 4.8907
62
2.4.6
2 Pixel-Level Image Fusion
Conclusion
The design of digital filter banks is important because filter banks have been used in many applications, including moderns, data transmission, speech and audio coding, as well as image fusion. This method is generally enough to be applicable to the design of other types of filter banks, such as multi-rate and multi-band filter banks for various applications. In the experiment, the fusion result using optimal filter banks outperforms the fusion result using traditional 9/7 filter banks.
2.5
Anisotropic Diffusion-Based Fusion of Infrared and Visible Sensor Images (ADF)
In applications such as military, navigation, and concealed weapon detection, different imaging systems such as CCD/VI and forward looking infrared (FLIR)/ IR are used to monitor a targeted scene. In these applications, details of the target are very difficult to detect from VI image alone because of low visual contrast. But they can be easily obtained from IR image. VI image is able to provide background details such as vegetation, texture, area, and soil. It is a tedious task to derive meaningful information by consecutively looking at multiple images. For better understanding of the scene, useful information need to be integrated from these multiple images in a single image. As discussed in Chap. 1, a fusion algorithm can integrate information from these complementary images to give a composite image by transferring useful information of source images in a single image with less fusion loss and artifacts. As briefed in Sect. 2.1, many MSD fusion techniques such as pyramid and wavelet-based methods have been developed in the perspective of fusion objective. It has been observed that edge-preserving filters can extract more salient information such as lines and edges than that of the pyramid and wavelet methods. They can also generate appreciable fused images compared to that of remaining MSD methods (pyramid and wavelets). These advantages of EPD motivated us to explore an edgepreserving decomposition process called anisotropic diffusion for multi-scale fusion. As shown in Fig. 2.20, anisotropic diffusion is used to decompose source images into approximation and detail layers. Final detail and approximation layers are calculated with the help of Karhunen–Loeve transform (KL-transform) and linear superposition, respectively. A fused image is generated from the linear combination of final detail and approximation layers. The advantages of this ADF method are outlined as follows: 1. This method is very effective and easy to implement. 2. It transfers most of the information from source images to the fused image. 3. Fusion loss is very less.
2.5 Anisotropic Diffusion-Based Fusion of Infrared and Visible Sensor Images (ADF)
Base layers
63
Base layer fusion
Source images
Anisotropic Diffusion
Linear Combination
Fused Image
⊕ Detail layers
Detail layer fusion
Fig. 2.20 Schematic diagram of the ADF method
4. Fusion artifacts introduced in the fused image are almost negligible. 5. Computational time is less.
2.5.1
Anisotropic Diffusion
Diffusion is a kind of smoothing, in the context of image processing. It is classified as isotropic diffusion and anisotropic diffusion. In isotropic, the process of smoothing will be done isotopically all over the image without bothering about the edge information. However, anisotropic diffusion process (Perona and Malik [45]) will smooth a given image at homogeneous regions while preserving non-homogeneous regions (edges) using partial differential equations (PDE). It overcomes drawbacks of the isotropic diffusion. The isotropic diffusion uses inter-region smoothing. So edge information is lost. In contrast, the anisotropic diffusion uses intra-region smoothing to generate coarser resolution images. At each coarser resolution, edges are sharp and meaningful. For better understanding, isotropic and anisotropic diffusion processes are demonstrated in Fig. 2.21. Figure 2.21a is the original Lena image. Figure 2.21b, c are the isotropic and anisotropic diffused images, respectively. From the figure, it can be observed that isotropic diffused image is totally blurred. However, in the anisotropic diffused image, smoothing process is done in required regions while preserving the edge information. Anisotropic diffusion comes under EPD techniques, whereas average and Gaussian filtering comes under non-EPD-based filtering techniques. The anisotropic diffusion equation uses flux function to control the diffusion of an image I as
64
2 Pixel-Level Image Fusion
Fig. 2.21 Demonstration of isotropic and anisotropic diffusion process
I t ¼ cðx, y, t ÞΔI þ ∇c:∇I,
ð2:61Þ
where c(x, y, t) is the flux function or rate of diffusion, Δ is the Laplacian operator, ∇ is the gradient operator, and t is the time or scale or iteration. Equation (2.61) is also called as heat equation. Forward-time central-space (FTCS) scheme is used to solve this equation. The solution for this PDE is directly given by
t t I tþ1 i,j ¼ I i,j þ λ cN ∇N I þ cS ∇S I þ cE ∇E I þ cW ∇W I i,j
ð2:62Þ
In Eq. (2.62), I tþ1 i,j is the coarser resolution image at t + 1 scale which depends on the previous coarser scale image I ti,j. λ is the stability constant satisfying 0 λ 1/4. Superscript and subscripts are applicable to all terms enclosed in the square bracket. ∇N, ∇S, ∇E, and ∇W are the nearest-neighbor differences in north, south, east, and west directions, respectively. They are defined as ∇N I i,j I i1,j I i,j , ∇S I i,j I iþ1,j I i,j , ∇E I i,j I i,jþ1 I i,j ,
ð2:63Þ
∇W I i,j I i,j1 I i,j : Similarly, cN, cS, cE, and cW are conduction coefficients or flux functions in north, south, east, and west directions, respectively.
2.5 Anisotropic Diffusion-Based Fusion of Infrared and Visible Sensor Images (ADF)
ctNi,j ¼ g ð∇I Þtiþð1=2Þ,j ¼ g ∇N I ti,j , ctSi,j ¼ g ð∇I Þtið1=2Þ,j ¼ g ∇S I ti,j , ctEi,j ¼ g ð∇I Þti,jþð1=2Þ ¼ g ∇E I ti,j , ctWi,j ¼ g ð∇I Þti,jð1=2Þ ¼ g ∇W I ti,j :
65
ð2:64Þ
In Eq. (2.64), g(.) is a monotonically decreasing function with g(0) ¼ 1. Perona and Malik [45] suggested two functions as mentioned below: gð∇I Þ ¼ eððk∇I k=K Þ Þ , 2
gð∇I Þ ¼
1þ
1
2 :
k∇I k K
ð2:65Þ ð2:66Þ
These functions offer a trade-off between the smoothing and edge preservation. The first function is useful if the image consists of high-contrast edges over low-contrast edges. The second function is preferred if the image consists of wide regions over smaller regions. Both the functions consist of a free parameter k. This constant k is used to decide the validity of a region boundary based on its edge strength. In subsequent discussions, the anisotropic diffusion for a given image I is denoted as aniso(I).
2.5.2
Anisotropic Diffusion-Based Fusion Method (ADF)
Various steps involved in the ADF method are briefed here below and discussed in detail in the following sub-sections. 1. 2. 3. 4.
Extract base and detail layers from source images using the anisotropic diffusion. Fuse detail layers based on the KL transform. Fuse base layers using the weighted superposition. Add final detail and base layers.
2.5.2.1
Extracting Base and Detail Layers
Consider source images fI n ðx, yÞgNn¼1 of size p q. All images are assumed to be co-registered. These images are passed through the edge-preserving smoothing anisotropic diffusion process for obtaining base layers.
66
2 Pixel-Level Image Fusion
Fig. 2.22 Base and detail layer decomposition of kayak dataset. (a) IR image, (b) VI image, (c) base layer of IR image, (d) base layer of VI image, (e) detail layer of IR image, (f) detail layer of VI image
Bn ðx, yÞ ¼ anisoðI n ðx, yÞÞ,
ð2:67Þ
where Bn(x, y)is the n-th base layer and aniso(In(x, y)) represents the anisotropic diffusion process on n-th source image. Required discussion on the anisotropic diffusion process can recall from Sect. 2.5.1. Detail layers can be obtained by subtracting base layers from source images. Dn ðx, yÞ ¼ I n ðx, yÞ Bn ðx, yÞ
ð2:68Þ
Base and detail layer decomposition of a kayak dataset is shown in Fig. 2.22.
2.5.2.2
Detail Layer Fusion Based on KL Transform
Detail layers are fused with the help of KL transform. This technique transforms correlated components into uncorrelated components. It provides a compact representation for the given dataset. Other names for KL transform are Hotelling
2.5 Anisotropic Diffusion-Based Fusion of Infrared and Visible Sensor Images (ADF)
67
transform or principle component analysis. KL transform basis vectors depend on the dataset unlike fast Fourier transform (FFT) and discrete cosine transform (DCT). The algorithm used for the purpose of detail layer fusion is as follows: 1. Let us take two detail layers D1(x, y) and D2(x, y) corresponding to two input images I1(x, y) andI2(x, y). Arrange these detail layers as column vectors of a matrix X. 2. Find the covariance matrix CXX of X by considering each row as an observation and each column as a variable. ξ 1 ð 1Þ ξ 2 ð 1Þ 3. Calculate eigenvalues σ 1, σ 2 and eigenvectors ξ1 ¼ and ξ2 ¼ ξ 1 ð 2Þ ξ 2 ð 2Þ of CXX. 4. Compute uncorrelated components KL1 and KL2 corresponding to the large eigenvalue (σ max ¼ max (σ 1, σ 2)). If ξmax is the eigenvector corresponding to the σ max, then KL1and KL2 are given by ξ ð1Þ KL1 ¼ Pmax , ξmax ðiÞ
ξ ð2Þ KL2 ¼ Pmax , ξmax ðiÞ
i
i
ð2:69Þ
5. The fused detail layer D is given by Dðx, yÞ ¼ KL1 D1 ðx, yÞ þ KL2 D2 ðx, yÞ:
ð2:70Þ
The generalized expression for N-detail layers is
Dðx, yÞ ¼
N X
KLn Dn ðx, yÞ:
ð2:71Þ
n¼1
2.5.2.3
Base Layer Fusion
Here, base layer information in each image is chosen by assigning proper weights wn to them. Final base layer is calculated as Bðx, yÞ ¼
N X n¼1
where ∑wn ¼ 1 and 0 wn 1.
wn Bn ðx, yÞ,
n ¼ 1, . . . N
ð2:72Þ
68
2 Pixel-Level Image Fusion
If w1 ¼ w2 . . . ¼ wN ¼ N1 , then this process represents the average of base layers. The final detail layer is obtained by using KL transform in Eq. (2.71), and the final base layer is obtained by using weighted superposition in Eq. (2.72).
2.5.2.4
Super Position of Final Detail and Base Layers
The fused image F is given by a simple linear combination of final base layer B and detail layer D. F ¼BþD
2.5.3
ð2:73Þ
Experimental Setup
This section presents the image database on which experiments are carried out, other fusion methods that are used for comparison, objective fusion metrics, and free parameter analysis of the ADF algorithm.
2.5.3.1
Image Database
ADF method is applied on various IR and VI image pairs. As shown in Fig. 2.23, qualitative and quantitative analyses of the ADF method are done for ten image datasets. These datasets consist of nine gray-scale image pairs and one color image
Fig. 2.23 VI and IR image dataset. (a) Battle field, (b) tree, (c) forest (d) industry, (e) kayak, (f) garden, (g) gun, (h) pedestrian, (i) traffic, (j) house image pairs
2.5 Anisotropic Diffusion-Based Fusion of Infrared and Visible Sensor Images (ADF)
69
pair. These image datasets are referred with their names in later part of the discussion as shown in Fig. 2.23. Qualitative analysis is presented for two image pairs (kayak and gun). However, quantitative analysis is done for all ten image datasets.
2.5.3.2
Fusion Metrics
Gradient-based fusion objective performance characterization (Petrovic and Xydeas [46]) is considered for the analysis of the ADF method. Fusion quantification is done by considering the information contribution by each sensor, fusion gain or fusion score (QXY/F), fusion loss (LXY/F), fusion artifacts (NXY/F). Metrics QXY/F, LXY/F, and NXY/F represent complementary information. Their summation should be unity. However, it is not satisfying for all types of XY=F source images. So, modified fusion artifacts (N k ) are considered to make sumXY/F mation unity. For better performance, Q value should be high and LXY/F, NXY/F, XY=F values should be low. Nk
2.5.3.3
Methods for Comparison
This method is compared with seven existing multi-scale fusion methods. Out of these, three (GRAD) [10], FSD [11], and RATIO [14]) are pyramid-based methods, two (SIDWT [47] and MSVD [19]) are transform domain-based methods. Recently proposed EPD-based method (CBF [29]) is also considered for comparison. For all of these methods, default parameter settings are adopted.
2.5.3.4
Effect of Free Parameters on the ADF Method
Here, with the help of objective fusion metrics, the effect of free parameters on the ADF method is assessed. In this, individual source images are filtered using the anisotropic diffusion process. The degree of smoothing depends on parameters constant k, number of iterations t, and stability constant λ. The effect of these parameters on the ADF is analyzed by considering average fusion metric values calculated over 10 image datasets. Effect of k on ADF algorithm is shown in XY=F Fig. 2.24. This figure demonstrates the behavior of QXY/F, LXY/F, NXY/F, and N k XY/F with respect to the change of k. It can be observed that Q looks almost constant XY=F XY/F XY/F . Metrics N and N k are almost constant for after k ¼ 20. It is also true for L any value of k. Similar analysis can be done for t. In Fig. 2.25, as the number of XY XY=F XY iterations t increases Q =F , L =F , NXY/F, and N k change but for t 10, they are almost constant. Similarly, as demonstrated in Fig. 2.26, above λ ¼ 0.1, fusion metrics show consistent performance. Hence, from the above analysis, an optimal performance of the ADF method is obtained for free parameters t ¼ 10, λ¼0.15, and
70
2 Pixel-Level Image Fusion
Fig. 2.24 Fusion score, fusion loss, and fusion artifacts with respect to k
Fig. 2.25 Fusion score, fusion loss, and fusion artifacts with respect to t
k ¼ 30. Equation (2.65) is considered for g(.) function. Weights w1 ¼ w2 ¼ 0.5 are taken for base layer fusion.
2.5.4
Results and Analysis
In this section, a comparative analysis of various image fusion algorithms with ADF algorithm for ten image datasets is presented in terms of visual quality and fusion metrics.
2.5 Anisotropic Diffusion-Based Fusion of Infrared and Visible Sensor Images (ADF)
71
Fig. 2.26 Fusion score, fusion loss, and fusion artifacts with respect to λ
2.5.4.1
Qualitative Analysis
In this section, qualitative analysis of fused images of different methods along with the ADF method is presented. Figure 2.27 displays the visual quality of various fusion methods of kayak image dataset. Figure 2.27a, b are VI and IR images. These images provide idea about sea shore, persons, ship, and sky. But no single image gives complete information about the scene. By the fusion task, one can get complete idea about the scene. Figure 2.27c–i present the results of different methods. RATIO method introduces artifacts into the fused image. CBF introduces gradient reversal artifacts in the fused image. GRAD, FSD, MSVD, and SIDWT images are fine, but these fused images are not able to provide entire information of the scene. However, fused image of the ADF method gives clear idea of the scene (seashore, ship, sky, and persons) without introducing extra information into the combined image compared to other methods. Qualitative comparison of a gun dataset is presented in Fig. 2.28. Figure 2.28a, b show VI and MMW images. Information about three persons with some object holding by the middle person can be found in VI image while information about the weapon conveyed by the MMW image. But no single image conveys entire information about the scene like which person concealed the pistol. By the fusion process one can understand the scene. Figure 2.28c–i show fused images of various methods. RATIO fused image does not provide the pistol information. GRAD, FSD, MSVD, and SIDWT methods are able to integrate source image information into the fused image, but these images are not visually significant to provide more complementary details of source images. Even though CBF image is visually good, it produces artifacts in the background of the combined image. However, ADF method transfers all the necessary image content of input imagery into the combined image with
72
2 Pixel-Level Image Fusion
Fig. 2.27 Comparison of visual quality of fused images of various methods for a kayak dataset. (a) VI image, (b) IR image, (c) GRAD, (d) FSD, (e) RATIO, (f) SIDWT, (g) MSVD, (h) CBF, (i) ADF
Fig. 2.28 Comparison of visual quality of fused images of various methods for a gun dataset. (a) VI image, (b) MMW image, (c) GRAD, (d) FSD, (e) RATIO, (f) SIDWT, (g) MSVD, (h) CBF, (i) proposed ADF
reduced artifacts. One could say that the third person from the left concealed the weapon under his shirt from the fused image.
2.5.4.2
Quantitative Analysis
Quantitative analysis is done with the help of Petrovic metrics. Figure 2.29 displays the bar chart comparison of average fusion metric values calculated over 10 image datasets of various image fusion methods. As demonstrated in Fig. 2.29, ADF
2.5 Anisotropic Diffusion-Based Fusion of Infrared and Visible Sensor Images (ADF)
73
Fig. 2.29 Quantitative analysis of various MSD fusion methods. (a) Fusion score QXY/F, (b) fusion XY=F loss LXY/F, (c) fusion artifacts NXY/F, (d) modified fusion artifacts N k
method is giving superior performance in all fusion metrics. Note that these metrics summation should tend to unity.
2.5.4.3
Computational Time
A comparison of computational time of various image fusion methods for IR-VI datasets is shown in Table 2.11. The experiments are carried out on a computer with 4 GB RAM and 2.27 GHz CPU. Experimentation on each dataset is conducted for 25 times, and the average of 25 computational times is considered for better accuracy. The average computational time calculated over 10 image datasets is considered. ADF computational time is less than the average computational time of SIDWT, MSVD, and CBF and is more than that of remaining methods GRAD, FSD, and RATIO. It is found that ADF method is giving superior results than the pyramid, transform domain, and edge-preserving methods for IR-VI images in terms of visual quality and Petrovic metrics. Next, we discuss another MSD fusion technique based on two-scale image decomposition and saliency extraction.
74
2 Pixel-Level Image Fusion
Table 2.11 Average computational time in seconds of various fusion methods
Time (s)
2.6
Method GRAD 0.3274
FSD 0.1210
RATIO 0.1749
SIDWT 0.7681
MSVD 1.4042
CBF 73.64
ADF 4.5875
Two-Scale Image Fusion of Infrared and Visible Images Using Saliency Detection
Here, an image fusion method based on saliency detection and two-scale image decomposition is presented to fuse IR and VI images which is shortly represented as TIF. This method is beneficial because of the visual saliency extraction process introduced in the fusion algorithm. It can highlight the saliency information of source images very well. A new weight map construction process based on visual saliency is developed. This process is able to integrate the visually significant information of source images into the fused image. Unlike, most of the multi-scale fusion techniques, this method uses two-scale image decomposition for base and detail layers. In ADF method, anisotropic diffusion used for this purpose. However, in TIF method, to reduce the computational time further, an average filter (non-EPD filter) is used instead of anisotropic diffusion. Hence, it is computationally fast and efficient. TIF method is tested on several image pairs and is evaluated qualitatively by visual inspection and quantitatively using objective fusion metrics. Outcomes of the TIF method are compared with the state-of-the-art multi-scale fusion techniques. Results reveal that this method outperforms the existing methods including the ADF method.
2.6.1
Two-Scale Image Fusion (TIF)
TIF method needs three steps to perform fusion: image decomposition/analysis, fusion, and image reconstruction/ synthesis. Decomposition is done by using an average filter (mean filter) to obtain base and detail layers. These decomposed base and detail layers are fused using different fusion rules. Fused image is reconstructed from the final base and detail layers. Block diagram of this method is shown in Fig. 2.30.
2.6.1.1
Two-Scale Image Decomposition
Consider two co-registered source images ϕ1(x, y) and ϕ2(x, y) of same size. These source images are decomposed into base layers containing large-scale variations and detail layers containing small variations. For this purpose, a simple mean filter is employed. This operation is represented by
Saliency maps
Image
Detail layers
Two-scale image decomposition
Fig. 2.30 Schematic diagram of the TIF method
Source images
Base layers
Weight maps
Final detail layer
Final base layer
Two-scale image reconstruction
Image
Fused image
2.6 Two-Scale Image Fusion of Infrared and Visible Images Using Saliency Detection 75
76
2 Pixel-Level Image Fusion
ϕB1 ðx, yÞ ¼ ϕ1 ðx, yÞ μðx, yÞ,
ð2:74Þ
ϕB2 ðx, yÞ ¼ ϕ2 ðx, yÞ μðx, yÞ,
ð2:75Þ
where ϕB1 and ϕB2 are the base layers corresponding to source images ϕ1 and ϕ2, respectively. The mean filter of square window size wμ is given by μ and represents the convolution. Detail layers are extracted by subtracting base layers from source images. B ϕD 1 ðx, yÞ ¼ ϕ1 ðx, yÞ ϕ1 ðx, yÞ,
ð2:76Þ
B ϕD 2 ðx, yÞ ¼ ϕ2 ðx, yÞ ϕ2 ðx, yÞ,
ð2:77Þ
D where ϕD 1 and ϕ2 are detail layers. These detail layers are fused with the help of saliency maps which will be discussed in the next section.
2.6.1.2
Visual Saliency Detection
Visual saliency detection or saliency detection (SD) [48] is the process of detecting or identifying regions such as persons or objects or pixels which are more significant than their neighbors. These salient regions drag more human visual attention compared to other regions present in the scene. In this method, introduced a simple yet an effective saliency map detection algorithm to extract visual saliency of VI and IR images for the purpose of fusion. As shown in Fig. 2.31, a mean filter is applied on each source image to reduce intensity variations between a pixel and its neighbors. This filter is a linear or non-EPD filter which performs smoothing operation on entire image without bothering about the edge information. A median filter is applied on each source image to remove noise or artifacts. This is a nonlinear filter. Hence, it performs smoothing operation on each source image while preserving the edge information. Saliency map of each source image is calculated by taking the difference of mean and median filtering outputs because the difference of these filtering outputs can highlight the saliency information such as edges and lines which are more significant than its neighbors. A norm of the difference is calculated since we are interested only in the magnitude of differences. As shown in Fig. 2.31, the saliency map ξ of a source image ϕ is given as ξðx, yÞ ¼ ϕμ ðx, yÞ ϕη ðx, yÞj,
ð2:78Þ
where || is the absolute value. ϕμ is the output of a mean filter of a square window of size wμ. ϕη is the output of a median filter of (square window) size wη. This saliency detection algorithm can be extended to color image processing. Visual saliency extraction of color image ϕ is given as
2.6 Two-Scale Image Fusion of Infrared and Visible Images Using Saliency Detection
77
Fig. 2.31 Schematic diagram of the visual saliency detection
ξðx, yÞ ¼ ϕμ ðx, yÞ ϕη ðx, yÞk,
ð2:79Þ
where kk is the L2 norm or Euclidean distance. This equation can be expanded as ξðx, yÞ ¼ de ϕμ , ϕη ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi r 2 2 G B B ¼ ϕRμ ðx, yÞ ϕRη ðx, yÞÞ2 þ ϕG μ ðx, yÞ ϕη ðx, yÞÞ þ ϕμ ðx, yÞ ϕη ðx, yÞÞ , ð2:80Þ
2
ϕRμ
3
2
ϕRη
3
6 G7 6 G7 R G 7 6 7 where deis the Euclidean distance, ϕμ ¼ 6 4 ϕμ 5 and ϕη ¼ 4 ϕη 5. Here, ϕμ , ϕμ , and
ϕBμ ϕBη filtering output, respectively. Similarly, of median filtering output, respectively. For source images ϕ1 and ϕ2, the saliency maps are denoted as ξ1 and ξ2, respectively. To understand this better, the analysis of battlefield images is presented in Fig. 2.32. The VI and IR images are illustrated in Fig. 2.32a, b and corresponding saliency maps are presented in Fig. 2.32c, d, respectively.
ϕBμ are the R, G, and B components of mean B ϕRη , ϕG η , and ϕη are the R, G, and B channels
78
2.6.1.3
2 Pixel-Level Image Fusion
Weight Map Construction
VI and IR images provide complementary information, i.e., information available in one image may not be available in other image. For example, Fig. 2.32a is the VI image which provides information about the battlefield, but it fails to provide information about the existence of a person. However, IR image in Fig. 2.32b is able to provide information about the person, but battlefield information is insufficient. Visually significant information need to be integrated from both source images into a single image. This can be done by assigning proper weights (pixels with insignificant information are assigned low weightage and pixels with significant information are assigned more weightage) to detail layers of the source images. This is because HVS is sensitive to the detail layer information of an image compared to its base layer information which is evident from Fig. 2.32. Making use of this observation, a new weight map construction technique is developed based on saliency information to fuse detail layers. The weight maps are calculated by normalizing the saliency maps because normalization brings the range to [0, 1] as well as satisfy the requirements relevant to the information. These weight maps are given by ψ 1 ðx, yÞ ¼
ξ1 ðx, yÞ , ξ1 ðx, yÞ þ ξ2 ðx, yÞ
ð2:81Þ
Fig. 2.32 Saliency maps and weight maps of a battlefiled dataset. (a) and (b) are source images, (c) and (d) are saliency maps, (e) and (f) are weight maps
2.6 Two-Scale Image Fusion of Infrared and Visible Images Using Saliency Detection
ψ 2 ðx, yÞ ¼
ξ2 ðx, yÞ , ξ1 ðx, yÞ þ ξ2 ðx, yÞ
79
ð2:82Þ
D where ψ 1 and ψ 2 are weight maps corresponding to ϕD 1 and ϕ2 calculated from ξ1 and ξ2, respectively. As shown in Fig. 2.32e and Fig. 2.32f, these weights ψ 1 and ψ 2 are complementary to each other, i.e., ψ 1 + ψ 2 ¼ 1. Therefore, if this weight map construction process assigns more weight to a pixel with significant information in one detail image, then it assigns less weight to the pixel with insignificant information in another detail image at the same pixel location and vice versa.
2.6.1.4
Detail Layer Fusion
Significant information from detail layers are integrated into a single image by D multiplying weight maps ψ 1 and ψ 2 with detail layers ϕD 1 and ϕ2 , respectively. D ϕD ðx, yÞ ¼ ψ 1 ðx, yÞϕD 1 ðx, yÞ þ ψ 2 ðx, yÞϕ2 ðx, yÞ,
ð2:83Þ
where ϕD is the final detail layer.
2.6.1.5
Base Layer Fusion
An average fusion rule is employed to combine base layers. ϕB ðx, yÞ ¼
1 B ϕ ðx, yÞ þ ϕB2 ðx, yÞ , 2 1
ð2:84Þ
where ϕB is the final base layer.
2.6.1.6
Two-Scale Image Reconstruction
Finally, the fused image is constructed from the linear combination of final base and detail layers. γ ðx, yÞ ¼ ϕB ðx, yÞ þ ϕD ðx, yÞ:
ð2:85Þ
80
2 Pixel-Level Image Fusion
φ 1( , )
1(
, )
1(
, )
1(
, )
Proposed method
( , )
Proposed method
( , )
Proposed method
( , )
γ( , )
2( , )
φ (, ) 2
2(
, )
2(
, )
Fig. 2.33 Color image fusion process
2.6.1.7
Color Image Fusion
TIF algorithm can also be applied on color images by performing fusion process on individual color channels (red, blue and green). Finally, fused image is reconstructed by concatenating these fused color channels. As shown in Fig. 2.33, red channel of the fused image can be obtained by applying the TIF fusion method on red channels of source images. Similarly, the same procedure can also be applied on remaining channels. By concatenating these color channels, fused image can be calculated.
2.6.2
Experimental Setup
The experimental setup discusses regarding image database on which experiments are carried out, other fusion methods that are used for comparison, objective fusion metrics, and free parameter analysis of the TIF algorithm.
2.6.2.1
Image Database
TIF method is applied on various IR and VI image pairs. As shown in Fig. 2.24, qualitative and quantitative analysis of this method is done for ten image datasets. Qualitative analysis is presented for two image pairs (battlefield and house). However, quantitative analysis is done for all ten image pairs.
2.6 Two-Scale Image Fusion of Infrared and Visible Images Using Saliency Detection
2.6.2.2
81
Other Methods for Comparison
TIF method is compared with seven state-of-the-art multi-scale fusion methods. Out of these, three (GRAD [10], FSD [11], and RATIO [14]) are pyramid-based methods. Two (SIDWT [47] and MSVD [19]) are transform domain-based methods. Recently proposed EPD-based method (CBF [22]) is also considered. Along with these, ADF method in Sect. 2.5 is also considered for comparison. For all of these methods, default parameter settings are adopted.
2.6.2.3
Objective Fusion Metrics
As discussed in Sect. 2.5.3.2, petrovic metrics [46] are considered for quantitative analysis
2.6.2.4
Parameter Analysis
As discussed in Sect. 2.6.1.2, in the TIF, saliency map extraction is done with the help of mean and median filters. If the window sizes of these filters vary, then the performance of the method also changes. So, the performance need to be analyzed by changing window sizes wμ and wη. This can be done with the help of Petrovic fusion metrics [46]. This analysis is performed on ten image pairs as shown in Fig. 2.34, and the average of these metric values are considered for accuracy. While examining the effect of wμ on the TIF, median filter window size is considered as wη ¼ 3. While inspecting the influence of wη on TIF, wμ ¼ 35 is taken. As shown in Fig. 2.34a, when wμ is increased from 3 to 35 the performance of the algorithm increases. However, when it is further increased beyond 35, it can be observed that the performance gradually decreases. The effect of wn on the TIF is shown in Fig. 2.34b. As demonstrated in the figure, at wn ¼ 3, TIF is giving maximum performance. If wn further increases, then the performance decreases. Hence, parameters for this experiment are considered as wμ ¼ 35, wη ¼ 3.
Fig. 2.34 Effect of free parameters wμ and wη on the performance of the proposed TIF method. (a) Effect of wμ on the performance when wη ¼ 3, (b) effect of wn on the performance when wμ ¼ 35
82
2.6.3
2 Pixel-Level Image Fusion
Results and Analysis
TIF is evaluated both qualitatively (by visual display) and quantitatively (by measuring fusion metrics) to verify its effectiveness. Computational time analysis of the TIF in comparison with other methods is also carried out.
2.6.3.1
Qualitative Analysis
The visual quality comparison of battlefield dataset are presented in Fig. 2.35. In this figure, (a) is VI image, (b) is IR image, (c)–(i) are the outputs of GRAD, FSD, RATIO, SIDWT, MSVD, CBF, and ADF methods, respectively, and (j) is the resultant image of the TIF. In Fig. 2.35a, VI image provides the field information, but there is no sign of existence of a person because of low lighting condition. Figure 2.35b is the IR image. It is able to provide information about a person. But, we need total information in a composite image. As shown in Fig. 2.35, outputs of GRAD (Fig. 2.35c), FSD (Fig. 2.35d), SIDWT (Fig. 2.35f), and MSVD (Fig. 2.35g) provide visually less information. RATIO image (Fig. 2.35e) is visually distorted. Blocking artifacts can be observed in CBF result (Fig. 2.35h). ADF (Fig. 2.35i) resultant image is good. However, the TIF method is able to integrate effectively both battlefield and person information from VI from IR images into the fused image (Fig. 2.35j) compared to the remaining methods. Figure 2.36a is a VI image. It gives the visual information about the scene. Figure 2.36b is the IR image, which gives information which is not available in VI. A fused image has to contain the complementary information in it. In Fig. 2.36, GRAD, FSD, SIDWT, MSVD, and ADF are not able to give complete information about the scene. RATIO and CBF outputs are not clear because of artifacts whereas output of the TIF method provides complementary information of source images very well.
Fig. 2.35 Comparison of visual quality of fused images of various methods for battlefield dataset. (a) VI image, (b) IR image, (c) GRAD, (d) FSD, (e) RATIO, (f) SIDWT, (g) MSVD, (h) CBF, (i) ADF, (j) TIF
2.6 Two-Scale Image Fusion of Infrared and Visible Images Using Saliency Detection
83
Fig. 2.36 Comparison of visual quality of fused images of various methods for a house dataset. (a) VI image, (b) IR image, (c) GRAD, (d) FSD, (e) RATIO, (f) SIDWT, (g) MSVD, (h) CBF, (i) ADF, (j) TIF
2.6.3.2
Quantitative Analysis
Sometimes visual quality alone is not sufficient to judge the effectiveness of a fusion algorithm. It has to be assessed both qualitatively and quantitatively. So far, visual quality analysis is discussed. Now, the TIF method will be assessed using Petrovic fusion metrics on ten VI and IR image datasets. Bar chart comparison of various methods along with this method is presented in Fig. 2.37. Average Petrovic metric
Fig. 2.37 Quantitative analysis of TIF method in comparison with other MSD methods. (a) Fusion XY=F score QXY/F, (b) fusion loss LXY/F, (c) fusion artifacts NXY/F, (d) modified fusion artifacts N k
84
2 Pixel-Level Image Fusion
Table 2.12 Average computational time comparison Time (s)
GRAD 0.2918
FSD 0.1077
RATIO 0.1544
SIDWT 0.6916
MSVSD 1.1710
CBF 56.7938
ADF 4.0574
TIF 0.4846
values calculated over 10 image datasets are considered for quantitative analysis. From this bar chart comparison, it can be observed that TIF is outperforming all the existing methods in terms of fusion metrics.
2.6.3.3
Computational Time
Any image fusion method is preferred to have less computational time along with the better visual quality and fusion metric values for real-time implementation. The computational time comparison of various methods for different image pairs is presented in Table 2.12. Experiments are conducted on a computer with 2.27 GHz CPU and 4 GB RAM. The average computational time is calculated over 10 image pairs as shown in Table. 2.12. From the average computational time, one can conclude that this method has more execution time than GRAD, FSD, RATIO methods and has less computational time than SIDWT, MSVD, CBF, and ADF methods. From the above simulations and analysis, it can be observed that when compared to the existing methods, TIF is capable of transferring most of the useful source information into the fused image with less information loss and less fusion artifacts. It also consumes considerably less computational time thus preferable in real-time implementation. So far, we have discussed about ADF and TIF methods in Sects. 2.5 and 2.6. These methods are developed for VI and IR images. In next section, a fusion algorithm for multi-focus images will be discussed.
2.7
Multi-Focus Image Fusion Using Maximum Symmetric Surround Saliency Detection (MSSSF)
In digital photography, two or more objects of a scene cannot be focused at the same time as discussed before. If one object is focused, then information about other objects may be lost and vice versa. Multi-focus fusion (MFF) is the process of generating an all-in-focus image from several out-of-focus images. In this section, a recently proposed multi-focus image fusion algorithm [49] based on visual saliency and weight map construction is presented. This method is very advantageous because the saliency map used here can highlight the saliency information present in source images with well-defined boundaries. A weight map construction process based on saliency information is developed. This process can identify focus and defocus regions present in the source image very well. It can also integrate only
2.7 Multi-Focus Image Fusion Using Maximum Symmetric Surround Saliency. . .
85
focused region information into the fused image. Now, we discuss the visual saliency detection process-based maximum symmetric surround.
2.7.1
Maximum Symmetric Surround Saliency Detection (MSSS)
SD [50] is the process of detecting and highlighting visually significant regions which drags the human visual attention compared to other regions present in the scene. SD is useful in many applications such as object segmentation, object recognition, and adaptive compression. However, here SD is utilized for multifocus fusion. A good SD method exhibits properties mentioned below. • • • •
It should highlight largest salient regions than smallest regions. It should uniformly highlight salient regions. Boundaries need to be well-defined. It should ignore texture or noise artifacts.
In this view, so far many SD methods have been proposed. SD algorithms (Frintrop et al. [51]; Harel et al. [52]; Hou and Zhang [43]; Itti et al. [53]; Ma and Zhang [54]) produce low-resolution saliency maps. Some SD algorithms (Harel et al. [52]; Hou and Zhang [43]; Ma and Zhang [54]) generate ill-defined object boundaries. The saliency maps of these methods are not useful to generate weight maps for the purpose of fusion because of these limitations. Achanta et al. [50] proposed a frequency-tuned SD method which overcomes the limitations of the existing saliency methods. This SD method is able to generate uniformly highlighted full-resolution saliency maps with well-defined boundaries. However, it fails if the image consists of complex background or large salient regions. To solve these problems, Achanta and Susstrunk [48] proposed another SD algorithm called maximum symmetric surround SD method which can highlight the salient object along with well-defined boundaries. MSSS algorithm is preferred for fusion over other SD methods because: 1. It gives saliency maps with full-resolution and well-defined boundaries. 2. Salient regions are calculated based on symmetric surrounds. Hence, it can effectively highlight salient regions in images with complex background. In multi-focus images, focused regions provide more visual information than defocused regions because focused regions are more salient than defocused regions. So, these salient regions (visually significant regions) should be identified from source images using SD algorithms. Hence, we adopted a SD algorithm for visual saliency extraction for the purpose of fusion. At first, Achanta et al. [50] proposed a frequency-tuned saliency detection algorithm to utilize almost all low-frequency content and most of the high-frequency content to obtain perceptually good saliency maps with full resolution. This saliency
86
2 Pixel-Level Image Fusion
map is obtained by taking the Euclidean distance between the average of an image Iμ and each pixel of the Gaussian blurred version If(u, v) of the same image. Sðu, vÞ ¼ I μ I f ðu, vÞ,
ð2:86Þ
where S(u, v) is the saliency map at a pixel location (u, v). Gaussian blur of size 3 3 is chosen to get If(u, v). However, this method fails when the source image contains complex background. It highlights the background along with the salient object since this method treats entire image as the common surround for all pixels in the image. Treating entire image as common surround is not desirable. To detect a pixel at the center of the salient object, it should contain small lower cut-off frequency. High lower cut-off frequency is required to detect a pixel near boundary. So, as we approach image boundaries, we should use local surround regions instead of common surround regions for detecting a pixel. This can be achieved by defining surround symmetry around the center pixel of its own sub-image near the boundary. This process can increase the lower cut-off frequency. MSSS saliency map detection (Achanta and Susstrunk [48]) of an image I of width w and height h is defined as Sss ðu, vÞ ¼ I μ ðu, vÞ I f ðu, vÞ:
ð2:87Þ
Here, Iμ(u, v) is the average of the sub-image at central pixel (u, v) and it is given as I μ ðu, vÞ ¼
uþu 1 X0 A i¼uu
0
vþv X0
I ði, jÞ,
ð2:88Þ
j¼vv0
where u0,v0 represent off-sets and A indicates the area. u0 ¼ min ðu, w uÞ,
ð2:89Þ
v0 ¼ min ðv, h vÞ, Sub-images obtained using Eqs. (2.88) and (2.89) are maximum symmetric surround regions for a given central pixel. In multi-focus images, focused regions provide visually more information than defocused regions. In other way, focused regions are more salient than defocused regions. So, salient regions need to be detected from these out-of-focus images using SD algorithms. It can be observed that the MSSS saliency detection algorithm is able to extract salient regions of multi-focus images. The multi-focus datasets used for simulations are shown in Fig. 2.38 and their corresponding saliency maps are displayed in Fig. 2.39. The process of saliency extraction using MSSS algorithm is denoted as
2.7 Multi-Focus Image Fusion Using Maximum Symmetric Surround Saliency. . .
87
Fig. 2.38 Multi-focus image datasets: (a) flower, (b) book, (c) bookshelf, (d) clock, (e) aircraft, (f) pepsi, (g) bottle, (h) parachute, (i) leopard, (j) flower wage
Fig. 2.39 Saliency maps of multi-focus image datasets: (a) flower, (b) book, (c) bookshelf, (d) clock, (e) aircraft, (f) pepsi, (g) bottle, (h) parachute, (i) leopard, (j) flower wage
S ¼ MSSSðI Þ,
ð2:90Þ
where I is the input image and S is the output saliency map. Using this saliency map, a new multi-focus image fusion algorithm is developed in the following section.
2.7.2
MSSS Detection-Based Image Fusion (MSSSF)
The key idea of this method is illustrated in the block diagram (Fig. 2.40). It is summarized in the following steps:
88
2 Pixel-Level Image Fusion
Base layers
Base layer fusion
Source images Two-scale image decomposition
Two-scale image reconstruction
Detail layers
Detail layer fusion
MSSS Saliency Detection
Weight map
Fused Image
Construction
Fig. 2.40 General block diagram of the MSSSF algorithm
Decompose source images into base and detail layers using an average filter. Calculate saliency maps of each source image using MSSS detection algorithm. Compute weight maps from extracted saliency map of each source image. Scale detail layers with these weight maps and combine all the scaled detail layers to obtain final detail layer. E. Compute final base layer by taking the average of all base layers. F. Take linear combination of final base and detail layers to get the fused image.
A. B. C. D.
2.7.2.1
Two-Scale Image Decomposition
Let us consider co-registered source images fI n ðx, yÞgNn¼1 of same size p q. These N-images are decomposed into base layers Bn containing large-scale variations. Bn ¼ I n A,
ð2:91Þ
where A is an average filter of size w. The convolution operation is indicated by . The detail layers Dn containing small-scale variations are obtained by subtracting base layers Bn from their corresponding source images In. Dn ¼ I n Bn :
ð2:92Þ
Base and detail layer decomposition for a flower dataset is shown in Fig. 2.41.
2.7 Multi-Focus Image Fusion Using Maximum Symmetric Surround Saliency. . .
89
Fig. 2.41 Base and detail layers of a flower dataset: (a) left focused flower image, (b) and (c) are base and detail layers of (a), (d) right focused flower, (e) and (f) are base and detail layers of (d)
2.7.2.2
Saliency Detection Algorithm
Saliency information from out-of-focus images is extracted using MSSS detection algorithm [50]. This algorithm is reviewed in Sect. 2.7.1. The process of saliency extraction from source images In is represented as Sn ¼ MSSSðI n Þ,
ð2:93Þ
where Sn is the saliency map of n-th source image. Saliency maps of a flower dataset are shown in Fig. 2.42.
Fig. 2.42 Saliency maps of a flower dataset. (a) and (b) are saliency maps of left and right focused images
90
2 Pixel-Level Image Fusion
Fig. 2.43 Weight maps of a flower dataset (red rectangle indicates the focused region; green rectangle indicates the defocused region). (a) and (b) are weight maps of left and right focused images
2.7.2.3
Weight Map Calculation
In digital photography, each multi-focus image provides information about a particular focused region. We need to integrate all focused regions into a single fused image. This can be done by properly choosing weight map of each source image. These weight maps should highlight the focused and defocused regions of the source images. Figure 2.43 shows the weight maps of the flower dataset. These weight maps represent the complementary information, i.e., focused and defocused regions. For example, as shown in Fig. 2.43, weight maps of focused and defocused regions are highlighted in red and green rectangles, respectively, representing the complementary information. These weight maps are calculated by normalizing the saliency maps as follows: wi ¼
Si , N P Sn
8i ¼ 1, 2, . . . N
ð2:94Þ
n¼1
2.7.2.4
Detail Layer Fusion
Here, detail layers are scaled with the help of weight maps wn calculated from MSSS detection algorithm, and these scaled detail layers are combined to get the final detail layer D as shown below: D¼
N X n¼1
wn Dn :
ð2:95Þ
2.7 Multi-Focus Image Fusion Using Maximum Symmetric Surround Saliency. . .
91
Fig. 2.44 Visual display of final base, detail layers and fused image. (a) Final detail layer, (b) final base layer, (c) MSSSF fused image
2.7.2.5
Base Layer Fusion
Final base layer is generated by taking the average of base layers as B¼
2.7.2.6
N 1 X B : N n¼1 n
ð2:96Þ
Two-Scale Image Reconstruction
Fused image is synthesized by taking the linear combination of B and D. F ¼ B þ D:
ð2:97Þ
The reconstruction of the flower dataset is displayed in Fig. 2.44.
2.7.3
Experimental Setup
In this section, image database, existing MFF methods, and free parameter analysis (effect on the performance of MSSSF for change of w) will be discussed.
2.7.3.1
Image Database
Experiments are conducted on several multi-focus image datasets. Results and analysis for 10 image datasets, viz. flower, leopard, bookshelf, clock, air craft, pepsi, bottle, parachute, book, and flower wage, are presented. These image datasets are shown in Fig. 2.38. These are available at https://sites.google.com/view/ durgaprasadbavirisetti/home.
92
2.7.3.2
2 Pixel-Level Image Fusion
Fusion Metrics
Assessment of a fusion method is really a challenging task when there is no ground truth available. Any image fusion algorithm can be assessed qualitatively by visual inspection and quantitatively by measuring the fusion metrics. Petrovic fusion metrics [46] are considered for the quantitative analysis.
2.7.3.3
Methods for Comparison
MSSSF method is compared with spatial domain MFF method BGS [6] and multiscale MFF methods DCT with variance measure (DCT + var) (Haghighat et al. [42]), DCT with variance measure and consistency verification (DCT + var. + cv) (Haghighat et al. [55], DCHWT (Shreyamsha Kumar [22]), DWT with absolute measure (DWT + AB) [20], CBF (Shreyamsha Kumar [29]), and GFF [28]. Default parameter settings are adopted for all of these methods.
2.7.3.4
Effect of Free Parameters on the MSSSF Method
The MSSS method uses average filter to extract base and detail layer information from source images. Size of the average filter affects the performance of this algorithm. So, size of the average filter has to be tuned for the best performance. XY=F Petrovic metrics QXY=F , LXY=F , N XY=F , N k are plotted in Fig. 2.45 by averaging over 10 multi-focus datasets. It can be observed that w ¼ 5 is the best choice.
2.7.4
Results and Analysis
The aim of any MFF method is to obtain a properly focused image with less execution time. Performance of the MFF algorithm can be verified qualitatively by visual inspection, quantitatively using fusion metrics, and by measuring the computational time.
2.7.4.1
Qualitative Analysis
Qualitative analysis is presented for two multi-focus image datasets, namely flower, book as shown in Fig. 2.38. Visual quality of the MSSSF along with various MFF methods is presented in Figs. 2.46 and 2.47. For these datasets, zoomed portion of a particular region of the fused image is also presented for in-depth qualitative analysis. In Figs. 2.46 and 2.47, subfigures (a)–(h) gives the visual display of our MSSSF, GFF, CBF, DCHWT, DWT + AB, BGS, DCT + var. + cv, and DCT + cv,
2.7 Multi-Focus Image Fusion Using Maximum Symmetric Surround Saliency. . .
93
Fig. 2.45 Effect of w on the MSSSF method
respectively. The subfigures (i)–(p) illustrate the zoomed portions of (a)–(h), respectively. In Fig. 2.46, fused images for the flower dataset are displayed. As emphasized in Fig. 2.46a with red rectangles, MSSSF is able to generate more focused regions (such as flower wage and switch). For example, switch portions (Fig. 2.46m, n) of DWT + AB and BGS methods are blurred. Remaining methods (Fig. 2.46j–l, o, p) are visually good, but they are not able to provide more focused switch region. However, as shown in Fig. 2.46i, MSSSF method is able to get more sharpened switch region compared to remaining methods. Figure 2.47 displays the visual quality of the results on book dataset. Zoomed portions of various MFF algorithms for book dataset highlighted in Fig. 2.47a–h (with red rectangles) are displayed in Fig. 2.47i–p. Zoomed portions of GFF (Fig. 2.47j), CBF (Fig. 2.47k), DCHWT (Fig. 2.47l), DCT + var. + cv (Fig. 2.47o), and DCT + var. (Fig. 2.47p) are visually good. But, these methods are not able provide visually good images. Zoomed portions of DWT + AB (Fig. 2.47m) and BGS (Fig. 2.47n) are visually distorted. However, as shown in Fig. 2.47i, MSSSF is giving sharpened information about text, objects, and book present in the fused image. Hence, MSSSF integrates both foreground and background regions of source images in the fused image effectively compared to the state-of-the-art methods.
94
2 Pixel-Level Image Fusion
Fig. 2.46 Comparison of visual quality of fused images of various methods for a flower dataset. (a) Proposed MSSSF, (b) GFF, (c) CBF, (d) DCHWT, (e) DWT + AB, (f) BGS, (g) DCT + var. + cv, (h) DCT + var. Subfigures (i)–(p) show the zoom version of switch portion of (a)–(h), respectively
2.7.4.2
Quantitative Analysis
It is difficult to judge the performance of a fusion algorithm by visual inspection alone. Fusion algorithm has to be evaluated both qualitatively and quantitatively for better assessment. In the previous section, qualitative analysis has been performed. Here, quantitative analysis is conducted by evaluating the Petrovic fusion metrics XY=F (average QXY=F , LXY=F , N XY=F , N k metrics calculated over 10 datasets) of other MFF algorithms (GFF (Li et al. [28]), CBF (Shreyamsha Kumar [29]), BGS (Tian et al. [6]), DCT + var. (Haghighat et al. [42], DCT + var. + cv (Haghighat et al. [55], DCHWT (Shreyamsha Kumar [22], DWT + AB [20] along with the MSSSF algorithm. Bar chart comparison of fusion metrics for various MFF algorithms is shown in Fig. 2.48a–d. From this bar chart comparison, it is easy to observe that, in all fusion metrics, MSSSF got superior values.
2.7 Multi-Focus Image Fusion Using Maximum Symmetric Surround Saliency. . .
95
Fig. 2.47 Comparison of visual quality of fused images of various methods for a book dataset. (a) Proposed MSSSF, (b) GFF, (c) CBF, (d) DCHWT, (e) DWT + AB, (f) BGS, (g) DCT + var. + cv, (h) DCT + var. Subfigures (i)–(p) show the zoom version of a particular focused region of (a)–(h), respectively
2.7.4.3
Computational Time
Computational time in seconds for various MFF methods is shown in Table 2.13. These experiments are performed on a computer with a 4 GB RAM and 2.2 GHz CPU. This time is calculated by averaging the run time of 10 multi-focus image datasets. As shown in the table, MSSSF computational time is less than BGS, DCHWT, DWT + AB, and CBF and more than that of DCT + var., DCT + var. + cv, and GFF methods. From the above results and analysis, one can conclude that MSSSF method is integrating more focused and sharpened regions of source images. MSSSF quantitative analysis against several state-of-the art MFF methods using Petrovic fusion metrics proves that it is outperforming the existing MFF methods. Its computational time is promising for the real-time implementation.
96
2 Pixel-Level Image Fusion
Fig. 2.48 Quantitative analysis of various MFF methods along with the MSSF (average fusion metrics over 10 image datasets are considered). (a) Fusion score QXY/F, (b) fusion loss LXY/F, (c) XY=F fusion artifacts NXY/F, (d) modified fusion artifacts N k
2.7.4.4
Summary
In this chapter, six new image fusion algorithms were introduced based on authors’ contribution after discussing the short comings of the state-of-the-art traditional pyramid and wavelet-based image fusion methods. New pyramid, discrete wavelet frame, optimal wavelet filter bank, and edge-preserving decomposition-based image fusion methods were introduced in Sects. 2.2–2.7 respectively. In Sect. 2.2, a new pyramid image fusion method based on pyramid decomposition was developed to avoid the disadvantage of poorly shift invariance due to wavelet and to make up for the shortcomings of pyramid image fusion of traditional pyramid structure in extracting texture and edge features. The MST approaches include the Laplacian pyramid, the ratio of low-pass pyramid, and the discrete wavelet transform (DWT) are shift-dependent. When there is a slight camera or object movement or when there is misregistration of the source images, the performance of those MST approaches will be degraded. To address this issue, a new image fusion method based on DWT was introduced in Sect. 2.3.
Time (s)
BGS 22.654
DCT + var 2.2423
DCT + var. + cv 2.5194
DCHWT 6.6002
DWT + AB 17.7984
Table 2.13 Average computational time comparison of various MFF methods along with the MSSF method CBF 66.8385
GFF 2.4904
MSSSF 2.8708
2.7 Multi-Focus Image Fusion Using Maximum Symmetric Surround Saliency. . . 97
98
2 Pixel-Level Image Fusion
Existing filter-bank design methods depends on biorthogonal filter banks with perfect reconstruction. Usually, these approaches do not introduce any errors themselves because of perfect reconstruction. However, in image fusion, information of the fused image is incomplete representation of the source images. Hence, small reconstruction errors introduced by filter banks do not necessary lead to worse fusion quality. In this method (Sect. 2.4), the perfect reconstruction condition in designing the optimal wavelet filter banks is relaxed to emphasize the overall fusion performance. In the ADF method (Sect. 2.5), a new fusion algorithm was developed using anisotropic diffusion and KL transform. Base and detail layers were extracted using the anisotropic diffusion. Base layers were averaged to get the final base layer. Final detail layer was calculated by using KL transform. Finally, both base and detail layers were added to get the fused image. ADF method was evaluated qualitatively by visual inspection, quantitatively using Petrovic fusion metrics, and by measuring its computational time. ADF performance was compared with state-of-the-art MSD fusion methods. Results justify that this method can generate visually good fused images with best fusion metric values and appreciable computational time. In ADF method, base and detail layers were extracted using anisotropic diffusion. Even though ADF used for two-scale image decomposition process for the purpose of fusion, it has taken ten iterations (t ¼ 10) to achieve this decomposition process for better results. Hence, this process has some computational burden. To reduce the computational time further and improve the visual quality of fused images, we presented TIF algorithm (Sect. 2.6) based on two-scale decomposition using an average filter and visual saliency detection. In this, a new visual saliency detection algorithm was developed to extract visually significant regions of source images. A new weight map construction process was developed based on visual saliencies. In the detail layers, this process was able to assign more weights to visually significant information and less weights to visually insignificant information. Base layers were averaged to get the final base layer. Fused image was obtained by combining the both final base and detail layers. This algorithm was assessed with help of Petrovic fusion metrics. Its performance was compared with state-of-the-art MSD fusion techniques along with the ADF method. From the results and analysis, we understood that TIF method is able to generate visually good fused images with better fusion performance compared to that of remaining methods. Its computational time is more than pyramid based methods and less than that of remaining methods including the ADF method. In addition to multi-sensor fusion, we also developed a new fusion algorithm (MSSSF) (Sect. 2.7) for multi-focus images based on saliency detection and two-scale image decomposition. A new maximum symmetric surround (MSSS) saliency detection was explored for the purpose of fusion. This SD algorithm was able to highlight visually significant regions. A new weight map construction based on MSSS saliency detection was implemented. This SD algorithm was able to identify focused and defocused regions of source images. Hence, we developed a new image fusion algorithm (MSSSF) which can integrate only visually significant and focused regions of source images into a single image. This algorithm was
References
99
applied on various multi-focus images. Results reveal that MSSSF method is very reliable than that of the remaining MFF algorithms.
References 1. A.A. Goshtasby, S. Nikolov, Image fusion: advances in the state of the art. Inform. Fusion 2(8), 114–118 (2007) 2. J. Yonghong, Fusion of landsat TM and SAR images based on principal component analysis. Remote Sens. Technol. Appl. 13(1), 46–49 (2012) 3. N. Mitianoudis, T. Stathaki, Pixel-based and region-based image fusion schemes using ICA bases. Inform. Fusion 8(2), 131–142 (2007) 4. T.M. Tu, S.C. Su, H.C. Shyu, P.S. Huang, A new look at IHS-like image fusion methods. Inform. Fusion 2(3), 177–186 (2001) 5. W. Huang, Z. Jing, Evaluation of focus measures in multi-focus image fusion. Pattern Recogn. Lett. 28(4), 493–500 (2007) 6. J. Tian, L. Chen, L. Ma, W. Yu, Multi-focus image fusion using a bilateral gradient-based sharpness criterion. Optics Commun. 284(1), 80–87 (2011) 7. R. Shen, I. Cheng, J. Shi, A. Basu, Generalized random walks for fusion of multi-exposure images. IEEE Trans. Image Process. 20(12), 3634–3646 (2011) 8. M. Xu, H. Chen, P.K. Varshney, An image fusion approach based on Markov random fields. IEEE Trans. Geosci. Remote Sens. 49(12), 5116–5127 (2011) 9. A. Akerman III, Pyramidal techniques for multisensor fusion. Sensor Fusion V. Int. Soc. Optics Photon. 1828, 124–131 (1992) 10. P.J. Burt, A gradient pyramid basis for pattern-selective image fusion. Proc. SID 1992, 467–470 (1992) 11. P. Burt, E. Adelson, The Laplacian pyramid as a compact image code. IEEE Trans. Commun. 31(4), 532–540 (1983) 12. A. Toet, A morphological pyramidal image decomposition. Pattern Recogn. Lett. 9(4), 255–261 (1989) 13. A. Toet, Image fusion by a ratio of low-pass pyramid. Pattern Recogn. Lett. 9(4), 245–253 (1989) 14. A. Toet, L.J. Van Ruyven, J.M. Valeton, Merging thermal and visual images by a contrast pyramid. Opt. Eng. 28(7), 287789 (1989) 15. H. Li, B.S. Manjunath, S.K. Mitra, Multisensor image fusion using the wavelet transform. Graph. Models Image Process. 57(3), 235–245 (1995) 16. T.A. Wilson, S.K. Rogers, M. Kabrisky, Perceptual-based image fusion for hyperspectral data. IEEE Trans. Geosci Remote Sens. 35(4), 1007–1017 (1997) 17. M. Choi, R.Y. Kim, M.G. Kim, The curvelet transform for image fusion. Int. Soc. Photogr. Remote Sens. 35(Part 88), 59–64 (2004) 18. B. Yang, S. Li, F. Sun, Image fusion using nonsubsampled contourlet transform, in Fourth International Conference on Image and Graphics (ICIG 2007), (IEEE, Piscataway, NJ, 2007), pp. 719–724 19. V.P.S. Naidu, Image fusion technique using multi-resolution singular value decomposition. Defence Sci. J. 61(5), 479–484 (2011) 20. J. Liang, Y. He, D. Liu, X. Zeng, Image fusion using higher order singular value decomposition. IEEE Trans. Image Process. 21(5), 2898–2909 (2012) 21. D. Looney, D.P. Mandic, Multiscale image fusion using complex extensions of EMD. IEEE Trans. Signal Process. 57(4), 1626–1630 (2009)
100
2 Pixel-Level Image Fusion
22. B.S. Kumar, Multifocus and multispectral image fusion based on pixel significance using discrete cosine harmonic wavelet transform. Signal Image Video Process. 7(6), 1125–1143 (2013) 23. Q.G. Miao, C. Shi, P.F. Xu, M. Yang, Y.B. Shi, A novel algorithm of image fusion using shearlets. Optics Commun. 284(6), 1540–1547 (2011) 24. W. Gan, X. Wu, W. Wu, X. Yang, C. Ren, X. He, K. Liu, Infrared and visible image fusion with the use of multi-scale edge-preserving decomposition and guided image filter. Infrared Phys. Technol. 72, 37–51 (2015) 25. Y. Jiang, M. Wang, Image fusion using multiscale edge-preserving decomposition based on weighted least squares filter. IET Image Process. 8(3), 183–190 (2014) 26. G. Cui, H. Feng, Z. Xu, Q. Li, Y. Chen, Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition. Optics Commun. 341, 199–209 (2015) 27. J. Zhao, H. Feng, Z. Xu, Q. Li, T. Liu, Detail enhanced multi-source fusion using visual weight map extraction based on multi scale edge preserving decomposition. Optics Commun. 287, 45–52 (2013) 28. S. Li, X. Kang, J. Hu, Image fusion with guided filtering. IEEE Trans. Image Process. 22(7), 2864–2875 (2013) 29. B.S. Kumar, Image fusion based on pixel significance using cross bilateral filter. Signal Image Video Process. 9(5), 1193–1204 (2015) 30. Z. Zhang, R.S. Blum, A categorization of multiscale-decomposition-based image fusion schemes with a performance study for a digital camera application. Proc. IEEE 87(8), 1315–1326 (1999) 31. M. Unser, Texture classification and segmentation using wavelet frames. IEEE Trans. Image Process. 4(11), 1549–1560 (1995) 32. R.S. Blum, R.J. Kozick, B.M. Sadler, An adaptive spatial diversity receiver for non-Gaussian interference and noise. IEEE Trans. Signal Process. 47(8), 2100–2111 (1999) 33. J. Yang, R.S. Blum, A statistical signal processing approach to image fusion for concealed weapon detection, in Proceedings. International Conference on Image Processing, vol. 1, (IEEE, Piscataway, NJ, 2002), pp. I–I 34. E. Diamant, Single-pixel information content, in Image Processing: Algorithms and Systems II, vol. 5014, (International Society for Optics and Photonics, Bellingham, WA, 2003), pp. 460–465 35. V.S. Petrović, C.S. Xydeas, Sensor noise effects on signal-level image fusion performance. Inform. Fusion 4(3), 167–183 (2003) 36. G. Qu, D. Zhang, P. Yan, Information measure for performance of image fusion. Electr. Lett. 38 (7), 313–315 (2002) 37. L.J. Chipman, T.M. Orr, L.N. Graham, Wavelets and image fusion, in Proceedings, International Conference on Image Processing, vol. 3, (IEEE, Piscataway, NJ, 1995), pp. 248–251 38. T.Q. Nguyen, P.P. Vaidyanathan, Two-channel perfect-reconstruction FIR QMF structures which yield linear-phase analysis and synthesis filters. IEEE Trans. Acoust. Speech Signal Process. 37(5), 676–690 (1989) 39. B.R. Horng, A.N. Wilson, Lagrange multiplier approaches to the design of two-channel perfectreconstruction linear-phase FIR filter banks, in International Conference on Acoustics, Speech, and Signal Processing, (IEEE, Piscataway, NJ, 1990), pp. 1731–1734 40. M. Antonini, M. Barlaud, P. Mathieu, I. Daubechies, Image coding using wavelet transform. IEEE Trans. Image Process. 1(2), 205–220 (1992) 41. O. Rioul, Simple regularity criteria for subdivision schemes. SIAM J. Math. Anal. 23(6), 1544–1576 (1992) 42. M.B.A. Haghighat, A. Aghagolzadeh, H. Seyedarabi, Real-time fusion of multi-focus images for visual sensor networks, in 2010 6th Iranian Conference on Machine Vision and Image Processing, (IEEE, Piscataway, NJ, 2010), pp. 1–6
References
101
43. D.P. Bavirisetti, R. Dhuli, Multi-focus image fusion using multi-scale image decomposition and saliency detection, Ain Shams Eng. J. (2016) 44. C.H. Anderson, 4,718,104. 5. U.S. Patent (1988) 45. P. Perona, J. Malik, Scale-space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Anal. Mach. Intell. 12(7), 629–639 (1990) 46. V. Petrovic, C. Xydeas, Objective image fusion performance characterisation, in Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1, vol. 2, (IEEE, Piscataway, NJ, 2005), pp. 1866–1871 47. O. Rockinger, Image sequence fusion using a shift-invariant wavelet transform, in Proceedings of International Conference on Image processing, vol. 3, (IEEE, Piscataway, NJ, 1997), pp. 288–291 48. R. Achanta, S. Süsstrunk, Saliency detection using maximum symmetric surround, in 2010 IEEE International Conference on Image Processing, (IEEE, Piscataway, NJ, 2010), pp. 2653–2656 49. D.P. Bavirisetti, R. Dhuli, Multi-focus image fusion using maximum symmetric surround saliency detection. ELCVIA: Electr. Lett. Comput. Vision Image Anal. 14(2), 58–73 (2015) 50. R. Achanta, S. Hemami, F. Estrada, S. Süsstrunk, Frequency-tuned salient region detection, in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2009), vol. CONF, (2009), pp. 1597–1604 51. S. Frintrop, M. Klodt, E. Rome, A real-time visual attention system using integral images, in International Conference on Computer Vision Systems: Proceedings, (2007) 52. J. Harel, C. Koch, P. Perona, Graph-based visual saliency, in Advances in Neural Information Processing Systems, (2007), pp. 545–552 53. L. Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 11, 1254–1259 (1998) 54. Y.F. Ma, H.J. Zhang, Contrast-based image attention analysis by using fuzzy growing, in Proceedings of the eleventh ACM international conference on Multimedia, (ACM, New York, 2003), pp. 374–381 55. M.B.A. Haghighat, A. Aghagolzadeh, H. Seyedarabi, Multi-focus image fusion for visual sensor networks in DCT domain. Comput. Electr. Eng. 37(5), 789–797 (2011)
Chapter 3
Feature-Level Image Fusion
Abstract Fusion at feature level is also referred to as an intermediate level of image fusion. This process can represent and analyze the multi-sensor data for realizing classification and recognition tasks. The multi-resolution techniques are important, theoretical, and analytical tools of signal and image processing. These techniques mainly include pyramid and wavelet transform-based methods. The present chapter studies feature-level image fusion methods based on pyramid decomposition. New feature-level image fusion algorithms in pyramid domain are introduced based on multi-resolution gradient, texture, and fuzzy region features.
3.1
Introduction
Feature-level fusion is an intermediate-level fusion process that uses the feature information extracted from the original information of each source for comprehensive analysis and processing. Generally, the extracted feature information should be a sufficient representation or statistics of the original information, so that the multisource information is classified, collected, and synthesized. The idea of feature-level fusion is to first extract useful features from the original multi-sensor imaging and then merge these features into new feature vectors for further processing. Typical image features include edges, corners, lines, and the like. Compared with pixel-level fusion, feature-level fusion has more information loss and less computation. Featurelevel fusion is a fusion of information at the intermediate level. It not only retains a sufficient number of important information but also compresses the information, which is beneficial to real-time processing. At the intermediate level between feature-level fusion and decision level fusion, feature-level fusion is generally regarded as the next level of fusion of decision-level fusion. Although a variety of sensors (such as forward-looking infrared, laser imaging radar, synthetic aperture radar, etc.) that can obtain high-quality images
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2020 G. Xiao et al., Image Fusion, https://doi.org/10.1007/978-981-15-4867-3_3
103
104
3 Feature-Level Image Fusion
have been developed, and more advanced research results have been obtained in pixel-level fusion algorithms, in the case of large differences in characteristics and limited data transmission bandwidth, the best practice is to extract the useful features of the image and perform feature-level image fusion. Some of the feature-level fusion methods are pixel-level and decision-level methods, but can also be used for feature-level fusion. Feature-level image fusion is mainly based on the comprehensive analysis and processing of scene feature information such as edge shape, contour, direction, area, distance obtained after pretreatment and feature extraction. Not only this kind of information fusion at the intermediate level retains a sufficient number of important information, but also the information is compressed and is beneficial to real-time processing. Therefore, the study of feature fusion has been focused on the extraction of the information contained in the image and the optimization of the fusion rule of the extracted feature information. Since feature extraction in a complex context is a difficult and hot area in the field of image comprehension, the preliminary results of image feature fusion are most successfully used in fusion of simple background, such as face recognition and character recognition. In the field of remote sensing application, the research on the fusion of image features is basically limited to the assembly line, the airport extraction, fusion, and recognition, or the extraction and recognition of planar features with typical features. A feature image fusion system consists of four parts: information acquisition, information processing, feature extraction, and information fusion. This process is consistent with human cognition of things. Based on the obtained target characteristics, a sample bank of all kinds of target characteristics is established. Based on this, the model samples are preprocessed, and feature extraction highlights some useful information and suppresses useless information, thereby forming distinctive features that do not vary with distance, delay, and direction. Invariant feature extraction and fusion algorithms will be the focus of feature information fusion. Feature extraction is very important for the design of pattern recognition system. The feature extraction can effectively compress the information of the target model, highlighting the structural differences among the target models, reducing the dimension of the model space and the size of the target template library, so as to improve the generalization ability of the recognition system and reduce the calculation of the system quantity, improve the realtime nature of their work. McMichael D. [1] used the information fusion methods in the early stage to detect the edge of the image and perform feature extraction, and then used the neural network to carry out multi-source image fusion. Jocelyn Chanussot feature-level neural network-based morphological image fusion method and hierarchical neural network feature-level fusion method are suitable for road network detection and feature extraction. Subsequently, Heene G. proposed a method of multi-source image fusion method for coastline detection; Pigeon L. carried out knowledgebased multi-source image fusion method. Based on IHS color space transform and wavelet multi-resolution analysis, Liu Zhe defines multiple eigenvalues based on the features of high frequency wavelet
3.1 Introduction
105
coefficients of the image and uses the eigenvalue product as a basis to propose a new image fusion algorithm. Geng Bo Ying proposed an edge image fusion method based on multi-resolution wavelet analysis and Gaussian Markov random field theory. Based on the multiresolution wavelet analysis of the image, a set of statistical parameters is extracted respectively by using the method of regression analysis based on the Gaussian Markov random field theory in the corresponding regions of different images. These parameters represent the local structural features of the image. After calculating the similarity measure, the similarity matrix of the input image and its features are input to generate the fused edge image. Jiang Xiao Yu aiming at the general law of feature-level image fusion, the multiresolution segmentation of the image is realized by using wavelet transform algorithm, the multi-resolution feature vector of the target is extracted, the information fusion of CCD image and the thermal image is completed at the feature-level classification. Dennis uses multi-layer neural network sensors for feature-level fusion and classification, and its classification results are more accurate than K nearest-neighbor classification and nonparametric Bayesian classification. Anne H. et al. used the gray level correlation matrix to extract and fuse the statistical features of the texture information of SAR images, and then classified the features and fused the feature levels of the multi-sensor data for sampling immediately, and applied it to the detection of buried mines. It shows that the detection rate of feature-level fusion is higher than that of single-sensor detection and decisionlevel fusion. Yu Xiu lan, Qian Guo hui, Jia Xiao guang, et al. aiming at feature classification fusion of TM and SAR image information for feature classification, an iterative classification method based on Markov random field and BP neural network is proposed. Li Jun, Lin Zong, et al. used multi-resolution wavelet decomposition for data fusion between high-resolution and multispectral images. The fusion of subset data at different scales was carried out by using the maximum and variance of regional variance, feature-level remote sensing image data fusion of the corresponding baseband data. Chen Xiaozhong, Sun Huayan, et al. use the inherent multi-scale features of wavelet transform to detect edge features of different scales and different precisions, and then use them to fuse images. Li Qin Shuang, Chen Dong Lin, and others proposed a spectral feature fusion method based on spectral feature knowledge. In addition, Gao Xiu Mei, et al. use the fusion of face recognition, Ju Yan, and other characteristics of the integration of handwritten Chinese character recognition research. This chapter introduces four parts: the first part briefly describes the concept of feature-level fusion; the second part introduces the multi-scale image fusion method based on gradient features from the perspective of multi-resolution transformation; the third part introduces the joint texture and gradient features. In the fourth part,
106
3 Feature-Level Image Fusion
based on the fusion rules of image fusion, an image fusion method based on fuzzy region features is introduced.
3.2
Fusion Based on Grads Texture Characteristic
The gradient feature of the image reflects a transform coefficient feature of the image. This section describes a multi-scale image fusion method based on gradient features. First, the gradient multi-scale transformation of the image is introduced, and then the image fusion strategy based on this method is introduced.
3.2.1
Multi-Scale Transformation Method Based on Gradient Features
The gradient of the image is calculated at each pixel. Each pixel reflects the gradient in four directions, namely horizontal, vertical, and two diagonal directions. In order to reflect these gradient features of the image at each scale of the image, a special connection must be established between this gradient operator and the traditional Laplace pyramid multi-scale transformation method. The Laplace pyramid is a set of band-pass sequence images that can be obtained by calculating the difference between adjacent levels of a Gaussian pyramid. That is Ln ¼ G N Li ¼ Gi EXPANDðGlþ1 Þ ¼ Gl Glþ1," ,
ð3:1Þ 0lN
ð3:2Þ
Suppose the matrix G0 represents the source image and the image G1 obtained after the low-pass filtering is sampled at intervals, the length and width of the G1 are only half of G0; and the G1 is low-pass filtered to obtain G2, so that each stage of the Gaussian pyramid is a repeated matrix, that is, a sequence of images reduced in size. Inter-level operation can be expressed as function REDUCE Gl ¼ REDUCEðGl1 Þ
ð3:3Þ
That is, the point (i, j) on the first (1 l N, N is the total level of multi-scale expansion of the pyramid) level image, 0 i Cl, 0 j Rl (Cl, multi-scale level l image size), then
3.2 Fusion Based on Grads Texture Characteristic
Gl ði, jÞ ¼
2 2 X X
107
ωðm, nÞGl1 ð2i þ m, 2j þ nÞ,
ð3:4Þ
m¼2 n¼2
where ω(m, n) is a Gaussian template, usually ω(m, n) ¼ ω(m)ω(n), the ω functions are symmetric and normalized. It can be seen that the REDUCE function is equivalent to convolving the source image with a Gaussian template, followed by interval sampling. The size of two adjacent pyramid images is reduced by a multiple of 1/4, and the band is gradually increased, so Gaussian can be regarded as a multiresolution low-pass filter. The function EXPAND is the inverse operation of the function REDUCE, which is used to expand the image of a certain level in the Gaussian pyramid to the size of the previous image by interpolation. Let Gl, k denote the image obtained after EXPAND operation on Gl for k (0 k 1), then there are
Gl,k ði, jÞ ¼ 4
G1,0 ¼ G1
ð3:5Þ
Gl,k ¼ EXPANDðGl,k1 Þ
ð3:6Þ
2 2 X X
ωðm, nÞGl,k1
m¼2 n¼2
iþm jþn , 2 2
ð3:7Þ
After a series of EXPAND and subtraction operations, get a set of band pass images that form the Laplace pyramid. The Laplace pyramid can completely represent a source image, G0 can be accurately reconstructed by the inverse process of constructing a pyramid, and the reconstructed image is unique. Define GN ¼ LN, perform an EXPAND operation on GN, add GN + 1 to LN 1, perform an EXPAND operation on GN + 1, and add GN + 2 to LN 2. This continues until G0 is restored. Gradient pyramid decomposition can be obtained by performing gradient direction filtering on all the images of the Gaussian pyramid (except the highest level) Dlk ¼ dk ðGl þ w Gl Þ,
0 l N,
k ¼ 1, 2, 3, 4,
ð3:8Þ
where is the convolution operation, Dlk is the k-th gradient pyramid image of the first level, Gl is the l-th level image of the Gaussian pyramid, and dk is the k-th gradient filter operator, defined as d1 ¼ ½1 1,
1 0 d2 ¼ pffiffiffi 2 1
1 , 0
1 d3 ¼ , 1
1 d4 ¼ 0
0 1
ð3:9Þ
where w is the weight function, to satisfy ω ¼ w w, ω is the Gaussian template. After the gradient direction filtering of the images of the Gaussian pyramid, four decomposition images of horizontal, vertical, and two diagonal directions are obtained at each level.
108
3 Feature-Level Image Fusion
Gradient pyramids need to be transformed into Laplace pyramids to reconstruct the image. Each Dlk is transformed into a corresponding second-order partial derivative pyramid (direction pyramid), namely 1 * Llk ¼ d k Dlk 8
ð3:10Þ
The pyramid of all directions forms the Laplace pyramid Ll Ll ¼
4 X * Llk
ð3:11Þ
k¼1
3.2.2
Multi-Scale Transformation Based on Gradient Feature Fusion Strategy
The concrete integration process includes six steps: 1. Enter the original image x1(n) and x2(n) in the spatial registration (two original image input as an example). 2. According to the given gradient filter di (i ¼ 1,. . ., 4), a multi-resolution decomposition algorithm based on gradient features is established. 3. Multi-resolution pyramid decomposition based on joint texture and gradient features on the original image. 4. After the image is decomposed into multi-resolution forms based on texture and edge, the fusion method adopts the strategy of fusion based on the similarity measure and saliency measure. The marker multi-resolution indicates that the image of the k-th layer in the i direction is the signal Lki. The methods for calculating the similarity and significance measures are as follows: Each pixel aimed at the starting point to open up a 3 3 small window, the window template coefficient was taken as 2
1 6 α ¼ 41 1 The significance measure is
3 1 1 7 1 8 15 16 1 1
ð3:12Þ
3.2 Fusion Based on Grads Texture Characteristic
Sðm, n, k, iÞ ¼
1 1 X X
109 !
αðs, t ÞL ki ðm þ s, n þ t, k, iÞ2
ð3:13Þ
s¼1 t¼1
Similarity measures are 2 M AB ðm, n, k, iÞ ¼
1 1 P P s¼1 t¼1
!A
!B
αðs, t ÞL ki ðm þ s, n þ t, k, iÞL ki ðm þ s, n þ t, k, iÞ S2A ðm, n, k, iÞ þ S2B ðm, n, k, iÞ ð3:14Þ
Set a threshold β (0 β 1), if the similarity measure MAB β, then 1 1 1 M AB ωA ¼ 2 2 1β
ω B ¼ 1 ωA
If the similarity measure MAB < β, then
ωA ¼ 1 if SA > SB ωA ¼ 0
else
ω B ¼ 1 ωA
ð3:15Þ
Finally, the fusion strategy !F L ki ðm, n, k, iÞ
!A
!B
¼ ωA ðm, n, k, iÞL ki ðm, n, k, iÞ þ ωB ðm, n, k, iÞL ki ðm, n, k, iÞ
ð3:16Þ
5. In order to avoid the situation that a certain point and its neighborhood come from different input original images, we verify the consistency of each direction image !F L ki ðm, n, k, iÞ after the fusion. The so-called consistency is that if a point from the image A and most of its neighborhood from the image B, then the point will be changed to the corresponding value of the image B. !F
6. The fused multi-resolution expanded image L ki ðm, n, k, iÞ is obtained by multiresolution inverse transformation based on gradient features. Specific examples of fusion and fusion evaluation will be introduced in the next section.
110
3.3
3 Feature-Level Image Fusion
Fusion Based on United Texture and Gradient Characteristics
Texture is a very important feature in many images, such as SAR images. For example, most aeronautical and satellite remote sensing images, medical microscopic images, artificial seismic profiling images from petroleum geophysical prospecting, etc. can be thought of as consisting of different types of textures, and therefore, the study of textures is image processing in the field of important theoretical research topics and has a wide range of applications. There are three main signs of texture: (1) some local sequence repeats over a larger area than the sequence; (2) the sequence is composed of non-random basic elements; and (3) a uniform body of parts, approximately the same size of the structure anywhere in the texture area. The basic part of the series is often referred to as texture primitives. It is also thought that textures are arranged by texture primitives according to certain laws or only some statistical laws. The former is called deterministic texture, and the latter is called random texture. Texture description methods can be divided into three categories: statistical methods, structural methods, and filtering methods. The Laws texture extraction method belongs to a method of extracting features by using filtering analysis. The texture filter uses five extracted texture kernel vectors of Laws [2]. l5 ¼ ½ 1 4 6 4 1 e5 ¼ ½ 1 2 0 2 1 s5 ¼ ½ 1
0
2
0 1
u5 ¼ ½ 1 2 r 5 ¼ ½ 1 4
0 6
2 4
ð3:17Þ
1 1
These kernel vectors are referred to as flatness, edge measure, speckle measure, waviness, and graininess, respectively.
3.3.1
Joint Texture and Gradient Features of Multi-Scale Transformation Method
The pyramid decomposition method based on texture and edge is obtained by improving the gradient pyramid method. Considering the important role of texture features in image processing and image segmentation and classification, the texture features are added to make the pyramid decomposition. The resolution transformation domain can include the texture information in the original image. Thus, providing a more comprehensive measure of information for further integration. Each layer of the Gaussian pyramid is filtered using a texture extraction filter and an edge-gradient filter template to generate a series of texture and edge images. The
3.3 Fusion Based on United Texture and Gradient Characteristics
111
L0(x) L0(x)
L0
L1(x) L2(x)
L2(x)
L2(x)
L1(x)
L1(x)
Gi Fig. 3.1 Multi-scale pyramid decomposition method based on texture and gradient features
image pyramid can be completely represented by these tower signals. When designing pyramid algorithms that combine texture and gradient features, you should first consider the reconstruction conditions. This is mainly because the selected texture, edge filter template, and pyramid decomposition used by the two filters must ensure that the part of the dashed box in Fig. 3.1 has the same role, so as to find the reconstruction conditions. Reconstruction Conditions: When a set of coefficients (ti, ci) can be found to make the system P P4 P29 satisfy ð1 w_ new Þ ¼ 25 i¼1 t i T i T i þ i¼1 ci Di Di ¼ i¼1 vi U i U i , the multiresolution transformation method of joint texture and gradient features can be reconstructed. Wherein, w_ new is a new kernel window function constructed by two convolutions of the kernel window function w_ in Eq. (3.3), w_ new ¼ ðw_ w_ Þ ðw_ w_ Þ; Ti is a texture extraction filter, which is obtained by filtering each texture extraction filter in Eq. (3.10), Di is the edge-gradient filter, which is obtained by performing two convolutions through the four gradient filters of Eq. (3.5), Ti and Di are collectively referred to as a feature extraction filter, and ti and ci are coefficients to be determined that satisfy the reconstruction conditions. Suppose vi ¼
ti
i 25
ci
25 < i 29
,
Ui ¼
Ti
i 25
Di
25 < i 29
ð3:18Þ
It is proved that the so-called reconstruction refers to that when a signal ω is equal e obtained by performing to the reconstructed signal
* *ω inverse transform in the * * e . Where transforming domain V 0 , V 1 , . . . , V k , . . . , V N1 , GN , that is, ω ¼ ω
* * * * V 0 , V 1 , . . . , V k , . . . , V N1 is the high-frequency part based on multi-scale expan* sion of texture and gradient features, V k ¼ ½V k1 , . . . , V ki , . . . , V k29 , and (GN) is its low-frequency part. It has been previously demonstrated that the FSD Laplacian pyramid can be approximately completely reconstructed [3], and here only the multi-resolution
112
3 Feature-Level Image Fusion
pyramid based on texture and gradient features can be constructed to make the filter resampling Laplacian Si pyramid form. V ki ¼ U i ½Gk þ w_ new Gk ð1 w_ new Þ ¼
29 X
ð3:19Þ
vi U i U i
ð3:20Þ
i¼1
Direction Laplacian pyramid: Lk ¼ ð1 w_ new w_ new ÞGk ¼ ð1 w_ new Þð1 þ w_ new ÞGk ¼
29 X i¼1
vi U i ½U i ð1 þ w_ new ÞGk ¼
29 X
vi U i V ki
ð3:21Þ
i¼1
Therefore, as long as a set of coefficients are found for the equation to be true, the multi-resolution pyramid based on texture and gradient features can produce the form of the filter resampling Laplacian pyramid. However, it is important to note here that since the FSD Laplacian pyramid is approximately reconstructed, that is, there is some error in the reconstructed result from the original image, the multi-resolution pyramid based on texture, and the gradient features described in this book. It also belongs to the approximate reconstruction of multi-resolution transformation method. The steps of multi-resolution pyramid transformation based on texture and gradient features are as follows: 1. To establish Gaussian multi-resolution pyramid Similar to the Gaussian multi-resolution pyramid, the difference is that the nuclear filter window function ω needs to be replaced by ω_ new ω_ new . 2. To create multi-resolution pyramid based on texture and gradient features. The pyramid image representation is obtained by filtering each layer of the Gaussian pyramid in 29 directions. V ki ¼ ½ð1 þ w_ new Þ Gk U i
ð3:22Þ
where U ¼ [T1, T2, . . ., T25, D1, . . ., D4] is a combination of a texture filter and a directional gradient filter, the subscript i is used to designate a certain feature extraction filter, Gk is a Gaussian pyramid decomposition result of the k-th layer, and w_ new a 9 9 filter kernel window function.
3.3 Fusion Based on United Texture and Gradient Characteristics
113
w_ new ¼ ðw_ w_ Þ ðw_ w_ Þ
ð3:23Þ
Directional gradient filter D is a given 5 5 filter obtained by convolving the filter twice (Table 3.1). 2
0
6 60 6 6 D1 ¼ 6 61 6 60 4 0 2 0 6 60 6 6 D3 ¼ 6 60 6 60 4 0
0
0
0
0
4
6
0
0
0 0
0 1
0 4 0
6
0 4 0
1
0
0
3
7 07 7 7 4 1 7 7, 7 0 07 5 0
0 0 3 0 0 7 0 07 7 7 0 07 7, 7 0 07 5 0 0
2
0
0
0
0
6 6 0 0 0 1 6 6 D2 ¼ 6 0 1:5 0 6 0 6 6 0 1 0 0 4 0:25 0 0 0 2 0:25 0 0 0 6 6 0 1 0 0 6 6 6 D4 ¼ 6 0 0 1:5 0 6 6 0 0 0 1 4 0
0
0
0
0:25
3
7 0 7 7 7 0 7 7, 7 0 7 5
0 0
3
7 0 7 7 7 0 7 7 7 0 7 5
0:25 ð3:24Þ
By performing a convolution operation on the Gaussian pyramid layers based on joint texture and gradient direction operators, a decomposition image containing 29 characteristic information is available at each level. Therefore, the pyramid decomposition algorithm based on the texture and the gradient direction can well represent the texture information and the edge information of the image under the multi-resolution transform domain. This gives the image processing, and fusion operation provides a good condition. 1. Reconstruction of multi-resolution pyramid based on texture and gradient features Reconstruction of the pyramid is equivalent to obtaining the undetermined coefficient of Eq. (3.20). Pending coefficients can be obtained using the singular value decomposition method in matrix theory. Singular value decomposition (SVD) method is a commonly used mathematic method of matrix inversion. Any M N (M N ) matrix A can be written as the product of three matrices A ¼ UWV T
ð3:25Þ
where W is a N N diagonal matrix whose diagonal elements are the singular values of A. U and V are M N and N N matrices, respectively, columns are orthogonal to each other, UTU ¼ VTV ¼ I and I is an identity matrix. Since V is a square matrix, it is also row-orthogonal, that is, VVT ¼ I.
Table 3.1 Texture filter structure
T 1 ¼ e5 lT5 =96 T 2 ¼ s5 lT5 =64
T 7 ¼ s5 eT5 =24 T 6 ¼ e5 eT5 =36
T 12 ¼ s5 sT5 =16 T 11 ¼ e5 sT5 =24
T 17 ¼ s5 uT5 =24 T 16 ¼ e5 uT5 =36
T 22 ¼ s5 r T5 =64 T 21 ¼ e5 r T5 =96
T 3 ¼ u5 lT5 =96
T 8 ¼ u5 eT5 =36
T 13 ¼ u5 sT5 =24
T 18 ¼ u5 uT5 =36
T 23 ¼ u5 r T5 =96
T 4 ¼ r 5 lT5 =256
T 9 ¼ r 5 eT5 =96
T 14 ¼ r 5 sT5 =64
T 19 ¼ r 5 uT5 =96
T 24 ¼ r 5 r T5 =256
T 5 ¼ l5 eT5 =96
T 10 ¼ l5 sT5 =64
T 15 ¼ l5 uT5 =96
T 20 ¼ l5 r T5 =256
T 25 ¼ l5 lT5 =256
114 3 Feature-Level Image Fusion
3.3 Fusion Based on United Texture and Gradient Characteristics
115
When M < N, singular value decomposition can also be performed. At this time, the singular value wj ( j ¼ M + 1, . . ., N ) on the diagonal of the matrix W is equal to zero. The columns corresponding to matrix U with wj are also zero. It can be shown that Eq. (3.15) holds true at any time and is almost unique. Equation (3.20) can be converted to w ¼* vΩ
!
ð3:26Þ
where w is ð1 w_ new Þ 1 81 vector of b transform; * v is a vector of coefficients to be determined, 1 29 in size; and Ω is a 29 81 matrix vector for transforming Ui Ui by a texture and gradient filter. !
By a singular value, decomposition can be written as Ω, and substituting Ω ¼ USVT into Eq. (3.26), we get
* v ¼* w VS1 U T
ð3:27Þ
where S1 is the pseudo inverse of S, so that the undetermined coefficient can be obtained as 2
0:5625, 6 0:0808, 6 * v¼6 4 0:5625, 0:0478,
0:3750,
0:5625,
0:4767,
0:5625,
0:3164,
0:2109,
0:2125, 0:0808, 0:2125,
0:3750, 0:2109, 0:2445,
0:2109, 0:3164, 0,
0:0243, 0:2125, 0:0024,
0:2109, 0:0477, 0:0133,
0:1417, 0:2109, 0:0240,
...
3
... 7 7 7 ... 5 0:0133 ð3:28Þ
Reconstruction of the pyramid and the directional gradient pyramid reconstruction methods are basically the same. Pyramid reconstructions based on texture and gradient features are relatively complex, and the directional Laplacian pyramid and Filter-Subtract-Decimated (FSD) Laplacian pyramid images are constructed as intermediate results. Defining the direction, the Laplacian pyramid is !
1 L ki ¼ U i GPki , 8
! L ki
ð3:29Þ
is the Laplacian pyramid image of the i-th feature of the k-th layer. Directions Laplace pyramid can be converted into FSD Laplace pyramid through accumulation, P ! to form FSD Laplacian pyramid, Lk ¼ 4l¼1 L kl convert Laplacian pyramid to Laplacian pyramid image, LPk ¼ [1 ω] Lk Si pyramid algorithm reconstruction to get the image, where ω ¼ ω_ new ω_ new .
116
3.3.2
3 Feature-Level Image Fusion
Multi-Scale Image Fusion Method Based on Gradient Feature
For multi-focus image fusion, we obtain the standard reference fusion image of multi-focus image by artificial shear splicing, and use the following two evaluation criteria to objectively determine the merits of the fusion results. 1. Root mean square error of standard reference fusion image and fusion image vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u M X N u 1 X RSME ¼ t F 0 ði, jÞ F ði, jÞÞ2 M N i¼1 J¼1
ð3:30Þ
where M, N is the number of rows and columns of the image; F0(i, j) is the pixel gray value of the algorithm fusion result at (i, j); and F(i, j) is the pixel gray value of the standard reference fusion image at (i, j). The smaller the p value, the better the fusion result. Conversely, the larger the p value, the poorer the fusion result. 2. The common information between the standard reference fusion image and the fused image MI ¼
LN X LN X i¼1
j¼1
hR,X ði, jÞ ln
hR,X ði, jÞ hR ðiÞhX ð jÞ
ð3:31Þ
where hR, X(i, j) is the joint probability f when the pixel gray in image R is i and the pixel with the same name in image X is j; hR(i), hZ( j) is the pixel gray value of image R (or Z ); and LN is the gray level. In general, the larger the MI, the more common information the two have. 1. For different sensor image fusion such as infrared and visible light, it is more difficult to evaluate the fusion results quantitatively. We cannot find an ideal fusion result as a reference image to measure the fusion result. Here, in order to be able to evaluate the fusion results objectively, two objective measures (Object Measure) are used: one is objective evaluation measure of image fusion proposed by Xydeas and Petrovic [4] in 2000, and for convenience, we call the mutual information of the edge; and another by the Chinese scholar Qu [5] and other information proposed measure, we call the pixel mutual information. Objective evaluation index of mutual information of edge measures how much edge information of “inherited” input image in fusion image. Based on the edge extraction of input image and fusion image, this method calculates the amount of edge information stored and uses the weighted amount of edge information as a measure to evaluate the fusion result. Therefore, the larger the mutual information of edge, the more edge information. Specific steps are as follows: Calculate the edge feature amplitude and phase image of the original image A, B and the fused image F.
3.3 Fusion Based on United Texture and Gradient Characteristics
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sxA ðn, mÞ2 þ syA ðn, mÞ2 y s ðn, mÞ αA ðn, mÞ ¼ tan 1 Ax sA ðn, mÞ
gA ðn, mÞ ¼
117
ð3:32Þ ð3:33Þ
The above is the A-image to get the edge of the amplitude and phase images of the formula, sxA ðn, mÞ and syA ðn, mÞ, the application of Sobel operator to filter the horizontal and vertical results. 2. Calculate the relative amplitude value GAF(m, n) and phase value AAF(m, n) of the original image A and the fused image F, and the relative amplitude values and phase values of the B image and the fused image F.
GAF ðm, nÞ ¼
8 gF ðm, nÞ > > < g ðm, nÞ if
gA ðm, nÞ > gF ðm, nÞ
A
> g ðm, nÞ > : A gF ðm, nÞ
AAF ðm, nÞ ¼ 1
else
jαA ðm, nÞ αF ðm, nÞj π=2
ð3:34Þ
3. Calculate edge amplitude and phase retention value QAF g ðm, nÞ ¼ QAF α ðm, nÞ ¼
Γg 1þe
K g ðGAF ðn,mÞσ g Þ
1þe
Γα K α ðAAF ðn,mÞσ α Þ
ð3:35Þ ð3:36Þ
AF QAF g ðm, nÞ and Qα ðm, nÞ describe the marginal amplitude and phase reservation between A and F images, respectively. Γg, Kg, σ g and Γα, Kα, σ α are the adjustable parameters. 4. Calculate edge information retention
AF QAF ðm, nÞ ¼ QAF g ðm, nÞQα ðm, nÞ:
ð3:37Þ
118
3 Feature-Level Image Fusion
5. Finally get the objective evaluation of mutual information edge measures N P M P AB=F QF ðm, nÞ
¼
QAF ðn, mÞwA ðn, mÞ þ QBF ðn, mÞwB ðn, mÞ
n¼1 m¼1
ð3:38Þ
N P M P
wA ðn, mÞ þ wB ðn, mÞ
i¼1 j¼1
where wA(n, m) ¼ |gA(n, m)|L, wB(n, m) ¼ |gB(n, m)|L. In order to measure the performance of image fusion, according to the definition of Petrovic V., this book takes all the parameters as a fixed value Γg ¼ 0:9994,
K g ¼ 15,
σ g ¼ 0:5,
Γα ¼ 0:9879,
K α ¼ 22,
σ α ¼ 0:8:
The objective evaluation index of mutual information of pixels adopts the mutual information as the evaluation index to quantitatively evaluate the overall effect of the fusion image. Mutual information describes the similarity of the information contained in images. For the input images A and B and the resulting fused image F, mutual information of F and A and B is calculated separately, and how much of the input image information contained in the fused image can be obtained MIFA ¼
L X L X i¼1
MIFB ¼
hF,A ði, jÞ hF ðiÞhA ð jÞ
ð3:39Þ
hF,B ði, jÞ ln
hF,B ði, jÞ hF ðiÞhB ð jÞ
ð3:40Þ
j¼1
L X L X i¼1
hF,A ði, jÞ ln
j¼1
In the formula, hF, A(i, j) is the joint probability that the pixel gray in the image is i and the pixel gray in the same position in the image is j; hF(i), hA( j) is the probability that the image pixel gray value is i; and L is the gray level. In general, the larger the MI, the more mutual information the two have. Similarly, the definition of a variable in Eq. (3.40) is consistent with the definition of a variable in equation. Then, the evaluation index of the fusion result can be defined as MIAB F ¼ MIFA þ MIFB
ð3:41Þ
The above equation can reflect how much the fused image F contains the input image A, B information. Figure 3.2 shows the structure of the image fusion algorithm based on texture and gradient multi-resolution pyramid (with a single-layer decomposition as an example). Specific integration steps are as follows: 1. Register the original images x1(n) and x2(n) spatially (take the two input images as examples).
3.3 Fusion Based on United Texture and Gradient Characteristics
119
Image B
Image A Multi-scale transformation
L k,1 L k,6 L k,11
Lk,2 Lk,7 Lk,12
L k,3 L k,8 L k,13
L k,4 L k,9 L k,14
L k,5
Significan ce measure Similarity measure
Significan ce measure Similarity measure
L k,10
decision
L k,15
L k,16
Lk,17
L k,18
L k,19
L k,20
L k,21
Lk,22
L k,23
L k,24
L k,25
L k,26
Lk,27
L k,28
L k,29
operation Lk,1
Lk,2
L k,3
L k,4
L k,5
Lk,6
Lk,7
L k,8
L k,9
L k,10
Lk,11
Lk,12
L k,13
L k,14
L k,15
Lk,16
Lk,17
L k,18
L k,19
L k,20
Lk,21
Lk,22
L k,23
L k,24
L k,25
Lk,26
Lk,27
L k,28
L k,29
Multi-scale transformation
L k,1
L k,2
Lk,3
L k,4
L k,5
L k,6
L k,7
Lk,8
L k,9
L k,10
L k,11
L k,12
Lk,13
L k,14
L k,15
L k,16
L k,17
Lk,18
L k,19
L k,20
L k,21
L k,22
Lk,23
L k,24
L k,25
L k,26
L k,27
Lk,28
L k,29
Inverse transform Fusion image Fig. 3.2 Schematic diagram of image fusion based on joint texture and gradient feature multiresolution method
2. Construct the filter vector Ui (i ¼ 1, . . ., 29) from the given texture filter Ti (i ¼ 1, . . ., 25) and the directional gradient filter Di; and seek w_ new . 3. According to the reconstruction condition, using the singular value decomposition method, find the coefficient of undetermined coefficient* v. 4. Multi-resolution pyramid decomposition based on joint texture and gradient features on the original image. After the image is decomposed into multi-resolution forms based on texture and edge, the fusion method adopts the strategy of fusion based on similarity measure and saliency measure. The multi-resolution flag indicates that the image of the kth layer in the i direction is signal Lki. The methods for calculating the similarity and significance measures are as follows: To each pixel as the starting point to open up a 3 3 small window, the window template coefficient is taken as
120
3 Feature-Level Image Fusion
3 1 1 7 1 8 15 16 1 1
2
1 6 α ¼ 41 1
ð3:42Þ
Significance measures are Sðm, n, k, iÞ ¼
1 1 X X
!
αðs, t ÞL ki ðm þ s, n þ t, k, iÞ2
ð3:43Þ
s¼1 t¼1
Similarity measures are 2 M AB ðm, n, k, iÞ ¼
1 1 P P s¼1 t¼1
!A
!B
αðs, t ÞL ki ðm þ s, n þ t, k, iÞL ki ðm þ s, n þ t, k, iÞ S2A ðm, n, k, iÞ þ S2B ðm, n, k, iÞ ð3:44Þ
Set a threshold β (0 β 1). If the similarity measure MAB β, then ωA ¼ 1 1 1 M AB ω B ¼ 1 ωA . 2 2 1β If the similarity measure MAB < β, then
ωA ¼ 1
if SA > SB
ωA ¼ 0
else
ω B ¼ 1 ωA
ð3:45Þ
Finally draw fusion strategy !F L ki ðm, n, k, iÞ
!A
!B
¼ ωA ðm, n, k, iÞL ki ðm, n, k, iÞ þ ωB ðm, n, k, iÞL ki ðm, n, k, iÞ
ð3:46Þ
5. In order to avoid the situation that the point of a certain point and its neighborhood come from different input original images respectively, we verify the !F consistency of each direction image L ki ðm, n, k, iÞ after fusion. The so-called consistency is that if a point is from the image A and most of its neighborhood from the image B, then the point will be changed to the corresponding value of the image B. !F 6. The fused multi-resolution expanded image L ki ðm, n, k, iÞ is obtained by multiresolution inverse transformation based on joint texture and gradient features. This section presents the results of the fusion of multi-focus images, infrared and visible images based on the combined texture and gradient features of the multiresolution image fusion method and is compared with the representative fusion method. These fusion methods include evaluation and comparison based on the
3.3 Fusion Based on United Texture and Gradient Characteristics
121
Fig. 3.3 (a) Texture feature extraction filter frequency response; (b) Gaussian directional gradient filter frequency response
Fig. 3.4 Original images and the fusion results. (a) Multi-focus original image 1; (b) multi-focus original image 2; (c) fusion results based on the LP method; (d) fusion results based on the FSD method; (e) fusion results based on the GP method; (f) fusion results based on the TGP method
Laplacian pyramid fusion method, the filter-subtract-decimation (FSD) Laplacian pyramid fusion method, and the gradient feature based multi-resolution pyramid fusion method. The filter used in the experiment is the Law’s texture feature extraction filter T1, T5, T8, T11 and the Gaussian direction gradient filter D1, D2, D3, D4. Figure 3.3 shows the frequency responses of the four texture feature extraction filters and the four sets of Gaussian direction gradient filters, which uses 3-layer decomposition. Figure 3.4a, b show an image with two clocks, respectively. The two clocks are located at different distances and one of the two images has a focused focus.
122
3 Feature-Level Image Fusion
Table 3.2 Image fusion results of index evaluation
Image fusion method Method based on the LP Method based on the GP Method based on the FSD Method based on the TGP
RSME 9.8688 10.6518 10.6619 9.6868
MI 2.6611 2.3352 2.3568 2.6613
Table 3.3 Image fusion results of index evaluation
Image fusion method Based on the LP method Based on the GP method Based on the FSD method Based on the TGP method
EMI 0.4344 0.4189 0.4179 0.4930
PMI 0.6456 0.6411 0.6427 0.6599
Figure 3.4c shows the fusion results obtained by using the fusion method (LP) based on Laplace’s tower transformation. Its multi-resolution decomposition method adopts the Laplace pyramid transformation method, and the fusion rules adopt the fusion rules consistent with those in this book. Fusion rules consistent with those described in this chapter are used to determine which multi-resolution method is more suitable for image fusion system. Figure 3.4d adopts the tower transformation fusion method based on the Laplace of filter subtract and decimate. The fusion rules are consistent with the method in this book, which is abbreviated as FSD in Table 3.2. Figure 3.4e shows the fusion results obtained by using the fusion method of directional gradient tower type transformation. The fusion rules are consistent with the method in this book, which is abbreviated as GP in Table 3.2. Figure 3.4f shows the result of fusion based on the multi-scale transform image fusion method with joint texture and gradient features, which is abbreviated as TGP in Table 3.2 which gives visually good fused result than rest of. From the fusion results, we can clearly notice that both the fusion method and the LP-based fusion method mentioned in this chapter are better than the FSD-based fusion method and the GP-based fusion method. Especially in the upper left part of the back clock, both the FSD-based method and the GP-based method show significant distortion of the fused image. The proposed fusion method can get a better fusion effect. The results of the evaluation from Table 3.3 are consistent with our intuitive qualitative assessment. Figure 3.5a, b show the infrared and visible images of the same scene, respectively. Figure 3.5c–e show the results of fusion based on LP-based fusion method, DWT-based fusion method, and DWF-based fusion method, respectively. Figure 3.5f shows the result of fusion based on TGP fusion method. Table 3.3 shows the objective evaluation of the fusion results. The infrared and visible image fusion can clearly observe the specific position of the target, which is convenient for the computer to detect the target or manual detection.
3.3 Fusion Based on United Texture and Gradient Characteristics
123
Fig. 3.5 Original images and the fusion results. (a) Infrared images; (b) visible light images; (c) fusion results based on the LP method; (d) fusion results based on the FSD method; (e) fusion results based on the GP method; (f) fusion results based on the TGP method
124
3.4
3 Feature-Level Image Fusion
Fusion Algorithm Based on Fuzzy Regional Characteristics
We know that in the image fusion method based on multi-resolution decomposition, the fusion rule has a direct impact on the speed and quality of the fusion image, so the fusion rule is a very important part of the image fusion method. In 1984, Burt [6] proposed a fusion rule based on pixel selection. Based on the decomposition of the original image into different resolution images, the gray value of the pixel with the largest absolute value was selected as the pixel gray value after fusion. This is based on the fact that pixels with larger grayscale values contain more information in different resolution images. For example, pixels with larger coefficient values contain edges, lines, and area boundaries in the image. While improving the contrast pyramid, Zhou [7] also improved the fusion rules based on pixel selection and improved the image fusion quality. Petrovic and Xydeas [4] proposed a pixel-selection fusion rule that considers the correlation of each image within and between the decomposition layers. The selection of pixels for multiple images in the decomposition layer is not a separate selection like the previous fusion rule, but considers the interrelationship between each image within the layer and each image between the layers. When Pu Tian [27] uses wavelet transform to perform image fusion, based on the characteristics of the human visual system sensitive to local contrast, a pixel-based fusion rule based on contrast is adopted. However, the pixel-based fusion selection is only a fusion rule that uses a single pixel as a fusion object. It does not consider the correlation between adjacent pixels of the image. The fusion result is not ideal. Considering the correlation between adjacent pixels of the image, Burt and Kolczynski [8] proposed a weighted average fusion rule based on the selection of the regional characteristics in 1993, linking the pixel gray value fusion selection with the window region in which it is located. According to the matching degree of the window area, different fusion rule operations are performed; the matching degree is greatly different, and the fusion point pixel values are directly selected according to the energy value; the difference in matching degree is small and the weighted average rule operation is used. In the fusion rule proposed by Li et al. [9], they use the i maximum gray value in the selected window area as the fused pixel value; usually the maximum gray value in the window area is selected as the fused pixel value, but taking into account the influence of noise can also be selected by fusion selection in the case of i ¼ 2 or i ¼ 3. While choosing the gray value of the pixel, this fusion rule also considers the correlation of pixels in the window area. Koren et al. [10] proposed a fusion rule based on the local direction energy. The local directional energy is obtained from the integral pairs of the directional filter. This rule is based on the characteristic that human vision is sensitive to energy in the largest local direction.
3.4 Fusion Algorithm Based on Fuzzy Regional Characteristics
125
Chibani and Houacine [11] determine the fusion pixel selection by calculating the number of absolute values of the pixels in the corresponding window area of the input original image in its fusion rule. The fusion rule based on the window region reduces the erroneous selection of fused pixels by considering the correlation of adjacent pixels. The fusion effect is improved. Zhang et al. [12] proposed a region-based fusion rule, in which each pixel in the image is regarded as part of an area or an edge, and image information such as regions and boundaries is used to guide fusion and selection. The fusion effect obtained by adopting this fusion rule is better, but this rule is more complicated than other fusion rules. For complex images, this rule is not easy to implement. Other fusion rules include statistical and estimation methods. This method considers image fusion from the perspective of signal and noise and is very suitable for image fusion with noise. From the past research on image fusion rules, we can see that the fusion rules have evolved from fusion based on single pixel point selection to fusion strategy based on window measurement, and then a region-based fusion method has emerged. This chapter analyzes the method proposed by Zhang and Blum [12], and Piella [13] proposes an area-based fusion rule and discusses some of the problems. Therefore, an image fusion method based on fuzzy region features is proposed. This method guarantees important areas. While there is a good consistency information with the background area, the sub-important area has significant high-frequency characteristics. This avoids the contradiction caused by simply pursuing consistency in all areas of the image. The image fusion method based on fuzzy region features is based on multi-resolution analysis, and K-means clustering is performed according to the low-frequency components of each layer of the image, and the low-frequency image is decomposed into important regions, sub-important regions, and background regions; each region of the image is different. The attributes are blurred, and the fusion strategy of each partial region is determined according to the respective fuzzy membership degree of each region. Finally, the multi-resolution representation of the fused image is obtained, and then the multi-resolution inverse transform is performed to obtain the image fusion result. Experiments show that our image fusion method based on fuzzy region features presented has a good fusion result.
3.4.1
Area-Based Image Fusion Algorithm
Zhang Z. and G. Piella’s [13] region-based multi-resolution image fusion methods are basically similar, and they are the first to multi-resolution decomposition of the image, after decomposition of the corresponding low-frequency parts of the image for image segmentation; sensor image fragmentation results can be merged to get a unique segmented image. Then based on this segmentation image, the fusion strategy of multi-resolution high-frequency components is determined, and the
126
3 Feature-Level Image Fusion
Fig. 3.6 The result of the two region-based fusion results. (a) Original goal, (b) Zhang Zhong method, (c) G. Piella method
lowest frequency component is selected according to the selection rule of the highfrequency component. The multi-resolution representation of the fused image is obtained, and the corresponding multi-resolution inverse transform (reconstruction) is finally performed to obtain the fusion result. Although their multi-resolution transform methods and image segmentation methods are different, the fusion criteria selection and fusion measure are basically similar in design, and the fusion structure is similar. Therefore, the fused images obtained by these methods have the problem of the consistency of regional features. That is, the local area of the fused image cannot completely reflect the distribution characteristics of the pixels inside the corresponding area of the original image. Since the regional significance measure is obtained after each multi-scale expansion in each frequency band, the selection in each frequency band may be inconsistent. That is, instead of selecting all the frequency bands in the corresponding area of the same image, this area appears. The inconsistency, as shown in Fig. 3.6, decreases both the contrast and the regional consistency of the target area, thus affecting the overall characteristics of the target area. The purpose of image fusion is to obtain a single image from multiple images. This image should be able to reflect all the important information of the original image. If artificially selecting important features of an image is to be fused, the first consideration is to select an important image region, then an important edge feature, and finally consider the fusion of pixel points. The purpose of doing so is often to preserve the consistency of important regions, that is, to obtain all the information of the region from the corresponding part of an image; for less important regions, the requirements of regional consistency are not strong, and you can choose those comparisons. The high-frequency information of the pixels that can reflect the “edge” feature is used as a fusion result. This shows that the consistency of the area is more important than the significance of a single pixel. Therefore, the image fusion process should also have the feature of the “first region after pixel.” In other
3.4 Fusion Algorithm Based on Fuzzy Regional Characteristics
127
words, the important areas must be merged first, and the pixels of the unimportant areas are merged. However, because the importance of this region is a vague and uncertain concept, the proposed region-based multi-resolution fusion method is also performed in a fuzzy space.
3.4.2
Image Fusion Method Based on Fuzzy Region Features
3.4.2.1
K-Means Algorithm Image Segmentation
During the application and research of images, people are often interested in certain parts of the image. These parts generally correspond to specific areas of the image that have unique properties and thus need to separate and extract these areas. Image segmentation refers to the technique of dividing an image into regions with specific features and extracting the objects of interest. Image segmentation is an essential part of image understanding and analysis. The area in the image refers to a connected set of pixels with consistent “meaningful” properties. The so-called “significant” attributes depend on the specific conditions of the image to be analyzed, such as the color, grayscale, statistical properties, or texture properties of the neighborhood of the image. “Consistency” requires that each zone has the same or similar feature attributes. Image segmentation methods generally include threshold segmentation method, cluster segmentation method, statistical segmentation method, and regional growth method and separate merger method. The K-means segmentation algorithm belongs to one of the cluster segmentation methods. The K-means algorithm divides n vectors xj into c classes of Gi and finds the cluster centers of each class so that the objective function of the dissimilarity (or distance) index is minimized. When the metric between the vector xk of the first class Gi and the corresponding cluster center ci is a Euclidean distance, the objective function can be defined as J¼
c X i¼1
Here J i ¼
P k, xk 2Gi
Ji ¼
c X
X
i¼1
k, xk 2Gi
! kx k c i k
2
ð3:47Þ
kxk ci k2 is an objective function within the class Gi. The value of
Ji depends on the geometry of the Gi and the position of the ci. Obviously, the smaller the value of j, the better the clustering effect. The basic idea of the K-means algorithm: 1. First randomly select c vectors as the center of each class. 2. Let U be a c n two-dimensional membership matrix. If the l-th vector xj belongs to class I, the element uij in U is 1; otherwise, the element takes 0, which is
128
3 Feature-Level Image Fusion
( uij ¼
1, for each k 6¼ i,
2 2 if x j ci x j ck
0
else
ð3:48Þ
3. Calculate the value of the objective function formula (3.48) according to uij. If it is lower than a given minimum threshold or if the difference between two consecutive values is less than one parameter threshold, the operation stops. P 4. Update each cluster center according to uij: ci ¼ jG1i j k,xk 2Gi xk , where jGi j ¼ Pn j¼1 uij represents the number of elements in the class Gi. Then return to step (2).
3.4.2.2
Fuzzy Theory and Regional Feature Fuzzification
Randomness is only a kind of uncertainty in the real world. In addition to this, there is another kind of more general uncertainty, which is ambiguity. In order to portray and deal with this uncertainty, Zadeh et al. [14] conducted a lot of research on the representation and processing of vagueness from the perspective of set theory in 1965 and proposed fuzzy sets, membership functions, and linguistic variables. Concepts such as language truth and fuzzy inference have created a new mathematical branch of fuzzy mathematics, thus providing a new way for quantitative description and processing of fuzziness. The fuzzy method shows great advantages in classifying pixels and extracting features. The so-called ambiguity refers to the undifferentiation of the objective things in terms of their forms and generics. The root cause is the existence of a series of transitional states between similar things. They infiltrate and interpenetrate each other, so that there is no clear dividing line between them. Ambiguity is a characteristic of something in the objective world. It is essentially different from randomness. For randomness, the meaning of the thing itself is clear, it may or may not happen under certain conditions, and it cannot be predicted in advance. Therefore, a number on [0,1] indicates the occurrence of the event possibility. The nature of ambiguity is ambiguous. Whether a specific object meets a fuzzy concept cannot be clearly determined. Definition Let U be the domain, μA is a function that maps any PPPP to a value on [0,1], i.e., μA : U ! ½0, 1
u ! μ A ð uÞ
ð3:49Þ
Then μA is defined as a membership function defined on U. The set A formed by μA(u) (u 2 U ) is called a fuzzy set on U, and μA(u) is called the membership degree of u to A. From the above definition, it can be seen that the fuzzy set A is completely characterized by its membership function. The membership function μA maps each element u in U to a value μA(u) on [0, 1], indicating the degree to which the element
3.4 Fusion Algorithm Based on Fuzzy Regional Characteristics
129
belongs to A, and a larger value means a higher degree of membership. When the μA(u) value is only 0 or 1, the fuzzy set A degenerates into a normal set, and the membership function degenerates into a characteristic function. When multiple sensors image a scene, the real scene area of each sensor image can be roughly divided into three types. These three areas can be divided by a certain target-sensitive sensor (such as an infrared imaging sensor), including important areas of the target, sub-important areas with rich edges or texture information, and background areas containing background information. From the concept in the above-mentioned fuzzy theory, the complete set of these three regions defines a regional fuzzy set A on the real scene U. Due to the sensitivity of the area-divided sensor to the target, we collectively call it the target sensor, and other imaging sensors are called background sensors. First, we must determine the fusion rule for each element in the regional fuzzy set A when it is fused. If the area is divided into important areas, the importance of this area of the target sensor image is stronger than the corresponding area of the background sensor image, and the method of fusion is to use all the multi-resolution coefficients of this part of the target sensor image as the corresponding image after fusion. Regional section: If it is an important area, it means that the multi-sensor image shows significant features in this area. This only needs to fuse the information with edge features, and does not need to fuse all the information. In the process, the multi-resolution coefficient with relatively significant significance can be selected as a fusion result, and the destruction of the consistency of the region will not deteriorate the final fusion result. If the area belongs to the background area, it means that the background sensor image is more important in the area than the target sensor image. Similar to the importance area, the fusion method uses all the multiresolution coefficients of this part of the background sensor image as the fused image. The corresponding area section: Fig. 3.7 shows the target sensor image and its image segmentation results. The figures, roads, and lawns in the figure are
(a) Original infrared image
(b) Segmented image
Fig. 3.7 Zone division of the target sensor image. (a) Original infrared image. (b) Segmented image
130
3 Feature-Level Image Fusion
Area attribute membership
µ
A1
A3
A2
µ A2 (u ) µA (u ) 1
µA3 (u ) u
Regional characteristics
Fig. 3.8 Schematic diagram of the membership function of the area attribute
the target importance area, the sub-important area, and the background area, respectively. Let the elements of regional fuzzy set A be A1, A2, A3, respectively, which represent the target importance area, the sub-important area, and the background area. The regional feature attribute is u(u 2 U ), then μA(u) is called u to the membership of regional fuzzy set A. Degrees, so when determining regional attributes, of the regional convergence strategy should be 8 > < A image F ¼ f ðA, BÞ > : B image
area
μA1 ðuÞ ¼ 1 μA2 ðuÞ ¼ 1
area
μA3 ðuÞ ¼ 1
ð3:50Þ
Since the importance of the image area is relative, that is, it cannot be judged whether the area is important or not important according to a certain feature of the image. The importance of the area is a vague concept, so it is necessary to blur the importance attribute of the image, and the fusion process is performed in the fuzzy space. This method is the more commonly used normal distribution membership function, as shown in Fig. 3.8. The definition of this function is "
2 # MEðuÞ E A j μA j ðuÞ ¼ exp Lmax Lmin
ð3:51Þ
2
Among them, μA j ðuÞ indicates that the u area belongs to the Aj membership function; Lmax and Lmin are the ideal clustering centers for the important areas and background areas of the image, E(A1) ¼ Lmin, E(A3) ¼ Lmax; E(Aj) indicates that Aj is the ideal clustering center, and Lmax and Lmin are the maximum and minimum gray levels of min the target sensor image, EðA2 Þ ¼ Lmax þL ; ME(u) is the actual clustering center of 2
3.4 Fusion Algorithm Based on Fuzzy Regional Characteristics
131
the u area. We obtain a series of fuzzy region membership degrees obtained from a certain sensor image as the fuzzy region features of the image. In Fig. 3.8, A1, A2, A3, respectively, represent the three elements of the image fuzzy region set, which correspond to the three ideal fusion results, respectively. When ME(u) ¼ A1 indicates that the image area u is the background, the corresponding region of the background sensor image can be directly used as the fusion result and set to F1; when ME(u) ¼ A2 indicates that the region u is the next most important region, we adopt the pixel-based image fusion method [15]. The result of F2 fusion is obtained. When ME(u) ¼ A3 indicates that the area u is an important area, the corresponding area of the target sensor image is directly used as the fusion result F3. Finally, according to the characteristics of each area, μA j ðuÞð j ¼ 1, 2, 3Þ points of the membership degree space of all image pixels are defined. Based on these degrees of membership, the fusion results of the images are determined. 3 P
μAi ðuÞF i F ¼ i¼13 P μ A i ð uÞ
ð3:52Þ
i¼1
F in the formula is a multi-resolution representation of the fusion result. Corresponding multi-resolution inverse transform can get the final fusion result.
3.4.3
Image Fusion Method Based on Fuzzy Region Features
Based on the principle of multi-resolution image fusion, it can make the fusion result without artificial patchwork traces, and can show the characteristics of the original image while achieving a natural transition between features. For this reason, this book adopts the multi-resolution analysis method of the optimal filter bank wavelet frame or the multi-resolution analysis method based on the joint texture and edgegradient features mentioned in the previous section, to decompose and reconstruct the image and optimize the filtering. The design of the instrument group and the establishment of the wavelet framework have been elaborated in this chapter. The fusion rules are based on the fusion method based on fuzzy region features. Figure 3.9 shows the structure of multi-resolution image fusion (FRF_MIF) based on fuzzy region features. The basic idea is to first decompose the original image into an optimal filter bank based on the original image registration and then calculate the image segmentation result under multi-resolution transformation, according to the fuzzy region feature fusion rule for the decomposed coefficient. The fusion algorithm is used to obtain new fusion coefficients. Finally, the fusion image is obtained through the corresponding inverse transformation.
132
3 Feature-Level Image Fusion
Background sensor image
Target sensor image Multiresolution transformation Decomposition factor
High frequency component
Low-frequency component
Multiresolution transformation Decomposition factor
Low-frequency component
High frequency component
K mean segmentation Clustering results
Measurement index
Regional fuzzy membership function
Fusion decision
Fusion operation
After fusion coefficient Multiresolution inverse Fusion image Fig. 3.9 Image fusion algorithm structure based on fuzzy region features
Measurement index
3.4 Fusion Algorithm Based on Fuzzy Regional Characteristics
133
The fusion algorithm may specifically include the following steps: 1. Registration of the input original image in space (taking two input original images as an example). 2. Each input original image is decomposed by the optimal filter bank wavelet frame to obtain its own multi-resolution image sequence. 3. The low-frequency component of the target sensor image is clustered and divided into three categories, which are respectively represented as an important target area, a sub-important area, and a background area. 4. According to the ideal cluster center, the fuzzy membership function is obtained for the segmented region, and the fuzzy region feature is obtained. The background sensor image can participate in the process of obtaining the fuzzy region feature, or it cannot participate in the process. 5. According to the characteristics of the fuzzy region and the measurement, the fusion decision is obtained after getting parameters of the high-frequency part of the multi-sensor image. 6. On the basis of obtaining the measurement index, a multi-resolution representation of the fused image is established according to the fusion rule based on the fuzzy region feature. 7. Consistent verification of the multi-resolution representation of the merged image. 8. The fusion image is obtained by inverse wavelet frame transform of the optimal filter bank. Here, fusion experiments of millimeter-wave images and visible light images, infrared and visible light images, and infrared and SAR images have been conducted, respectively. The performance evaluation of image fusion results adopts two objective evaluation indicators: pixel mutual information and edge mutual information. Figure 3.10a, b are millimeter-wave images and visible-light images for inspecting concealed weapons. From (a), we can see that the firearms are imaged. According to the application, guns are an important target in the detection of dangerous goods. The rest of the area is the sub-important area and background area. The experiment uses a millimeter-wave image as a target sensor image. Through the image fusion technique, it can be clearly seen that the gun is hidden on the third person from the left. It uses the region-based fusion method of Zhang [12], the region-based fusion method of Piella [13] the method based on fuzzy region features introduced in this chapter, and the multi-scale method based on joint texture and gradient features described in previous section. Fuzzy feature region fusion method is used to fuse images. The multi-scale decomposition layers are all three layers. The fusion results are shown in Fig. 3.10c–f. It can be clearly seen that the image fusion method based on fuzzy region features has better regional consistency (or integrity) than other methods. The evaluation indicators are shown in Table 3.4. Both pixel mutual information and edge mutual information have improved significantly.
134
3 Feature-Level Image Fusion
Fig. 3.10 Input original image and fusion image. (a) Millimeter-wave image 1; (b) Visible light image 2; (c) Fusion result using Zhang Z region fusion method; (d) Fusion result using Piella G [13] region fusion; (e) Based on fuzzy region feature image fusion result 1; (f) Image fusion results based on fuzzy region feature 2
3.4 Fusion Algorithm Based on Fuzzy Regional Characteristics Table 3.4 Index evaluation of image fusion results
Image fusion method Using the method of Zhang Z Using the method of Piella G Based on FRF methoda Based on FRF methodb
135 EMI 0.5178 0.6021 0.6169 0.6134
PMI 1.3912 1.5668 1.6503 1.7702
a
Represents the fusion method based on fuzzy region features proposed in this book b Represents a multi-scale fuzzy region fusion method based on joint texture and gradient features
Among them, in Zhang Z’s [12] region-based image fusion method in the wavelet domain, the image segmentation algorithm is formed by first detecting the edges and then linking the edges, and the fusion rules are in the area size and the activity-level measurement (the activity-level measure). The fusion rule combines the size and the activity-level measurement of the region, and is formed under certain priority conditions. The segmentation algorithm is more complex to implement, but the final fusion result of the image is determined by its corresponding region fusion rule. Therefore, a relatively simple segmentation algorithm is used to generate the image region. The region fusion algorithm uses the method of obtaining the regional significance measure to determine that a relatively close fusion result will be obtained. The current method uses a K-means algorithm instead of an edge-linked region segmentation method. This algorithm is relatively simple to implement and can also reflect the performance of region fusion in the wavelet domain to some extent. The specific algorithm is as follows: 1. Wavelet transform of a certain number of layers of two images with good registration. 2. In the wavelet transform, each layer is decomposed using the low-frequency component of the layer; we divide the low-frequency component to achieve the region; the K-means algorithm is used to obtain the segmentation results of each layer. 3. Using two regions of the divided image to overlap, a sub-region image is obtained; taking into account the saliency measurement of each region in the wavelet domain of the two original images, a decision surface can be obtained. 4. According to decision-making face fusion, the fusion results in the wavelet domain are obtained. 5. Consistency verification of the multi-resolution representation of the merged image. 6. The wavelet inverse transform of the fusion results in the wavelet domain results in the final fusion result. The region-based image fusion method of Piella G [13] is performed in the Laplacian pyramid decomposition space. The segmentation algorithm of each layer adopts a multi-resolution “inheritance” segmentation strategy, and the method of fusion rules is similar to that of Zhang [12]. In order to get a better fusion effect, in this chapter, the algorithm also needs to verify the consistency.
136
3 Feature-Level Image Fusion
Fig. 3.11 Input original image and fused image. (a) Infrared image; (b) SAR image; (c) Fusion result using Zhang Z region fusion method; (d) Using Piella G region fusion result; (e) Fuzzy region feature fusion result; (f) Based on fuzzy region feature fusion result
It can be seen from Fig. 3.11a that the road is very clear, and this area can be regarded as an important target area. From Fig. 3.11b, it can be seen that the background information is very rich. Experiments use infrared images as target sensor images. Figure 3.11c–f are Zhang et al. [12] region-based fusion method, Piella [13] region-based fusion method, fuzzy region feature-based fusion method, and the second chapter is based on texture and gradient features. The scale method combines the fusion results obtained by the fusion method of fuzzy region features. The multi-scale decomposition layer is of three layers.
3.5 Multi-Source Face Feature Fusion Recognition Algorithm Based on Genetic. . . Table 3.5 Image fusion result indicator evaluation
Image fusion method Using the method of Zhang Z Using the method of Piella G Based on FRF methoda Based on FRF methodb
EMI 0.5207 0.5178 0.5692 0.5835
137 PMI 1.4870 1.5969 1.6400 1.6515
a
Represents the fusion method based on fuzzy region features proposed in this book b Represents a multi-scale fuzzy region fusion method based on joint texture and gradient features
Compared to other methods, the image fusion method based on fuzzy region features has relatively complete target information and background information. In addition, according to the evaluation index, the fusion result still retains more edge information, as shown in Table 3.5. It can also be seen from the above that when the texture information of the original image is relatively rich, a multi-scale transformation method based on texture and gradient features can obtain better fusion effects.
3.5
Multi-Source Face Feature Fusion Recognition Algorithm Based on Genetic Algorithm
Feature-level fusion recognition refers to the synthesis and processing of information obtained after preprocessing and feature extraction, and then classify and identify. Commonly used feature extraction methods include principal component analysis (PCA) [16] linear discriminant analysis (LDA) [17] and independent component analysis (ICA) [18]. However, when these methods are applied to infrared human faces, they do not take full advantage of the details provided by the infrared human face and do not achieve satisfactory results. At present, Gabor wavelet is widely used in feature detection because of its excellent time–frequency aggregation and good directional selectivity [19]. However, the maximum bandwidth of the Gabor wavelet is limited to one frequency, and the widest possible spectrum information with the best spatial location cannot be obtained. The Log-Gabor function is a good alternative to the Gabor function [20]. The bandwidth of the Log-Gabor wavelet can be arbitrarily constructed, which overcomes the disadvantages of the general Gabor wavelet for over-representation of low-frequency components and insufficient representation of high-frequency components. The study of feature-level fusion algorithms has not received due attention in recent years compared with other-level fusion algorithms [21]. However, featurelevel fusion is very important in information fusion processing. First, it extracts more efficient feature information from the original data space and reduces the spatial dimension. Second, it eliminates redundant information between the feature representation vectors obtained by each data source, thereby facilitating subsequent decision-making. In short, feature-level fusion can achieve an effective, low-dimensional feature representation vector that is conducive to the final decision.
138
3 Feature-Level Image Fusion
The existing feature-level fusion algorithms can be divided into two categories: feature selection and feature combination. The so-called feature selection is first of all the feature representation vector together and then uses a suitable method to generate a new feature representation vector, the elements of each position of the new vector are selected from the same position of the original vector elements. The fusion method based on dynamic programming proposed by Zhang [22] and the fusion method based on supervised neural network proposed by Battiti [23] all fall into this category. The so-called feature combination, that is, all feature representation vectors are directly combined into a new vector. The most typical feature combination method is a serial fusion strategy that serializes two or more feature representation vectors into a large vector [21].
3.5.1
Feature-Level Fusion Algorithm
1. Serial Strategy (Serial Strategy) [6, 21] It is assumed that A and B are two feature spaces defined in the pattern sample space Ω. For any sample Γ 2 Ω, the corresponding feature representation vector is α 2 A and β 2 B. The serial fusion strategy chains the two feature representation vectors into a large vector γ. α γ¼ β
ð3:53Þ
Obviously, if α is the n dimension and β is the m dimension, the synthesized vector γ is the (n + m) dimension. In this way, all serially synthesized vectors form a feature space of a (n + m) dimension. 2. Parallel Strategy [21, 24] It is assumed that A and B are two feature spaces defined in the pattern sample space Ω. For any sample Γ 2 Ω, the corresponding feature representation vector is α 2 A and β 2 B. The parallel fusion strategy represents these two features into a complex vector γ. γ ¼ α þ iβ
ð3:54Þ
Among them, i is an imaginary unit. It should be noted that if the dimensions of α and β are inconsistent, the dimension is consistent by zeroing the low dimensional vectors. For example, α ¼ (a1, a2, a3)T and β ¼ (b1, b2)T first convert β to (b1, b2, 0)T and then synthesize vector γ ¼ (a1 + ib1, a2 + ib2, a3 + i0)T. Define a parallel fusion feature space Ω on C ¼ {α + iβ|α 2 A, β 2 B}. Obviously, this is an n-dimensional complex vector space, where n ¼ max {dimA, dimB}. In this space, the inner product can be defined as
3.5 Multi-Source Face Feature Fusion Recognition Algorithm Based on Genetic. . .
ðX, Y Þ ¼ X H Y
139
ð3:55Þ
Among them, X, Y 2 C and H indicate a total transfer. The complex vector space that defines the inner product above is called unitary space. In the space, you can introduce the following norms vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n pffiffiffiffiffiffiffiffiffi u uX a2j þ b2j kZ k ¼ Z H Z ¼ t
ð3:56Þ
j¼1
Among them Z ¼ (a1 + ib1, , an + ibn)T. Correspondingly, the distance (unitary distance) between the complex vector Z1 and Z2 can be defined as kZ 1 Z 2 k ¼
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðZ 1 Z 2 ÞH ðZ 1 Z 2 Þ
ð3:57Þ
Compared with the serial fusion strategy, the parallel fusion strategy reduces the dimension of the fused vector. More importantly, it introduces the concept of unitary space, which transforms the fusion problem of two real vector spaces into a mathematical problem of a complex vector space. Compared with the serial fusion strategy, the parallel fusion strategy reduces the dimension of the fused vector. More importantly, it introduces the concept of unitary space, which transforms the fusion problem of two real vector spaces into a mathematical problem of a complex vector space. 3. Genetic Algorithm (GA) [1, 25] Genetic algorithms mimic the evolution of living things. In the process of biological evolution, each species is more and more adapted to the environment in the process of continuous development; the basic characteristics of each individual of the species are inherited by its descendants, but the descendants are not exactly the same as their fathers; individual characteristics that are more adaptable to the environment in the survival and development of individuals can be preserved, reflecting the principle of survival of the fittest. The basic idea of genetic algorithms is based on this. The algorithm interprets the possible solutions of the problem into 0 and 1 code strings called chromosomes. Given a set of initial chromosomes, the genetic algorithm manipulates them using genetic operators to generate a new generation. The new generation of chromosomes may contain better solutions than previous generations. Every chromosome needs to be evaluated by its fitness function. The goal of the genetic algorithm is to find the most suitable chromosome. The general genetic algorithm consists of four parts: coding mechanism, fitness function, genetic operator, and control parameters. The coding mechanism is the basis of the genetic algorithm. The genetic algorithm is not a direct discussion of the research object, but through a certain coding mechanism to the object is unified to a specific symbol (letter) arranged in a certain sequence of strings (chromosomes). In the commonly used genetic algorithm, the chromosome consists of 0 and 1, and the code is a binary string.
140
3 Feature-Level Image Fusion
The code of the genetic algorithm can have a very broad understanding. In the optimization problem, one chromosome corresponds to one possible solution. The survival of the fittest is the principle of natural evolution. Good and bad must have standards. In genetic algorithms, the degree of fitness for each chromosome is described by a fitness function. The purpose of introducing the fitness function is to evaluate and compare the chromosomes according to their fitness and determine the degree of good or bad. There are three most important operators of genetic algorithms: selection, crossover, and mutation. The role of selection is to determine whether it is eliminated or copied in the next generation based on the degree of chromosomes. In general, by selecting, there will be a greater chance of the chromosomes with good fitness, while the chance that the chromosomes with low fitness, i.e., inferior chromosomes, will continue to exist. Crossover operators allow different chromosomes to exchange information. A mutation operator is a value that changes a position on a chromosome. In the actual operation of the genetic algorithm, certain parameters need to be properly determined to improve the effect of selection. These parameters are: population size per generation, crossover rate (probability of performing crossover operator), mutation rate (probability of performing mutation operator), in addition to genetic algebra, or other indicators that can be used to determine the discontinuation of reproduction. The main calculation process of the genetic algorithm is as follows: • • • • • • • • • • •
Begin. Determine fitness function and coding rules. Identify genetic algebra, population size, crossover rate, and mutation rate. Randomly generate the initial population and calculate its fitness. Repeat. Selecting two (or more) chromosomes in the population to perform crossover at the crossover rate. Choosing chromosomes in the population to mutate at the mutation rate. Calculate the fitness of each chromosome. Select offspring groups (properly breed; those who are disqualified). Until the specified algebra is reached or a satisfactory result is obtained. End.
3.5.2
Fusion Recognition Based on Genetic Algorithm
The previous section explored three representative feature-level fusion algorithms. The serial fusion strategy is very simple, but it increases the number of dimensions. In addition, the direct vector cascades the effect of the fusion. The parallel fusion strategy introduces unitary space, and the dimension after the fusion remains unchanged. However, it is clear that the final calculation results are the same for the parallel fusion distance and the Euclidean distance of the serial fusion. In this
3.5 Multi-Source Face Feature Fusion Recognition Algorithm Based on Genetic. . . Visible face
Log -Gabor
141
ICA GA
Top -match
Result
IR face Log -Gabor
ICA
Feature Extraction
Feature Fusion
Classification
Fig. 3.12 Fusion identification framework
way, parallel fusion does not seem to make much sense. In fact, in practical applications, whether it is a serially obtained large vector or a parallel-derived complex vector, it is often necessary to perform feature extraction on these vectors in order to effectively fuse information [21]. But this undoubtedly increases the workload. The genetic algorithm, because of its use of random operations, has no special requirements on the search space and no requirements and has the advantages of simple operation, fast convergence rate, global optimization, etc., has been rapidly developed in recent years, and has been widely used in many ways. Therefore, this chapter proposes a multisource face fusion recognition algorithm based on genetic algorithm. Figure 3.12 shows a fusion recognition framework based on genetic algorithms. Assume that each face corresponds to a pair of face images at the same time, one is a visible light face, and the other is an infrared face. They are acquired synchronously and undergo a strict registration. This can be ensured through the use of integrated multi-source sensors. The following describes in detail the various parts of the identification framework. 1. Extracting Independent Log-Gabor Features Log-Gabor wavelet and independent component analysis (ICA) are used to calculate independent Log-Gabor features for visible and infrared face images. By obtaining the convolution with a set of Log-Gabor wavelets, a multi-level Log-Gabor feature of the face image can be obtained. Figure 3.13 shows the amplitude of the test plot and its convolution output with a set of Log-Gabor wavelets (2 scales, 6 directions). By serially outputting the columns of the matrix, each output can be represented as a column vector. Before the concatenation, the output of the ρ is first down sampled to reduce the dimensionality of the vector space. The output column vectors obtained at different frequencies and directions are finally concatenated into a large vector, the Log-Gabor feature vector O, to represent the input face image. Since the dimension of O is very high, y is reduced by PCA to obtain a low-dimensional feature vector F. By using the transformation matrix z obtained by the ICA method, an independent Log-Gabor feature z of the face image can be obtained, as shown in the following equation y ¼ Fz
ð3:58Þ
Since ICA does not provide the order between independent components, the following is defined based on the order of the ratios of intraclasses to intraclass variances [26]
142
3 Feature-Level Image Fusion
(a)
(b)
(c)
(d) Fig. 3.13 Test pattern and its amplitude with a set of Log-Gabor wavelets (2 scales, 6 directions) convolution output. (a) Visible light test chart; (b) Infrared test chart; (c) The amplitude of the convolution output of a visible light test chart and a set of Lo-Gabor wavelets; (d) The amplitude of the convolution output of an infrared test chart with a set of Log-Gabor wavelets
3.5 Multi-Source Face Feature Fusion Recognition Algorithm Based on Genetic. . .
P r¼
ð xk xÞ 2
σ between k ¼ PP σ within ðxki xk Þ2 k
143
ð3:59Þ
i
Among them, σ between is the variance between classes, and σ within is the sum of intraclass variances. Figure 3.13 shows sampling image and the magnitudes of the convolution outputs of the sampling image with the Log-Gabor wavelets: (a) visible sampling image, (b) LWIR sampling image, (c) the convolution outputs with visible sampling image, and (d) the convolution outputs with LWIR sampling image. 2. Feature Fusion Using Genetic Algorithm (GA) Let zV and zI represent independent Log-Gabor features obtained from visible and infrared face images, respectively. Through the genetic algorithm (GA), the fused feature representation vector Z can be obtained Z ¼ f ðzV , zI , xÞ
ð3:60Þ
Among them x is to find the optimal chromosome by the genetic algorithm. Each bit of x is related to the feature of a particular location. The value of this bit determines whether the feature at this location is selected from zV (value 1) or zI (value 0). In this chapter, the fitness function is designed based on the fusion recognition rate. 3. Using Top-Match Classification The vector Z obtained by feature fusion is used to classify the input face. Top-match, the nearest neighbor, is a simple and effective method of classification. The face is awarded to the class k if the Euclidean distance is minimized ε k ¼ kZ C k k
ð3:61Þ
Among them, Ck is a fusion vector describing the k face.
3.5.3
Experimental Results and Evaluation
In order to verify the algorithm performance of the fusion recognition based on genetic algorithm proposed in this section, the following experiments are performed. The experimental data is from the Equinox database [3], where the visible light image has a gray resolution of 8 bits and a long wave infrared (LWIR) image of 12 bits. All data is obtained from the newly designed CCD and LWIR integrated sensors, which can simultaneously acquire visible and infrared images with an image registration accuracy of 1/3 pixels. The original image size is 320 240, and the face is 180 140. Sixty objects are selected, and one training library, one verification
144
3 Feature-Level Image Fusion
Table 3.6 Recognition performance of PCA, ICA, Log-Gabor, and the proposed scheme (%)
Test_no Test_illumination Test_eyeglasses Test_expression Test_all
PCA 98.83 97.83 30.17 93.83 86.83 71.67 97.83 96.83 28.67 63.50
ICA 97.83 97.83 72.67 87.83 87.83 69.67 95.83 96.83 62.00 68.17
LogGabor + PCA 97.83 97.83 83.83 91.83 97.83 85.83 95.83 96.83 78.67 83.83
LogGabor + ICA 97.83 98.83 87.83 95.83 97.83 90.00 95.83 98.83 83.33 85.83
The algorithm proposed in this chapter 100 97.00 97.83 99.50 92.83
library, and five test libraries are created for the visible light face and the infrared face. There are two training charts for each object in the training library, and there are ten test charts for each object in the test library and verification library. Select the training library, test library, and verification library according to the following criteria: • • • • • • •
Training set: Forward lighting, no glasses, no expression. Test no set: Forward lighting, no glasses, no expression. Test illumination set: Sidelight, no glasses, no expression. Test eyeglasses set: Forward lighting, glasses, expressionless. Test expression set: Forward lighting, no glasses, expression. Test all set: Lateral lighting, glasses, expression. Validation set: Lateral lighting, glasses, expression.
The genetic algorithm finds an optimal chromosome based on the verification of the library’s image data. Each test library corresponds to a test condition. In the experiment, the performance of the fusion recognition algorithm mentioned in this chapter was evaluated by various test conditions. In the experiment, the Log-Gabor wavelet used has four dimensions and six directions. The parameters are set as follows: the wavelength of the smallest filter is 3, the scale factor of the adjacent filter is 2, the ratio of the standard deviation of the radial Gaussian function to the center frequency of the filter is 0.65, and the ratio of the spacing angle in the filter direction to the standard deviation of the angle Gaussian function is 1.5. The down sampling factor ρ is 4. The control parameters of the genetic algorithm are set as follows: the population size is 100, the genetic algebra is 100, the crossover rate is 0.96, and the mutation rate is 0.02. Table 3.6 compares the performance of the algorithm proposed in this chapter with the single-sensor face recognition algorithm (PCA, ICA, Log-Gabor). In each row, visible light results are listed above the infrared. In all cases, the number of
3.6 Summary
145
Table 3.7 Fusion recognition performance by using different feature extraction techniques and GA (%) Test_all
PCA 71.67
ICA 78.67
Log-Gabor + PCA 87.83
Log-Gabor + ICA 92.83
feature elements taken by PCA and ICA is 60, and the classification method uses top-match. Comparing the second, third, and fourth rows in Table 3.6 with the first row respectively, the effects of lighting, glasses, and expression on the face recognition can be obtained. Comparing the fifth row with the first row, the influence under joint interference can be obtained. According to the analysis in the table, the performance of the Log-Gabor+ICA algorithm is best in both single-sensor and visible-to-infrared face. Even so, under the influence of light, the recognition rate of the visible light face is 87.83%; under the influence of glasses, the recognition rate of the infrared face is 90%; under the joint and interference, the recognition rates of visible light and infrared face are 83.33% and 85.83%, respectively. Under all test conditions, the performance of the proposed algorithm is higher than 90%, which is obviously better than the face recognition algorithm under the single sensor. Table 3.7 compares the face recognition performance after feature-level fusion using different face features (PCA, ICA, Log-Gabor) under test-all conditions (i.e., under combined interference conditions). After the feature extraction of the multisource face, genetic algorithm (GA) was used for fusion processing. The experimental results show that the independent Log-Gabor feature of the algorithm proposed in this chapter achieves better recognition performance than the fusion of other face features.
3.6
Summary
Image feature-level fusion belongs to the intermediate level of image fusion. This is used to comprehensively analyze and process the multiple characteristic information obtained by multi-sensor to realize the classification, collection, and synthesis of multi-sensor data. In general, the extracted feature information should be a sufficient representation of the pixel information and sufficient statistics, including the edge of the target, texture, and regional characteristics. This chapter discusses multiresolution gradient features (corresponding to edge information), texture features, and multi-resolution fuzzy region features and their corresponding fusion algorithms from the perspective of multi-resolution transform space. The multi-resolution method of the image is an important analytical tool of signal processing and an important theoretical tool of image fusion. This transformation method generally includes two major branches of theory: one is multi-resolution transformation method based on pyramid transformation, and another is multiresolution transformation method based on wavelet. This chapter studies pyramid-
146
3 Feature-Level Image Fusion
based transformation methods and proposes a multi-scale transformation method based on texture and gradient features. The transform method displays the salient features of the image, such as texture and edge information, as much as possible in the transform coefficients, so that the fused image has the feature information of each scale and direction of the original image. When performing multi-scale image fusion based on texture and gradient features, the image is first decomposed into sub-band images with different features according to the texture filter and the gradient filter, and then the sub-band images are fused according to the contrast-based fusion rule so as to obtain a new set of fused sub-band images. Finally, the fused image is obtained by using the multi-scale inverse transform. The fusion results based on the Laplacian pyramid transform, the fusion method based on the FSD pyramid transform, and the pyramid transform based on the gradient feature are compared to show the effectiveness of the image fusion algorithm based on the joint texture and the gradient feature. Image fusion algorithm based on fuzzy region features focuses on the study of image fusion rules. So far, image fusion rules are still very important research topics in image fusion. The quality of the fusion rules directly affects the speed and quality of the fusion image. Therefore, the fusion rules are very important in the image fusion method. In this chapter, different sensor image fusion methods based on fuzzified regional features are also arranged by ranking the importance of regions, and the region attributes are classified according to the features of regions. Different attributes of the region are used to fuse the images in the fuzzy space, and the contrast of the image is improved while preserving the regional consistency of the important region and the background region. Experiments show that this method can obtain a better fusion image, and its fusion result is better than the region-based fusion method of Zhang et al. [12] and the region-based fusion method of Piella et al. [13]. The selection of regional features and the fuzzy membership function in the image are crucial to the fusion performance. Therefore, the suitable regional features and the fuzzy membership function should be designed according to the different applications and the imaging characteristics of the original image.
References 1. M.N. Do, M. Vetterli, Frame reconstruction of the Laplacian pyramid, in 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221), vol. 6, (IEEE, Piscataway, NJ, 2001), pp. 3641–3644 2. D.R. Barron, O.D.J. Thomas, Image fusion through consideration of texture components. Electron. Lett. 37(12), 746–748 (2001) 3. A. Toet, L.J. Van Ruyven, J.M. Valeton, Merging thermal and visual images by a contrast pyramid. Opt. Eng. 28(7), 287789 (1989) 4. C.A. Xydeas, V. Petrovic, Objective image fusion performance measure. Electron. Lett. 36(4), 308–309 (2000) 5. G. Qu, D. Zhang, P. Yan, Information measure for performance of image fusion. Electron. Lett. 38(7), 313–315 (2002)
References
147
6. P.J. Burt, The pyramid as a structure for efficient computation, in Multiresolution Image Processing and Analysis, (Springer, Berlin, 1984), pp. 6–35 7. Y.T. Zhou, Multi-sensor image fusion, in Proceedings of 1st International Conference on Image Processing, vol. 1, (IEEE, Piscataway, NJ, 1994), pp. 193–197 8. P.J. Burt, R.J. Kolczynski, Enhanced image capture through fusion, in 1993 (4th) International Conference on Computer Vision, (IEEE, Piscataway, NJ, 1993), pp. 173–182 9. H. Li, B.S. Manjunath, S.K. Mitra, Multisensor image fusion using the wavelet transform. Graph. Models Image Process. 57(3), 235–245 (1995) 10. I. Koren, A. Laine, F. Taylor, Image fusion using steerable dyadic wavelet transform, in Proceedings, International Conference on Image Processing, vol. 3, (IEEE, Piscataway, NJ, 1995), pp. 232–235 11. Y. Chibani, A. Houacine, On the use of the redundant wavelet transform for multisensor image fusion, in ICECS 2000. 7th IEEE International Conference on Electronics, Circuits and Systems (Cat. No. 00EX445), vol. 1, (IEEE, Piscataway, NJ, 2000), pp. 442–445 12. Z. Zhang, R.S. Blum, Region-based image fusion scheme for concealed weapon detection, in Proceedings of the 31st Annual Conference on Information Sciences and Systems, (1997), pp. 168–173 13. G. Piella, A general framework for multiresolution image fusion: from pixels to regions. Inform. Fusion 4(4), 259–280 (2003) 14. L.A. Zadeh, Fuzzy sets. Inf. Control. 8(3), 338–353 (1965) 15. V.S. Petrovic, C.S. Xydeas, Cross-band pixel selection in multiresolution image fusion, in Sensor Fusion: Architectures, Algorithms, and Applications III, vol. 3719, (International Society for Optics and Photonics, Bellingham, WA, 1999), pp. 319–326 16. Y. He, Multi-Sensor Information Fusion with Application (Publishing House of Electronics Industry, Beijing, 2007) 17. T. Liu, Data Mining Technology and Application (National University of Defence Technology Press, Hunan, 1998) 18. A.N. Steinberg, Data fusion system engineering, in Proceedings of the Third International Conference on Information Fusion, vol. 1, (IEEE, Piscataway, NJ, 2000), pp. MOD5–MOD3 19. Y. Kang, Data Mining Theory and Application (Xidian University Press, Xi’an, 1997) 20. A. Farina, F.A. Studer, Radar Data Processing: Introduction and Tracking, vol 2 (Research Studies Press, Baldock, 1985) 21. E. Waltz, J. Llinas, Multisensor Data Fusion, vol 685 (Artech House, Boston, 1990) 22. D.L. Hall, S.A. McMullen, Mathematical Techniques in Multisensor Data Fusion (Artech House, Boston, 2004) 23. P.J. Burt, E.H. Adelson, The Laplacian pyramid as a compact image code. IEEE Trans. Commun. 31(4), 532–540 (1983) 24. A. Toet, Image fusion by a ratio of low-pass pyramid. Pattern Recogn. Lett. 9(4), 245–253 (1989) 25. B. Aiazzi, L. Alparone, F. Argenti, S. Baronti, I. Pippi, Multisensor image fusion by frequency spectrum substitution: subband and multirate approaches for a 3:5 scale ratio case, in IGARSS 2000. IEEE 2000 International Geoscience and Remote Sensing Symposium. Taking the Pulse of the Planet: The Role of Remote Sensing in Managing the Environment. Proceedings (Cat. No. 00CH37120), vol. 6, (IEEE, Piscataway, NJ, 2000), pp. 2629–2631 26. B. Aiazzi, L. Alparone, A. Barducci, S. Baronti, I. Pippi, Multispectral fusion of multisensor image data by the generalized Laplacian pyramid, in IEEE 1999 International Geoscience and Remote Sensing Symposium. IGARSS’99 (Cat. No. 99CH36293), vol. 2, (IEEE, Piscataway, NJ, 1999), pp. 1183–1185 27. P. Tian, Q. Fang, Contrast-based multiresolution image fusion. Acta Electron. Sin. 28(12), 116–118 (2000)
Chapter 4
Decision-Level Image Fusion
Abstract Image fusion can be performed at three levels: pixel level, feature level, and decision level. Among them, decision-level fusion is a high-level information fusion, which is less explored and is a hot spot in the field of information fusion. Compared to low- and middle-level fusion, high-level fusion is accurate, supports real time, and is also able to solve the disadvantages of single sensor imaging. However, the main disadvantage is the loss of information, which is high in this type of fusion. This chapter mainly introduces decision-level information fusion algorithms, including voting, Bayesian inference, evidence theory, fuzzy integral, and other specific methods. Taking the SAR and FLIR images as an example, the algorithm process and implementation of decision-level fusion are presented.
4.1
Introduction
Decision-level fusion is a high-level information fusion [1, 2]. It can be performed by following the four steps. They are: First is the multi-sensor imaging processing. Second is the decision generation. Third is the convergence in the fusion center. Final step is the concluding fusion process. In the information processing architecture, the peak of the fusion level is the decision-level fusion, also known as high-level fusion. People combine different types of current information and prior knowledge to fuse them into intelligent (best) decisions, which is a good example of decision-level fusion. In general, decision-level fusion is more perfect than the others and can better overcome the shortcomings of each sensor. For other fusion levels, the failure of one sensor means the failure of the entire system. Compared with pixellevel fusion [3, 4] and feature-level fusion [5, 6], the decision-level fusion has the best real-time performance, but the main drawback of this method is the information loss. Before the fusion, each sensor completes the goal of the decision-making. Then according to a certain fusion criteria and the credibility of each decision-making, it will make the best decision.
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2020 G. Xiao et al., Image Fusion, https://doi.org/10.1007/978-981-15-4867-3_4
149
150
4
Decision-Level Image Fusion
Decision-level information fusion algorithms include voting, Bayes inference, evidence theory, fuzzy integrals, and various other specific methods. This chapter first introduces these fusion methods and then presents the algorithm implementation of the fusion between SAR and FLIR images at decision level.
4.2
Fusion Algorithm Based on Voting Method
The voting law is conceptually very simple, similar to the democratic election in daily life, with the minority obeying the majority and passing by more than half. Therefore, voting methods (including Boolean “and”, “or”) are the simplest techniques for multisensory comprehensive labeling. Each sensor provides an input statement that observes the identity of the entity, which is then searched by voting to find a statement that “more than half the sensors agree” (or other simple decision criteria) and announce the result of the vote: a union description. Of course, sometimes it may be necessary to introduce weighting methods, thresholding techniques, and other methods of determination, thus increasing the complexity of the voting method to a certain extent. Voting is very useful when accurate prior probabilities are not available, especially for real-time blending.
4.3
Fusion Algorithm Based on D-S Evidence Theory
Among all kinds of information fusion methods, it is the characteristic of D-S evidence theory to carry on uncertainty reasoning. It uses the information collected by the sensor as evidence to establish a corresponding basic credibility in the decision-making target set. In this way, evidence reasoning can use the Dempster consolidation rules to combine different pieces of information into a unified message representation within the same decision framework. Evidence decision theory allows the credibility to be given directly to sensor information, which avoids simplifying assumptions about unknown probability distributions and preserves information. These advantages of evidence reasoning make it widely used in the multiinformation fusion of various intelligent systems. D-S evidence theory uses a recognition framework Θ to represent the set of topics of interest, which defines a set of functions m : 2Θ ! [0, 1]
4.3 Fusion Algorithm Based on D-S Evidence Theory
ð 1Þ ð 2Þ
151
mðΦÞ ¼ 0 X mðAÞ ¼ 1
ð4:1Þ
A⊆Θ
We can call m as the basic credibility distribution of the recognition framework. If A belongs to the recognition framework, then m(A) is called the basic credible number of A. The basic credible number reflects the credibility of A itself. For any set of propositions, the D-S evidence theory proposes the concepts of credibility (Bel) and plausibility (Pl) functions PlðAÞ ¼ 1 Bel A X BelðAÞ ¼ mðBÞ
ð4:2Þ ð4:3Þ
B⊆A
(Bel(A), Pl(A)) can be used to describe the uncertainty of A. Given several pieces of credibility function based on different evidences in the same recognition frame and using Dempster’s synthesis method, the credibility function produced by the combined effect of different evidences can be obtained. Suppose that Bel1 and Bel2 are two independent evidence-based credibility functions in the same recognition frame, m1 and m2 are their corresponding basic credibility distributions, respectively, and focal components are A1, A2, Ak and B1, B2, Bl. By the Dempster synthesis rule, we can get
mðAÞ ¼
8 > > > < > > > :
m 1 ðA i Þ m 2 B j Ai \B j ¼A A 6¼ Φ P 1 m1 ðAi Þ m2 B j P
ð4:4Þ
Ai \B j ¼Φ
0
A¼Φ
where i ¼ 1,2,3. . .k; j ¼ 1,2,3. . .l. Evidence theory is an algorithm based on approximate reasoning. Compared with other pattern recognition algorithms, its algorithm is simple and effective, which can make better use of people’s experience knowledge for pattern recognition. The Dempster–Shafer method is a promotion of the Bayesian decision-making test. Evidence theory satisfies a weaker axiom than probabilistic theory and shows great flexibility in distinguishing between evidence gathering processes of uncertainty.
152
4.4
4
Decision-Level Image Fusion
Fusion Algorithm Based on Bayes Inference
The Bayesian theory was published by Thomas Bayes in 1763. Its basic principle is that given a priori likelihood estimation of a hypothesis, the Bayesian method can update the hypothetical likelihood function with the advent of new evidence (observational data). Bayesian inference is an important method for dealing with stochastic patterns. Therefore, many scholars devote themselves to the research of information fusion methods based on Bayesian decision theory for different application backgrounds and various methods [7, 8]. Assume that X is a group of information sources, X ¼ {x1, x2, , xR}, and the target is judged to ωj according to the maximum posteriori probability (MAP). Z ! ωj If P ω j jx1 , , xR ¼ max Pðωk jx1 , , xR Þ k
ð4:5Þ
According to Bayesian theory, the maximum posterior probability (MAP) can be expressed as Pðωk jx1 , , xR Þ ¼
pðx1 , , xR jωk ÞPðωk Þ pðx1 , , xR Þ
ð4:6Þ
p(x1, , xR) is the joint probability density, which can be expressed as pðx1 , , xR Þ ¼
m X p x1 , , xR ω j P ω j
ð4:7Þ
j¼1
Assuming that information sources are statistically independent, we get pðx1 , , xR jωk Þ ¼
R Y i¼1
By the formula (4.6), (4.7), (4.8), available
p ð x i j ωk Þ
ð4:8Þ
4.5 Fusion Algorithm Based on Summation Rule
153
Pðωk Þ Pðωk jx1 , , xR Þ ¼
R Q
p ð x i j ωk Þ
i¼1 R Q
m P P ωj j¼1
p xi ω j
ð4:9Þ
i¼1
Substituting (4.9) into (4.5), we can get the following decision rule Z ! ωj If R R Y Y P ωj p xi ω j ¼ max Pðωk Þ p ð x i j ωk Þ k
i¼1
ð4:10Þ
i¼1
Using the posterior probability of each sensor, we can draw R R Y Y PðR1Þ ω j P ω j jxi ¼ max PðR1Þ ðωk Þ P ð ωk j x i Þ i¼1
k
ð4:11Þ
i¼1
The Bayesian approach is actually a special case of D-S evidence theory, so all data fusion problems using the Bayesian approach can be replaced by D-S evidence theory. Evidence theory can well describe the actual problem of decision fusion. However, there are also many documents that question the condition of evidence independence in Dempster’s synthesis rule. At the same time, Dempster’s synthesis rule has the problem of an exponential explosion. The implementation of this algorithm is a key issue. With the increasing demands on the performance of data fusion systems, it is not enough to rely solely on a certain method, and the joint use of multiple methods will become a development trend. Through various methods to learn from each other, we will receive satisfactory results.
4.5
Fusion Algorithm Based on Summation Rule
The summing rules and the rules of maxima and minima to be introduced later can all be regarded as the evolution of Bayesian inference and play an important role in the practical application [9]. Not only they simplify calculations, but also they work well. From the above description, we get Bayesian reasoning Z ! ωj If
154
4
Decision-Level Image Fusion
R R Y Y PðR1Þ ω j P ω j jxi ¼ max PðR1Þ ðωk Þ P ð ωk j x i Þ k
i¼1
ð4:12Þ
i¼1
Assume that X is a group of information sources, X ¼ {x1, x2, , xR}, and the target is judged to ωj according to the maximum posteriori probability (MAP). Take the logarithm of the left and right ends of (4.12) to get the summation rule of decision fusion Z ! ωj If R R X X ð1 RÞP ω j þ P ω j jxi ¼ max ð1 RÞPðωk Þ þ Pðωk jxi Þ k
i¼1
4.6
ð4:13Þ
i¼1
Fusion Algorithm Based on Min-Max Rule
Bayesian inference and summation rules construct the basic framework of sensor fusion. According to the following formula, we can deduce other fusion strategies R Y i¼1
R
Pðωk jxi Þ min Pðωk jxi Þ i¼1
R R 1X Pðωk jxi Þ max Pðωk jxi Þ R i¼1 i¼1
ð4:14Þ
Equation (4.14) shows that Bayesian inference, and summation rules can be approximated by the upper bound and the lower bound.
4.6.1
Maximum Rules
According to the summing rule and formula (4.14), we can get the maximum rule Z ! ωj If
4.6 Fusion Algorithm Based on Min-Max Rule
155
R R ð1 RÞP ω j þ R maxi¼1 P ω j jxi ¼ max ð1 RÞPðωk Þ þ R max Pðωk jxi Þ i¼1
k
ð4:15Þ Assuming the same probability of occurrence in any one class, we get Z ! ωj If R R max P ω j jxi ¼ max max Pðωk jxi Þ i¼1
4.6.2
k
i¼1
ð4:16Þ
Minimum Rules
According to Bayesian inference and formula (4.14), we can get the minimum rule Z ! ωj If R R PðR1Þ ω j min P ω j jxi ¼ max PðR1Þ ðωk Þ min Pðωk jxi Þ i¼1
i¼1
k
ð4:17Þ
Assuming the same probability of occurrence in any one class, we get Z ! ωj If R R min P ω j jxi ¼ max min Pðωk jxi Þ i¼1
k
i¼1
ð4:18Þ
156
4.7
4
Decision-Level Image Fusion
Fusion Algorithm Based on Fuzzy Integral
Fuzzy integrals have the ability to incorporate the importance of multi-source information (fuzzy measures) and the objective evidence (h-functions) provided by each source. Fuzzy integral is a nonlinear function defined on the basis of fuzzy measure, which has the ability to fuse multi-source information. Fuzzy measures and fuzzy sets are two different concepts. Fuzzy sets reflect the degree of a known element subordinate to a collection that does not have a distinct boundary. However, fuzzy measures consider the degree of trust or the probability that an undetermined element belongs to a (fuzzy or non-ambiguous) set. In the mid-1970s, Japanese scholar Sugeno extended the classical probability measure, replacing the additive conditions in classical probability with monotonicity with weak constraint and proposing the concept of fuzzy measure. Suppose X is a group of information sources, P(X) is the power of X, g is a Sugeno fuzzy measure of X. In this case, g satisfies 1. Bounded conditions: g(Φ) ¼ 0, g(X) ¼ 1. 2. Monotonicity: 8A, B 2 P(X), if A ⊆ B
g(A) g(B).
3. Continuity: if 8Ai 2 P(X) and fAi g1 is monotonous, lim g ð A Þ ¼ g lim A i i . i¼1 i!1
i!1
4. Suppose 8A, B 2 P(X), A \ B ¼ Φ, we can get gðA [ BÞ ¼ gðAÞ þ gðBÞ þ λgðAÞ gðBÞ,
λ > 1
5. Suppose gi ¼ g({xi}) is a fuzzy density function, we can get
λþ1¼
n Y
1 þ λgi
ð4:19Þ
i¼1
Assuming hk(xi) is evidence that the target belongs to Ck from source xi, the fuzzy integral can be expressed as Z A
hk ðxÞ∘gðÞ ¼ sup min min hk ðxÞ, gðA \ E Þ E⊆X
x2E
¼ sup ½ min ðα, gðA \ F α ÞÞ α2½0, 1
ð4:20Þ
F α ¼ fxjhk ðxÞ αg Consider two sources x1 and x2, and then assume hk(x1) hk(x2). (If not, rearrange the sources of information.) Then fuzzy integrals can be expressed as
4.7 Fusion Algorithm Based on Fuzzy Integral Visible image
ICA
157
SVMs Fuzzy Integral
IR image
ICA
Result
SVMs
Feature Extraction
Classification
Decision Fusion
Fig. 4.1 Flowchart of the fusion recognition scheme
ek ¼ max ½ min ðhk ðx1 Þ, gðA1 ÞÞ, min ðhk ðx2 Þ, gðA2 ÞÞ
ð4:21Þ
A1 ¼ fx1 g, A2 ¼ fx1 , x2 g When using fuzzy integral for fusion, the importance gi of the sensor can be subjectively determined by the experts, but also can be given by the specific data. In the experiment, we assume the classified success rate of each sensor as a fuzzy measure [10]. Experimental results show that using fuzzy integral is an effective fusion algorithm for fuzzy decision. The improvement of the fusion system performance comes from the mutual compensation between the sensors, which has roughly the same division area for each sensor. Multi-sensor fusion cannot significantly improve the system performance. The superiority of data fusion lies in the fact that the sensors can compensate each other. Figure 4.1 depicts a fusion recognition framework based on fuzzy integrals. Assuming that each face corresponds to a pair of face images at the same moment, one is a visible face and the other is an infrared face, which are acquired synchronously and undergo strict registration.
4.7.1
ICA Feature Extraction
Using independent component analysis (ICA) [11] to analyze the face image and then extract the features of visible and infrared face images respectively. In order to reduce the computational complexity, the signal needs to be dimensionally reduced before performing ICA processing. Therefore, the use of ICA means that the PCA dimension reduction must be performed before. Assuming that the vector representation of face image is Γ, the low-dimensional feature vector y can be solved by PCA, and then the ICA feature representation vector z of face image can be obtained by using the obtained conversion matrix F, as shown in the following formula: y ¼ U T ðΓ ΨÞ ¼ Fz
ð4:22Þ
where U is the Eigen face space constructed from the face database and Ψ is the average of the face database.
158
4.7.2
4
Decision-Level Image Fusion
SVM Classification
The vector z output from the ICA is entered into the SVMs for classification. Before using SVMs, we first need to train the data in the following steps: proportionally adjusting, selecting kernel functions, finding the best parameter values using crossvalidation, and training SVMs. Of course, in the use of SVMs, the test data should also be with the same method of proportional adjustment. Here, the kernel function chooses radial basis function (RBF). 2 K xi , x j ¼ exp γ xi x j ,
γ>0
ð4:23Þ
Thus, there are two parameters for SVMs: C and γ. Each attribute of training data and test data is proportionally adjusted to a range of [1, +1]. The best parameter values can be found by cross-validation: C ¼ 1, γ ¼ 0.01. In this work, the sigmoid model is used to complete the mapping from the standard output of SVMs to the identification of matching degrees [12].
4.7.3
Decision Fusion with Fuzzy Integral
In the fusion recognition, assume that x1 represents a visible light sensor and x2 represents an infrared sensor. The fuzzy density gi ¼ g({xi}) is obtained from the statistical analysis of the recognition rate of the sensor xi. In this way, fuzzy integrals Fk can be expressed as follows: Fk ¼
max ð min ðSkV , g1 Þ, SkI Þ max ð min ðSkI , g2 Þ, SkV Þ
SkV SkI else
ð4:24Þ
where SkV and SkI are the categorical evidence (visible matching degree) obtained from the visible face image and the infrared face image, respectively. If the fuzzy integral is the largest, the test face Z will be classified as ωj. Z ! ω j,
F j ¼ max F k : k
ð4:25Þ
4.8 Label Fusion for Segmentation Via Patch Based on Local Weighted Voting
4.8 4.8.1
159
Label Fusion for Segmentation Via Patch Based on Local Weighted Voting Introduction
Segmentation is one of the fundamental problems in biomedical image analysis. The traditional approach to segment a given biomedical image involves the manual delineation of the regions of interest (ROI) by a trained expert. However, due to this highly labor-intensive process and its poor reproducibility, it is often desirable to have accurate automatic segmentation techniques. Early segmentation algorithms mainly dealt with tissue classification, in which local image intensity contains important information. However, these algorithms cannot guarantee accuracy. Generally, multi-atlas segmentation sometimes needs to select a subset of best atlases for a given target image based on a certain predefined measurement of anatomical similarity. It also consists of the following two steps: first, in the registration step, all selected atlases and their corresponding label maps are aligned to the target image. Second, the label fusion step, where the registered label maps from the selected atlases are fused into a consensus label map for the target image by proper algorithms. The label fusion methods based on multi-atlas can effectively solve the problem of segmentation using a priori knowledge without artificial interference, and finish the image segmentation of specific organization automatically with high accuracy. A novel patch-driven level set method for label fusion takes advantage of a probabilistic model and locally weighted voting scheme. First, the patches of the target image and training atlases are extracted. Second, probabilistic models of label fusion are built based on the patch. Bayesian inference is utilized to extend the popular method (local weighted voting). When calculating the label prior, we analyze the label fusion procedure concerning the image background and regard it as an isolated label. The Kronecker delta function is employed as the model of the label prior.
4.8.2
Method
Generally, we assume that similar patches share the same label. That is to say, if we extract patches from the target image and training scans and find they share similar image intensity, we will deem they have the same label at the same location of voxels. Based on this assumption, the probabilistic model between the target patch and the training atlas patches is established. Then the labels from the training data are propagated to the target image by the label fusion segmentation algorithm of local weighted voting. For each voxel in the test image, its intensity patch can be taken from the ww w w w w neighborhood. Its patch dictionary can be adaptively built
160
4
Decision-Level Image Fusion
from all N aligned atlases as follows. First, let NnðxÞN nx be the neighborhood of voxel x in the nth atlas, with the neighborhood size as wp wp wp (called patch area here). Then, for each voxel 2n(x), we can obtain its corresponding patch from the nth atlas, i.e., a ww w w w w w w w dimensional column vector. By gathering all these patches from wp wp wp neighborhoods of all N training atlases, we can build a patch dictionary, which contains M training patches, from all N aligned atlases, where M ¼ (wp)3 N. In the label fusion step, local weighted voting strategy in this chapter is used to segment the target image. We build the probabilistic model for intensity prior and label prior and use Bayesian inference to derive the segmentation algorithm. The specific process of the proposed algorithm is shown in Fig. 4.2. 1. Intensity prior A voxel from a target image with a certain intensity value can be treated as a weight to perform local weighted voting scheme. Intensity prior describes a probability where a voxel belongs to a certain training image patch or a likelihood. We adopt a Gaussian distribution for intensity information between the patch of the target image and the patch dictionary from training subjects as the intensity prior, which can be written as follows: h i 1 1 pm ðI ðxÞ; I m Þ ¼ pffiffiffiffiffiffiffiffiffiffi exp 2 ðI ðxÞ I m ðxÞÞ2 2σ 2πσ 2
ð4:26Þ
where σ is the standard deviation of the Gaussian distribution, I(x) and Im(x), m ¼ {1, 2, . . ., M} m 2 {1, 2, . . ., M}, respectively, represent intensity patch of the target image and the mth training patch at location x. 2. Label prior Label prior is a kind of metric to describe a possibility that a voxel at a certain location belongs to a certain label. If only considering these labeled anatomical structures (regions of interest, ROI) as competitors to be involved in label fusion procedure, we will more likely get the specific label, even if it should be background. Based on a large number of our experiments, the Kronecker delta function proves to be more suitable to be label prior, where the function is 1 if the label from automatic segmentation method at a certain location and manual label in training label maps at the same position are equal, and 0 otherwise. However, the Kronecker delta function is rarely used as label prior to most of the label fusion segmentation methods based on patch and the probabilistic model. Here, we define other anatomical regions (not ROIs) to be the background label. Namely, while estimating the label prior, the background should be taken into account as an isolated label. Its value is set at 0 so that the background has the same privilege as the other labels. The Kronecker delta function is used for label prior as follows:
patch 1 patch 2
Ă
Ă Ă
Ă
Ă
Ă
Ă
Ă
Ă
patch Mth
Extract Extract Patch Patch
ANTs Registration
Atlas Nth
Label Label Fusion Fusion (Local (Local Weighted Weighted Voting Voting strategy) strategy))
3 patch ((wp) ×2) th
Extract Extract Patch Patch
ANTs Registration
Atlas 2
Probabilistic Probabilistic Model: Model: (Intensity (Intensity prior, prior, Label Lab a el prior) prior)
patch (wp)3 th
Extract Extract Patch Patch
ANTs Registration
Atlas 1
Fig. 4.2 The process of label fusion algorithm via patch based on probabilistic model and local weighted voting. We extract patches from the target image and training scans and then establish the probability model between the target patch and the training atlases patches. Finally, the segmentation result can be obtained through the proposed label fusion algorithm
Target patch
Extract Extract Patch Patch
Target Target Image Image
4.8 Label Fusion for Segmentation Via Patch Based on Local Weighted Voting 161
162
4
Pm ðSðxÞ ¼ l; Lm ðxÞÞ ¼
Decision-Level Image Fusion
1 ðδðSðxÞ ¼ lÞÞ Z m ð xÞ
ð4:27Þ
where S(x) denotes estimation of the label at location x and Lm represents the P training patches label maps l 2 {0, labels of target}. Z m ðxÞ ¼ ℒ l¼0 δðSðxÞ ¼ lÞ PL Z m ðxÞ ¼ l¼0 δðSðxÞ ¼ lÞ is the partition function that makes the probability pm(S(x) ¼ l; Lm) between zero and one at location x. ℒ is the total number of labels including the background label. δ(∙) refers to Kronecker delta. The function is 1 if the variables are equal, and 0 otherwise δðSðxÞ, lÞ ¼
0, 1,
if SðxÞ 6¼ l : if SðxÞ ¼ l
ð4:28Þ
3. Label fusion Through a patch-based process, we may assume that the voxels are independent identical distribution, so that the label on every location can be estimated separately. The label fusion scheme becomes a classical local weighted voting scheme, except for the intensity prior and the minor modification of the label prior. The segmentation is formulated as b SðxÞ ¼ arg max pm ðSðxÞjI ðxÞ; fLm , I m gÞ SðxÞ
¼
arg max
M X
l2f0, labels of targetg m¼1
ð4:29Þ pm ðI ðxÞjI m Þpm ðSðxÞ ¼ l; Lm Þ:
The Bayesian condition and marginal distribution are used in equation (4.29). Then, we make the following corrections b Sð x Þ ¼
arg max
M X
l2f0, labels of targetg m¼1
pm ðI ðxÞ; I m Þpm ðSðxÞ ¼ l; Lm Þ
ð4:30Þ
where pm(I(x); Im) serves as weights and label prior values serve as votes. It is well known that the local weighted voting strategy is based on voxel, and the neighborhood of voxel x has no impact on the central point in one patch. So finally we take the center location label value from the labeled patch as the segmentation result and put it in the location x on the whole label image. In the program, we unfold the 3D patch matrix into a 1D array, and the position (ww w w w w 1)/2 + 1(w w w 1)/2 + 1 corresponds to the center position. It is not difficult to see that when a patch area becomes a 1 1 1 voxel, the proposed label fusion algorithm can become the local weighted voting strategy without the patch.
4.8 Label Fusion for Segmentation Via Patch Based on Local Weighted Voting
4.8.3
163
Experiments and Results
1. Dataset The experiments employ 20 brain MRI scans and corresponding label maps (as illustrated in Fig. 4.3), which were selected from a large dataset. We note that the 20 brain scans were resampled to 1 mm isotropic resolution, skullstripped, bias field-corrected, and intensity-normalized with FreeSurfer. The intensity normalization is necessary because it enables us to directly compare image intensities (even across datasets). The scans were then pairwise registered with the software package ANTs. We used the default parameters for the deformation model, the number of iterations (20 30 50), and cost function (neighborhood cross-correlation). The MRI images are of dimensions 256 256 256. In training label maps, each one has been labeled for 45 anatomical regions by experts. But nine anatomical regions of interest in two of the hemispheres we used are white matter (WM), cerebral cortex (CT), lateral ventricle (LV), hippocampus (HP), thalamus (TH), caudate (CA), putamen (PU), pallidum (PA), and amygdala (AM). During experiments, considering both reasonable computational processing time and segmentation accuracy, we use union sets along the ROIs of all training label maps to fetch voxels of ROIs to segment the target image. The remaining 36 anatomical regions will be treated as background in label fusion. 2. Comparison of methods In order to verify our theoretical analysis, massive experiments in the context of multi-atlas label fusion are performed to compare our proposed algorithm with a variety of related methods. We use a volume overlap measure known as the Dice score to quantitatively assess the accuracy of segmentation. The evaluation criterion is of the form
Fig. 4.3 2D slices are shown for visualization. The left is the intensity image of the human brain and the right is the corresponding label map
164
4
DS ¼ 2
V ðSman \ Sauto Þ V ðSman Þ þ V ðSauto Þ
Decision-Level Image Fusion
ð4:31Þ
where Sauto is an automatic segmentation of the target image, Sman refers to manual segmentation, V denotes the region of segmentation. The range of DS is between 0 and 1, with 1 meaning a perfect consistency volume overlap between the two segmentations. In the experiments, all methods were implemented to segment brain MRI scans automatically. With the exception of FreeSurfer and Joint fusion, we all concern the image background and take it as an isolated label while estimating label prior to the Kronecker delta function in the remaining methods. Details are as follows: • FreeSurfer We directly use the FreeSurfer software package (http://www.freesurfer.net/) to segment all anatomical regions of the test images one by one. It does not rely on the preprocessing step of pairwise registration compared with the following label fusion methods. • Joint fusion (JF) In this technique [13], weighted voting is formulated in terms of minimizing the total expectation of labeling error and in which pairwise dependency between atlases is explicitly modeled as the joint probability of two atlases making a segmentation error at a voxel. • Majority voting (MV) Majority voting can be seen as the most likely labeling in a probabilistic model in which the segmentation S(x) is sampled randomly from one of the N atlases. • Local weighted voting (LWV) In this label fusion strategy, it takes advantage of the image intensities of the deformed atlases which can improve segmentation quality. Compared with the proposed algorithm, the difference is that the processing is not at patch level. • Majority voting based on patch (MVP) The method does not consider the intensity of the image, but it computes label prior to patch level which is an improvement of MV. • Local weighted voting based on patch (LWVP) As Sect. 4.8.2 described, the probabilistic models of intensity prior and label prior are established based on the patch. We assume the similar patches share the same label and build a patch dictionary between the target image and training data. Then label fusion will be processed on M training patches instead of N training subjects. We also investigate another representation to define the label prior term using LogOdds model based on SDM [14]. However, it is a compromise between segmentation accuracy and processing time, which can reduce large running time but will cause lower-accuracy without regard to the background. The Kronecker delta function is used to build label prior and make background analysis in this chapter.
4.8 Label Fusion for Segmentation Via Patch Based on Local Weighted Voting
165
1
0.95
Dice
0.9
0.85
0.8
0.75
0.7
1*1*1
3*3*3 Patch area
5*5*5
Fig. 4.4 Dice scores of LWVP scheme according to the patch size (w ¼ 7) and patch area (wp ¼ 1, 3, 5). The segmentation results are obtained with N ¼ 2 in a leave-one-out way. On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, and the whiskers span the extreme data points not considered as outliers (which are marked with red crosses)
3. Parameter optimization The four mentioned label fusion methods (except FreeSurfer and JF) were used to segment the scans in a leave-one-out manner, where the test subject was left out of the atlas set. To determine the optimal parameter σ in the intensity prior probability, we tried many values, from 1 to 10. In LWV and LWVP, by contrast, we found that when σ ¼ 5 the segmentation result is the best. The other parameter setting and optimization are as follows. 4. Impact of the patch size and patch area In MVP and LWVP label fusion schemes, we studied the impact of the patch size and patch area on segmentation accuracy. In one patch, the neighborhood of voxel x has no impact on the central point, because the strategies of local weighted voting and majority voting are both based on voxel. During experiments, the patch size is set as 7 7 7 voxels (namely w ¼ 7). In the LWVP scheme, we set N ¼ 2, and when a patch area of 3 3 3 voxels is chosen (namely wp ¼ 3), segmentation precision is at its best (as illustrated in Fig. 4.4). The optimal patch area seems to reflect the complexity of the anatomical structure. As can be seen from Fig. 4.5, when we set N ¼ 19 and wp ¼ 3, the accuracy of segmentation results will likewise be at their best. 5. Impact of the number of training scans N Segmentation accuracy was studied from N ¼ 2 to N ¼ 19. The results of the LWV method are presented in Fig. 4.5 which reports the average Dice score as a
166
4
Decision-Level Image Fusion
1
0.95
Average Dice
0.9
0.85
0.8
0.75
0.7
5
10 15 Number of training scans (N)
20
Fig. 4.5 The average Dice scores for LWV method as a function of the number of training scans. It is obtained with σ ¼ 5, w ¼ 7, and wp ¼ 3 in leave-one-out way and reaches 90.8% when all 19 subjects are used
(a)
(b)
Fig. 4.6 (a) Sagittal slice of a segmentation which only includes 9 ROIs in the left hemisphere using the proposed algorithm. (b) 3D rendering of the segmentation
function of N. As expected, the segmentation accuracy can be improved by increasing the number of selected training subjects. In other experiments, we find the rest of methods have the same regularity. So the remaining results here are based on N ¼ 19. 6. Results We segmented the brain MRI images in a leave-one-out cross-validation fashion and by label fusion methods which were described in Sect. 4.2. Figure 4.6 shows the automated segmentation of a scan on the nine ROIs in the left hemisphere
4.8 Label Fusion for Segmentation Via Patch Based on Local Weighted Voting
167
1 0.95 0.9 0.85
Dice
0.8 0.75 0.7 0.65 0.6 0.55 0.5
WM
CT
LV
HP
TH
CA
PU
PA
AM
Fig. 4.7 Dice scores for the three methods: LWV-S (green), LWV-K (blue), LWVP (red). On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, and the whiskers span the extreme data points that are not considered outliers (which are marked with red crosses)
using LWVP. We first show the sagittal slice of the result in Fig. 4.6a. 3D rendering of the segmentation is vividly shown in Fig. 4.6b. Some anatomical regions in relation to Alzheimer’s disease, especially hippocampus, can be seen more clearly. Kronecker delta in LWV (represented for LWV-K) and LWVP. In this process, the image background is made analysis and taken as an isolated label in the latter two methods. Figure 4.7 shows box plots for the Dice scores for the nine ROIs in the left hemisphere. Our method based on patch produced more accurate results. The mean Dice scores of LWV-K are almost closed to the one in LWV-S and are even better in WM structure. But the LWV-K takes less running time than the LWV-S in program processing. Furthermore, it proves that making the background analysis is necessary, and the Kronecker delta function is more suitable for label prior. We also studied other methods to segment brain MRI scans, such as FreeSurfer, JF, MV, and MVP. FreeSurfer segments whole-brain anatomical regions, and the remaining label fusion algorithms only yield segmentation of ROIs. Table 4.1 and Fig. 4.8 report the average Dice scores achieved by all algorithms for segmentation accuracy of the nine ROIs in the left hemisphere. Establishing label prior, we adopt SDM in LWV (represented for LWV-S), and to evaluate the performance of the proposed method, we measured the Dice scores for the manual and automated segmentation for each ROI. LWVP, a novel patch-driven level set method, yields the most accurate segmentation in all ROIs but the pallidum (PA). LWV gets better results than other methods including FreeSurfer which is a state-of-the-art whole-brain segmentation tool. The segmentation of LWV and LWVP clearly benefits from the
168
4
Decision-Level Image Fusion
Table 4.1 Average Dice scores for each ROI corresponding to the five methods FS JF MV LWV MVP LWVP
WM 0.943 0.940 0.922 0.952 0.902 0.958
CT 0.890 0.863 0.827 0.892 0.806 0.906
LV 0.935 0.921 0.903 0.935 0.899 0.941
HP 0.864 0.874 0.840 0.878 0.835 0.883
TH 0.815 0.924 0.920 0.925 0.916 0.925
CA 0.904 0.896 0.871 0.909 0.863 0.915
PU 0.898 0.910 0.901 0.913 0.898 0.916
PA 0.760 0.874 0.877 0.878 0.873 0.873
AM 0.825 0.851 0.840 0.851 0.838 0.856
“FS” represents FreeSurfer. Boldface font indicates best scores 1 0.95 0.9 0.85
Dice
0.8 0.75 0.7 0.65 0.6 FS
0.55 0.5
WM
CT
LV
HP
JF TH
MV CA
MVP PU
LWV PA
LWVP AM
Fig. 4.8 Dice scores corresponding to the mentioned methods: FreeSurfer (dark green), JF (purple), MV (light green), MVP (black), LWV (blue), LWVP (red). On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, and the whiskers span the extreme data points that are not considered outliers (which are marked with red crosses)
additional use of local intensity information. The difference between FreeSurfer and the above other methods is that pairwise registration of the latter is based on ANTs which is a maturity algorithm and can improve segmentation accuracy. For this reason, in the TH, PU, PA, and AM anatomical ROIs, the mean Dice scores for FreeSurfer are lower than other methods. Joint fusion, a label fusion strategy with Kronecker delta function as label prior, yields better segmentation accuracy than MV and MVP, but worse than LWV and LWVP. Thus the experimental results demonstrate that it is necessary to take the background as an isolated label when the label prior is built. The superiority of the proposed algorithm based on the patch is apparent. Compared with FreeSurfer, joint fusion obtains higher average Dice scores in HP, TH, PU, PA, and AM. In addition, we note that the results of MV and MVP are worse than those in the weighted label fusion methods. The reason is that the majority of voting strategies ignore the image intensity which contains a significant amount of the
References
169
relevant information. Also, we find MVP performs slightly worse than MV. This might be due to the conflict between patch level and the discard of image intensity information in MVP, which has an influence on the accuracy of the segmentation.
4.9
Summary
Decision-level fusion is a high-level information fusion, which is a hot spot in the field of information fusion. High-level fusion than other low-level fusion is more perfect, better real-time, and can better overcome the shortcomings of each sensor, but the disadvantage is the loss of information up. Before the fusion, each sensor has completed the goal of the decision-making, and then according to certain fusion criteria and the credibility of each decision-making to make the best decision. This chapter mainly introduced decision-level information fusion algorithms, including voting, Bayesian inference, evidence theory, fuzzy integral, and other specific methods. By considering the SAR and FLIR images as an example, the algorithm process and implementation of decision-level fusion are introduced.
References 1. A.H. Gunatilaka, B.A. Baertlein, Feature-level and decision-level fusion of noncoincidently sampled sensors for land mine detection. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 577–589 (2001) 2. J.A. Benediktsson, I. Kanellopoulos, Classification of multisource and hyperspectral data based on decision fusion. IEEE Trans. Geosci. Remote Sens. 37(3), 1367–1377 (1999) 3. G. Kiremidjian, Issues in image registration. IEEE Proc. SPIE Image Understand. Man Mach. Interface 758, 80–87 (1987) 4. G. Liu, Study of Multisensor Image Fusion Methods. PhD Thesis, Xidian University 5. R.C. Gonzalez, P. Wintz, Digital Image Processing (Addison-Wesley, Reading, MA, 1977) 6. M.E. Ulug, C.L. McCullough, Feature and data level fusion of infrared and visual images. Proc. SPIE 3719, 312–318 (1999) 7. J. Kittler, Multi-Sensor Integration and Decision Level Fusion (The Institution of Electrical Engineers, London) 8. B. Jeon, D.A. Landgrebe, Decision fusion approach for multitemporal classification. IEEE Trans. Geosci. Remote Sens. 37(3), 1227–1233 (1999) 9. L.O. Jimenez, A. Morales-Morell, A. Cresus, Classification of hyperdimensional data based on feature and decision fusion approaches using projection pursuit, majority voting, and neural networks. IEEE Trans. Geosci. Remote Sens. 37(3), 1360–1366 (1999) 10. M. Petrakos, J.A. Benediktsson, I. Kanellopoulos, The effect of classifier agreement on the accuracy of the combined classifier in decision level fusion. IEEE Trans. Geosci. Remote Sens. 39(11), 2539–2546 (2001) 11. W.K. Pratt, Digital Image Processing (Wiley, New York, 1978) 12. E.L. Hall, Computer Image Processing and Recognition (Academic Press, New York, 1979)
170
4
Decision-Level Image Fusion
13. H. Wang, J. Suh, S. Das, J. Pluta, C. Craige, P. Yushkevich, Multi-atlas segmentation with joint label fusion. IEEE Trans. Pattern Anal. Mach. Intell. 35, 611–623 (2012). https://doi.org/10. 1109/TPAMI.2012.143 14. M.R. Sabuncu, B.T.T. Yeo, K. Van Leemput, B. Fischl, P. Golland, A generative model for image segmentation based on label fusion. IEEE Trans. Med. Imaging 29(10), 1714–1729 (2010)
Chapter 5
Multi-sensor Dynamic Image Fusion
Abstract Image fusion can be performed on dynamic images along with the static images. Here, static is referred to images captured from the static platform. For example, multi-focus and remote sensing image fusion assumes source images as static. However, in real-time surveillance applications, dynamic image sequences have to be fused. Here, dynamic refers to images captured from the moving platform. In this chapter, a detailed discussion on dynamic image fusion is presented. In addition, new dynamic image fusion algorithms are also introduced. They are an image fusion algorithm based on region target detection, an improved dynamic image fusion algorithm, a new FOD recognition algorithm based on multi-source information fusion, a multi-cue mean-shift target tracking approach based on fuzzified region dynamic image fusion, and a new tracking approach based on tracking-before-fusion.
5.1
Introduction
Previous chapters have studied the fusion of multi-source still images. However, in practical applications such as target detection and identification in safety monitoring and battlefield environments, it is often necessary to fuse moving images (sequence images) from multiple sensors. This chapter studies the fusion problem of multisensor dynamic images based on multi-scale decomposition method and proposes a dynamic image fusion system, in which the sequence image super-resolution restoration and moving object detection are introduced into the multi-sensor dynamic image fusion. For multi-focus images and remote sensing multi-source images, they are not included in the scope of multi-source dynamic image fusion because they are usually static images in practical applications. The fusion of static images has been widely studied, but there are few researches on the dynamic image fusion algorithms. If the sequence images obtained by multisensor are fused directly by the static image fusion method frame by frame, the motion information of the sequence images on the time axis cannot be used to guide the image fusion process. Oliver R. et al. proposed a multi-sensor moving image fusion algorithm based on the discrete wavelet frame transformation, but the © Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2020 G. Xiao et al., Image Fusion, https://doi.org/10.1007/978-981-15-4867-3_5
171
172
5 Multi-sensor Dynamic Image Fusion
algorithm still processes the sequence images obtained by the multi-sensor according to a static image fusion method. However, utilizing the motion information in the multi-sensor sequence image is still a difficult task. In this chapter, we make a bold attempt to research the multi-sensor dynamic image fusion and propose a multi-sensor dynamic image fusion system based on multi-scale decomposition. In this system, we apply the sequence image superresolution restoration and moving target detection theory to multi-sensor dynamic Image fusion, the better use of sequence images in the time axis of motion information.
5.2
Multi-sensor Dynamic Image Fusion System
The fusion system of multi-sensor dynamic images proposed in this chapter is shown in Fig. 5.1. There are four processes in the fusion of sequence images of two sensors: sequential image fusion (sequence image super-resolution restoration), moving object detection, image multi-scale decomposition, and sequence fusion between images. The following describes each process in detail. First, in-sequence image fusion (sequence image super-resolution restoration): sequence image super-resolution restoration (or reconstruction) refers to recovering (or reconstructing) a high-resolution sequence image from a low-resolution sequence image. Many imaging systems, such as forward-looking infrared imagers and visible light cameras, are limited by the inherent array density of sensor arrays in rapidly acquiring wide-field images. The resolution of the images cannot be very high, and in practical applications, image transmission speed and image storage capacity and other factors also limit the resolution of the image enhancement; at the same time, the imaging process of the under sampling (continuous image discretization) effect in turn causes the image spectral aliasing, the resulting image quality degradation. If
Multi-scale decomposition A Sensor sequence
Super resolution
B Sensor sequence
Super resolution
Moving target detection
Multi-scale decomposition
Fig. 5.1 Multi-sensor dynamic image fusion system
Fusion of target areas
Non-target area
The fused sequence images
5.2 Multi-sensor Dynamic Image Fusion System
173
you increase the resolution of the sensor array to increase the image resolution, the cost can be very expensive or difficult to achieve. The super-resolution restoration technique can estimate high-resolution sequence images from low-resolution sequence images while eliminating additive noise and blurring caused by limited sensor array densities and point spread functions of optical imaging processes. The two images of the sensor sequence are super-resolution restoration, and two enhanced sequence images can be obtained. The following process is the resolution of these two sequences of images for processing. The second step, the moving target detection: The purpose of moving target detection is to find out the moving target area in the two sensor sequence images. The target area often contains important information that people need, such as targets for illegal intrusion detection in surveillance, tanks, and fighters on the battlefield. In image fusion, in order to preserve the complete information of the target area, we need to adopt a special fusion strategy for the target area. If the target area is detected at a certain position in the A sensor sequence image and the target area is not detected at the position in the B sensor sequence image, in order to preserve the integrity of the target information, during the fusion process, the A sensor sequence image information in the target area should be retained, and vice versa. If the target is detected at the same position in both sensor sequence images, in order to preserve the integrity of the target information, the similarity matching needs to be performed on the gray information of the two sequence images in the target area: if the similarity is higher than a certain threshold, a fusion strategy of weighted average is adopted for the fusion of the target area; otherwise, the target area with more detail information is selected as the fusion target area. The specific integration process sees the following two steps. The third step, the image multi-scale decomposition: The purpose of image multiscale decomposition is to transform the images of the two images of the sequence of images of the two sensors to obtain their respective multi-resolution representation. In this chapter, the direction pyramid frame transform or direction of the inseparable wavelet transform framework is adopted for multi-scale decomposition. This step is the basis of the image fusion between sequences in the next step. The fourth step, the image fusion between sequences: Image fusion between sequences is to fuse the coefficients of multi-scale decomposition of each frame of two sensor sequence images according to the target area information detected by moving target. The fusion processing of the target area and the fusion of the non-target area are separately performed: when the target area is fused, if the target area is detected in the sequence image of the A sensor and the target area is in the B sensor sequence image when no multi-scale decomposition coefficient of the two sensor sequence images is fused, the multi-scale decomposition coefficient of the A sensor sequence image is selected as the fused coefficient, and vice versa. If the target area is fused between two sensor sequence images, the similarity matching is first performed on the target area when the similarity is greater than a certain threshold, then the multi-scale decomposition coefficients of the two sensor sequence images in the target area are weighted averagely; otherwise, the fused coefficient is the multi-scale decomposition coefficients of the two sensor sequence
174
5 Multi-sensor Dynamic Image Fusion
image with large energy in the target area. For the fusion of non-target regions, in order to improve the efficiency of the algorithm and the fusion of high-frequency coefficients after the multi-scale decomposition, a window-based fusion strategy is adopted to obtain the coefficient similarity in the local window. If the similarity is large, weighted average processing; Otherwise, according to the local window of the energy coefficient of the coefficient of choice. For the fusion of low-frequency coefficients after multi-scale decomposition, an average processing method is adopted.
5.3
Improved Dynamic Image Fusion Scheme for Infrared and Visible Sequence Based on Image Fusion System
Improved dynamic image fusion scheme for infrared and visible sequence based on image fusion system is introduced in this section. Target detection technique is employed to segment the source images into target and background regions as the first step of the improved dynamic image fusion for infrared and visible sequence. Different fusion rules are adopted respectively in target and background regions. Two quantitative performance indexes are feedback to determine optimum weight coefficients for fusion rules. This feedback performance information improved the fusion result and effectiveness in target regions. Fusion experiments on real-world image sequences indicate that the improved method is effective and efficient, which achieves better performance than those fusion methods without feedback.
5.3.1
Introduction
The techniques of multi-source image fusion originated in the military fields and their impetuses also came from military fields. The battlefield detecting technology, based on the pivotal content of multi-source image fusion, has become one of the most important military advanced technologies, including target detection, track, and recognition and scene awareness. Image fusion is a specialization of the more general topic of data fusion, dealing with image and video data [1]. It is the process by which multi-modality sensor imageries from same scene are intelligently combined into single view of the scene with extended information content. Image fusion has important applications in the military, medical imaging, remote sensing, and security and surveillance fields. The benefits of image fusion include improved spatial awareness, increased accuracy in target detection and recognition, reduced operator workload, and increased system reliability [2]. Image fusion processing must satisfy the following requirements, as described in [3]: Preserve (as far as possible) all salient information in the source images; do not
5.3 Improved Dynamic Image Fusion Scheme for Infrared and. . .
175
introduce any artifacts or inconsistencies; be shift invariant; be temporal stable and consistent. The last two points are especially important in dynamic image fusion (or image sequences fusion) as human visual system is highly sensitive to moving artifacts introduced by the shift-dependent fusion process [3]. Fusion process can be performed at different levels of information representation, sorted in ascending order of abstraction: signal, pixel, feature, and symbol levels [4]. From the simplest weighted pixel averaging to more complicated multiresolution (MR) method (including pyramidal schemes and wavelet schemes), pixel-based fusion methods were well researched [3, 5–10]. Recently, feature-level fusion with region-based fusion scheme has been reported both qualitative and quantitative improvements over the pixel-based method as more intelligent semantic fusion rules can be considered based on actual features [11–16]. An improved dynamic image fusion scheme for infrared and visible sequence based on feedback optimum weight coefficients is introduced in this section.
5.3.2
Generic Pixel-Based Image Fusion Scheme
The generic pixel-based fusion scheme is briefly reviewed, and more details can be found in [3, 5–16]. Figure 5.2 illustrates the generic wavelet fusion scheme, which can be divided into three steps: First, all source images are decomposed by using multi-resolution method, which can be the pyramid transform (PT) [5, 6], discrete wavelet transform (DWT) [7–9], discrete wavelet frames (DWF) [3], or dual-tree complex wavelet transform [10]. Then the decomposition coefficients are fused by applying a fusion rule, which can be a point-based maximum selection (MS) rule or more sophisticated area-based rules [6, 7]. Finally, the fused image is reconstructed by using the corresponding inverse transform on the fused coefficients. Source sensors for most image fusion systems have different fields of view, resolutions, lens distortions, and frame rates. It is vital to align the input images properly with each other, both spatially and temporally, a problem addressed by image registration [13]. To minimize this problem, the imaging sensors in many practical systems are rigidly mounted side-by-side and physically aligned as closely as possible. However, in more complex systems where sensors move relative to each other, the registration of input images becomes a very challenging problem, to some extent, larger than the fusion algorithm itself. Most applications of a fusion scheme are interested in features within the image, not in the actual pixels. Therefore, it seems reasonable to incorporate feature Fig. 5.2 Generic pixelbased image fusion scheme
Image A
PreProcessing
Image B
Fused Image
MR Transform Inverse Transform
Fusion
176
5 Multi-sensor Dynamic Image Fusion
information into the fusion process [14]. A number of region-based fusion schemes have been proposed [11–16]. However, most of the region-based schemes are designed for still image fusion, and every frame of each source sequence is processed individually in image sequence case. These methods do not take full advantage of the wealth of inter-frame information within source sequences.
5.3.3
Improved Dynamic Image Fusion Scheme Based on Region-Based Target
For pixel-based approaches, the MR decomposition coefficient is treated independently (MS rule) or filtered by a small fixed window (area-based rule). However, most of the applications of a fusion scheme are interested in features within the image, not in the actual pixels. Therefore, it seems reasonable to incorporate feature information into the fusion process [14]. A number of region-based fusion schemes have been proposed [11–16]. However, most of the region-based schemes are designed for still image fusion, and every frame of each source sequence is processed individually in image sequence case. These methods do not take full advantage of the wealth of inter-frame information within source sequences. The novel region-based fusion scheme proposed for fusion of visible and infrared (IR) image sequences is shown in Fig. 5.3, where the target detection (TD) techniques are introduced to segment target regions intelligently. For convenience, we assume both source sequences are registered well before fusion. First, both the visible and IR sequences are enhanced by using pre-processing operator. Then each frame of the source sequences is transformed by using an MR method (where the LR DWT is adopted, see Sect. 3.1). Simultaneously, the frames are segmented into object and background regions by using a TD method. Different fusion rules are adopted in target and background regions. Finally, the fused coefficients belonging to each region are combined, and fused frames are reconstructed by using the corresponding inverse transform.
IR Sequence
MR Transform Preprocessi ng
Visible Sequence Fused Sequence
Target Region Fusion
TD MR Transform
Background Region Fusion
Inverse Transform
Fig. 5.3 Region-based IR and visible dynamic image fusion scheme
5.3 Improved Dynamic Image Fusion Scheme for Infrared and. . .
5.3.3.1
177
The Limited Redundancy Discrete Wavelet Transform
It is well known that the standard DWT produces a shift-dependent [17, 18] signal representation due to down-sampling operations in every sub-band, which results in a shift-dependent fusion scheme, as described by Rockinger [3]. To overcome the problem, Rockinger presents a perfect shift-invariant wavelet fusion scheme by using DWF. However, this method is much computationally expensive due to high redundancy (2m n:1 for m-D and n-level decomposition) of the representation. Bull et al. [10] further develop the wavelet fusion method by introducing DT CWT, which provides approximately shift invariance by introducing limited redundancy (2m:1 for m-D and any level decomposition). However, the DT CWT employs two filters banks, which must be designed rigorously to achieve appropriate delays while satisfying perfect reconstruction (PR) conditions. Moreover, the decomposition coefficients from every tree should be regarded as the real or imaginary part of the complex, increasing difficulty of subsequent processing in the fusion rule. A new implementation of the DWT is introduced in this chapter, which provides approximate shift invariance and nearly perfect reconstruction while preserving the properties of DWT: computational efficiency and easy implementation. Figure 5.4 shows decomposition and reconstruction scheme for the new transform in cascading form, which can be extended easily to 2D by separable filtering along rows and then columns. For the limited redundancy (not more than 3:1 for 1D) of the new transform, we use LR DWT (the limitedly redundant discrete wavelet transform) here to distinguish it from DWT and DWF.
5.3.3.2
Region Segmentation Algorithm
The target detection (TD) operator aims for segmenting both source frames into target regions, in which the significant information is included such as moving human and vehicle, and background regions. A novel target detection method is proposed in this chapter based on the characteristics of IR imaging. At first, a regionmerging method [19] is adopted to segment the initial IR frame. It is easy to find the Fig. 5.4 The LR DWT and inverse transform
178
5 Multi-sensor Dynamic Image Fusion
target regions, which have high contrast with the neighboring background, in the segmented IR frame. A confidence measure [20] for each candidate region is computed. It is very inefficient to compute the confidence measure for each candidate within every frame. Therefore, a model-matching method is adopted to find the target regions in the subsequent frames. A target model is obtained by using intensity information of the target region in pre-frame. Not the whole but a small region in post-frame which corresponds with (and is little larger than) the target region in pre-frame is matched. The initial detection operator based on segmentation and confidence measure will be repeated in case no target being detected in certain successive frames. The target detection in the visible sequence is similar to the IR sequence.
5.3.3.3
Fusion Rules in the Target Region
To preserve the full information as far as possible in the target region, a special fusion rule should be employed 1 in2 the object region. Assume that target detection M gives M target maps: T ¼ t , t , , t IR IR IR IR in IR frame and N target region maps: T V ¼ t 1V , t 2V , , t NV in the corresponding visible frame. The target map is downsampled by 2m (according to the resolution of decomposition coefficients) to give a decimated target map at each level. The target maps in both the source frames are analyzed jointly TJ ¼ TIR [ TV. The frame is segmented into three sets: single, overlapped target region sets, and background region set. Overlapped target regions are defined as TO ¼ TIR \ TV. Single target regions are all the target regions where no overlap T S ¼ T J [ T O . Clearly, there is TJ ¼ TS [ TO. Background regions are defined as B ¼ T J . In the single-target regions, fusion rule can be written as. cf ðx, yÞ ¼
cir ðx, yÞ, if ðx, yÞ 2 T IR cv ðx, yÞ, if ðx, yÞ 2 T V
ð5:1Þ
In a connected overlapped target region t 2 TO, a similarity measure between two sources is defined as 2 M ðt Þ ¼ P ðx, yÞ2t
P ðx, yÞ2t
I ir ðx, yÞ I v ðx, yÞ
½I ir ðx, yÞ2 þ
P ðx, yÞ2t
½I v ðx, yÞ2
ð5:2Þ
where Iir and Iv denote IR and visible frames, respectively. Then an energy index of the coefficients within the overlapped region is computed respectively in IR and visible frames
5.3 Improved Dynamic Image Fusion Scheme for Infrared and. . .
Si ðt Þ ¼
X
179
ci ðx, yÞ2
ð5:3Þ
ðx, yÞ2t
where t 2 TO and i ¼ ir, v mean the IR and visible frames, respectively. A threshold of similarity α is introduced, where α 2 [0, 1] and normally α ¼ 0.85 is appropriate. In case M(t) < α, the fusion rule in overlapped target region can be written as cf ðx, yÞ ¼
cir ðx, yÞ, if Sir ðt Þ Sv ðt Þ cv ðx, yÞ,
ð5:4Þ
otherwise
In case M(t) α, a weight average method is adopted cf ðx, yÞ ¼
ϖ max ðt Þ cir ðx, yÞ þ ϖ min ðt Þ cv ðx, yÞ, if Sir ðt Þ Sv ðt Þ ϖ min ðt Þ cir ðx, yÞ þ ϖ max ðt Þ cv ðx, yÞ, if Sir ðt Þ < Sv ðt Þ
ð5:5Þ
where the weights ϖ min(t) and ϖ max(t) can be obtained 1 M ðt Þ 1 1 1α 2 : ϖ max ðt Þ ¼ 1 ϖ min ðt Þ 8